tl;dr: One peer generates a self-signed certificate and sends the fingerprint of that over the signalling channel; the other connects to it as a "client".
The resulting DTLS keying material is subsequently used for SRTP encryption (for media) and SCTP over DTLS (for the data channel, which is presumably what's being used here).