Sometimes. The VoIP world has been dealing with this for years. The process of finding and connecting the clients is called signaling, and that typically requires a shared relay. The signaling can contain information on how the clients can be reached directly. In SIP, typically each client will use a STUN server to discover their public IPs, and a process call ICE allows the clients to iteratively try those candidates until they establish a connection. Based on the NAT settings, it may not be possible, so clients will also include the IP of a relay TURN server.
The challenges with Croc here are largely the same, but with data as the media instead of voice or video. (Although, “VoIP” can also handle data in this way. See WebRTC data channels.)
This blog post[1] may interest you. As you suggested, the workflow seems to be:
1. Try various techniques that might trick the firewalls on both ends to let the connection through. This requires a relay for the initial negotiation only.
The challenges with Croc here are largely the same, but with data as the media instead of voice or video. (Although, “VoIP” can also handle data in this way. See WebRTC data channels.)