How we built WebRTC chat: top 3 lessons learned

WebRTC (Web Real-Time Communication) is an awesome new tech for video/audio chat directly inside your browser or mobile app. The best of its kind. But like many new technologies, it has some nasty pitfalls even for experienced developers. This is the story of our WebRTC chat app.

Today, I’ll explain what is WebRTC and tell you how to avoid three mistakes that can easily destroy your video chat project:

Pitfall 1: Not understanding the WebRTC technology
Pitfall 2: Choosing the wrong library
Pitfall 3: Using public STUN/TURN servers
Conclusions

The story starts early February 2018.

On a cold rainy evening, a man appeared in front of our office. He was dressed in a gray trench coat and a rain-soaked fedora. The man’s name was [Redacted]. As we invited him to take shelter in the warmth of our server room, he made us an offer we couldn’t refuse.

We can’t, of course, tell you the exact nature of that offer. You see, he wanted us to implement a [Redacted] feature in his [Redacted] application. Well, you can think of it as a real-time video-chat. For spies.

Video conferences are planned for the next release; credit: Kingsman: the Secret Service

After a short discussion, we’ve chosen the perfect technology for the task – Web RTC.

It allows direct communication between browsers or mobile apps. Among the WebRTC examples are such apps as Google Hangouts, Facebook Messenger, and Discord.

What WebRTC is used for besides video/audio chat?

P2P file sharing.
Channeling large amounts of data (e.g. voice chat in multiplayer video games).
Online video conferences and webinars.
Live presentations.
Screen share.
Remotely controlling smart TVs.
Sharing data between IoT devices, etc.

Unlike previous solutions, it doesn’t require any plugins (e.g. Flash) or additional software.

It is free. It’s open source. It works in most modern browsers.

The whole feature was estimated to take less than 60 hours. But as soon as we got to the bottom of it, things got so much complicated.

Pitfall 1: Not understanding the WebRTC technology

At the beginning, we didn’t have any practical experience with WebRTC. Still it seemed familiar. Although WebRTC was released in 2011, it packs ideas from many established domains such as VoIP communications, web development, video streaming, etc.

But WebRTC is a new tech. Its specification is in flux. Its implementation in browsers is constantly changing. The information you find about WebRTC can often be outdated or incorrect.

Soon we realized how incredibly optimistic was that 60-hour estimate 🙂

So my first advice would be to get a good understanding of what is WebRTC before you start developing your own application:

Know everything about the servers you must deploy for a WebRTC app.
Learn about the signaling process needed to establish a peer to peer connection.
Figure out how media is processed and transmitted.
Consult the experts when necessary.

It’s easy to overestimate your own understanding of the tech. On the other hand, developing your own solution needs serious investment and continuous effort on development part. So, if you have limited time or money, a better solution would be to use a WebRTC platform.

Pitfall 2: Choosing the wrong library

After looking for a readymade solution to implement and maintain our WebRTC connections, we liked PeerJS.

peerjs

The library is one of the most starred GitHub repositories related to WebRTC. It has tons of positive feedback from developers of similar projects.

Implementing WebRTC with PeerJs would’ve allowed us to work on the application logic level, instead of bogging us down in network protocols.

code

The library even comes with its own implementation for signaling server.

Sounds fantastic, right?

Here’s the catch. The last commit to the repository was made 3,5 years ago.

You can’t use such an outdated library for your WebRTC project. The technology is evolving at an incredible pace and WebRTC code gets stale real fast.

Anything older than a couple of months is already outdated. Anything older than a year is pretty much dead.

After creating a quick PeerJS prototype, we’ve tested it in various browsers. It turned out, not all of them supported our solution.

Here’s the official table for WebRTC browser support.

But in practice, things looked different. Google Chrome/Chromium had a great connection. At the same time, Edge, Safari, and Firefox on Linux had trouble establishing and maintaining the connection. So, which browsers really support WebRTC chat?

But even if our PeerJS-based solution worked fine in the near future, there was no guarantee that it would continue working reliably in all the modern browsers as they get more updates.

In the end, we’ve decided to find another library.

Our search has lead us to this list of two dozen libraries that could be used for a WebRTC project. However, of all the candidates, only SimpleWebRTC and EasyRTC met all our criteria.

Here’s what you should consider when choosing a WebRTC library.

Whether the project is still alive

Look for libraries that were updated in the last few months. A code that is older than a year might not even function anymore. Also, find out what the update was about and how often the library is updated.

Whether it has a quality documentation

The quality of documentation varies greatly from one WebRTC library to another. The gold standard would be having an introduction to the library’s makeup and architecture, an API reference, explanation of the exposed properties and methods, the project’s showcase, info on how to install, configure, maintain and scale the solution, etc.

A quality documentation will save you lots of headache as well as help developers to quickly get started and save you some time. It would also help you better understand what you can achieve with the library.

Whether you understand the library’s code and can maintain it by yourself

This echoes with the first pitfall. Great libraries come with comments explaining what the code actually does. This would make maintaining the code so much easier.

Whether it’s popular among the developers

An active community and support from the developers is a strong indication that the solution is worth your attention. Popularity also means you can easily find answers to the nagging questions or even hire people for your project.

Is the library a standalone or a hosted solution?

With hosted solutions, you get virtually zero control over those its parts that are hosted on the developer’s servers. Standalone libraries, on the other hand, allow you to control every aspect of your WebRTC implementation.

Is there a back-end implementation available?

This could save you a ton of time, but sadly only a few libraries have got the servers covered.

And finally, is it configurable?

Pitfall 3: Using public STUN/TURN servers

Now that we have taken care of the browser compatibility issue, another problem has become our priority. Everything worked fine inside of our local network. But as soon as we tried to reach out beyond the firewall, the connection got wonky.

We’ve discovered the reason for the frequent disconnects were the public STUN/TURN servers that we used for our project.

But didn’t I just say that WebRTC is entirely P2P and doesn’t need servers?

Well, that’s in theory. In practice, it’s a bit more complicated.

Let’s say you want to get some mission updates from the agent [Redacted]. But to establish a P2P communication channel with the agent [Redacted], a WebRTC app first has to locate him.

With regular websites and servers, you’ll receive their IP address via a DNS server.

But the agent [Redacted] isn’t a server. You can’t discover his address that way.

Most PCs don’t have public-facing IP. Your “friend” is likely surfing the Internet from the safety of his top secret LAN. His true IP is hidden behind firewalls and Network Address Translation (NAT) devices. These devices map his internal IP to the available public-facing IPs. Even the agent [Redacted] might be unaware of his external IP.

One way to discover his IP address and let him know where to send responses is using a STUN (Session Traversal Utilities for NAT) server.

When establishing a peer-to-peer connection, you first ask a STUN server to reveal your public-facing IP. After this, you can tell your friend how to contact you. He would, in turn, do the same thing.

Now that you both know each other’s IP, you can establish a P2P connection.

But discovering each other’s IPs is just a small piece of the puzzle called signaling.

It consists of discovering the network, traversing NAT, establishing and managing sessions, securing the communication channel, handling the errors, etc.

But sometimes NAT devices and firewalls won’t let you establish a peer to peer connection.

In such a case, a TURN (Traversal Using Relays around NAT) server is used to relay data between the two browsers.

In WebRTC, using a TURN server is the last resort when the standard course of action fails.

When using a TURN server, browsers don’t need to understand how to connect to each other and send data between them. All they need to know is what public TURN server to use as an intermediary.

For smooth user experience, your TURN server should be pretty robust, have a large bandwidth and be able to handle a lot of data.

In the end, we had to set up our own servers. This took care of the connectivity issues.

So what you should know about STUN/TURN servers to avoid the same mistakes?

While it’s possible to make a P2P connection work without any servers, a real-world project needs them for a reliable connection.
Never bank on free STUN (and especially TURN) servers.
With STUN servers you don’t need a particularly powerful machine. Our estimates show that a single video call adds about 10Kb of signaling traffic. STUN servers are also not that demanding when it comes to memory or processing power.
TURN servers can quickly become resource hogs. The bitrate for an HD video varies between 2 and 4Mbps. This means that a ten-minute WebRTC video chat would eat up at least 150 Mb of traffic. If your users make on average 1,000 calls a day and you relay just 10% of traffic via a TURN server, you’ll get 30Gb per day. Nobody would offer a free-for-all TURN server that could handle so much traffic.

Another thing to consider is that TURN servers are pretty sensitive to user geography. If you’ve got two users from the UK talking to each other via a West Coast TURN server, the lag alone would noticeably degrade the stream’s quality.

That’s why if you have a global audience, you’ll likely need several TURN servers in various places around the world. But usually, three TURN servers (TLS, UDP, and TCP) and a single STUN server are enough for the WebRTC server setup.

From our experience, I’d recommend using either restund or coturn STUN/TURN servers.

Conclusions

And that pretty much covers WebRTC pitfalls. Hope you now understand what is WebRTC and how to avoid the three most common developers’ mistakes.

As for us, these four months were a bumpy ride. We’ve made mistakes, but we’ve learned much about WebRTC and became pretty confident with the tech.

So if you’ve got any questions about WebRTC chat development, you can always message us.

See you next time.

What is WebRTC and how to avoid its 3 deadliest pitfalls

Pitfall 1: Not understanding the WebRTC technology

Pitfall 2: Choosing the wrong library

Pitfall 3: Using public STUN/TURN servers

Conclusions

Subscribe to MindK Blog

Read next