Evil empires are often a target of grudge and hate from end users. This is understandable and sometimes even justified, but nevertheless Uber still partially pays for our trips (even though it’s temporarily) and Google accelerated the WebRTC technology, which otherwise would stay a proprietary and expensive software tool for quite a narrow b2b purposes if not for the Evil Empire of Google.

Since WebRTC has been made public, video chats became much easier to develop. A number of API and services, servers and framework has emerged for that. In this article we thoroughly describe one more way to develop a video chat between a web browser and a native Android application.

Video chat in a browser

A classic WebRTC video chat between browsers starts from SDP (session description protocol) exchange. Alice sends her SDP to Boris, and Boris responds with his one. SDP is a config like below:

o=- 1468792194439919690 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE audio video
a=msid-semantic: WMS 9nKsWmxMvOQBYaz9xhRffWeWSUbCbnox6aQ4
m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:kSrQ
a=ice-pwd:4uIyZd/mbVSVe2iMFgC1i3qn
a=fingerprint:sha-256 6B:29:2F:47:EB:38:64:F3:25:CE:BD:E6:B0:3F:A6:FA:55:57:A9:EA:44:0B:7C:45:D2:0D:F4:96:8D:B2:9F:BA
a=setup:actpass
a=mid:audio
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=sendonly
a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:3525514540 cname:drYey7idt605CcEG
a=ssrc:3525514540 msid:9nKsWmxMvOQBYaz9xhRffWeWSUbCbnox6aQ4 09bdb6e7-b4b3-437b-945e-771f535052e3
a=ssrc:3525514540 mslabel:9nKsWmxMvOQBYaz9xhRffWeWSUbCbnox6aQ4
a=ssrc:3525514540 label:09bdb6e7-b4b3-437b-945e-771f535052e3
m=video 9 UDP/TLS/RTP/SAVPF 96 98 100 102 127 97 99 101 125
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:kSrQ
a=ice-pwd:4uIyZd/mbVSVe2iMFgC1i3qn
a=fingerprint:sha-256 6B:29:2F:47:EB:38:64:F3:25:CE:BD:E6:B0:3F:A6:FA:55:57:A9:EA:44:0B:7C:45:D2:0D:F4:96:8D:B2:9F:BA
a=setup:actpass
a=mid:video
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:4 urn:3gpp:video-orientation
a=extmap:5 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:6 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
a=sendonly
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:96 VP8/90000
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtpmap:98 VP9/90000
a=rtcp-fb:98 ccm fir
a=rtcp-fb:98 nack
a=rtcp-fb:98 nack pli
a=rtcp-fb:98 goog-remb
a=rtcp-fb:98 transport-cc
a=rtpmap:100 H264/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=fmtp:100 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
a=rtpmap:102 red/90000
a=rtpmap:127 ulpfec/90000
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=98
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100
a=rtpmap:125 rtx/90000
a=fmtp:125 apt=102
a=ssrc-group:FID 2470936840 2969787502
a=ssrc:2470936840 cname:drYey7idt605CcEG
a=ssrc:2470936840 msid:9nKsWmxMvOQBYaz9xhRffWeWSUbCbnox6aQ4 ce9235c5-f300-466a-aadd-b969dc2f3664
a=ssrc:2470936840 mslabel:9nKsWmxMvOQBYaz9xhRffWeWSUbCbnox6aQ4
a=ssrc:2470936840 label:ce9235c5-f300-466a-aadd-b969dc2f3664
a=ssrc:2969787502 cname:drYey7idt605CcEG
a=ssrc:2969787502 msid:9nKsWmxMvOQBYaz9xhRffWeWSUbCbnox6aQ4 ce9235c5-f300-466a-aadd-b969dc2f3664
a=ssrc:2969787502 mslabel:9nKsWmxMvOQBYaz9xhRffWeWSUbCbnox6aQ4
a=ssrc:2969787502 label:ce9235c5-f300-466a-aadd-b969dc2f3664

From this SDP config we can say, for example, that it suggests using H.264 and VP8 codecs for video and Opus for audio. Besides, it provides a lot of other information useful for communication: codecs priority, usage of fir, nack, pli feedbacks, the profile level for the H.264 codec is 42e01f – Baseline 3.1, and so on.

When you implement a video chat based on native WebRTC API you should understand what are SDP, candidates, codecs, ICE, STUN, TURN and many other scary words.

2_video_chat_in_browser

WebRTC, Websockets � SIP

Terms WebRTC and Websockets are often confused. Sometimes SIP takes part in this mess too.

3_WebRTC_Websockets

Well, we can definitely state that WebRTC is not directly related to neither Websockets nor SIP.

Websockets is simply a convenient way to transmit SDP from Boris to Alice. We could use plain HTTP for that or send SDP by e-mail. SDP messages exchange is signalling information and we can use any protocol to send it. For browsers, the default protocols to send data are Websockets and HTTP. Hence, Websockets is mostly used because it is closer to real time compared to HTTP. You can’t transfer video or audio via Websockets, only signalling information: text and commands.

4_WebRTC_SIP

SIP is a text protocol to exchange messages. WebRTC is often wrongly called SIP in a browser, most likely because SIP messages also use SDP to configure codecs and establish connections.

On the other side, when we say something like SIP telephone we mean a device that along with the SIP (RFC3261) protocol also supports a dozen of other network specifications and protocols: RTP, SDP, AVPF, etc.

Indeed, in its core, WebRTC uses construction bricks similar to those used by a SIP telephone (SRTP, STUN, and so on). So one could say that both WebRTC and SIP devices and software use the same technology basis. But calling WebRTC SIP in a browser is incorrect not least because browsers do not have SIP out-of-the-box.

5_WebRTC_Websockets_SIP

WebRTC is a technology that has three main audio/video transmission functions:

  • Capturing, encoding and sending
  • Receiving, decoding and playback
  • Overcoming NAT and Firewall

Plus a lot of auxiliary functions such as jitter compensation, adaptive bitrate, network overload control and so on.

As described above, in order to successfully transmit media via WebRTC, Alice and Boris should exchange SDP containing detailed information on video stream formats, packeting and other parameters that specify how the SDP sender will receive video.

In addition to exchanging SDP, a TURN-server may be required. This server will pass the video traffic through if the peer-to-peer connection will not be established, for example if Alice or Boris have some unfriendly (for instance, symmetric) NAT.

Now, suppose we want to add a third active participant to the chat, or simply another viewer. Here is a good example: debates. Two participants talk while all others just watch. Another example is a chat for three or more participants.

When the third participant arrives, things get more complex. Now every participant needs to capture and compress two video streams instead of just one, and establish mutual connections to overcome NAT. In this case time needed to establish a connection increases while the stability of this connection decreases. Two or more video streams compressed and sent simultaneously creates a serious load to CPU and network and affects the quality especially on mobile devices:

6_WebRTC_Websockets_SIP

Tasks like these:

  • connection of three or more participants
  • connection of additional subscribers of the video chat
  • recording of the video chat

are beyond the scope of peer-to-peer and require a centralized WebRTC server that will manage all the connections.

7_WebRTC_server

As we said above, there are services and servers as well as more or less convenient API on top of WebRTC API that can speed up development of video chats and allow to work with handier abstractions i.e. Stream, Room, Publisher, Subscriber and so on.

For example, to create the simplest video chat, exchanging names of the streams would be enough. Boris knows Alice’s stream. Alice know Boris’ stream. The video chat is ready:

8_server_streamname

Example of a video chat in a browser

In this article we will demonstrate how Streaming API works with Web Call Server 5 – a WebRTC server for video chats and online broadcasts.

The video chat in action is illustrated on the following two screenshots. The first subscriber Alice sees the video chat like this:

9_video_chat

The second subscriber Edward sees the video chat like this:

10_video_chat_in_browser

In this example a few things happen:

  1. Alice sends the video stream named Alice from the browser to the server.
  2. Edward sends the video stream named Edward from the browser to the server.
  3. Alice fetched and played the video stream named Edward.
  4. Edward fetched and played the video stream named Alice.

As seen from the example, we built a video chat based on the assumption that both Alice and Edward know each other’s stream names. We didn’t directly used SDP, PeerConnection, NAT, TURN, etc.

Therefore, a video chat can be implemented by simply passing names of the streams to those who should play them.

This simple concept allows using any front-end and back-end technologies such as Jquery, Bootstrap, React, Angular, PHP, Java, .Net, and further on. The good news is embedding support for video streams and video chat does not have effect on the existing we application. You control your video chat simply allowing (or denying) given subscribers to play specific video streams.

Source code of the video chat in a browser

Now let’s see how the corresponding code looks. An HTML page with the video chat has two main div elements:

  • localVideo – the video captured from the web camera
  • remoteVideo – the video that is played form the server

11_source_code_of_video_chat

You can assign arbitrary identifiers to this divs, for instance id=”captureVideo” or id=”playbackVideo”, but both div elements must present on the page.

The HTML page with localVideo and remoteVideo blocks looks as follows:

<html>
<head>
    <script language="javascript" src="https://flashphoner.com/downloads/builds/flashphoner_client/wcs_api-2.0/current/flashphoner.js"></script>
    <script language="javascript" src="video-chat.js"></script>
</head>
<body onLoad="init()">
<h1>Video Chat</h1>
<div id="localVideo" style="width:320px;height:240px;border: 1px solid"></div>
<div id="remoteVideo" style="width:320px;height:240px;border: 1px solid"></div>
<input type="button" value="connect" onClick="connect()"/>
<input type="button" value="publish" onClick="publish('Alice')"/>
<input type="button" value="play" onClick="play('Edward')"/>
<p id="status"></p>
</body>
</html>

Now, here is the code that is responsible for sending and playing the video.

Sending the stream from a webcam

To send, we use the session.createStream().publish() API method. For this stream we specify the HTML div element that should display the video captured from the webcam, localVideo, as well as the name of the video stream, Alice, so that any connected client that knows this name will be able to play the stream.

session.createStream({
        name: "Alice",
        display: localVideo,
        cacheLocalResources: true,
        receiveVideo: false,
        receiveAudio: false
    }).on(Flashphoner.constants.STREAM_STATUS.PUBLISHING, function (publishStream) {
        setStatus(Flashphoner.constants.STREAM_STATUS.PUBLISHING);
    }).on(Flashphoner.constants.STREAM_STATUS.UNPUBLISHED, function () {
        setStatus(Flashphoner.constants.STREAM_STATUS.UNPUBLISHED);
    }).on(Flashphoner.constants.STREAM_STATUS.FAILED, function () {
        setStatus(Flashphoner.constants.STREAM_STATUS.FAILED);
    }).publish();

Playing the stream from the server

To play, we specify the name of the stream that we want to play, and the HTML div element, remoteVideo, that should display the stream received from the server. We use the session.createStream().play() API method.

session.createStream({
    name: "Edward",
    display: remoteVideo,
    cacheLocalResources: true,
    receiveVideo: true,
    receiveAudio: true
}).on(Flashphoner.constants.STREAM_STATUS.PLAYING, function (playStream) {
    setStatus(Flashphoner.constants.STREAM_STATUS.PLAYING);
}).on(Flashphoner.constants.STREAM_STATUS.STOPPED, function () {
    setStatus(Flashphoner.constants.STREAM_STATUS.STOPPED);
}).on(Flashphoner.constants.STREAM_STATUS.FAILED, function () {
    setStatus(Flashphoner.constants.STREAM_STATUS.FAILED);
}).play();

While working with the server, the HTML page will receive various statuses from it, i.e. PLAYING, STOPPED for playback and PUBLISHING, UNPUBLISHED for publishing. Therefore, the basic thing we need to do for a video chat to work is to place two div blocks on the web page and include the corresponding scripts that will execute stream.play() and stream.publish() for the given stream name. The full source code of the Two Way Streaming example can be downloaded here.

Example of a WebRTC video chat in an Android application

The video chat for Android works exactly the same way as a video chat in a browser. The app establishes connection to the server and sends a video stream from the camera of the Android device, as well as receive and play the other video stream from the server. Below is the Android app Streaming Min (a mobile version of the Two Way Streaming example for a video chat in a browser), that allows to exchange video streams.

12_video_chat_in_android

As you can see from the screenshot, nothing has changed. We have two video windows. The left one displays the video captured from the webcam, and the right one displays the video received from the server. Exchanging of video stream is too based on stream names. We publish one stream and play the other one.

Source code of the video chat for an Android application

While to create a video chat in a browser we used Web SDK that includes the flashphoner.js API script, for a full-featured Android application we need to import the aar-file of the Android SDK to the project. To understand how this works, we recommend to build and execute the Streaming Min example based on the Android SDK. All examples are available in the github repository.

1. Download all examples

git clone https://github.com/flashphoner/wcs-android-sdk-samples.git

2. Download SDK

wget https://flashphoner.com/downloads/builds/flashphoner_client/wcs-android-sdk/aar/wcs-android-sdk-1.0.1.25.aar

3. Link SDK as the aar-file to examples.

cd export
./export.sh /tmp/wcs-android-sdk-1.0.1.25.aar

Note, we tell the export.sh script the path to the downloaded file: wcs-android-sdk-1.0.1.25.aar – Android SDK

As a result, in the export/output folder you will find a completely configured project that you can open in Android Studio

Now you only need to build the examples using gradle.

1 – Create a new run configuration

13_source_code_android_application

2 – Select the Gradle script

14_source_code_of_video_chat

3 – Build

15_video_chat_for_android

As a result, we should receive apk-files, that can be installed to an Android device. In this example we exchanged video streams with a browser. The video stream test33 was sent from the Android device to the server and played in a browser. The video stream 8880 was sent by the browser and played on the Android device. Therefore we ended up with a two-way audio and video communication between the browser and the Android app.

16_video_chat_android_application

In the Web version of the video chat we used HTML div elements for video. On Android, we use renderers.

private SurfaceViewRenderer localRender;
private SurfaceViewRenderer remoteRender;

17_div_elements_for_video

The localRenderer displays the video captured from the camera of the Android device. The remoteRenderer shows the video received from the server.

1. Establish a connection to the server and set renderers.

sessionOptions = new SessionOptions(mWcsUrlView.getText().toString());
sessionOptions.setLocalRenderer(localRender);
sessionOptions.setRemoteRenderer(remoteRender);
...
session = Flashphoner.createSession(sessionOptions);
…
session.connect(new Connection());

2. Create a stream with an arbitrary name and publish the stream to the server.

StreamOptions streamOptions = new StreamOptions(mPublishStreamView.getText().toString());
publishStream = session.createStream(streamOptions);
...
publishStream.publish();

3. Specify the name of the stream to play and fetch the stream from the server.

StreamOptions streamOptions = new StreamOptions(mPlayStreamView.getText().toString());
playStream = session.createStream(streamOptions);
...
playStream.play();

The full source code of the StreamingMinActivity.java class is available here. And the entire Streaming Min example for Android is available in the repository here.

Web Call Server

In conclusion, we demonstrated how to creare a simple exchange of video streams between an HTML page in a browser and an Android application. Video streams go through Web Call Server that is both the signalling server and an audio and video proxy.

18_web_call_server

Web Call Server – is a server software that can be installed on Linux, either on a virtual server or a dedicated server. WCS is a streaming video WebRTC server, and it can manage video streams from browsers, iOS and Android devices.

References

Technologies and protocols

WebRTC – WebRTC technology
SDP – Session description protocol, RFC
Websocket – Websocket protocol, RFC

Server and API for video chat development

Web Call Server – WebRTC streaming video server for video chats
Download Web Call Server – install the server
Web Call Server on EC2 – run a server image on Amazon EC2
Web SDK – Web SDK to develop video chats with WebRTC support
Android SDK – Android SDK to develop video chats with WebRTC support

Working examples

Web Two Way Streaming – an example video stream exchange for Web
Android Two Way Streaming – an example video stream exchange Android application

Source codes of examples

Web Two Way Streaming – source code of an example video stream exchange for Web
Android Two Way Streaming – source code of an example video stream exchange for Android