Review of WCS 5.2 – Server for Webcast and Webcam Developers

Alice is an experienced full stack developer, who is capable of writing a SAAS project framework on her favorite framework using php in a week. As for frontend, she prefers Vue.js.

A client contacts you via Telegram, asking you to develop website that will be the meeting place for the employer and the employee to conduct an in-person interview. In-person means face-to-face, a direct video contact in real time with video and voice. “Why not use Skype?” some may ask. It just so happened that serious projects – and each startup undoubtedly considers itself a serious project – are trying to offer an internal communications service for a variety of reasons, including:

1) They don’t want to lend its users to third-party communicators (Skype, Hangouts, etc.) and want to retain them in the service.

2) The want to monitor their communications, such as call history and interview results.

3) Record calls (naturally, notifying both parties about the recording).

4) They don’t want to depend on policies and updates of third-party services. Everyone knows this story: Skype got updated, and everything goes up in smoke…

It seems like an easy task. WebRTC comes up when googling on the topic, and it seems like you can organize a peer-to-peer connection between two browsers, but there are still some questions:

1) Where to get STUN/TURN servers?

2) Can we do without them?

3) How to record a peer-to-peer WebRTC call?

4) What will happen if we need to add a third party to the call, for example, an HR manager or another specialist from the employer?

It turns out that WebRTC and peer-to-peer alone are not enough, and it is not clear what to do with all this to launch the required video functions of the service.

Article content

Server and API
Incoming streams
- WebRTC
- RTMP
- RTSP
- VOD
- SIP/RTP
Outgoing streams
- WebRTC
- RTMP
- RTSP
- MSE
- HLS
Incoming and outgoing
Incoming stream manipulation
Stream relaying
- WebRTC
- RTMP
- SIP/RTP
Connecting servers to a CDN content processing network
To summarize
Links
Related articles
Related features
Documentation

Server and API

To close these gaps, server solutions and peer-server-peer architecture are used. Web Call Server 5.2 (hereinafter — WCS) is one of the server solutions; it is a development platform that allows you to add such video functions to the project and not worry about STUN/TURN and peer-to-peer connection stability.

At the highest level, WCS is a JavaScript API + server part. API is used for development using regular JavaScript on the browser side, and the server processes video traffic, acting as a Stateful Proxy for media traffic.

javascript_server_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

In addition to the JavaScript API, there is also the Android SDK and iOS SDK, which are needed to develop native mobile applications for iOS and Android, respectively.

For example, publishing a stream to the server (streaming from a webcam towards the server) looks like this:

Web SDK

session.createStream({name:”stream123”}).publish();

Android SDK

publishStream = session.createStream(streamOptions)
publishStream.publish();

iOS SDK

FPWCSApi2Stream *stream = [session createStream:options error:&error];
if(![stream publish:&error]) {
//published without errors
}

As a result, we can implement not only a web application, but also full-fledged ups for Google Play and the App Store with video streaming support. If we add mobile SDK to the top-level picture, we will get this:

javascript_sdk_server_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

Incoming streams

The streaming server, which is WCS, starts with incoming streams. To distribute something, we need to have it. To distribute video streams to viewers, it is necessary that these streams enter the server, go through its RAM, and exit through the network card. Therefore, the first question that we need to ask when familiarizing ourselves with the media server is what protocols and formats the latter uses to accept streams. In the case of WCS, these are the following technologies: WebRTC, RTMP, RTSP, VOD, SIP/RTP.

webRTC_RTMP_RTSP_VOD_SIP_RTP

Each of the protocols can be used by diverse clients. For example, not only the stream from the browser can enter via WebRTC, but also from another server. In the table below there are the possible sources of incoming traffic.

WebRTC	RTMP	RTSP	VOD	SIP/RTP
Web SDK camera + mic canvas screen sharing Android SDK iOS SDK WCS push pull CDN	RTMP encoder ffmpeg OBS Wirecast Adobe Encoder WCS push pull Flash Player	IP camera RTSP server	Filesystem AWS S3	SIP endpoint SIP conferencing

If we go through the sources of incoming traffic, we can add the following:

Incoming WebRTC

Web SDK allows not only capturing the camera and microphone, but also using the capabilities of the browser API to access the screen by means of screen sharing. In addition, we can capture an arbitrary Canvas element, and all that is drawn on it for subsequent broadcasting is canvas streaming:

Due to the mobile specifics, the Android SDK and iOS SDK have the ability to switch between the front and rear cameras of the device on the go. This allows us to switch the source during streaming without having to pause the stream.

The incoming WebRTC stream can also be obtained from another WCS server using the push, pull and CDN methods, which will be discussed later.

publish_scheme_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

Incoming RTMP

The RTMP protocol is widely used in the streamers’ favorite OBS, as well as in other encoders: Wirecast, Adobe Media Encoder, ffmpeg, etc. Using one of these encoders, we can capture the stream and send it to the server.

We can also pick up an RTMP stream from another media server or WCS server using the push and pull methods. In the case of push, the initiator is the remote server. In the case of pull, we turn to the local server to pull the stream from the remote one.

rtmp_server_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

Incoming RTSP

Sources of RTSP traffic are usually IP cameras or third-party media servers that support RTSP protocol. Despite the fact that when initiating an RTSP connection, the initiator is WCS, the audio and video traffic in the direction from the IP camera moves towards the WCS server. Therefore, we consider the stream from the camera to be incoming.

rtsp_ip_cam_streaming_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

Incoming VOD

At first glance, it might seem that the VOD (Video On Demand) function is exclusively associated with outgoing streams and with the file playback by browsers. But in our case, this is not entirely true. WCS broadcasts an mp4 file from the file system to localhost; as a result, an incoming stream is created, as if it came from a third-party source. Further, if we restrict one viewer to one mp4 file, we get the classic VOD, where the viewer gets the stream and plays it from the very beginning. If we do not restrict one viewer to one mp4 file, we get VOD LIVE – a variation of VOD, in which viewers can play the same file as a stream, connecting to the playback point at which all the others are currently located (pre-recorded television broadcast mode).

watermarked_stream_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

Incoming SIP/RTP

To receive incoming RTP traffic inside a SIP session, we need to set up a call with a third-party SIP gateway. If the connection is successfully established, the audio and/or video traffic will go from the SIP gateway, which will be wrapped in the incoming stream on the WCS side.

sip_as_stream_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

Outgoing streams

After receiving the stream to the server, we can replicate the stream received to one or many viewers on request. The viewer requests a stream from the player or other device. Such streams are called outgoing or “viewer streams,” because sessions of such streams are always initiated on the side of the viewer/player. The set of playback technologies includes the following protocols/formats: WebRTC, RTMP, RTSP, MSE, and HLS.

WebRTC	RTMP	RTSP	MSE	HLS
Web SDK Android SDK iOS SDK WC pull CDN	Flash Player RTMP Players	RTSP Player VLC WCS etc	Web SDK	HLS players hls.js native Safari

Outgoing WebRTC

In this case, the Web SDK, Android SDK and iOS SDK act as the API for the player. An example of playing WebRTC stream looks like this:

Web SDK

session.createStream({name:”stream123”}).play();

Android SDK

playStream = session.createStream(streamOptions);
playStream.play();

iOS SDK

FPWCSApi2Stream *stream = [session createStream:options error:nil];
    if(![stream play:&error]) 
    {
   //published without errors
   }

This is very similar to the publishing API, with the only difference being that instead of stream.publish(), stream.play() is called to play.

A third-party WCS server can be the player, which will be instructed to pick up the stream via WebRTC from another server using the pull method or pick up the stream inside CDN.

Outgoing RTMP

rtmp_play_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

Here there will be mainly RTMP players – both the well-known Flash Player and desktop and mobile applications that use the RTMP protocol, receive and play an RTMP stream. Despite the fact that Flash left the browser, it kept the RTMP protocol, which is widely used for video broadcasts, and the lack of native support in browsers does not prevent the use of this quite successful protocol in other client applications. It is known that RTMP is widely used in VR players for mobile applications on Android and iOS.

Outgoing RTSP

rtsp_play_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

The WCS server can act as an RTSP server and distribute the received stream via RTSP as a regular IP camera. In this case, the player must establish an RTSP connection with the server and pick up the stream for playback, as if it was an IP camera.

Outgoing MSE

mse_play_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

In this case, the player requests a stream from the server using the Websocket protocol. The server distributes audio and video data via web sockets. The data reaches the browser and is converted into chunks that the browser can play thanks to the native MSE extension supported out of the box. The player ultimately works on the basis of the HTML5 video element.

Outgoing HLS

hls_play_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

Here, WCS acts as an HLS server or Web server that supports HLS (HTTP Live Streaming). Once the incoming stream appears on the server, an .m3u8 HLS playlist is generated, which is given to the player in response to an HTTP request. The playlist describes which video segments the player should download and display. The player downloads video segments and plays it on the browser page, on the mobile device, on the desktop, in the Apple TV set-top box, and wherever HLS support is claimed.

Incoming and outgoing

In total, we have 5 incoming and outgoing stream types. They are listed in the table:

Incoming	Outgoing
WebRTC	WebRTC
RTMP	RTMP
RTSP	RTSP
VOD	MSE
SIP/RTP	HLS

That is, we can upload the streams to the server, connect to them, and play them with suitable players. To play a WebRTC stream, use the Web SDK. To play a WebRTC stream as HLS, use an HLS player, etc. One stream can be played by many spectators. One-to-many broadcasts do work.

Now, let us describe what actions can be performed with streams.

Incoming stream manipulation

Outgoing streams with spectators are not easily manipulated. Indeed, if the viewer has established a session with the server and is already getting some kind of stream, there is no way to make any changes to it without breaking the session. For this reason, all manipulations and changes take place on incoming streams, at the point where its replication has not yet occurred. The stream that has undergone changes is then distributed to all connected viewers.

Stream opeartions include:

– recording

– taking a snapshot

– adding a stream to the mixer

– stream transcoding

– adding a watermark

– adding a FPS filter

– image rotation by 90, 180, 270 degrees

Incoming stream recording

stream_rec_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

Perhaps the most understandable and frequently encountered function. Indeed, in many cases, streams require recording: webinars, English lessons, consultations, etc. Recording can be initiated with either the Web SDK or the REST API with a special request:

/stream/startRecording {}

The result is saved in the file system as an mp4 file.

Taking a snapshot

stream_snapshots_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

An equally common task is to take pictures of the current stream to display icons on the site. For example, we have 50 streams in a video surveillance system, each of which has one IP camera as a source. Displaying all 50 threads on one page is not only problematic for browser resources, but also pointless. In case of 30 FPS, the total FPS of the changing picture will be 1500, and the human eye simply will not accept such a display frequency. As a solution, we can configure automatic slicing or snapshot-taking on demand; in this case, images with an arbitrary frequency can be displayed on the site, for example, 1 frame in 10 seconds. Snapshots can be removed from the SDK through the REST API, or sliced automatically.

snapshot_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

The WCS server supports the REST method for receiving snapshots:

/stream/snapshot

stream_snapshot_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

Adding a stream to the mixer

mixer_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

An image from two or more sources can be combined into one for display to end viewers. This procedure is called mixing. Basic examples: 1) Video surveillance from multiple cameras on the screen in one picture. 2) Videoconference, where each user receives one stream, in which the rest are mixed, to save resources. The mixer is controlled via the REST API and has an MCU mode of operation for creating video conferences.

REST command to add a stream to the mixer:

/mixer/startup

Stream transcoding

transcoding_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

Streams sometimes need to be compressed in order to adapt for certain groups of client devices by resolution and bit rate. For this, transcoding is used. Transcoding can be enabled on the Web SDK side, through the REST API, or automatically through a special transcoding node in the CDN. For example, a video of 1280×720 can be transcoded to 640×360 for distribution to customers from a geographic region with a traditionally low bandwidth. Where are your satellites, Elon Musk?

stream_video_transcoding_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

Used REST method:

/transcoder/startup

Adding a watermark

watermarked_stream_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

It is known that any content can be stolen and turned into WebRip, no matter what protection the player has. If your content is valuable, you can embed a watermark or logo into it that will greatly complicate its further use and public display. To add a watermark, just upload a PNG image, and it will be inserted into the video stream by transcoding. Therefore, you will have to prepare a couple of CPU cores on the server side in case you still decide to add a watermark to the stream. In order not to create the watermark on the server by transcoding, it is better to add it directly on the encoder/streamer, which often provide such an opportunity.

Adding a FPS filter

fps_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

In some cases, it is required that the stream have an even FPS (frames per second). This may come in handy if we re-stream the stream to a third-party resource like Youtube or Facebook or play it with a sensitive HLS player. Filtering also requires transcoding, so make sure to properly assess the power of your server and prepare 2 cores per stream if such an operation is planned.

Image rotation by 90, 180, 270 degrees

rotation_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

Mobile devices have the ability to change the resolution of the published stream depending on the angle of rotation. For example, you started to stream, holding the iPhone horizontally, and then rotated it. According to the WebRTC specification, the streamer browser of the mobile device (in this case iOS Safari) should signal the rotation to the server. In turn, The server must send this event to all subscribers. Otherwise, it would be like this – the streamer put the phone on its side, but still sees its camera vertically, while the viewers see a rotated image. To work with rotations on the SDK side, the corresponding cvoExtension extension is included.

Management of incoming streams – table

Automatic – the configuration is usually set on the server side in the settings.

Action	Web, iOS, Android SDK	REST API	Automatic	CDN
Recording	+	+
Taking a snapshot	+	+	+
Adding to mixer	+	+
Stream transcoding	+	+		+
Adding a watermark			+
Adding a FPS filter			+
Image rotation by 90, 180, 270 degrees	+

Stream relaying

Relaying is also an option for manipulating streams entering the server; it consists in forcing the stream to a third-party server. Relaying is synonymous with such words as republishing, push, inject.

Relaying can be implemented using one of the following protocols: WebRTC, RTMP, SIP/RTP. The table shows the direction in which the stream can be relayed.

WebRTC	RTMP	SIP/RTP
WCS	RTMP server WCS	SIP server

WebRTC relaying

webrtc_push_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

A stream can be relayed to another WCS server if for some reason it is required to make the stream available on another server. Relaying is done through the REST API via /push method. Upon receipt of such a REST request, WCS connects to the specified server and publishes a server-server stream to it. After that, the stream becomes available for playback on another machine.

/pull/push

– used REST method

RTMP relaying

rtmp_republish_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

As with WebRTC relaying, RTMP relaying to another server is also possible. The difference will only be in the relay protocol. RTMP relaying is also performed via /push and allows transferring the stream to both third-party RTMP servers and to services supporting RTMP Ingest: Youtube, Facebook streaming, etc. Thus, WebRTC stream can be relayed to RTMP. We might as well as relay any other stream entering the server, for example RTSP or VOD, to RTMP.

The video stream is relayed to another RTMP server using REST calls.

/push/startup

– used REST call

SIP/RTP relaying

sip_republish_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

It is rarely used function. Most often, it is used in enterprise. For example, when we need to establish a SIP call with an external SIP conference server and redirect the audio or video stream to this call so that the audience of the conference sees some kind of video content: “Please watch this video” or “Colleagues, now let’s watch an IP camera stream from the construction site”. We need to keep in mind that in this case, the conference itself exists and is managed on an external VKS-server with SIP support (recently, we have tested the solution from Polycom DMA), whereas we just connect and relay the existing stream to this server. The REST API function is called /inject and serves just for this case.

REST API command:

/call/inject_stream/startup

Connecting servers to a CDN content processing network

Usually, one server has a limited amount of resources. Therefore, for large online broadcasts where the audience counts for thousands and tens of thousands, scaling is necessary. Several WCS servers can be combined into one CDN content delivery network. Internally, CDN will work through WebRTC to maintain low latency during streaming.

cdn_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

The server can be configured in one of the following roles: Origin, Edge, or Transcoder. Origin type servers receive traffic and distribute it to the Edge servers, which are responsible for delivering the stream to viewers. If it is necessary to prepare a stream in several resolutions, Transcoder nodes are included in the scheme, which take on the resource-consuming mission of transcoding streams.

To summarize

WCS 5.2 is a server for developing applications with realtime audio and video support for browsers and mobile devices. Four APIs are provided for development: Web SDK, iOS SDK, Android SDK, REST API. We can publish (feed) video streams to the server using five protocols: WebRTC, RTMP, RTSP, VOD, SIP/RTP. From the server, we can play streams with players using five protocols: WebRTC, RTMP, RTSP, MSE, HLS. Streams can be controlled and undergo such operations as recording, slicing snapshots, mixing, transcoding, adding a watermark, filtering FPS, and broadcasting video turns on mobile devices. Streams can be relayed to other servers via WebRTC and RTMP protocols, as well as redirected to SIP conferences. Servers can be combined into a content delivery network and scaled to process an arbitrary number of video streams.

What Alice must know to work with the server

The developer needs to be able to use Linux. The following commands in the command line should not cause confusion:

tar -xvzf wcs5.2.tar.gz

cd wcs5.2

./install.sh

tail -f flashphoner.log

ps aux | grep WebCallServer

top

One also need to know Vanilla JavaScript when it comes to Web development.

//publishing the stream
session.createStream({name:’mystream’}).publish();
//playing the stream
session.createStream({name:’mystream’}).publish();

The ability to work with the back-end may also come in handy.

REST_Methods_WebRTC_Android_iOS_SDK_API_WCS_browser_RTMP_RTSP_VOD_SIP_RTP

WCS can not only receive control commands through the REST API, but also send hooks – i.e. notifications about events that occur on it. For example, when trying to establish a connection from a browser or mobile application, WCS will trigger the /connect hook, and when trying to play a stream, it will trigger the playStream hook. Therefore, the developer will have to walk a little in the shoes of the back ender, who is able to create both a simple REST client and a small REST server for processing hooks.

REST API example

/rest-api/stream/find_all

– example of REST API for listing streams on the server

REST hook example

https://myback-end.com/hook/connect

– REST hook /connect processing on the backend side.

Linux, JavaScript, REST Client / Server – three elements that are enough to develop a production service on the WCS platform working with video streams.

Developing mobile applications will require knowledge of Java and Objective-C for Android and iOS, respectively.