John was happy. He’d just turned in a commission and he was enjoying a relaxing evening. Hours upon hours of development, optimization, testing, changes and approvals were left behind.

And just as he was contemplating picking up a nice cold beer, his phone rang.

“Only half the viewers could connect to the stream!” — said the voice on the other side of the line.

With a resigned sigh, John opened up his laptop and started pouring through logs.

Unfortunately, in all of those many tests, he never considered that a big number of viewers would mean great strains for the server infrastructure and the network itself.

As it happens, John is not alone in his plights. Many users reach out to tech support with questions like these:

“What kind of server do I need for 1000 viewers?”

“My server is solid, but only 250 viewers can connect simultaneously, the rest either can’t join, or get stuck with terrible video quality”

Such questions have one inquiry in common: How does one choose a correct server?

Previously we’d already touched on the topic of choosing a server based on the number of subscribers. Here’s the gist:

1. When choosing a server for streaming—with or without balancing—you need to take into the account the load profiles:

  • basic streaming;
  • streaming with transcoding;
  • stream mixing.

 

2. Streams with transcoding and mixing place greater loads on the CPU and RAM, compared to basic streams. The load on the server CPU shouldn’t exceed 80%. If that’s the case, all viewers will receive the video of decent quality.

3. In practice, it is often the case that the stream quality depends not on the server specs, but on the network capacity.

Here’s a quick reference on how to calculate the number of streams based on the network capacity:

One 480p stream takes up about 0.5 – 1 Mbps of traffic. WebRTC streams have a variable bitrate, so let’s assume it’s going to take 1 Mbps. Thus, 1000 streams equals 1000 Mbps.

4. The number of viewers and the stream quality partially depend on the way the server is configured — the number of media ports and the ZGC usage.

In this article, we’ll take a look at how to run a server stress test and we’ll see whether all the aforementioned points hold true.

Testing plan

scheme_сonnection_websocket_big_servers_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

  1. Publish a stream from a camera on a WCS server #1 following the instructions from a previous example, “Two Way Streaming”
  2. Using the Console web app start a stress test, where the WCS server #2 will imitate 1000 users trying to connect to WCS #1.
  3. Using the data from the Prometheus monitoring system, check the server load and the number of outgoing WebRTC streams.
  4. If the server is handling the requested number of streams, select a random one and manually check the stream quality degradation (if there is any).

 

The test shall be considered successful if 1000 viewers could connect to WCS#1 with no visible quality degradation.

Preparing for testing

For this test you’ll need

  • two WCS servers;
  • standard streaming setup described in Two-Way Streaming;
  • Console web app for testing;
  • Google Chrome browser and Allow-Control-Allow-Origin extension for working with Console.

 

If you don’t have Access-Control-Allow-Origin installed, do so and run it:

enable_CORS_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

We assume your WCS is already installed and configured. If not, follow this guide.

For the test to be successful and for the results to be usable, you need to go through the following preliminary steps.

1. Extend the media port range for WebRTC connections in the following file: flashphoner.properties

media_port_from = 20001
media_port_to = 40000

Make sure the range doesn’t overlap with other ports that are used by the server and with Linux ephemeral port range (you can change it, if necessary)

2. In that same file, specify the parameter that increases the duration of the test and the parameter that will display on the statistics page the data regarding the network load:

wcs_activity_timer_timeout=86400000
global_bandwidth_check_enabled=true

3. Enable ZGC in JavaVM. The recommended JDK versions (which have been proven to work well with WCS) are 12 and 14 (Here’s the installation guide)

To configure ZGC, do the following:

– In wcs-core.properties comment out the following lines:

-XX:+UseConcMarkSweepGC
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails

– Adjust the logs settings

-Xlog:gc*:/usr/local/FlashphonerWebCallServer/logs/gc-core.log
-XX:ErrorFile=/usr/local/FlashphonerWebCallServer/logs/error%p.log

– Make the heap size no less than half of the server’s physical memory

### JVM OPTIONS ###
-Xmx16g
-Xms16g

– Enable ZGC

# ZGC
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:+UseLargePages -XX:ZPath=/hugepages

– Configure the memory pages. Make sure to calculate the number of pages fit for the selected heap size:

(1,125*heap_size*1024)/2. 

For -Xmx16g this number is:

(1.125*16*1024)/2=9216
mkdir /hugepages
echo "echo 9216 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages" >>/etc/rc.local
echo "mount -t hugetlbfs -o uid=0 nodev /hugepages" >>/etc/rc.local
chmod +x /etc/rc.d/rc.local
systemctl enable rc-local.service
systemctl restart rc-local.service

Once ZGC is configured, you’ll need to reload WCS.

4. To make server monitoring easy, we suggest deploying the Prometheus+Grafana monitoring system.

We’ll be monitoring the following indicators:

– CPU load:

node_load1
node_load5
node_load15

– Physical memory usage for Java:

core_stats{instance="your.WCS.server.name:8081", job="flashphoner", param="core_java_freePhysicalMemorySize"}
core_stats{instance="your.WCS.server.name:8081", job="flashphoner", param="core_java_totalPhysicalMemorySize"}

– Pauses in ZGC operations:

custom_stats{instance="your.WCS.server.name:8081", job="flashphoner", param="gc_pause"}

– Network capacity:

network_stats

– Number of streams. For the stress tests, we’ll count the outgoing WebRTC connections:

streams_stats{instance="your.WCS.server.name:8081", job="flashphoner", param="streams_webrtc_out"}

– Number and percentage of degraded streams:

degraded_streams_stats{instance="your.WCS.server.name:8081", job="flashphoner", param="degraded_streams"}
degraded_streams_stats{instance="your.WCS.server.name:8081", job="flashphoner", param="degraded_streams_percent"}

Test #1 — weak servers

For our first test, we’ll use two servers with the following specs:

  • 1x Intel Atom C2550 @ 2.4Ghz (4 cores, 4 threads);
  • 8GB RAM;
  • 2x 1Gbps.

 

The channel’s nominal capacity is 2x 1Gbps. Let’s see if it’s true.

You can measure network performance using the iperf tool. It supports all the common operating systems: Windows, MacOS, Ubuntu/Debian, CentOS. In its server mode, iperf can be installed along with WCS, which makes it possible to test the channel from one end to the other, from the publisher to the viewer.

Launch iperf in server mode:

iperf3 -s -p 5201

where:

5201 – the port to which iperf expects the clients will attempt to connect.

Launch iperf in client mode to test sending data from client to server via TCP:

iperf3 -c test.flashphoner.com -p 5201

where:

  • test.flashphoner.com – WCS server;
  • 5201 – port for iperf in server mode.

 

Install and run iperf in server mode on the server #1:

start_iperf_server_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

Install and run iperf in client mode on server #2 and receive data on the network performance:

start_iperf_client_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

We see that the average network throughput between the publisher (server #1) and the viewer (server #2) is 2.25 Gbps, which means it can, theoretically, support 2000 viewers.

Let’s test this out.

Start the stress test.

On server #1, open the Console app through HTTP http://your.WCS.server.name:9091/client2/examples/demo/streaming/console/console.html

Specify the domain name or IP address of the server #1 and click the button titled “Add node”. This shall be our test server, the source of the streams. Following the same steps, connect the server #2, which shall be imitating the viewers and capturing the streams.

add_servers_to_console_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

For server #1, start streaming the webcam feed following the guide in the Two-way Streaming article. Any stream name will do.

two-way_streaming_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

In the Console app, select server #2, click “Pull streams”, and enter the following test parameters:

  • choose node – pick server #1;
  • local stream name, Remote stream name – specify the name of the published stream (server #1);
  • qty – specify the number of viewers (for this test — 1000).

 

Click “Pull”:

pull_streams_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

During the test, we will monitor the situation using graphs provided by Grafana:

dashboard_Grafana_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

As you can see, the test increased the load on the server CPU. The Load Average of over 5 for a 4-core processor means it’s under a 100% load:

CPU_averrage_Grafana_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

The available RAM size for Java Heap has decreased:

Java_physical_memory_Grafana_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

ZGC pauses lasted up to 5 ms, which is acceptable:

ZGC_pause_Grafana_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

Graph of the channel bandwidth capacity. Here we see that the majority of the traffic is taken by the outgoing streams. The throughput never exceeded 100 Mbps (less than 5% of the nominal bandwidth, which we successfully tested with iperf ):

Global_bandwidth_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

The number of the outgoing streams. As you can see, the test was almost a complete failure. We didn’t manage to service more than 260 users, and our target level was 1000.

Connection_websocket_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

The degraded streams. As we got closer to the end of the stream, quality degradation started appearing. This makes sense, since the Load Average of over 5 on a 4-core processor means it’s 100% loaded and under such conditions stream degradation is inevitable:

degraded_streams_Grafana_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

The conclusion is simple:

As tempting as it may be to save money on hardware, weak servers and weak virtual instances can’t handle serious loads needed for high-quality production. Although, if your goals don’t require streaming for 1000 viewers, it might be a viable option.
For instance, using an “underpowered” server you can:

  • Set up a simple video surveillance system – distribute feed from an IP camera to a small number of subscribers via WebRTC;
  • Set up a system for webinar hosting for a small company;
  • Stream audio only (audio streams are less demanding).

 

Now let’s fire up more powerful servers and see if we can serve 1000 viewers.

Test #2 — powerful servers

For test #2 we’ll use two servers with the following specs:

  • 2x Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz ( 24 cores, 48 threads in total);
  • 192GB RAM;
  • 2x 10Gbps.

 

As before, test the channel bandwidth with iperf:

iperf_big_servers_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

In this case, the bandwidth of the channel between the publisher (server #1) and the viewer (server #2) is 9.42 Gbps, which equals 9000 potential viewers.

Start the stress test as before using the Console web app.

Monitor the situation using graphs provided by Grafana:

dashboard_big_servers_Grafana_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

Let’s take a closer look at the graphs.

Predictably, the stress test increased the CPU load, but for the 48 threads the peak Load Average of 11 is not indicative of high loads. The server processor is clearly loaded, but it’s far from reaching full capacity:

CPU_averrage_big_servers_Grafana_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

The available RAM size allocated for Java Heap didn’t undergo significant changes:

Java_physical_memory_big_servers_Grafana_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

ZGC pauses reached 2.5 ms, which is acceptable:

ZGC_pause_Grafana_big_servers_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

The graph of the network channel bandwidth. Here, we see that the outgoing streams take up the majority of the traffic. The throughput never went beyond 500 Mbps (around 5% of the nominal bandwidth, which we successfully tested with iperf ):

Global_bandwidth_big_servers_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

The number of outgoing WebRTC streams. The test of the big servers was successful. We managed to serve 1000 viewers.

Connection_websocket_big_servers_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

Degraded streams. The server handled the stress well, and no degraded streams occurred during the test:

degraded_stream_big_servers_Grafana_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

To make sure the test is working, you need to connect to the server and randomly pick a stream for a visual quality inspection.

For instance, pick a stream from the list provided by the console:

console_choose_stream_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

And play it on the server #2. It should have no visible quality loss – no stutters, freezes, or artifacts:

play_stream_WCS_WebRTC_browser_stream_WebSocket_publishing_testing

The test of powerful servers was a success. We managed to connect 1000 viewers to the stream and maintained decent video quality.

The test results might vary depending on the environment, and the location of servers and users (both subscribers and streamers).

In this article, we’ve taken a look at one of the stress test variants. The data from the such tests can be used to optimize the server configuration and fit it to your specific needs, to avoid under- and over-loading the equipment.

Good streaming to you!

Links

Demo

WCS on Amazon EC2

WCS on DigitalOcean

WCS in Docker