The Internet is full of guides on how to record what’s happening on the screen into a file using FFmpeg. In this article, we’ll go a step further and we’ll see how to broadcast screensharing via FFmpeg and create a stream on your site.

It goes without saying that there are many streaming solutions out there, both paid and free. FFmpeg, however, retains its prominence thanks to its cross-platform support, minimalist interface (which is non-existent, since the control is executed through the OS console) and its vast functionality. There are many FFmpeg-based programs for file conversion. FFmpeg is absolutely self-sufficient. You don’t need to search for a movie online, you don’t need to download and install codecs. All you need is a single file (ffplay.exe), that contains all the necessary codecs.

We can sing it praises all day, but today we’re here for a different reason.

Let’s go!

Streaming the screen

The task is simple — we need to broadcast what’s happening on the screen to a site, where the stream will be played using WebRTC technology, and we need to capture both video and audio.

sсhema_ffmpeg_screensharing_WebRTC_RTMP_WCS_bitrate_codec_framerate_video_audio_stream

Let’s take a look at two possible solutions using FFmpeg, one for Linux and one for Windows.

A little tangent on the dangers of perfectionism

At this point, I’d like to make a confession. Research for this article was my first experience with FFmpeg. And for that reason, I spent a long time pouring through guides and googling solutions. Eventually, I managed to find a combination of keys for FFmpeg that would allow to screenshare and capture audio. Following the manual I added the keys for screensharing into the stream going to a WCS server, successfully tested publishing via FFmpeg and playback via WebRTC on Windows and Linux and started writing this article.

While testing I caught myself thinking the FFmpeg screensharing command was rather chaotic. And so, I decided to tidy up the code a bit — I put all the code for video capture and encoding first, put the code for audio capture and encoding second, and put the code for data transfer and stream formation last.

NB! DO NOT use this command!

ffmpeg.exe -f gdigrab -i desktop -draw_mouse 1 -rtbufsize 100M -framerate 30 -probesize 10M -c:v libx264 -r 30 -preset ultrafast -tune zerolatency -crf 25 -pix_fmt yuv420p -f dshow -i audio="@device_cm_{33D9A762-90C8-11D0-BD43-00A0C911CE86}\wave_{F585B65B-4690-4433-8109-F16C6389C066}" -acodec aac -f flv rtmp://demo.flashphoner.com:1935/live/rtmp_stream

Before publishing the article on the blog, I tested it again and, to my dismay, saw that the stream was lacking both video and audio.

no-audio-no-video_play_stream_ffmpeg_screensharing_WebRTC_RTMP_WCS_bitrate_codec_framerate_video_audio_stream

That made me go back to Google manuals and guides.

I checked the FFmpeg keys over and over, I even found an alternative way for capturing system sounds in Windows (see below). I tested different Windows drivers and different desktop versions of Ubuntu, but all of it was in vain:

The WCS server would receive an empty stream.

In one of the guides I read a phrase that stuck with me. Is said: “Do not mix the video and audio keys!”. So I diligently avoided doing that and tried to neatly separate them. First I would put the keys related to audio, then — to video, then — to streaming.

Then, at one point, I tried the original code again, the one that looked a chaotic mess. And lo and behold! It worked. I took a close look at the code and realized my mistake. The keys for video and audio must not be mixed with keys for different actions!

By trial and error, I concluded that the code must follow the following structure:

keys for audio capture + keys for video capture + keys for audio encoding + keys for video encoding + keys for transmitting the data into the stream going to the server

Unfortunately, FFmpeg doesn’t support any service keys for separating the code sections, so it might be a bit confusing for inexperienced users. Later in the article, in the code descriptions, I will explain what keys relate to what actions.

Solution for Windows

Let’s take a look at how to stream the screensharing via FFmpeg. In the command prompt, run the following command:

ffmpeg.exe -f gdigrab -rtbufsize 100M -framerate 30 -probesize 10M -draw_mouse 1 -i desktop -c:v libx264 -r 30 -preset ultrafast -tune zerolatency -crf 25 -pix_fmt yuv420p -f flv rtmp://demo.flashphoner.com:1935/live/rtmp_stream

where:

//capture of video component of screensharing

-f gdigrab — Windows driver for screen capture;

-rtbufsize 100M — video buffer. The stream must be fast and smooth, with no frame drops. What’s why it’s better to record video to RAM, and then stream it using FFmpeg.

-framerate 30 — framerate upon screen capture;

-probesize 10M — number of frames necessary for FFmpeg to identify a stream;

-draw_mouse 1 — mouse cursor capture;

-i desktop — command for capturing the whole screen

//encoding of video component of screensharing

-c:v libx264 — compress video into MP4 with x264 codec;

-r 30 — codec shall record the video at 30 frames per second;

-preset ultrafast — tells the codec to not stall and start encoding as soon as possible (relevant when doing screen recording);

-tune zerolatency — an optional section for x264 codec that helps fasten the encoding;

-crf 25 — video quality (the bigger the number, the worse the quality, and the lower the number — the better);

-pix_fmt yuv420p — color format of the resulting video.

//forming the stream and transmitting it to WCS

-f flv rtmp://demo.flashphoner.com:1935/live/rtmp_stream — records into a stream titled “rtmp_stream” and transmits it to a server titled demo.flashphoner.com.

And so, we capture the screen for streaming

no-audio_windows_publish_stream_ffmpeg_screensharing_WebRTC_RTMP_WCS_bitrate_codec_framerate_video_audio_stream

and receive it on the WCS side. (From this point on, I use the player from this example: “Media Devices“, because it allows to indicate the absence and presence of audio stream on a screenshot.)

no-audio_windows_play_stream_ffmpeg_screensharing_WebRTC_RTMP_WCS_bitrate_codec_framerate_video_audio_stream

Receive the stream, albeit with no sound. The screenshot shows that the Audio stats section is full of zeroes.

Now let’s move on to audio capture.

The first thing to do is to determine what devices are available for audio capture. In the Windows console, run the following command:

ffmpeg -list_devices true -f dshow -i dummy

Receive the following response:

list_device_ffmpeg_screensharing_WebRTC_RTMP_WCS_bitrate_codec_framerate_video_audio_stream

The command brings up a list of devices that can record or playback sound: speakers, microphones and webcams. Find the name of the device you want to use for audio capture (the speakers). On the following screenshot the device I chose for capture is called Stereo Mix (Realtek (R) Audio). Stereo Mix is one of the best devices when it comes to audio capture. It is a virtual device that allows to mix various audio sources, those that are from within the OS and the ones from the microphone.

To start audio capture, add the following keys into the original screensharing code:

-f dshow -i audio="@device_cm_{33D9A762-90C8-11D0-BD43-00A0C911CE86}\wave_{8A4D10E5-8DB9-4B92-8C29-4BA2E60C1DDE}" ... -acodec aac

where:

//capture of audio component of screensharing

-f dshow — Windows driver for audio capture;

-i audio= — here, set the “audio” parameter as your the thing you use instead of our “Stereo Mix (Realtek (R) Audio)”

//capture of video component of screensharing

//encoding of audio component of screensharing

-acodec aac — audio compression is done via the aac codec

//encoding the video component of screensharing

//forming the stream and transmitting it to WCS

Launch it:

ffmpeg.exe -f dshow -i audio="@device_cm_{33D9A762-90C8-11D0-BD43-00A0C911CE86}\wave_{8A4D10E5-8DB9-4B92-8C29-4BA2E60C1DDE}" -rtbufsize 100M -f gdigrab -framerate 30 -probesize 10M -draw_mouse 1 -i desktop -acodec aac -c:v libx264 -r 30 -preset ultrafast -tune zerolatency -crf 25 -pix_fmt yuv420p -f flv rtmp://demo.flashphoner.com:1935/live/rtmp_stream

And enjoy:

stereo_mixer_windows_play_stream_ffmpeg_screensharing_WebRTC_RTMP_WCS_bitrate_codec_framerate_video_audio_stream

There is also an alternative method. It might be useful if your PC doesn’t have a Stereo Mix device or if your audio card driver doesn’t support it for some reason.

For this method you’ll need a small VB-Audio Virtual Cable utility (at the moment of this writing it is free).

VB-CABLE — is a virtual audio device that functions as a virtual audio cable. All the audio that goes into the CABLE input, is transmitted to the CABLE output.

Download and install VB-CABLE. The installation is nothing special.

Run the installer as Administrator:

run-as-administrator_ffmpeg_screensharing_WebRTC_RTMP_WCS_bitrate_codec_framerate_video_audio_stream

Click “Install Driver”

install_driver_ffmpeg_screensharing_WebRTC_RTMP_WCS_bitrate_codec_framerate_video_audio_stream

Once the installation is complete, set the default device to playback and record audio onto the virtual cable.

set_as_default_device_ffmpeg_screensharing_WebRTC_RTMP_WCS_bitrate_codec_framerate_video_audio_stream

Now, the audio from the running software will be sent to the Cable Output virtual device, which functions as a regular microphone, which allows for audio capture. There is one caveat, however: you will be unable to hear what you’re recording (meaning the audio will be transmitted to the virtual device instead of the speakers or the headset).

Now, determine the available devices for audio capture

ffmpeg -list_devices true -f dshow -i dummy

A new device has appeared on the list: CABLE Output (VB-Audio Virtual Cable)

list_device_Cable_Output_ffmpeg_screensharing_WebRTC_RTMP_WCS_bitrate_codec_framerate_video_audio_stream

Launch screensharing capture using this device. Address it using the alternative name:

ffmpeg.exe -f dshow -i audio="@device_cm_{33D9A762-90C8-11D0-BD43-00A0C911CE86}\wave_{F585B65B-4690-4433-8109-F16C6389C066}" -rtbufsize 100M -f gdigrab -framerate 30 -probesize 10M -draw_mouse 1 -i desktop -acodec aac -c:v libx264 -r 30 -preset ultrafast -tune zerolatency -crf 25 -pix_fmt yuv420p -f flv rtmp://demo.flashphoner.com:1935/live/rtmp_stream

Cable_Output_windows_play_stream_ffmpeg_screensharing_WebRTC_RTMP_WCS_bitrate_codec_framerate_video_audio_stream

Solution for Linux

In this example, we’re launching screensharing on ОS Ubuntu Desktop 20.04 LTS.

Let’s begin with determining the audio devices available for capture. Run the following command in the console:

pacmd list-sources

The results should look as shown on the screenshot. For further actions, we’re looking to the device titled Monitor of Built-in Audio Analog Stereo. It’s a virtual device that, much like its cousin Windows Stereo Mix, allows to mix the system audio with audio from the microphone. To proceed we shall require the index value of that device.

linux_list_device_ffmpeg_screensharing_WebRTC_RTMP_WCS_bitrate_codec_framerate_video_audio_stream

Start the screensharing stream via FFmpeg:

ffmpeg -f pulse -ac 2 -i 0 -f x11grab -rtbufsize 100M -s 1200x720 -framerate 30 -probesize 10M -draw_mouse 1 -i :0.0 -acodec aac -c:v libx264 -r 30 -preset ultrafast -tune zerolatency -crf 25 -pix_fmt yuv420p -f flv rtmp://demo.flashphoner.com:1935/live/rtmp_stream

where:

//capture of audio component of screensharing

-f pulse — Linux driver for audio capture;

-ac 2 — stereo audio mode (-ac 1 — mono audio mode);

-i 0 — specify the device index for audio capture. As described, for Monitor of Built-in Audio Analog Stereo that value is “0”.

//capture of video component of screensharing

-f x11grab — required option for screen capture in Linux;

-rtbufsize 100M — video buffer;

-s 1200×720 — size of captured screen area (px);

-framerate 30 — framerate of capture;

-probesize 10M — number of frames required for FFmpeg to identify a stream;

-draw_mouse 1 — mouse cursor capture;

-i :0.0 — coordinates of the first pixel of the captured screen area.

//encoding of audio component of screensharing

-acodec aac — audio compression is done via the aac codec.

//encoding of video component of screensharing

-c:v libx264 — compress video into MP4 with x264 codec;

-r 30 — codec shall record video at 30 frames per second;

-preset ultrafast — makes it so the encoding is done as fast as possible (relevant when doing screen recording);

-tune zerolatency — an optional section for x264 codec that helps fasten the encoding;

-crf 25 — video quality (the bigger the number, the worse the quality, and the lower the number — the better);

-pix_fmt yuv420p — color format of the resulting video.

//forming the stream and transmitting it to WCS

-f flv rtmp://demo.flashphoner.com:1935/live/rtmp_stream — records into a stream titled “rtmp_stream” and transmits it to a server titled demo.flashphoner.com.

Now let’s test it.

Start the playback of a video in the player, then run the abovementioned FFmpeg command in the Ubuntu console:

linux_publish_stream_ffmpeg_screensharing_WebRTC_RTMP_WCS_bitrate_codec_framerate_video_audio_stream

On the WCS server side, open any suitable player and launch the playback of the captured screensharing stream via WebRTC (stream name — rtmp_stream).

linux_play_stream_ffmpeg_screensharing_WebRTC_RTMP_WCS_bitrate_codec_framerate_video_audio_stream

A side-by-side comparison of FFmpeg commands for Windows and Linux

Here’s a comparison table for the FFmpeg keys for screensharing with system audio capture and WCS streaming

  Windows Linux
Driver for audio capture -f dshow -f pulse
Displaying the audio capture device -i audio=”@device_cm_{33D9A762-90C8-11D0-BD43-00A0C911CE86}\wave_{8A4D10E5-8DB9-4B92-8C29-4BA2E60C1DDE}” -i 0
Driver for screen capture -f gdigrab -f x11grab
Video buffer -rtbufsize 100M -rtbufsize 100M
Captured screen area size -i desktop -i :0.0 <br /> -s 1200×720
Framerate of screen capture -framerate 30 -framerate 30
Number of frames necessary for stream identification -probesize 10M -probesize 10M
Mouse cursor capture -draw_mouse 1 -draw_mouse 1
Codec for audio compression -acodec aac -acodec aac
Codec for video compression -c:v libx264 -c:v libx264
Framerate of resulting video -r 30 -r 30
Option for faster encoding -preset ultrafast -preset ultrafast
Option for faster encoding for x264 codec -tune zerolatency -tune zerolatency
Quality of captured video -crf 25 -crf 25
Color format of the resulting video -pix_fmt yuv420p -pix_fmt yuv420p
Streaming to the demo.flashphoner.com server -f flv rtmp://demo.flashphoner.com:1935/live/rtmp_stream -f flv rtmp://demo.flashphoner.com:1935/live/rtmp_stream

 

Conclusion

Today we’ve taken a close look at screensharing with audio capture using FFmpeg for Windows and Linux. You can perform any actions that are supported by WCS with the screensharing streams you now know how to make. You can record them, play, retranslation them, transcode them or add them to mixer.

See you later on this blog and good streaming to you!

Links

Demo

WCS on Amazon EC2

WCS on DigitalOcean

WCS in Docker