Some of our customers asks:
– So what about HTML5 phone? Is it already done?

In this article, we want to explain situation regarding HTML5, SIP, WebRTC phone solution for Windows, iOS, Androind, iPad, iPhone.

Some peoples reads tech news and thinks that future is already here, and HTML5 is a standard, which gives us possibility to create real time audio/video communications in desktop or mobile browsers like iPhone iPad devices.

Unfortunatelly it`s not true.

We would like to explain technologies and describe its perspectives and restrictions for now.
In short – the HTML5 phone is possible to make and it will work everywhere, but not right now.
It will be available in 6-12 months(but not for sure) depending on activity of browser’s vendors.

Several terms:

HTML5 is a new standard for browsers, including some additional tags like audio and video.
SIP – session initiation protocol aka rfc3261
WebRTC – open source framework for browser that is able to send or receive encoded audio and video over network.

So, lets find out what is the “HTML5 phone”, and what technologies do we need to create it.
First of all HTML5 phone is not really “HTML5″, because nobody can create SIP phone using pure HTML5.
Surprised? But it is true.

Media traffic:
1. HTML5 can not encode audio+video.
2. HTML5 can not capture audio+video from microphone and camera.
3. HTML5 can not send audio and video data over TCP or UPD protocol.
4. HTML5 does not support Echo cancellation

What about SIP signaling:
HTML5 can not communicate with any SIP devices by TCP and UDP.

Maybe this list is not full yet, but it is quite enough to understand that HTML5 phone is impossible for now.
Yes, HTML5 standard supports playing video over HTTP and playing audio/video files, but that’s all current standard can do.
It`s not designed for any communications, and we think that the standard will never support real-time features,
because it is just HTML(Hypertext Markup Language) version.

Although… we still can create some type of “HTML5 phone”. But we need to add some technologies.
We will need 2 modules: 1) Signaling module 2) Media module
So, what name should our HTML5 phone have?

In fact, HTML5 will do NOTHING or almost nothing in the phone application.
Everything it will do is playing “din-don” for incoming call. It is just playing a ringtone file.
So… we will put “HTML5″ keyword at the end of our solution name, or even delete it.

Lets check what words do we need to put at first. There are two new technologies:
Websocket and WebRTC. “HTML5 phone” is impossible without it.

Websocket is a protocol designed for Browser-Server communications over stateful TCP connection.
What is the difference between the Websocket protocol and plain HTTP? HTTP is a connectionless protocol. It does not keep connection. Browser creates new connection for every HTTP GET/POST request.
If you need to implement real-time data application, you can use HTTP, but plain HTTP gives VERY high traffic overhead for those applications.

For example, you need to receive stock prices from server every second. In case of plain HTTP protocol, you need to initiate a new HTTP request to get changes every second.
If your requirement is 20 ms latency, HTTP will give you too high overhead in terms of traffic and CPU resources.

Websocket uses 1 persistent TCP connection for all incoming and outgoing traffic.
Websocket works over TCP. This is reliable protocol. So, all the data will be reliably delivered.

Webscocket is the only way to implement low-overhead SIP signaling for browser application. Maybe, you will find solutions, where SIP signaling is implemented over HTTP.
It will be a PHP server that maintains the SIP state machine and receives HTTP requests from web-browser, but it is many times worse than Websockets.
Thus, Adobe Flash Player has RTMP protocol that can do the same. But, in this article we tell about Flash-free application as far as it is.

Websocket protocol is supported in the latest Chrome, Mozilla, Safari, IE 10 browsers and has Javascript API to manipulate websockets.
For example, you can open connection or send/receive data from pure Javascript code.
Thus, to implement SIP signaling we should have Javascript SIP stack, that works over the Websocket protocol.

So, our HTML5 phone solution should be called Javascript SIP Websocket HTML5 phone or shortly: “JS SIP Websocket HTML5 phone”.
Is it all? Not exactly.

Unfortunately, SIP over Websocket protocol is not supported by current VOIP vendors and millions of VoIP switches all over the world.
It means if you made Javascript SIP Websocket HTML5 phone, your phone would not work with 99% of the SIP vendors, because vendors do not implement this
protocol in their products. Why?

There are two reasons:

Draft “SIP over Websocket” is not in RFC state. It is just draft. It is not a standard.
SIP vendors and providers are very conservative. Many SIP providers support only UDP protocol for SIP, and do not support TCP and Websockets.

Register on 10 SIP providers sites and try to SIP register over TCP transport. You will mostly see response “Unavailable”.
So, according to this and other reasons, Websocket protocol is not supported by wide range of SIP engines.
So what is a solution?

We can solve this problem using intermediate server software, that converts Websocket SIP requests/responses to UDP/SIP requests/responses and vice versa.
This solution’s name is Websocket SIP Proxy Server, that supports Websocket+SIP specification and pure SIP specification to get compatibility with
the vendors which does not support Websocket over SIP.

In fact, this proxy server will convert Websocket/TCP based reliable protocol to unreliable UDP. According to SIP rfc3261 specification, proxy server must
handle messages statefully. So our server should not just forward messages, but also it must support separate transactions: Websocket SIP client – Websocket SIP proxy
and Websocket SIP proxy – SIP vendor and provider and maintain the state of transactions.

So, our solution should be called “Js SIP Websocket HTML5 phone + Websocket SIP proxy”.

Ok, we have included signaling and intermediate proxy server to our HTML5 phone solution. Also HTML5 is used in the solution to play sounds. But it is still not working solution, because browser does not support streaming. Streaming is an audio/video capture, encoding, sending over transport protocol, receiving and playback, different codecs, Echo cancellation, bufferization.

Of course, Adobe Flash Player has all these features, required for streaming:

– Audio/video capture and playback
– H.264, Sorenson spark
– G.711, Speex, Nelly Moser
– Acoustic Echo Cancellation
– RTMP/RTMFP transport protocols for media relay
– But we still want to avoid Flash player and get pure browser solution without any browser plugins. So let’s move forward. Alternative streaming technology for Flash Player is the WebRTC. This is streaming framework designed for browser vendors.

At the moment, there is no browser, that officially supports this build-in framework.
That means your users can enable WebRTC function in some browsers, but these settings are turned off by default and experimental for now.
This technology is promoted by Google and Mozilla and has experimental support in Chrome and Firefox browsers. You need to change some settings to enable it.

WebRTC features:

– Audio/Video capture and playback
– iSAC/iLBS audio codec
– VP8 video codec
– SRTP
– Acoustic Echo cancellation
– Video Jitter Buffer
– Similar features represented in Adobe Flash Player 11.

So, what is better: Adobe Flash Player or WebRTC?
Let’s consider pros and cons:

WebRTC

+ iLBS audio codec is VoIP standard
+ VP8 is a well-suited video codec that designed for web video relay in real-time mode
SRTP – Secure Realtime Transport Protocol is open protocol, that described in RFC. Many vendors does not support Secure RTP. Probably, we need to convert SRTP to RTP and vice versa for vendors, which do not support the SRTP protocol.
VP8 is not compatible with products of many VoIP vendors. So, we probably will need to implement server-side transcoding between VP8 and H.264(it is VoIP video standard currently).
WebRTC is not supported by 99% of browsers. Only one browser, that supports this technology, is enabled in experimental browser Chrome Canary.

Adobe Flash Player

Browser plugin.
If Adobe Flash Player was an open source and positioned like WebRTC,
it would be an excellent solution. But Flash player is a plugin, proprietary plugin for browser.
So, developers, end users have dependency from Adobe while using Flash Player.
Incompatible protocols.
RTMP and RTMFP is well-suited protocols for media relay, but there are proprietary protocols and we can not communicate over such protocols with any VoIP vendor.
We need Flash-VoIP gateway to send and receive media data between RTMP/RTMFP and RTP(Realtime Transport Protocol).
+ H.264 encoding/decoding playback support, because H.264 is a robust standard for video in VoIP industry, and it gives high compatibility with wide range of VoIP systems.
+ G.711, Speex codecs are standards for VoIP too.
+ High availability. It works on 99% of desktop browsers where Adobe Flash Player is installed and Adobe Flash Player has auto-updates since 11.2 version.
+ Robust transport protocols as RTMP/RTMFP, tested in millions applications.

Thus, WebRTC is perspective technology.This technology can take a place of Flash Player for Browser-VoIP communications in future.
Therefore, we would like to see such features in WebRTC:

– G.711, G.729 audio codecs
– H.264 video codec
– Non-encrypted RTP support
– Compatibility with list of top browsers, such as IE, Chrome, Mozilla, Safary, Opera.
But WebRTC is objectively worse as Browser to VoIP communication platform than Flash Player and is not suitable for production usage for now. This situation will be changed if top browser vendors release WebRTC support. We should wait for a while.

What about mobile devices?
Nothing. There is no difference, where browser is installed – on desktop platform or mobile device.
This browser must support WebRTC for streaming.
There is one browser we know that can do it. It is Chrome Canary.
If you compile Chrome Canary browser with WebRTC support for mobile platform it will work. But we have not tested yet. So, in the world of mobile technologies we have the same situation. We are waiting for mobile browsers with WebRTC support…

So, what name will our solution have in the end?
Solution name is: “Javascript SIP Websocket HTML5 phone + Websocket SIP proxy + WebRTC”.

Too long, isn’t it?
But we can not exclude any element of this name. This is the real name of HTML5 phone. Now, you aware that “HTML5 phone” does not exists, and that HTML5 phone is a complex solution that is based on top of several technologies.

How can we use this solution for production right now?

There is one trick.
We can exclude WebRTC from this scheme and replace WebRTC with Adobe Flash Player application. This application will provide audio and video streaming and other features which supported by WebRTC. Yes, in this case we are Flash-dependent again. But this solution is closer to “HTML5 phone dream” and to production than WebRTC.

Feel free to try our solution below.

We are glad to introduce you Javascript SIP Websocket HTML5 phone + Websocket SIP proxy + Flash solution.