voice recognition

Speech Recognition in the Browser

Last Thursday I had a pleasure to give talk about Speech Recognition in the Browser at the Code Fellows in Seattle.

Many people were surprised how easy it is to add speech recognition to your website with pure JavaScript. So I thought I will share a few code snippets here. It works in Chrome only so far.

Recognizing speech

This is how you can translate speech to text:

var sr = new webkitSpeechRecognition();
sr.onresult = function (evt) {
    console.log(evt.results[0][0].transcript);
}
sr.start();

You can also get the confidence level of the result:

var sr = new webkitSpeechRecognition();
sr.onresult = function (evt) {
    console.log(evt.results[0][0].transcript, evt.results[0][0].confidence);
}
sr.start();

You can get interim results:

sr.interimResults = true;	// false by default
sr.onresult = function(evt) {
	for (var i = 0; i < evt.results.length; ++i) {
		console.log(evt.results[i][0].transcript);
	};
};

Or different alternatives of recognized speech:

sr.maxAlternatives = 10;	// default = 1
sr.onresult = function(evt) {
	for (var i = 0; i < evt.results[0].length; ++i) {
		console.log(evt.results[0][i].transcript);
	}
}

You can set a language, e.g., to Polish:

sr.lang = 'pl-PL'

All above will stop recognition when you stop speaking. In order to do not stop recognition you need to set continuous flag to true. Additionally, this will treat every fragment of you speech as interim result, so you need to update onresult callback too:

sr.continuous = true;	// false by default
sr.onresult = function(evt) {
	console.log(evt.results[evt.results.length-1][0].transcript);
};

Speech Recognition object has other callbacks (than onresult) that you can take advantage of:

sr.onstart = function() { console.log("onstart"); }
sr.onend = function() { console.log("onend"); }
sr.onspeechstart = function() { console.info("speech start"); }
sr.onspeechend = function() { console.info("speech end"); }

Emitting speech

var msg = new SpeechSynthesisUtterance('Hi, I\'m Jakub!');
speechSynthesis.speak(msg);

You can also change the speaker voice:

var voices = window.speechSynthesis.getVoices();
msg.voice = voices[10]; // Note: some voices don't support altering params

There is also other options you can set:

msg.volume = 1; // 0 to 1
msg.pitch = 2; //0 to 2
msg.text = 'Hello World';
msg.lang = 'en-US';

msg.onend = function(e) {
	console.log('Finished in ' + event.elapsedTime + ' seconds.');
};

Summary

Speech is coming to the browser, and you can not stop it. The question is when most of websites will add voice support. Check out voiceCmdr – a library that I blogged about earlier this year, which helps to add voice commands to your websites in very easy way. You can also check out website that can be navigated with voice commands – you can find available commands in my blog post. You can find entire logic for voice commands support in this file (lines: 38-103).


Website with speech recognition for free

A few weeks ago I blogged about voiceCmdr – library for adding voice commands to website (built on top of Web Speech API).

I put up simple website – BooksLib – a books library that allows up voting books, adding to favorites, and searching.

This application enables also voice interaction. You can check it live here (Azure) or here (Heroku).

It works in two modes:

  1. continuous – website is listening for commands continuously
  2. single – website is listening for a single command, and stops listening after receiving

You can enable one of two modes through panel on the top-right corner:

BooksLib - voice panel

Available commands:

  • Home – go to home site
  • Books – go to books site
  • Favorites – go to favorites site
  • Top 10 – go to top 10 site
  • Search [phrase] – search for given phrase (e.g. “Search JavaScript” will display all books with “JavaScript” phrase in title)
  • Favorite – add currently displayed book to favorites (can be used only when single books is displayed)

You can check how these commands were added in lines 60-85, in app.js file (only 25 lines including empty lines and brackets!).

To add voice commands to your website easily check out voiceCmdr library.

* Voice commands works only in Google Chrome – the only web browser that supports Web Speech API so far.


voiceCmdr – voice commands in the Browser

voice recognition

Recently I discovered Web Speech API. I was already talking to the browser using Google Hangout and Google Translator, but I have never thought about adding voice support to my own website.

I did some research, and I found a demo. Based on that I put up simple demo website (say: “show website blog”, and it will take you directly to the sub page that can be also approached with 3 mouse clicks). For now speech recognition works only in Google Chrome and Safari. In Chrome it is not SpeechRecognition API, but webkitSpeechRegognition API. I hope, in the near future, other browsers will also implement it. Especially Spartan, which is integrated with Cortana.

I noticed that while the API is flexible, it is not easy to use. I think, for most common scenarios, developer would like to be able to add commands associated with function callbacks, and control recognition state with start/stop actions.

I created a JavaScript library voiceCmdr. It is a single .js file without any dependencies. You can install it with npm or bower (check README on github).

You can add commands:

voiceCmdr.addCommand("go home", function () {
  // go to home page
});

Callback function can have parameter, which is everything you said after command. E.g.:

voiceCmdr.addCommand("search", function (param) {
  // search for phrase specified in param
});

You can also remove commands:

voiceCmdr.removeCommand("go home");

In order to start listening for commands:

voiceCmdr.start();

To stop listening:

voiceCmdr.stop();

You can also invoke listening for single command:

voiceCmdr.getCommand();

Check examples, and be aware that Web Speech API works only through http, and https (it will not work if you open static html file). The easiest way to run server is to use python SimpleHTTPServer:

python -m SimpleHTTPServer 8080

This can also go another way (by browser talking to you) with Web Synthesis API.

I am curious what do you think. Are we ready for voice commands in the browser? Are you concerned about your privacy (check Scott Hanselman’s tweet)?