a few words about

Speech recognition technologies have already quite widely entered our daily lives.

We use them while driving, voice assistants live in smart speakers at home. Smart home control systems are gaining more and more popularity every day.

Someone, probably, even uses voice assistants in their smartphones.

a few words about

Speech recognition technologies have already quite widely entered our daily lives.

We use them while driving, voice assistants live in smart speakers at home. Smart home control systems are gaining more and more popularity every day.

Someone, probably, even uses voice assistants in their smartphones.

a few words about

Speech recognition technologies have already quite widely entered our daily lives.

We use them while driving, voice assistants live in smart speakers at home. Smart home control systems are gaining more and more popularity every day.

Someone, probably, even uses voice assistants in their smartphones.

a few words about

ACCESSIBILITY

Speech recognition and synthesis technologies certainly make life easier for people who cannot use other input interfaces and cannot see the information displayed on the screen.

It can be both our elderly parents and ourselves, so the accessibility of interfaces should be paid enough attention.

a few words about

Speech recognition and synthesis technologies certainly make life easier for people who cannot use other input interfaces and cannot see the information displayed on the screen.

It can be both our elderly parents and ourselves, so the accessibility of interfaces should be paid enough attention.

a few words about

https://developer.mozilla.org

The Web Speech API enables you to incorporate voice data into web apps. The Web Speech API has two parts:

SpeechSynthesis (Text-to-Speech)
SpeechRecognition (Asynchronous Speech Recognition)

Web Speech API

a few words about

https://developer.mozilla.org

The Web Speech API enables you to incorporate voice data into web apps. The Web Speech API has two parts:

SpeechSynthesis (Text-to-Speech)
SpeechRecognition (Asynchronous Speech Recognition)

Web Speech API

a few words about

https://developer.mozilla.org

Web Speech API

The Web Speech API enables you to incorporate voice data into web apps. The Web Speech API has two parts:

SpeechSynthesis (Text-to-Speech)
SpeechRecognition (Asynchronous Speech Recognition)

Speech recognition is accessed via the SpeechRecognition interface, which provides the ability to recognize voice context from an audio input (normally via the device's default speech recognition service) and respond appropriately. Generally you'll use the interface's constructor to create a new SpeechRecognition object, which has a number of event handlers available for detecting when speech is input through the device's microphone.

a few words about

https://developer.mozilla.org

Speech Recognition API

This API has a set of standard properties, methods, and events.

Here are some of them:

Starts the speech recognition service listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition.

Stops the speech recognition service from listening to incoming audio, and attempts to return a SpeechRecognitionResult using the audio captured so far.

SpeechRecognition.start()

SpeechRecognition.stop()

SpeechRecognition.abort()

Stops the speech recognition service from listening to incoming audio, and doesn't attempt to return a SpeechRecognitionResult.

a few words about

https://developer.mozilla.org

browser compatibility

This is an experimental technology, therefore it has very limited browser support

(in fact, only Сhromium supports it right now)

a few words about

browser compatibility

 <script type="text/javascript">
  const recognition = new SpeechRecognition();
  recognition.onresult = (event) => {
    if (event.results.length > 0) {
      q.value = event.results[0][0].transcript;
      q.form.submit();
    }
  }
</script>
 
<form action="https://www.example.com/search">
  <input type="search" id="q" name="q" size=60>
  <input type="button" value="Click to Speak" onclick="recognition.start()">
</form>

a few words about

example

From: https://wicg.github.io/speech-api/#examples-recognition

a few words about

Now speech synthesis technologies are widely used for various tasks, such as voicing text in online translators (in one of which I am writing this text right now), or for web-surfing with screen readers.
But speech recognition is used mainly in applications on native platforms and is not very common in the browser web. This is probably because the browser has restrictions on the use of the microphone related to privacy and security, and of course, due to the low browser support.

This makes it difficult a little to use this API in practice, but I tried to find some examples of usage Speech Recognition API that would be interesting to implement for the web-developer in some of his pet projects.

let's make a page for your cat!

with the ability to switch between light and dark themes using speech recognition

 <!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
	...
  <title>Karasique's Page</title>
</head>
<body class="page theme theme_dark1">
  <button class="button toggler" id="speech-btn">&#128540;</button>
  <h1 class="heading">most famous cat in web</h1>
  <p class="output"></p>
  <section class="section">
    <h2 class="heading">
      Meow
    </h2>
    <div class="wrapper">
      <ul class="list">
        <li class="list__item">
          <button class="button button_red">Meow!</button>
        </li>
        <li class="list__item">
          <button class="button button_green">Meow?</button>
        </li>
      </ul>
      <p class="paragraph">
        Meow meow meow meow meow meow meow meoweow meow meow 
			...
        meow meow meow meow.
      </p>
    </div>
  </section>
  <img class="photo"  src="/image.png" alt="Karasique" width="700">
<script src="script.js"></script>
</body>
</html>

Let's start by adding markup and some content to the page.

 :root {
  --transition: all 0.4s ease-in-out;
  --image-width: 70rem;
 
  margin: 0;
  font-size: 62.5%;
  overflow: hidden;
}
 
.theme_dark {
  --background-default: #232946;
  --text-default: #b8c1ec;
  --text-heading: #fffffe;
  --background-danger: #b011e0;
  --background-success: #3886ec;
}
 
.theme {
  background-color: var(--background-default, #abd1c6);
  color: var(--text-default, #0f3433);
  transition: var(--transition);
}

Now add different handsome stuff

(with css-custom-properties, I really love them)

It's a small fragment of code, but it does the main job:

We got this. Pretty nice.

 const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const SpeechGrammarList = window.SpeechGrammarList || window.webkitSpeechGrammarList;
const SpeechRecognitionEvent = window.SpeechRecognitionEvent || window.webkitSpeechRecognitionEvent;
 
const colors = ['dark', 'light'];
const grammar = '#JSGF V1.0; grammar colors; public <color> = ' + colors.join(' | ') + ' ;'
 
const recognition = new SpeechRecognition();
 
const speechRecognitionList = new SpeechGrammarList();
speechRecognitionList.addFromString(grammar, 1);
recognition.grammars = speechRecognitionList;
recognition.lang = 'en-US';
recognition.interimResults = false;

It's time to do some serious things!

Here we connect and configure our Speech Recognition API:

 let micOn = false;
 
onClick = () => {
  if (!micOn) {
    ding.play();
    recognition.start();
    micOn = true;
  }
}
 
button.addEventListener('click', onClick);
 
recognition.onresult = (event) => {
  const result = event.results[0][0].transcript;
 
  if (colors.includes(result)) {
    switch (result) {
      case 'dark':
        page.classList.add('theme_dark');
        break;
 
      default:
        page.classList.remove('theme_dark');
        break;
    }
  }
}

And here we listen to the click on the button and fire the event.

API will do the rest for us 👍

what for?

a few words about

a few words about

a few words about

a few words about

ACCESSIBILITY

a few words about

a few words about

Web Speech API

a few words about

Web Speech API

a few words about

Web Speech API

a few words about

Speech Recognition API

a few words about

browser compatibility

a few words about

browser compatibility

a few words about

example

a few words about

let's make a page for your cat!

let's make a page for your cat!

THANK YOU FOR WATCHING!

GOODBYE!

	<script type="text/javascript">
	const recognition = new SpeechRecognition();
	recognition.onresult = (event) => {
	if (event.results.length > 0) {
	q.value = event.results[0][0].transcript;
	q.form.submit();
	}
	}
	</script>

	<form action="https://www.example.com/search">
	<input type="search" id="q" name="q" size=60>
	<input type="button" value="Click to Speak" onclick="recognition.start()">
	</form>

	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8">
	...
	<title>Karasique's Page</title>
	</head>
	<body class="page theme theme_dark1">
	<button class="button toggler" id="speech-btn">😜</button>
	<h1 class="heading">most famous cat in web</h1>
	<p class="output"></p>
	<section class="section">
	<h2 class="heading">
	Meow
	</h2>
	<div class="wrapper">
	<ul class="list">
	<li class="list__item">
	<button class="button button_red">Meow!</button>
	</li>
	<li class="list__item">
	<button class="button button_green">Meow?</button>
	</li>
	</ul>
	<p class="paragraph">
	Meow meow meow meow meow meow meow meoweow meow meow
	...
	meow meow meow meow.
	</p>
	</div>
	</section>
	<img class="photo" src="/image.png" alt="Karasique" width="700">
	<script src="script.js"></script>
	</body>
	</html>

	:root {
	--transition: all 0.4s ease-in-out;
	--image-width: 70rem;

	margin: 0;
	font-size: 62.5%;
	overflow: hidden;
	}

	.theme_dark {
	--background-default: #232946;
	--text-default: #b8c1ec;
	--text-heading: #fffffe;
	--background-danger: #b011e0;
	--background-success: #3886ec;
	}

	.theme {
	background-color: var(--background-default, #abd1c6);
	color: var(--text-default, #0f3433);
	transition: var(--transition);
	}

	const SpeechRecognition = window.SpeechRecognition \|\| window.webkitSpeechRecognition;
	const SpeechGrammarList = window.SpeechGrammarList \|\| window.webkitSpeechGrammarList;
	const SpeechRecognitionEvent = window.SpeechRecognitionEvent \|\| window.webkitSpeechRecognitionEvent;

	const colors = ['dark', 'light'];
	const grammar = '#JSGF V1.0; grammar colors; public <color> = ' + colors.join(' \| ') + ' ;'

	const recognition = new SpeechRecognition();

	const speechRecognitionList = new SpeechGrammarList();
	speechRecognitionList.addFromString(grammar, 1);
	recognition.grammars = speechRecognitionList;
	recognition.lang = 'en-US';
	recognition.interimResults = false;

	let micOn = false;

	onClick = () => {
	if (!micOn) {
	ding.play();
	recognition.start();
	micOn = true;
	}
	}

	button.addEventListener('click', onClick);

	recognition.onresult = (event) => {
	const result = event.results[0][0].transcript;

	if (colors.includes(result)) {
	switch (result) {
	case 'dark':
	page.classList.add('theme_dark');
	break;

	default:
	page.classList.remove('theme_dark');
	break;
	}
	}
	}