React - Voice Assistance using React Speech Recognition

It is interesting when we speak something and our computer translates it into human readable text automatically.

React - Voice Assistance using React Speech Recognition

In this blog post, we will look at how to implement voice assistance in React app using React Speech Recognition. 

What is React Speech Recognition?

It is a React hook that accesses the Web Speech API to convert speech from the machine's microphone to the app's React components. It basically translates speech from your device's mic into text. This text can then be read by our React app and perform tasks.

There are two important things in this framework:

  • useSpeechRecognition, a React hook that gives component access to a transcript of speech picked up from the user's microphone.
  • SpeechRecognition manages the global state of the Web Speech API, exposing functions to turn the microphone on and off.

Prerequisites:

To use React Speech Recognition, we need to have React 16.8 or above so that React hooks can be used, you can see the full package README here.

Browser Support:

React Speech Recognition supports following browsers:

  • Google Chrome (recommended)
  • Microsoft Edge
  • Google Chrome for Android
  • Android Webview
  • Samsung Internet

Unfortunately, iOS does not support these APIs.

What we'll make?

We will make a simple web page that has search bar with mic icon at the end of the input field. Initially the mic icon is off, when you click it, the mic icon turns on and your browser will ask you to allow your device microphone to listen for first time. As soon as you speak, your speech translates to text and is displayed in the search bar. You can reset the text with Reset button below the search bar. When you turn your mic on and won't speak anything, the mic turns off after 6 seconds (you can change this in code). You can also write your own commands to do particular task.

Let's get started!

Setting up the workspace:

1) Create a new react app with create-react-app

npx create-react-app vb-voice-assistant

2) After initialization of the project with the above command, install react-speech-recognition.

npm i react-speech-recognition
// if using yarn
yarn add react-speech-recognition

3) After completion of above steps, open App.js file, remove everything from it and add the following content to it.

import VoiceAssistant from "./components/voice-assistant/VoiceAssistant";

const App = () => (
  <>
    <VoiceAssistant />
  </>
);

export default App;

Don't worry, we haven't created VoiceAssistant component yet. Our app only has this component which you will see next.

4) Create a directory named components inside src directory. Inside the components directory, create another directory named voice-assistant. We will create our VoiceAssistant component in this directory. So create a JavaScript file named VoiceAssistant.js inside it and add the following content to it.

import React from 'react';

const VoiceAssistant = () => {
  return (
    <>
      Voice Assistant
    </>
  );
}

export default VoiceAssistant;

I won't cover the CSS part, you can look at source code attached to this blog post or at this github repo. index.css file contains the most of the css related to this project.

5) Add the following content to get a nice look for web page. Make sure you add the following line, look at the source code for the css part.

import '../../assets/css/voice_assistant.scss';
import React from 'react';
import '../../assets/css/voice_assistant.scss';

const VoiceAssistant = () => {
  return (
    <>
      <div className="voice-asst">
        <form>
          <img
            src="https://velocitybytes.com/uploads/logo/logo_60bbbd94a064d1.png"
            alt="VelocityBytes"
            width="145" height="46"
          />
          <fieldset>
            <legend>Voice Assistant</legend>
            <div className="inner-form">
              <div className="input-field">
                <button className="btn-search" type="button">
                  <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24">
                    <path d="M15.5 14h-.79l-.28-.27C15.41 12.59 16 11.11 16 9.5 16 5.91 13.09 3 9.5 3S3 5.91 3
                    9.5 5.91 16 9.5 16c1.61 0 3.09-.59 4.23-1.57l.27.28v.79l5 4.99L20.49 19l-4.99-5zm-6 0C7.01 14
                     5 11.99 5 9.5S7.01 5 9.5 5 14 7.01 14 9.5 11.99 14 9.5 14z"/>
                  </svg>
                </button>
                <button
                  className="btn-voice"
                  type="button"
                >
                  <svg width="24px" height="24px" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
                  <g data-name="Layer 2">
                    <g data-name="mic">
                      <rect width="24" height="24" opacity="0"/>
                      <path d="M12 15a4 4 0 0 0 4-4V6a4 4 0 0 0-8 0v5a4 4 0 0 0 4 4z"/>
                      <path d="M19 11a1 1 0 0 0-2 0 5 5 0 0 1-10 0 1 1 0 0 0-2 0 7 7 0 0 0 6 6.92V20H8.89a.89.89 0 0
                          0-.89.89v.22a.89.89 0 0 0 .89.89h6.22a.89.89 0 0 0 .89-.89v-.22a.89.89 0 0 0-.89-.89H13v-2.08A7
                          7 0 0 0 19 11z"/>
                    </g>
                  </g>
                </svg>
                </button>
                <input
                  type="search"
                  placeholder="Speak something..."
                  value=""
                />
              </div>
              <div className="suggestion-wrap">
                <button
                  type="button"
                  className="reset-button"
                >
                  Reset
                </button>
              </div>
            </div>
          </fieldset>
        </form>
      </div>
    </>
  );
}

export default VoiceAssistant;

Working on Speech Recognition:

6) Let's work on the meat part. Adding React Speech Recognition Hooks.

To use React Speech Recognition, we must first import it into the component. We will use the useSpeechRecognition hook and the SpeechRecognition object.

To import them:

import SpeechRecognition, {useSpeechRecognition} from 'react-speech-recognition';

To start listening to the user's voice, we will have to call startListening() function.

SpeechRecognition.startListening();

To stop listening, we call stopListening() function.

SpeechRecognition.stopListening();

To get the transcript (text basically) of the user's speech, we will use transcript.

const { transcript } = useSpeechRecognition();

To reset or clear the value of the transcript, you can call resetTranscript().

const { resetTranscript } = useSpeechRecognition();

To check whether the browser supports Web Speech API or not, we can use this:

if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
    return (
      <>
        Your browser doesn't support speech recognition
      </>
    );
  }

Adding pieces together:

7) Let's join the pieces that we have covered so far.

import React, {useEffect, useState} from 'react';
import SpeechRecognition, {useSpeechRecognition} from 'react-speech-recognition';
import '../../assets/css/voice_assistant.scss';

const VoiceAssistant = () => {
  const {
    transcript,
    resetTranscript
  } = useSpeechRecognition();

  const [isMicOn, setIsMicOn] = useState(true);
  const [isListening, setIsListening] = useState(false);
  const [searchTerm, setSearchTerm] = useState('');

  useEffect(() => {
    setSearchTerm(transcript);
    // eslint-disable-next-line react-hooks/exhaustive-deps
  }, [transcript]);

  useEffect(() => {
    setTimeout(() => {
      if (isListening && searchTerm.length === 0) {
        setIsListening(false);
        setIsMicOn(true);
      }
    }, 6000);
    // eslint-disable-next-line react-hooks/exhaustive-deps
  }, [isListening]);

  const handleResetTranscript = (event) => {
    resetTranscript();
    event.preventDefault();
    event.target.classList.remove('animate');
    event.target.classList.add('animate');

    setTimeout(function(){
      event.target.classList.remove('animate');
    }, 700);
  }

  let bubblyButtons = document.getElementsByClassName("reset-button");

  for (let i = 0; i < bubblyButtons.length; i++) {
    bubblyButtons[i].addEventListener('click', handleResetTranscript, false);
  }

  const handleMic = () => {
    setIsMicOn(!isMicOn);
    if (isMicOn) {
      setIsListening(true);
      SpeechRecognition.startListening()
        .then(() => {})
    }
    if (!isMicOn) {
      SpeechRecognition.stopListening();
      SpeechRecognition.abortListening();
      setIsListening(false);
    }
  };

  if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
    return (
      <>
        Your browser doesn't support speech recognition
      </>
    );
  }

  return (
    <>
      <div className="voice-asst">
        <form>
          <img
            src="https://velocitybytes.com/uploads/logo/logo_60bbbd94a064d1.png"
            alt="VelocityBytes"
            width="145" height="46"
          />
          <fieldset>
            <legend>Voice Assistant</legend>
            <div className="inner-form">
              <div className="input-field">
                <button className="btn-search" type="button">
                  <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24">
                    <path d="M15.5 14h-.79l-.28-.27C15.41 12.59 16 11.11 16 9.5 16 5.91 13.09 3 9.5 3S3 5.91 3
                    9.5 5.91 16 9.5 16c1.61 0 3.09-.59 4.23-1.57l.27.28v.79l5 4.99L20.49 19l-4.99-5zm-6 0C7.01 14
                     5 11.99 5 9.5S7.01 5 9.5 5 14 7.01 14 9.5 11.99 14 9.5 14z"/>
                  </svg>
                </button>
                <button
                  className="btn-voice"
                  type="button"
                  onClick={handleMic}
                >
                  {
                    !isMicOn ? (<svg width="24px" height="24px" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
                      <g data-name="Layer 2">
                        <g data-name="mic">
                          <rect width="24" height="24" opacity="0"/>
                          <path d="M12 15a4 4 0 0 0 4-4V6a4 4 0 0 0-8 0v5a4 4 0 0 0 4 4z"/>
                          <path d="M19 11a1 1 0 0 0-2 0 5 5 0 0 1-10 0 1 1 0 0 0-2 0 7 7 0 0 0 6 6.92V20H8.89a.89.89 0 0
                          0-.89.89v.22a.89.89 0 0 0 .89.89h6.22a.89.89 0 0 0 .89-.89v-.22a.89.89 0 0 0-.89-.89H13v-2.08A7
                          7 0 0 0 19 11z"/>
                        </g>
                      </g>
                    </svg>) : (<svg width="24px" height="24px" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
                      <g data-name="Layer 2">
                        <g data-name="mic-off">
                          <rect width="24" height="24" opacity="0"/>
                          <path d="M15.58 12.75A4 4 0 0 0 16 11V6a4 4 0 0 0-7.92-.75"/>
                          <path d="M19 11a1 1 0 0 0-2 0 4.86 4.86 0 0 1-.69 2.48L17.78 15A7 7 0 0 0 19 11z"/>
                          <path d="M12 15h.16L8 10.83V11a4 4 0 0 0 4 4z"/>
                          <path d="M20.71 19.29l-16-16a1 1 0 0 0-1.42 1.42l16 16a1 1 0 0 0 1.42 0 1 1 0 0 0 0-1.42z"/>
                          <path
                            d="M15 20h-2v-2.08a7 7 0 0 0 1.65-.44l-1.6-1.6A4.57 4.57 0 0 1 12 16a5 5 0 0 1-5-5 1 1 0 0 0-2
                        0 7 7 0 0 0 6 6.92V20H9a1 1 0 0 0 0 2h6a1 1 0 0 0 0-2z"/>
                        </g>
                      </g>
                    </svg>)
                  }
                </button>
                <input
                  type="search"
                  placeholder={isListening ? 'Listening...' : 'Speak something...'}
                  value={searchTerm}
                />
              </div>
              <div className="suggestion-wrap">
                <button
                  type="button"
                  className="reset-button"
                  onClick={handleResetTranscript}
                >
                  Reset
                </button>
              </div>
            </div>
          </fieldset>
        </form>
      </div>
    </>
  );
}

export default VoiceAssistant;
  • Apart from import of react-speach-recognition hook, we have declared three states: isMicon, isListening, and searchTerm. When we click on mic icon (a button typically and initially mic off icon), handleMic() function is called. We invert the condition and if mic is turned on, we start for listening. If he clicks the mic icon again, the mic is turned off and we stop listening. Based on isMicOn status, we render mic on or mic off icons and similary based on the isListening status, we update the placeholder of input text field.
  • We have written two useEffects(...) - First one: when user speaks something the speech is translated to text and is available in transcript. So whenever transcript changes, the useEffect(...) runs and it updates searchTerm and we placed it in value attribute of input field. Second one: When user turns mic on, we set isListening to true, this triggers useEffect(...) and if user has not spoken anything, the searchTerm is empty and we check for it using setTimeout(...) function with some delay and we turn mic off and update isListening.
    • useEffect(() => {
        setSearchTerm(transcript);
        // eslint-disable-next-line react-hooks/exhaustive-deps
      }, [transcript]);
    • useEffect(() => {
      setTimeout(() => {
        if (isListening && searchTerm.length === 0) {
      	setIsListening(false);
      	setIsMicOn(true);
        }
      }, 6000);
      // eslint-disable-next-line react-hooks/exhaustive-deps
      }, [isListening]);
  • We have added Reset button to clear text from the input field, when you click it, handleResetTranscript() is called. resetTranscript(); inside the function will clear the text and rest of the code is for animation of Reset button when you click it.
  • To check whether the browser supports Web Speech API or not, we have added:
    • if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
      return (
        <>
      	Your browser doesn't support speech recognition
        </>
      );
      }

8) You can add commands to speech recognition like this:

const commands = [
  {
    command: "reset",
    callback: () => {
      handleReset();
    },
  },
];

const {
    transcript, 
    resetTranscript
} = useSpeechRecognition({commands});

Conclusion:

That's it! You have successfully made voice assistant using React Speech Recognition. If you like it, please do share it.

Files