Tag Archives: Python

Raspberry Pi – Translator

via Wolf Paulus » Embedded | Wolf Paulus

Recently, I described how to perform speech recognition on a Raspberry Pi, using the on device sphinxbase / pocketsphinx open source speech recognition toolkit. This approach works reasonably well, but with high accuracy, only for a relatively small dictionary of words.

Like the article showed, pocketsphinx works great on a Raspberry Pi to do keyword spotting, for instance to use your voice, to launch an application. General purpose speech recognition however, is still best performed, using one of the prominent web services.

Speech Recognition via Google Service

Google’s speech recognition and related services used to be accessible and easy to integrate. Recently however, they got much more restrictive and (hard to belief, I know) Microsoft is now the place to start, when looking for decent speech related services. Still, let’s start with Google’s Speech Recognition Service, which requires an FLAC (Free Lossless Audio Codec) encoded voice sound file.

Google API Key

Accessing Google’s speech recognition service requires an API key, available through the Google Developers Console.
I followed these instructions for Chromium Developers and while the process is a little involved, even intimidating, it’s manageable.
I created a project and named in TranslatorPi and selected the following APIs for this project:

Goggle Developer Console: Enabled APIs for this project

Goggle Developer Console: Enabled APIs for this project

The important part is to create an API Key for public access. On the left side menu, select API & auth / Credentials. Here you can create the API key, a 40 character long alpha numeric string.

Installing tools and required libraries

Back on the Raspberry Pi, there are only a few more libraries needed, additionally to what was installed in the above mentioned on-device recognition project.

sudo apt-get install flac
sudo apt-get install python-requests
sudo apt-get install python-pycurl

Testing Google’s Recognition Service from a Raspberry Pi

I have the same audio setup as previously described, now allowing me to capture a FLAC encoded test recording like so:
arecord -D plughw:0,0 -f cd -c 1 -t wav -d 0 -q -r 16000 | flac - -s -f --best --sample-rate 16000 -o test.flac
..which does a high quality, wave type recording and pipes it into the flac encoder, which outputs ./test.flac

The following bash script will send the flac encoded voice sound to Google’s recognition service and display the received JSON response:

#!/bin/bash
# parameter 1 : file name, containg flac encoded voiuce recording
 
echo Sending FLAC encoded Sound File to Google:
key='<TRANSLATORPI API KEY>'
url='https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key='$key
curl -i -X POST -H "Content-Type: audio/x-flac; rate=16000" --data-binary @$1 $url
echo '..all done'
Speech Recognition via Google Service - JSON Response

Speech Recognition via Google Service – JSON Response

{
  "result": [
    {
      "alternative": [
        {
          "transcript": "today is Sunday",
          "confidence": 0.98650438
        },
        {
          "transcript": "today it's Sunday"
        },
        {
          "transcript": "today is Sundy"
        },
        {
          "transcript": "today it is Sundy"
        },
        {
          "transcript": "today he is Sundy"
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

More details about accessing the Google Speech API can be found here: https://github.com/gillesdemey/google-speech-v2

Building a Translator

Encoding doesn’t take long and the Google Speech Recognizer is the fastest in the industry, i.e. the transcription is available swiftly and we can send it for translation to yet another web service.

Microsoft Azure Marketplace

Creating an account at the Azure Marketplace is a little easier and the My Data section shows that I have subscribed to the free translation service, providing me with 2,000,000 Characters/month. Again, I named my project TranslatorPi. On the ‘Developers‘ page, under ‘Registered Applications‘, take note of the Client ID and Client secret, both are required for the next step.
regapp1

regapp2

Strategy

With the speech recognition from Google and text translation from Microsoft, the strategy to build the translator looks like this:

  • Record voice sound, FLAC encode it, and send it to Google for transcription
  • Use Google’s Speech Synthesizer and synthesize the recognized utterance.
  • Use Microsoft’s translation service to translate the transcription into the target language.
  • Use Google’s Speech Synthesizer again, to synthesize the translation in the target language.

For my taste, that’s a little too much for a shell script and I use the following Python program instead:

# -*- coding: utf-8 -*-
import json
import requests
import urllib
import subprocess
import argparse
import pycurl
import StringIO
import os.path
 
 
def speak_text(language, phrase):
    tts_url = "http://translate.google.com/translate_tts?tl=" + language + "&q=" + phrase
    subprocess.call(["mplayer", tts_url], shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
 
def transcribe():
    key = '[Google API Key]'
    stt_url = 'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=' + key
    filename = 'test.flac'
    print "listening .."
    os.system(
        'arecord -D plughw:0,0 -f cd -c 1 -t wav -d 0 -q -r 16000 -d 3 | flac - -s -f --best --sample-rate 16000 -o ' + filename)
    print "interpreting .."
    # send the file to google speech api
    c = pycurl.Curl()
    c.setopt(pycurl.VERBOSE, 0)
    c.setopt(pycurl.URL, stt_url)
    fout = StringIO.StringIO()
    c.setopt(pycurl.WRITEFUNCTION, fout.write)
 
    c.setopt(pycurl.POST, 1)
    c.setopt(pycurl.HTTPHEADER, ['Content-Type: audio/x-flac; rate=16000'])
 
    file_size = os.path.getsize(filename)
    c.setopt(pycurl.POSTFIELDSIZE, file_size)
    fin = open(filename, 'rb')
    c.setopt(pycurl.READFUNCTION, fin.read)
    c.perform()
 
    response_data = fout.getvalue()
 
    start_loc = response_data.find("transcript")
    temp_str = response_data[start_loc + 13:]
    end_loc = temp_str.find(""")
    final_result = temp_str[:end_loc]
    c.close()
    return final_result
 
 
class Translator(object):
    oauth_url = 'https://datamarket.accesscontrol.windows.net/v2/OAuth2-13'
    translation_url = 'http://api.microsofttranslator.com/V2/Ajax.svc/Translate?'
 
    def __init__(self):
        oauth_args = {
            'client_id': 'TranslatorPI',
            'client_secret': '[Microsoft Client Secret]',
            'scope': 'http://api.microsofttranslator.com',
            'grant_type': 'client_credentials'
        }
        oauth_junk = json.loads(requests.post(Translator.oauth_url, data=urllib.urlencode(oauth_args)).content)
        self.headers = {'Authorization': 'Bearer ' + oauth_junk['access_token']}
 
    def translate(self, origin_language, destination_language, text):
        german_umlauts = {
            0xe4: u'ae',
            ord(u'ö'): u'oe',
            ord(u'ü'): u'ue',
            ord(u'ß'): None,
        }
 
        translation_args = {
            'text': text,
            'to': destination_language,
            'from': origin_language
        }
        translation_result = requests.get(Translator.translation_url + urllib.urlencode(translation_args),
                                          headers=self.headers)
        translation = translation_result.text[2:-1]
        if destination_language == 'DE':
            translation = translation.translate(german_umlauts)
        print "Translation: ", translation
        speak_text(origin_language, 'Translating ' + text)
        speak_text(destination_language, translation)
 
 
if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Raspberry Pi - Translator.')
    parser.add_argument('-o', '--origin_language', help='Origin Language', required=True)
    parser.add_argument('-d', '--destination_language', help='Destination Language', required=True)
    args = parser.parse_args()
    while True:
        Translator().translate(args.origin_language, args.destination_language, transcribe())

Testing the $35 Universal Translator

So here are a few test sentences for our translator app, using English to Spanish or English to German:

  • How are you today?
  • What would you recommend on this menu?
  • Where is the nearest train station?
  • Thanks for listening.

Live Demo

This video shows the Raspberry Pi running the translator, using web services from Google and Microsoft for speech recognition, speech synthesis, and translation.


Embedded Scripting (Lua, Espruino, Micro Python)

via OSHUG

The thirty-second OSHUG meeting will take a look at the use of scripting languages with deeply embedded computing platforms, which have much more constrained resources than the platforms which were originally targeted by the languages.

Programming a microcontroller with Lua

eLua is a full version of the Lua programming language for microcontrollers, running on bare metal. Lua provides a modern high level dynamicaly typed language, with first class functions, coroutines and an API for interacting with C code, and yet which is very small and can run in a memory constrained environment. This talk will cover the Lua language and microcontroller environment, and show it running on-off-the-shelf ARM Cortex boards as well as the Mizar32, an open hardware design built especially for eLua.

Justin Cormack is a software developer based in London. He previously worked at a startup that built LED displays and retains a fondness for hardware. He organizes the London Lua User Group, which hosts talks on the Lua programming language.

Bringing JavaScript to Microcontrollers

This talk will discuss the benefits and challenges of running a modern scripting language on microcontrollers with extremely limited resources. In particular we will take a look at the Espruino JavaScript interpreter and how it addresses these challenges and manages to run in less than 8kB of RAM.

Gordon Williams has developed software for companies such as Altera, Nokia, Microsoft and Lloyds Register, but has been working on the Espruino JavaScript interpreter for the last 18 months. In his free time he enjoys making things - from little gadgets to whole cars.

Micro Python — Python for microcontrollers

Microcontrollers have recently become powerful enough to host high-level scripting languages and run meaningful programs written in them. In this talk we will explore the software and hardware of the Micro Python project, an open source implementation of Python 3 which aims to be as compatible as possible with CPython, whilst still fitting within the RAM and ROM constraints of a microcontroller. Many tricks are employed to put as much as possible within ROM, and to use the least RAM and minimal heap allocations as is feasible. The project was successfully funded via a Kickstarter campaign at the end of 2013, and the hardware is currently being manufactured at Jaltek Systems UK.

Damien George is a theoretical physicist who likes to write compilers and build robots in his spare time.

Note: Please aim to by 18:15 as the first talk will start at 18:30 prompt.

Sponsored by: