google的提供的语音到text的服务

转载自:

http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/

Just yesterday, Google pushed version 11 of their Chrome browser into beta, and along with it, one really interesting new feature- support
for the HTML5 speech input API. This means that you’ll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new
feature.

If you’re running Chrome version 11, you can test out the new speech capabilities by going to their simple test page on the html5rocks.com site:

http://slides.html5rocks.com/#speech-input

Genius! but how does it work? I started digging around in the Chromium source code, to find out if the speech recognition is implemented as a library built into Chrome, or, if it sends the audio back to Google to process- I know I’ve seen the
Sphynx libraries in the Android build, but I was sure the latter was the case- the speech recognition was really good, and that’s really hard to do without really good language models- not something
you’d be able to build into a browser.

I found the files I was looking for in the chromium source repo:

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

It looks like the audio is collected from the mic, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results. Looking through their audio encoder code, it looks like the audio can be either
FLAC or
Speex
– but it looks like it’s some sort of specially modified version of Speex- I’m not sure what it is, but it just didn’t look quite right.

If that’s the case, there should be no reason why I can’t just POST something to it myself?

The URL listed in speech_recognition_request.cc is:

https://www.google.com/speech-api/v1/recognize

So a quick few lines of PERL (or PHP or just use wget on the command line):

#!/usr/bin/perl

require LWP::UserAgent;

my $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
my $audio = "";

open(FILE, "<" . $ARGV[0]);
while(<FILE>)
{
    $audio .= $_;
}
close(FILE);

my $ua = LWP::UserAgent->new;

my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=16000", Content => $audio);
if ($response->is_success)
{
    print $response->content;
}

1;

This quick PERL script uses LWP::UserAgent to POST the binary audio from my audio clip; I recorded a quick wav file, and then converted it to FLAC on the command line (see
SoX for more info)

To run it, just do:

[root@prague mike]# ./speech i_like_pickles.flac

The response is pretty straight forward JSON:

{
    "status": 0,
    "id": "b3447b5d98c5653e0067f35b32c0a8ca-1",
    "hypotheses": [
    {
        "utterance": "i like pickles",
        "confidence": 0.9012539
    },
    {
        "utterance": "i like pickle"
    }]
}

I’m not sure if Google is intending this to be a public, usable web service API, but it works- and has all sorts of possibilities!

Comments (121)Trackbacks
(17)

  1. Hey Sushant,

    I’m not sure what you mean by recording it and sending it to PHP? If you just mean converting to flac, you should be able to find what you need here:

    http://flac.sourceforge.net/download.html

    I’m not a flac expert or anything- I was just using the command line flac utility for Linux.

    Mike

  2. Hey Mike,
    I mean to say, converting wave file to flac using php.
    Is there any script for the same?

    If you are Skype let me knw, we can discuss over it.

    Sushant

  3. Hey Mike,
    I mean to say, converting wave file to flac using php.
    Is there any script for the same?

    If you are on Skype let me know, we can discuss over it.

    Sushant

  4. Hey Sushant,

    I don’t know too much about FLAC- but that URL I sent you (http://flac.sourceforge.net/download.html) should have some details.

    Cheers,

    Mike

  5. Any specific options for the FLAC files, I’m converting from WAV and not getting but three words back on a 50 word voicemail, they are not correct words either. The sample is a 16-bit Mono 8000Hz wav. The FLAC plays and sounds good.

  6. The only thing to make sure, is that if you’re sending 8khz, that you’re setting the content-type correctly:

    Content-Type: audio/x-flac; rate=8000

    I’m also not sure how long of audio clip you can send it- I’ve noticed some errors when I tried to send too much.

    Mike

  7. How could this be done in PHP? it would be really apprecited if you can show me a snippet. thank You,

  8. Read the comments- there’s a full working PHP example listed.

  9. Hi mike,
    Thanks for your response. Here is my command
    wget –post-file=”test.flac” –header= “Content-Type: audio/x-flac; rate=16000″ –output-file=”result.txt

    in my result.txt I am getting the following:

    –2012-05-07 19:11:54–
    ftp://content-type/%20audio/x-flac;%20rate=16000

    => `x-flac’
    Resolving content-type (content-type)… failed: Name or service not known.
    wget: unable to resolve host address `content-type’
    Do you have any idea why this is happing?

    Is is possible that google have restricted it now?

  10. I assume you want -O result.txt not –output-file (which puts the logs and not the content in results.txt)

    Also- did you actually include the URL to post to?

    This works fine for me:

    wget –post-file=filename.flac –header=”Content-Type: audio/x-flac; rate=16000″ -O result.txt “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”

    Note the quotes around the header and URL.

    Mike

  11. it looks like my encoding was not correct . changed the content-type audio/flac instead of x-flac and now I am getting 400. Unknown encoding. thanks anyways.

  12. Hi there,

    Just wondering…following your process will enable me to access Google Speech API from any browser right? or at least modern browsers… I guess that’s what the point of the post is.. to by pass the need of Chrome… just making sure before I dive into coding
    everything

    Thanks,
    ak

  13. It will let you access the Google Speech service via a simple web request- so from anything that can POST an audio file to google (so from the command line or from server-side languages like PHP, PERL, etc).

    Mike

  14. What is the limit in terms of file size or minutes? Any Ideas?

  15. Hey Mike,

    I am woking on audio transciption webiste. I wanted to run the google api using cron in the linux machine. Is it possible to run your PHP in the backend.

  16. I don’t see any reason why no?

    Check out the php site for details on PHP from the command line:

    http://ca3.php.net/manual/en/features.commandline.php

    Mike

  17. So I have tested to google api numerous times and it seems that it only works for short, less than 15 sec audio files, like in Mikes’ example. I am not sure what is the exact limits of terms of audio files size. I tried digging into the chromium source code
    as well and could not tell. But if you try posting an audio that is longer than 15 sec or more the service will return “Entity too large” response.

  18. Hi Al,

    That was my experience as well- I’m not sure what the exact length is, but it makes sense as it’s meant for short in-browser commands.

    Mike

  19. This is awesome, thank you for sharing the knowledge! This might be perfect for a smart house system (where you shout ‘LET ME OUT! FIRE! FIRE!’ to open the door
    :) )

  20. Hi Mike

    Great Work! and many thanks.. I was trying to find out how to record using the speech api. But, now I can use this technique instead and send my recorded voice to the server.

    I have an issue with the format though, I recorded “1″ the speech api on my browser returns “1″. But when I encode it to flac and try it returns “new york” with a confidence of 60%. I have posted SOX o/p of my recording. Kindly let me know the formats you
    used.

    prompt3.flac:

    File Size: 29.7k Bit Rate: 418k
    Encoding: FLAC Info: Processed by SoX
    Channels: 2 @ 16-bit
    Samplerate: 44100Hz
    Replaygain: off
    Duration: 00:00:00.57

    In:100% 00:00:00.57 [00:00:00.00] Out:25.1k [ | ] Hd:3.6 Clip:0
    Done.

  21. Hey Its working now changed the bitrate as mentioned in the comments.. thanks again mike!

版权所有,禁止转载. 如需转载,请先征得博主的同意,并且表明文章出处,否则按侵权处理.

    分享到:
This entry was posted in 音频技术研究 and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*