azure sky with a few clouds

Microsoft Azure provides excellent AI speech service, but how it organizes its documents and SDK drags me back to the bad memories when I read Microsoft Visual C++ MSDN documents for Windows 98. After twenty years, Microsoft continues producing unclear tutorials, and its API does not give the correct error code as expected.

Install Python Package for Azure Speech

You expect a simple pip installation if you have worked with OpenAI or similar AI service providers. It should be as simple as one line of bash command.

pip install azure-cognitiveservices-speech

First, I have to complain about the long names they use, which makes our busy programmers’ heads want to explode. In Python, it is recommended to import like this.

import azure.cognitiveservices.speech as speechsdk

Hey, besides the long pip package name, the import statement is lengthy and different from the pip package name. While I understand Microsoft Azure is a massive service provider, I’m not convinced that this justifies such complex naming conventions. I prefer a more intuitive and common practice.

Everything runs perfectly, and pip installation does not give any error. You thought you were ready to call the API in a couple of lines of code until you ran into the weirdest problem in at least this week.

What the heck is Subscription?

You try hard to find a simple hello world Python function to call speech to text. It would be something like the following.

import azure.cognitiveservices.speech as speechsdk

speech_cfg = speechsdk.SpeechConfig(subscription='your_subscription_key', region='you_region')
audio_cfg = speechsdk.audio.AudioConfig(filename="helloworld.wav")
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_cfg, audio_config=audio_cfg)

if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(speech_recognition_result.text))
elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = speech_recognition_result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))
        print("Did you set the speech resource key and region values?")

All other APIs call an API key a KEY, but Azure calls it a subscription. Okay, you decide to still live with it and supply the key you got from the Azure website until you hit this.

What else needs to be done?

It turns out that Azure Speech depends on many other libraries, which is another Microsoft tradition.

Python docs

Leave a Reply

Your email address will not be published. Required fields are marked *