How to Automatically Generate Video Sub-Title in Another Language

I recently started a French YouTube channel. Quickly, I got a message asking to add English sub-title, and got also a suggestion to leverage Azure Logic App and some Cognitive Services to help me in that task. I really liked the idea, so I gave it a shot. I recorded myself and in twenty minutes I was done. Even though, it was not the success I was hoping for, the application works perfectly. It's just that speaking in French with a lot of English technical word was a little bite too hard for the Video Indexer. However, If you are speaking only one language in your video that solution would work perfectly. In this post, I will show you how to create that Logic App with Azure Video Indexer and Cognitive Services.

The Idea


Once a video is dropped in an OneDrive folder (or any file system accessible from Azure), a Logic App will get triggered and uploads the file to the Azure Video Indexer, generate a Video Text Tracks (VTT) file, and save this new file in another folder. A second Logic App will get started and use the Translator Text API from Azure Cognitive Service to translate the VTT file, and save it into the final folder.

GenerateSubTitle


The Generation


Before getting started, you will need to create your Video Indexer API. To do this, login to the Video Indexer developer portal, and subscribe at the Video Indexer APIs - Production in the Product tab. You should then get your API keys.


ApiKeyPage

To get more detail on the subscription refer to the documentation. To know the names, parameters, code sample to all the methods available in your new API, click on APIs tab.

APIDetails

Now let's create our first Logic App. I always prefer to start with a blank template, but take what fits you. Any Online file system's trigger will do, in this case I'm using the When a file is created from OneDrive. I got some issue with the trigger. It was not always getting fired by a new file. I tried the When a file is modified trigger, but it didn't solve the problem. If you think, you know what I was doing wrong feel free to leave a comment :).

First reel action is to upload the file to the Azure Video Indexer. We can to that ery easily by using the method Upload video and and index, passing the name and content from the trigger.

Of course, the longer is the video the longer will be the process, so we will need to wait. A way to do that is by adding a waiting loop. Will use the method Get processing state from the Video Indexer and loop until the status is processed. To slow down your loop just add a wait action and set it at tree or five minutes.

When the file is completely processed, it will be time to retrieve the VTT file. This is done in two simple step. First, we will get the URL by calling the method Get the transcript URL, then with a simple HTTP GET we will download the file. The last thing we will need to do will be to save it in a folder where our second Logic App will be watching for new drop.

In the visual designer, the Logic App should look to this.

LogicAppGenerateVTT


The Translation


The second Logic App is very short. Once again, it will get triggered by a new file trigger in our OneDrive Folder. Then it will be time to call our Translator Text API from Azure Cognitive Service. That's to the great Logic App interface it's very intuitive to fill all the parameter for our call. Once we got the translation, we need to save it into our final destination.

The Logic App should look like this.

LogicAppTranslate


Conclusion


It was much easier than I expected. I really like implementing those integration projects with Logic App. It's so easy to "plug" all those APIs together with this interface. And yes like I mentioned in the introduction the result was not "great". I run test with video purely in English (even with my accent) or only in French (no mix) and the result was really good. So I think the problem is really the fact that I mix French and English. I could improve the Indexer by spending time providing files so the service could understand better my "Franglish". However, in twenty minutes, I'm really impressed by the way, in turned out. If you have idea on how to improve this solution, or if you have some questions, feel free to leave a comment. You can also watch my French YouTube video.

All the code is available online on Github - Cloud5VideoHelper.

References: