Auto-merge auto-generated VTT from Udemy with manual transcript

cancel
Showing results for 
Search instead for 
Did you mean: 

Auto-merge auto-generated VTT from Udemy with manual transcript

Hi!

 

I have a transcript which I read from during the recording. Later Udemy creates VTT, which ideally should be the transcript again with added timings. However due to the course, many terms are used, that the VTT generator not knows and hence it messes up almost every generated sentence. In addition, I am not a native English speaker.

 

Therefore I came up with a tool that merges the VTT timings with the transcript using some kind of fuzzy logic to fine the best matching parts. So for instance the content of the VTT generated by Udemy such as

 

03:42.650 --> 03:50.570
And you may need an original, preferably the discontinued Dumela, not on the right or the Autoline

 

03:51.080 --> 03:51.710
on the left.

 

03:53.000 --> 04:00.080
If you go to a store, watch out that the MCU that is definitely featuring HP, which is located here.

 

04:01.110 --> 04:04.380
Is not Soyland to the board and can be removed.

 

As you can see, it is a mess and I am sure, it will definitely not help in understanding the content. The tool searches the transcript for matching sentences, finds them in this paragraph

 

Slide 6
And you will need an Arduino. Preferably the discontinued Duemilanove on the right or the Arduino Uno on the left. If you go to a computer store, watch out that the MCU, that big black processor in the middle is like that and not soldered to the board. Otherwise you will need some separate ATmega328Ps. In this course, you will need at least 2 of those Arduinos, better would be 3.

 

and converts this mess into

 

00:03:42.650 --> 00:03:50.570
And you will need an Arduino. Preferably the discontinued Duemilanove on the right or the Arduino Uno

 

00:03:51.080 --> 00:03:51.710
on the left.

 

00:03:53.000 --> 00:04:00.080
If you go to a computer store, watch out that the MCU, that big black processor in the middle is like that

 

00:04:01.110 --> 00:04:04.380
and not soldered to the board.

 

It is not perfect yet and is quite slow (it takes like an hour for 20 hours of lecture material on machines with 128 CPUs in total). My question is, do you know a tool which can do the same just with higher accuracy and faster?

 

Thanks,

Alex

2 Replies

Hey, I dont know it yet, tell us the tool which you are using. I am okay with slow speed

Hey HSE-Trainer,

 

I wrote it by myself. Currently it is a small script in Python that has been put together. I did a quick search could not find anything and just wrote it. However, I might have overlooked something in my search, so I wanted to ask, if a tool like that already exists.

 

If you are interested in using it, I have to make the source more beautiful first. 🙂

 

Alex

cancel
Showing results for 
Search instead for 
Did you mean: 
First-time course creation
Decided to take the plunge and create your own Udemy course? Check out some incredible Udemy success stories, and get inspired by your fellow instructors!
Top Liked Authors