Synchronizing text and video

After having visited the land of transcription as my first stop in the world of Web video, the next logical step was to look into how this wonderful transcription of my video could be actually shown along with the video.

Transcriber, the tool I used to generate the captions of the video, saves the transcription into its own XML format:

<Episode program="ParisWeb 2007 - Les Bonnes Pratiques du Web Mobile" 
  air_date="2007-11-16">
<Trans scribe="Dominique Hazael-Massieux" 
  audio_filename="parisweb" version="5" version_date="090210" xml:lang="fr">
  <Speakers>
    <Speaker id="spk1" name="Stéphane Deschamps" 
      check="no" type="male" dialect="native" accent="French" scope="local"/>
    </Speaker>
  </Speakers>
  <Section type="report" startTime="0" endTime="44.209">
    <Turn startTime="0" endTime="19.933" speaker="spk1" mode="planned">
      <Sync time="0"/>
      Y'a quelque chose auquel on croit beaucoup à ParisWeb,
      <Sync time="3.458"/>
      c'est "les standards, c'est bon, mangez-en",
      <Sync time="6.553"/>
      c'est pour ça que cette association existe
    </Turn>
  </Section>
</Trans>

It offers the possibility to export it in a variety of other formats (including HTML), but for sake of exploring one of the technologies in development in W3C for that precise use-case, Timed Text DFXP, I started to look into transforming their XML format into Timed Text.

Another motivation for that was that Subtitle Editor, the other tool I had looked at, is able to import and export timed text data; this also meant that very same tool would allow me to quickly visualize the subtitles superimposed to the video, one of the advantages that it had over Transcriber.

It turned out (unsurprisingly, I suppose) that the conversion between the two formats was really quite easy through an XSLT style sheet; the main structural difference between the two formats is that Transcriber notes break points as XML elements (<Sync> in the example above), while TimedText wraps the transcripted content into elements (<span> or <p>).

So, now that I had a Timed Text version of my transcription, how did that help me putting the transcripted video on the Web?

Looking quickly on the Web, it seems that some Video hosting services, including dotSub and dailymotion but not (I think) Youtube, allows publishers to upload subtitles with their videos; as I have verified since, dotSub even supports importing and exporting subtitles in TimedText format.

But I was curious to know how to include these subtitles in a self-hosted video situation; I had little hope to find subtitles support through the classical <object> tag in HTML, but I was hoping that the new <video> element in HTML 5 would help solve that problem.

Unfortunately, it doesn’t out of the box as of the draft dated of February 12 :

[…] authors are expected to provide alternative media streams and/or to embed accessibility aids (such as caption or subtitle tracks) into their media streams.

That certainly seemed extremely suboptimal to me – having to download a whole video to access its transcript doesn’t sound like a good use of anyone bandwidth. Discussions on fixing that current state of the HTML 5 spec have apparently started, and brought to my attention the work that my colleague Philippe had started to implement a JavaScript-based TimedText player for HTML 5.

This was exactly what I needed, and I thus started to play with that code to embed subtitles of my video in an HTML page.

And this is what got me started to look into why the new <video> element in HTML 5 is actually a game changer, rather than just a nice wrapper around the existing functionalities in <object> – which is what my next blog post will look into.

Synchronizing text and video

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112