Skip to main content

SpeakUp: A Transcript Markup Language

What is SpeakUp?

A simple text markup language for transcripts of moving pictures or video including a markup language for annotation.

Overview

When the Folkstreams project required a way for filmmakers and academic contributors to create and maintain transcripts for films archived and presented through the Folkstreams website, I decided a simple text markup language would be the best way to store and edit transcripts.

A transcript markup language defines a series of conventions for formatting text (like wiki text) that is translated into HTML for display. SpeakUp was designed to contain as much content as possible and preserve meaning for possible later conversion into XML or database form.

Speakup is implemented as a module extending the PEAR Text_Wiki library text translation module and is a requirement for use.

Although development and documentation of Speakup is not complete, it is in use on the Folkstreams website.

Speakup, including all markup, code and documentation is open source and released under a GPL license. I apologize for the brevity of this document, but the best way to learn SpeakUp is to download the package and experiment with it. Download.

Some Background

Some background on why transcripts are important. As the Folkstreams project was developed, project director Tom Davenport and developer Steve Knoblock, in a series of discussions, arrived at the conclusion that transcripts are essential to searching, finding and understanding films online. Two points emerged: that transcripts are a rich source of indexable text that help make media searchable and that more importantly, transcripts are a rich source of conversation and debate.

Frequently notes are more informative and interesting than the work they annotate. We discovered this was true for film transcripts (see Sadobabies for an example of a conversation going on in the notes about the nature of folklore). Although there are sophisticated means to capture the dialog of a moving picture and render it to text, these transcripts are inadequate. They lack annotation. They lack expressive quality of a transcript edited by a knowledgeable person. They are in a sense, a travesty, like an OCR'ed copy of Dickens left uncorrected.

Comments

Popular posts from this blog

Reading Tweets

I see a new kind of writing being created on Twitter, including hashtags, mixed into the text, in a variety of creative ways. In future, we should see a system that allows users to make these kind of connections, but without needing to include obscure computer-like commands in their text. I sometimes feel I'm reading a Linux command line or script when reading some tweets. Sometimes, it takes a moment to figure out what the tweet means.

Traditonal Publishers Still Hidebound

"The idea that something that appeared in print is automatically worth paying for is nonsense." says Mark Coatney in Evaluating Time Magazine's New Online Pay Wall This is an example of thinking from the traditional publishing world, where if something made it into print or was "published" it meant the content with through a lengthy process of adding value and checking quality, through the editorial, fact-checking and proofreading process. This was thought in the olden days to mean something. Yes, it did, but not always. That editors and fact-checkers were available or that they had a hand in content did not necessarily mean puff-pieces, fabricated stories, falsehoods, mistakes, typos never made it into that published content polished to shine like your grandmother's counter tops. Publishing was a measure of trust and quality from the pre-network world. The network has a new set of criteria and indicators of trust and quality. I find that often writers who

Snowball, the Dancing Bird

A video of a dancing bird has become the latest YouTube sensation. Some people thought the bird's performance was faked, but for me, it is not surprising, given the sophisticated ability birds demonstrate for manipulating pitch and rhythm in their songs, that a bird shows the ability to keep time with music. Neuroscientists, including John Iversen of the Neurosciences Institute, have studied the dancing bird and confirm it is capable of extracting a beat from sound. What impressed me most about Snowball's performance is when he lifts his leg and gives it a little shake before bringing it down. As the investigators mention, it may be prompted by the pace being too fast to put his foot all the way down in time with the faster beat, but it piques my curiosity further. It appears Snowball is dividing the beat when he waves his foot, into two or three little waves, which if I am seeing it correctly, suggests birds are capable of division of the beat and perceiving and manipulating