Skip to main content

SpeakUp: A Transcript Markup Language

What is SpeakUp?

A simple text markup language for transcripts of moving pictures or video including a markup language for annotation.

Overview

When the Folkstreams project required a way for filmmakers and academic contributors to create and maintain transcripts for films archived and presented through the Folkstreams website, I decided a simple text markup language would be the best way to store and edit transcripts.

A transcript markup language defines a series of conventions for formatting text (like wiki text) that is translated into HTML for display. SpeakUp was designed to contain as much content as possible and preserve meaning for possible later conversion into XML or database form.

Speakup is implemented as a module extending the PEAR Text_Wiki library text translation module and is a requirement for use.

Although development and documentation of Speakup is not complete, it is in use on the Folkstreams website.

Speakup, including all markup, code and documentation is open source and released under a GPL license. I apologize for the brevity of this document, but the best way to learn SpeakUp is to download the package and experiment with it. Download.

Some Background

Some background on why transcripts are important. As the Folkstreams project was developed, project director Tom Davenport and developer Steve Knoblock, in a series of discussions, arrived at the conclusion that transcripts are essential to searching, finding and understanding films online. Two points emerged: that transcripts are a rich source of indexable text that help make media searchable and that more importantly, transcripts are a rich source of conversation and debate.

Frequently notes are more informative and interesting than the work they annotate. We discovered this was true for film transcripts (see Sadobabies for an example of a conversation going on in the notes about the nature of folklore). Although there are sophisticated means to capture the dialog of a moving picture and render it to text, these transcripts are inadequate. They lack annotation. They lack expressive quality of a transcript edited by a knowledgeable person. They are in a sense, a travesty, like an OCR'ed copy of Dickens left uncorrected.

Comments

Popular posts from this blog

Reading Tweets

I see a new kind of writing being created on Twitter, including hashtags, mixed into the text, in a variety of creative ways. In future, we should see a system that allows users to make these kind of connections, but without needing to include obscure computer-like commands in their text. I sometimes feel I'm reading a Linux command line or script when reading some tweets. Sometimes, it takes a moment to figure out what the tweet means.

Snowball, the Dancing Bird

A video of a dancing bird has become the latest YouTube sensation. Some people thought the bird's performance was faked, but for me, it is not surprising, given the sophisticated ability birds demonstrate for manipulating pitch and rhythm in their songs, that a bird shows the ability to keep time with music. Neuroscientists, including John Iversen of the Neurosciences Institute, have studied the dancing bird and confirm it is capable of extracting a beat from sound. What impressed me most about Snowball's performance is when he lifts his leg and gives it a little shake before bringing it down. As the investigators mention, it may be prompted by the pace being too fast to put his foot all the way down in time with the faster beat, but it piques my curiosity further. It appears Snowball is dividing the beat when he waves his foot, into two or three little waves, which if I am seeing it correctly, suggests birds are capable of division of the beat and perceiving and manipulating ...

Blogging the Archives

A vital interest of mine is access to archives. I've been interested in the possibilities inherent in the web and network for increasing access to archives and enabling a greater number of non-academics to browse, organize and surface archive holdings. One of the most significant ways of exposing the holdings of an archives is blogging the contents. We really haven't got there yet, but I've noticed a small trend, which I hope signifies the beginning of exponential growth, of people blogging artifacts. I do not remember the first site I came across where a blogger was posting pictures of artifacts, usually photographs from an online catalog of a museum, but here are some recent finds. Illustration Art All Edges Gilt If we could just get every artifact in the world's museums and archives photographed or scanned and online, give the tools to blog the contents to millions of ordinary people interested in telling the stories of these cultural objects, think of how rich that ...