Closed captioning's role becomes more complex in multiscreen world

by Samantha Bookman

Closed captioning and multiscreen

WGBH introduced closed captioning in the 1970s. (Image source: Wikipedia)

Pity the misunderstood, overlooked caption: the text that displays underneath television pictures, white letters framed in black. Pity the subtitles of foreign movies, oft-maligned for their inaccuracy. Getting these captioning functions right is an expensive, time-consuming proposition. At least, that's what broadcasters want consumers to think.

In the wake of FCC rules, passed in 2012, that require television programs and movies to have closed captions even in the online video space--and an added rule this year regarding short clips--broadcasters and online video providers are scrambling to make sure their online catalogs are captioned.

But some, broadcasters in particular, complain that the rules are onerous and will place tremendous expense on them.

"The time and cost of enabling captions is not substantially less for a 2-minute clip than for a 2-hour full-length movie," the Digital Media Association, a group that represents Amazon (NASDAQ: AMZN), Apple (NASDAQ: AAPL), Microsoft (NASDAQ: MSFT) and YouTube, told The Hill.

Technicolor, Manzanita, Deluxe Media and SDI Media are among caption providers that have been serving film and television studios and distributors for years. The traditional technologies used to place captions correctly on television screens are from a different era and were built for a different way of delivering video. Thus, adding captions in the online video era requires providers to change their workflow and technologies--if they haven't done so already.

Then there's the technical complexity of delivering captions along with the video. The file containing the caption must be encoded into the same format as the video, and has to spool into the correct spot on the video.

Chris Knowlton, Wowza

Knowlton (Source: Wowza)

"In order to hit all the end devices a user might have… each uses a different streaming format," Chris Knowlton, Wowza VP and streaming industry evangelist, told FierceOnlineVideo. "To take in a single stream or piece of content, you have to take and convert it into a format that device can deal with, and display captions, in this case. It's a challenge for a number of folks."

Wowza is noted for its multiscreen streaming capability--converting source video into the various digital formats and compression levels needed to stream simultaneously to a plethora of mobile devices, computers, and TV screens.

Wowza does not create closed captions or subtitles, but does handle conversion.

"There are a whole bunch of different (captioning) standards, which are more de facto industry standards," Knowlton said. "Default TV is CEA 608, which specified in the 1980s how to embed caption data inside a video signal being delivered to a TV--aka, Line 21 data. From there, additional formats were created. For instance, if you're using a version of Flash streaming, folks would embed 'on-text data events' or 'on text caption events.' That is the default way to do it with streaming because Flash was so prevalent in the early 2000s."

The need for established broadcast captioning providers to convert or add digital caption technologies is clear. But pivoting could be difficult, and smaller providers have been waiting in the wings for this moment.

Not only are captioning and subtitling startups offering cost-effective ways to caption online content, they're pointing to the captions file as a potential metadata gold mine.

Two closed-captioning and subtitle providers, Dotsub and 3Play Media, are among the numerous startups making the case for affordable, compliant and monetizable captions.


Dotsub's platform enables closed captions and translated subtitles to be generated--by their professional translators or community-sourced with employees, partners or fans--for any online video in any language.

Founded in 2007, Dotsub serves enterprises across several verticals, and lists Adobe, airbnb, Bank of America, RSA and the U.S. Army among its customers.

Additionally, offering a "freemium" business model to add subtitles to IP-based videos is working well for Dotsub, a New York-based vendor.  The company's tiered subscription service also allows users to caption and translate subtitles for their own content at no cost; subscription rates apply at higher levels.

Peter Crosby, Dotsub

Crosby (Source: Dotsub)

Dotsub doesn't add its captions within online video files; rather, users have the option of choosing to have captions or subtitles display by clicking an appropriate icon on the video player.

"We're talking about video captions as overlays," said Peter Crosby, Chief Revenue Officer at Dotsub. "For example, on YouTube or other video players, if there's a 'CC' button or 'Choose a Language' box, you make an API call to Dotsub or another third party, like Wowza, Brightcove, Kaltura or Ooyala, for the subtitle file. The files are infinitesimal, they're kilobytes in size."

"With many customers we integrate with their players. So when a user makes a call for the video, we stream text subtitle files which hit the player far before the streaming. So there isn't an additional load for captions or subtitles if they're done as overlays."

Wowza's Knowlton explains this type of captioning, using what are called sidecar files.

"You can embed captions directly in the video stream (the traditional way). Or you can get sidecar files that travel independently of audio and video. If you have multiple languages, you only have to send out the file for the specific language," he said.

Using this sidecar method not only helps streamline the video file and playout, it also creates new options for content providers.

"Netflix and Amazon Instant (NASDAQ: AMZN) were under huge pressure to caption everything. What's happened now is they have all made it to 100 percent and now found huge utility around captions," said Crosby.

Online video providers, for example, can correct or update the captions and subtitles separately when needed. They can also learn how their audiences are viewing their content--if they're selecting specific languages, or how often they request the closed captioning file.

There are only a few good use cases for doing captions the old way, by including them as part of the video file. Security is the biggest reason, Crosby said. "Only in extreme situations, where there's no bandwidth on the user end, or there's a high security requirement like with Boeing where they want that kind of control on captions. But that's the old model, that's dying out." Further, he said, with the "burned-in" caption placed in a specific spot, a user's view of the action on the screen could be obscured if they're using a player the video wasn't originally designed to stream on. The caption is single-language only, and the size of the video file, encoded as MP4, can be onerous.

"When you burn them in, the old fashioned way, then you have to replicate each 700 MB or whatever file over and over and over, which makes no sense storage-wise, streaming bandwidth-wise, or cost-wise." 

Using a cloud-based platform allows Dotsub to scale its operations to meet demand from its freemium subscribers and to ensure its enterprise customers and its own staff of professionals are able to easily access its captioning services. It also gives the provider and its customers a lot of flexibility in how they caption files, the file types they use--such as M4V "softsub" files for iOS devices--and how they manage them, like automatically pushing files from Dotsub to YouTube, Crosby noted.

For example, when Adobe needed subtitles in different languages added to 14,000 training videos for its Adobe TV platform, Dotsub's service made the job easier. "Adobe recruited their worldwide developer network to translate these training videos," Crosby said. Adobe "gamified" it, he explained, giving away points for performance.

"When we do the enterprise services, we work only with human beings. Well trained professionals who caption in the FCC-required 508 compliance style. (Which include noting off-screen noises, for example.) Our translators are from around the world; we work with them individually, through agencies, through university pools of talent."

3Play Media

Started by three MIT grad students participating in a project to caption the university's online Open Courseware video content, 3Play Media builds captioning from a different angle: speech recognition.

Josh Miller, 3Play Media

Miller (Source: 3Play Media)

"Media and entertainment companies see captions as clunky," said Josh Miller, VP of Sales and Business Development for 3Play. Captioning for the broadcast and distribution world has long been a manual process. But IP is changing all that, and introducing a measure of automation.

The company's proprietary software is built on a speech recognition engine--similar to software that consumers can buy off-the-shelf to record memos and letters through their PC's microphone. Speech recognition allows 3Play to quickly transcribe spoken words in television shows and movies. The platform time-synchronizes the transcript with the video. From there, 3Play checks the transcript for accuracy and then uses the time-synchronized text to create closed captioning or subtitles, as well as translation when needed.

While speech-to-text captioning varies in quality--there's an entire page on KnowYourMeme dedicated to YouTube's automatic captioning fails--3Play keeps the human element in play, reviewing its captions for accuracy before finalizing the files.

Launched in 2008, 3Play gained its first angel investor in 2010 and completed a second round of funding in 2011. The company currently has 17 core employees and engages some 800 U.S.-based contractors to do accuracy checks on its transcripts.

Like Dotsub, 3Play Media sees the potential value of online captions' meta data. It's "more usable, searchable, accessible, and SEO-friendly," the company says.

3Play focuses on education and government verticals but also serves enterprises--most notably Netflix (NASDAQ: NFLX), for which 3Play Media played a key role in getting the subscription video on demand provider 100 percent compliant with the FCC's captioning mandate, well within the commission's deadline.

Miller feels that broadcasters' complaints about the cost of captioning clips are unfounded. "The cost argument is silly," he said. "(Getting captions right) is more about timing and workflow."

Even established providers acknowledge that prices are dropping. For example, Screen Systems, a unit of SDI Media that handles captions and subtitles for media and entertainment companies, employed stenographers in past years to type in captions. But "price pressure" has driven many trained stenographers out of the business, according to Screen Systems' Andrew Lambourne, business development director, in an article for InBroadcast. Screen Systems currently uses "Voicewriters" who listen to TV programs or movies and speak the lines being delivered into a speech-to-text program, Q-Live.

The training to be a Voicewriter takes much less time than that of a stenographer--three months versus two years. But, he said, it's still a demanding job to do accurately. "It takes intense training to reach that level of concentration and skill," he wrote. "…And of course if you slip up or get tongue-tied the viewers are left with inaccurate captions."

While captioning is now subject to stricter mandates, the $2 per minute price broadcasters are citing is something new entrants to the segment can easily under-bid.

"Because of players like us, the price has dropped like 50 percent," said Dotsub's Crosby. "These companies are scrambling to build the same platform as Dotsub, and competing with each other. They're going from a high service model ... to a low service model."

That creates some serious competition, but going for the lowest price could impact quality, 3Play's Miller notes. Content producers need to do their homework before choosing a captioning provider.

"The assumption is that this process should be cheaper," he said. "But the quality will dictate that price. Also, are (providers) being honest about what they offer?"

With that in mind, are the days of expensive-to-produce captions really gone? That remains to be seen, but as companies develop new ways to add captions to IP video and various formats are standardized, the market is destined to go in new directions.

Updated Aug. 13 with additional information for Dotsub.

Closed captioning's role becomes more complex in multiscreen world