Skip to main content

nytlabs

nytlabs



EDITOR (2015)
Tagging and annotation have long been some of the most important tasks that a news organization undertakes. The tags that we attach to articles enable nearly everything that happens to that article after publication: how we recommend related content to readers, how search engines index our site, how ads are targeted and more.

Currently, at The New York Times, those tags are applied at the article level. Yet when we look at an article we can see that it actually contains many smaller component parts, like a fact, a person, a recipe or an event. If we could begin to annotate and tag these components, it would enable us to do so much more with that information. New devices, especially those with smaller screens, could make use of smaller chunks of content. New products could be created by extracting components from their original article context and recombining them to create collections or new kinds of experiences. And rather than the archive being a file cabinet full of articles, it would become a corpus of structured news information that could be interrogated and reasoned across.

Fine-grained annotation within an article is a difficult problem that has historically been approached in two ways, both of which have their own challenges. One approach is computational, building rule sets or machine learning processes to take best guesses at where to apply tags. These approaches can be quite successful, but are still not nearly good enough to stand on their own. The other approach is to have people do the tagging. The person writing the article knows the information needed with a high degree of accuracy, but the burden of work required to highlight and annotate every significant phrase is untenable.

Editor is an experimental text editing interface that explores how collaboration between machine learning systems and journalists could afford fine-grained annotation and tagging of news articles. Our approach applies machine learning techniques interactively, as part of the writing process, rather than retroactively. This approach can offload the burden of work to the computational processes, and can create affordances for journalists to augment, edit and correct those processes with their knowledge.

This prototype is comprised of a simple text editor (shown on the left), supported by a set of networked microservices (visualized on the right). The microservices shown here are recurrent neural networks (using https://code.google.com/p/word2vec/) that are trained to apply New York Times tags to free text, but you can imagine a host of other services that could do things like try to attribute quotes or that know about specific domains like food or sports. As the journalist is writing in the text editor, every word, phrase and sentence is emitted on to the network so that any microservice can process that text and send relevant metadata back to the editor interface. Annotated phrases are highlighted in the text as it is written. When journalists finish writing, they can simply review the suggested annotations with as little effort as is required to perform a spell check, correcting, verifying or removing tags where needed. Editor also has a contextual menu that allows the journalist to make annotations that only a person would be able to judge, like identifying a pull quote, a fact, a key point, etc.

Popular posts from this blog

Elizabeth Holmes Discusses Theranos at WSJDLive 2015

Elizabeth Holmes Discusses Theranos at WSJDLive 2015 Elizabeth Holmes Discusses Theranos at WSJDLive 2015 At the WSJDLive 2015 conference, Theranos founder and CEO Elizabeth Holmes discusses her company's proprietary technologies, the FDA's inspection of its facilities, and the assertion that her company was too quick to market its products.