Skip to content

When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.

PLOS BLOGS EveryONE

A Way with Words: Data Mining Uncloaks Authors’ Stylistic Flair

First_Folio_-_Folger_Shakespeare_Library_-_DSC09660

As any writer or wordsmith knows, searching for the right word can be a painful struggle. Here’s comforting news: word choice may be the key to understanding your stylistic flair.

New research in the field of text mining suggests that distinct writing styles are discernible by word selection and frequency. Even the use of common words, such as “you” and “say,” can help distinguish one writer from another. To learn more about style, the authors of a recent PLOS ONE paper turned to the famed lord of language, William Shakespeare.

The researchers assembled a pool of 168 plays written during the 16th and 17th centuries. After accounting for duplicates, 55,055 unique words were identified and then cross-referenced against the work of four writers from that time period: William Shakespeare, Ben Jonson, Thomas Middleton, and John Fletcher. The researchers counted how often these writers used words from the pool and ranked words by their frequency. Lists of twenty of the most-used and least-used words were then compiled for each writer and considered “markers” of their individual styles.

Fletcher, for one, frequently used the word “ye” in his plays, so a relatively high frequency of “ye” would be a strong marker of Fletcher’s particular writing style. Similarly, Middleton often used “that” in the demonstrative sense, and Jonson favored the word “or.” Shakespeare himself used “thou” the most frequently, and the word “all” the least.

In addition to looking at individual word use, the researchers analyzed specific works where the writer’s style changed significantly, such as in Middleton’s political satire “A Game at Chess,” which was notably different from his other works. They also compared word choice between writers. Their findings indicate that, unlike his contemporaries, Shakespeare’s style was marked more by his underuse of words rather than his overuse. Take, for example, Shakespeare’s use of “ye.” Unlike Fletcher, who used this word liberally, “ye” is one of Shakespeare’s least frequently used words.

Such analyses, the researchers suggest, may help with authorship controversies and disputes, but they can also address other concerns. In a post in The Conversation, the authors of this paper suggest that the mathematical method used to identify words as markers of style may also be helpful to identify biomarkers in medical research. In fact, the research team currently uses these methods to study cancer and the selection of therapeutic combinations, multiple sclerosis, and Alzheimer’s disease.

 

Citation: Marsden J, Budden D, Craig H, Moscato P (2013) Language Individuation and Marker Words: Shakespeare and His Maxwell’s Demon. PLoS ONE 8(6): e66813. doi:10.1371/journal.pone.0066813

Image: First Folio – Folger Shakespeare Library – DSC09660, Wikimedia Commons

Leave a Reply

Your email address will not be published. Required fields are marked *


Add your ORCID here. (e.g. 0000-0002-7299-680X)

Back to top