SciTech

Political speech patterns analyzed

Readability analyses are usually utilized by teachers to select appropriate reading materials for students, but researchers at Carnegie Mellon’s Language Technologies Institute, including principle systems scientist Maxine Eskenazi and graduate research assistant Elliot Schumacher, are interested in applying readability analysis methods to another area of natural language: presidential campaign speeches.

The readability analyses of campaign speeches from the 2016 United States presidential election campaign shows that the candidates’ choice of words generally ranged from an eighth-grade level to a tenth-grade level. The exception to this conclusion was businessman Donald Trump, who speaks at a seventh-grade level. When comparing speeches of past presidents, they found that George W. Bush had the lowest grammatical level, with a 5th-grade average, and Abraham Lincoln the highest, who ranked far above a 10th grade level. Among the candidates, grade-level speech patterns are between sixth and seventh grade, except for Trump. From their analysis paper, readability of a document is defined as the reading level, from grade one to grade 12.

“[A candidate’s grade level] is determined by looking at the lexical contents and the grammatical structure of the sentences in a document,” Eskenazi said in a Language Technologies Institute press release. “It is based on the observation that some words (and grammatical structures) appear with greater frequency at one grade level than another. For example, we would expect that we could see the word ‘win’ fairly frequently in third grade documents while the word ‘successful’ would be more frequent in, say, seventh grade documents. We would not see dependent clauses very often at the second grade level, whereas they would be quite frequent at the seventh grade level.”

Researchers use a readability model, REAP, for this analysis. This model uses lexical components (word choice) and grammatical constructions (syntactical complexity) to measure a certain individual’s speech grade-level. The model is based on a database that contains sets of texts for each grade level. Texts mostly come from written assignments that teachers have published on their websites from students of each grade level. The model analyzes two aspects of readability: lexical and grammatical.

The lexical reading difficulty measure is based on the smoothed individual probabilities of words occurring at each reading level. The grammar reading difficulty measure is based on typical grammatical constructions in the sentences of each grade level.

“Assessing the readability of campaign speeches is a little tricky because most measures are geared to the written word, yet text is very different from the spoken word,” Eskenazi said. “When we speak, we usually use less structured language with shorter sentences. The readability measure we use looks at how often words and grammatical structures occur at each level (Flesch-Kincaid looked at length of sentences and words). In this way, the measures we used are a bit more reflective of any type of language, written or spoken.” One of the main issues was the tension between written language grade-levels and those of spoken language, which tends to be more conversational, and thus less complex.

In order to standardize, the researchers made adjustments to reflect language as a whole, incorporating both spoken and written language in their analyses.

Researchers also computed the standard deviation of lexical and grammatical grade level for each candidate. The result reveals the degree to which the candidate changes their choice of words from one speech to another. Researchers note that Hillary Clinton, for words, and Trump, for grammar, have the most variabile levels from one speech to another.

This could indicate that they are the candidates who are trying the most to be well understood by their audience. Researchers are continuing to add more speeches to their database. “With more speeches and speeches of different candidates before the same audience, we may learn something more,” Eskenazi said.