To say South Park isn't the most family friendly of TV shows is putting it lightly -- it aims to offend as much as humanly possible. It has its fair share of swearing, but which character has the least control over their language? You might be surprised... or not, depending on how perceptive you are.
This post was originally published on Gizmodo Australia.
Kaylin Walker decided to text mine the show's transcripts -- 18 seasons' worth, to be exact -- to find out the word usage of the show's characters, including Cartman, Stan, Kyle, Butters, Randy and even Kenny.
How did Walker do it? For the technically minded:
The programming language R and packages tm, RWeka and stringr were used to scrape South Park episode transcripts from the internet, attribute them to a certain character, break them into ngrams, calculate the log likelihood for each ngram/character pair, and rank them to create a list of most characteristic words/phrases for each character. The results were visualized using ggplot2, wordcloud and RColorBrewer.
There are a number of graphs and charts in Walker's analysis, but here are the big ones:
Source: Kaylin Walker
Yep, Kenny is the swear king, which probably has to do with the fact you can barely understand him. I was surprised to see Kenny with such a commanding lead, but there you go.
Text mining South Park [Kaylin Walker]
Top image: Comedy Central