To say South Park isn’t the most family friendly of TV shows is putting it lightly — it aims to offend as much as humanly possible. It has its fair share of swearing, but which character has the least control over their language? You might be surprised… or not, depending on how perceptive you are.
This post was originally published on Gizmodo Australia.
Kaylin Walker decided to text mine the show’s transcripts — 18 seasons’ worth, to be exact — to find out the word usage of the show’s characters, including Cartman, Stan, Kyle, Butters, Randy and even Kenny.
How did Walker do it? For the technically minded:
The programming language R and packages tm, RWeka and stringr were used to scrape South Park episode transcripts from the internet, attribute them to a certain character, break them into ngrams, calculate the log likelihood for each ngram/character pair, and rank them to create a list of most characteristic words/phrases for each character. The results were visualized using ggplot2, wordcloud and RColorBrewer.
There are a number of graphs and charts in Walker’s analysis, but here are the big ones:
Source: Kaylin Walker
Yep, Kenny is the swear king, which probably has to do with the fact you can barely understand him. I was surprised to see Kenny with such a commanding lead, but there you go.
Text mining South Park [Kaylin Walker]
Top image: Comedy Central
Comments
4 responses to “A Revealing Analysis Of Word Frequency On South Park”
Wow, Chef must have had a pretty big mouth if he’s still pretty high while been gone for a long time.
It looks like it’s profanity per 1000 words. So it’s not like he swore more times than Kyle, just that his swear density is higher.
How revealing.
Cartman doesnt say ‘stan’ and kyle talks about mr hankey a lot even though hankey was only in a few episodes in early seasons.
Randy’s songwriting skills really bumped Lorde up the list.