A Revealing Analysis Of Word Frequency On South Park

To say South Park isn’t the most family friendly of TV shows is putting it lightly — it aims to offend as much as humanly possible. It has its fair share of swearing, but which character has the least control over their language? You might be surprised… or not, depending on how perceptive you are.

This post was originally published on Gizmodo Australia.

Kaylin Walker decided to text mine the show’s transcripts — 18 seasons’ worth, to be exact — to find out the word usage of the show’s characters, including Cartman, Stan, Kyle, Butters, Randy and even Kenny.

How did Walker do it? For the technically minded:

The programming language R and packages tm, RWeka and stringr were used to scrape South Park episode transcripts from the internet, attribute them to a certain character, break them into ngrams, calculate the log likelihood for each ngram/character pair, and rank them to create a list of most characteristic words/phrases for each character. The results were visualized using ggplot2, wordcloud and RColorBrewer.

There are a number of graphs and charts in Walker’s analysis, but here are the big ones:


Source: Kaylin Walker

Yep, Kenny is the swear king, which probably has to do with the fact you can barely understand him. I was surprised to see Kenny with such a commanding lead, but there you go.

Text mining South Park [Kaylin Walker]

Top image: Comedy Central


The Cheapest NBN 1000 Plans

Looking to bump up your internet connection and save a few bucks? Here are the cheapest plans available.

At Kotaku, we independently select and write about stuff we love and think you'll like too. We have affiliate and advertising partnerships, which means we may collect a share of sales or other compensation from the links on this page. BTW – prices are accurate and items in stock at the time of posting.

Comments


4 responses to “A Revealing Analysis Of Word Frequency On South Park”