2 min read

Text Analysis - Moby Dick

Moby Dick is Herman Melville’s epic novel of 1851. It tells the story of Captain Ahab’s pursuit of a great white sperm whale that has bitten off his leg. He risks his own life and that of his crew on the whaling ship Pequod. He is gripped by a narcissistic rage in his single-minded voyage of revenge. The tale is narrated by Ishmael, taking part in his first whaling expedition, and we encounter a multi-national crew including Queequeg, Starbuck, Stubb, Tashtego, Flask and Daggoo. The story is interspersed with detailed chapters on whales, almost in the form of a mini encyclopedia.
Below is an analysis of the text:

First, download the book from Project Gutenberg and display the output:

## # A tibble: 23,571 x 3
##    text               linenumber chapter
##    <chr>                   <int>   <int>
##  1 MOBY DICK;                  1       0
##  2 OR THE WHALE                2       0
##  3 ""                          3       0
##  4 by Herman Melville          4       0
##  5 ""                          5       0
##  6 ""                          6       0
##  7 ""                          7       0
##  8 ""                          8       0
##  9 "  CHAPTER 1"               9       0
## 10 ""                         10       0
## # ... with 23,561 more rows

Then, tidy the text into a more manageable format

linenumber chapter word
1 0 moby
1 0 dick
2 0 or
2 0 the
2 0 whale
4 0 by
4 0 herman
4 0 melville
9 0 chapter
9 0 1

Next, remove the stop words – the uninteresting, common words, such as: I, me, my, myself, we, our, ours, ourselves, you…

Of the remaining words, find the book’s most frequent and display the top 10 in a table and then in a word cloud:

word n
whale 1094
sea 451
ahab 436
ship 431
ye 430
head 343
time 332
captain 308
boat 291
white 282

The R code used is available on github