Book Analytics

A good novel can be read on many levels. There is always a superficial layer, the story itself. A compelling story can be followed and enjoyed by the widest targeted audience. Beneath the surface, there are often layers of complexity and literary devices at play. Metaphor, themes and satire can be cloaked or revealed transparently. We all learn about this in grade school, and some go further in university really dissecting books for everything the author intended (or maybe didn’t) to present the reader. Here is a slightly, more data-driven way to dig into a book. I loaded the entire content of both On Swift Wings and Gulliver’s Travels into a data analytics workflow to compare and contrast the styles and contents. A few tools used here include sentiment flow, word correlation, word complexity and vocabulary. There are some fascinating details that can be revealed. I hope you’ll enjoy this data analysis of these two novels.

By the way, the script I wrote takes about 5 seconds to run once I have the manuscript, whether from Project Gutenberg or a text/word file. If you’d like to see the same analysis about your book, or a favourite public domain book, just let me know.

Comparing Sentiment Flow

Sentiment Flow Analysis in On Swift Wings
Sentiment Flow Analysis in Gulliver’s Travels

I think these two graphs are particularly interesting. The top two bar charts are an analysis of sentiment value in On Swift Wings (my book) and the bottom two are for Gulliver’s Travels. You’ll note that I’ve blocked out the end of On Swift Wings. I don’t wish to spoil any surprises about whether the ending is happy or sad.

For background, the BING model determines a raw count of whether a word should be deemed “Positive” or “Negative.” Simply put, if the bar is above the line, then the corresponding 1% of the book has more positive words than negative ones. The AFINN model scores different words according to whether they are very positive, somewhat positive, somewhat negative, or very negative and assigns a value that way. In this way, the AFINN model measures the use of emotions with strength. Words like “Torture” and “Ecstasy” bear a greater weight than “Good” or “Bad.”

The first interesting finding is that in general I use quite a few more negative words than Jonathan Swift. The overall balance in terms of raw scores flows from positive at the beginning of the novel to more negative in the later stages. Swift tends to be more positive throughout, in fact, using more positive words particularly near the end of Gulliver’s Travels. (Note, I’m still not talking about the actual end of On Swift Wings.)

While I tend to use more negative words than positive, by weight (AFINN) On Swift Wings has a similar weighted score to Gulliver’s Travels. Most parts of the book are positive, and to a similar degree to Gulliver’s Travels. I think this is particularly interesting. Evidently, I use stronger, more impactful words to counterbalance a general negativity.

Sentiment Word Maps

On Swift Wings Sentiment Cloud
Gulliver’s Travels Sentiment Cloud

To the point about the strength of words used, these word clouds illustrate for each book how commonly different words are used that carry sentiment (size of font) and how impactful that sentiment is (lighter colour = less impact). Both books show many similar words (Great, Like, Good, No, Dead), but there are differences. There are a greater quantity and distribution sentimental words used more frequently in On Swift Wings. Both images were generated using the same code, the difference in shape is due to a difference in style. I invite you to look at the words and compare them yourself. I could look at these two figures for hours.

Again, if you’d like to see your favourite public domain novel, let me know, I’ll run the script and send the results. (I’ll probably put the code on GitHub soon too)

Word Correlation Map

On Swift Wings Correlations
Gulliver’s Correlations

These two figures demonstrate word combinations. Words that are used frequently together are connected. The more often, the thicker and brighter the line. Again, many differences can be seen between the two works. I tend to use a few words together frequently while Mr. Swift has a few clusters of interconnected words, and few other patterns he repeats.

Word Summary Statistics

Measure On Swift Wings Gulliver’s Travels Comparison
Word Count121,426104,280116%
Unique Words11,8488,359142%
Unique Word Ratio9.758.02122%
Average Word Length6.396.21103%

Here’s a really quick little analysis counting the number of words, how many of them are unique, what the ratio of unique to total words is and average word length. It isn’t a valid measure of quality, but On Swift Wings is 16% longer than Gulliver’s Travels, there are 42% more unique words in On Swift Wings, and each word is on average 4% longer. Reading On Swift Wings, you’ll encounter a new word approximately 22% more frequently than reading Gulliver’s Travels.

Before the hate rains down, please remember that this is all good fun. Gulliver’s Travels is a great book, and I strongly recommend it. I only hope that On Swift Wings will be intriguing and entertaining as well.

Weekly Review Section

Thank You Stewart Adams

I received my first review on Amazon this week! As hoped, the book is a challenging but rewarding read. Please keep the reviews coming! or, Goodreads, Indigo. Reviews are desperately needed to spread the word and get the book in front of more readers. Please.

An interesting modernization of Gulliver’s Travels. There are some great concepts in the book including “perfect” societies and how one person can make a difference.
It is not an easy read due to the meaty sentences, but I am glad I read it.

Worth your time. Stewart Adams – Amazon Review

Cash and a Cold Start

It has been an interesting couple of weeks. My book has now been out for just under three months. This means that I’m starting to get my first royalty payments. In a typically convenient moment, during a span of two hours today, I ran into two things related to the top of my mind issue I’m dealing with right now. (Reviews – Please Review On Swift Wings)

  • The first relates to a favourite cartoonist of mine, Brian Gordon, who is releasing his third book shortly. If you’re a parent, I guarantee that you’ll find his work funny. I’d definitely recommend his books. He posted about the importance of pre-orders for a struggling author. Getting pre-orders helps deal with my second related event.
  • The second came while I was working on a data science course as part of my other job, the one that keeps me from struggling. It was talking about recommender systems like those used on Netflix and Amazon, and the “Cold Start” problem, where until an item has a certain number of reviews, and a sufficient number of people have commented, recommender systems are generally incapable of recommending an item.
Cold Start
Thawing out the cold start

Anyway, I’m trying to figure out an incentive to get reviews online that doesn’t fall foul of the rules and regulations put forward by Amazon and co. I’m not allowed to buy reviews or have family review it, and I don’t intend to risk it.

The other cool thing as mentioned previously is that I got my first royalty payments this week. This is for the few pre-orders that I did receive. Since I didn’t really try to drive pre-orders on my first book, I didn’t expect or get many, but it is pretty cool to get a little money. Now I get to watch the money trickle in.

A little update on the Immortals – book #2. I’m now working again on the plan for the book. I had put it down for a couple weeks to focus on other things, but I’m back at it. I currently have about twenty pages of notes. I think I might show how data science-y I am in a subsequent post, demonstrating my tabular approach to planning, making sure that I am handling all of the themes, characters, and plotlines appropriately throughout the novel. I’ll also show some of the natural language analysis I did of the first book when it was getting close to completion, as compared to Gulliver’s Travels, particularly around sentiment analysis.

Coming Soon: Data Science and Novel Writing