Monday, 18 May 2020

The Treehouse of Horror

This time I'm gonna have a look at a dataset provided by PromptCloud on Kaggle. They scraped IMDb for horror movies released between January 2012 and October 2017.


Source: IMDb





As databases are being constantly updated, information in this dataset might be considered out of date (eg. review rating).
There are also inconsistencies in release country and dates. While let's say Ti West's The House of the Devil was released already in 2009 in the US, this dataset tells us about its May 2012 release in Taiwan.
I have ommited the country, however for the purpose of this blog, I kept the (sometimes imprecise) dates of release.
There was also a lot of information missing for many movies, such as budget or age restriction.



It appears that horror is still seen as a rather trashy genre where even masterpieces (subjectively) like Mike Flanagan's Oculus or the already mentioned The House of the Devil rarely score a rating above 7/10.
The mean average rating stands at 5.1, while the median is 5.0 on the dot.
If we split our movies into two groups, English spoken and non-English spoken, we can see that the non-English films are being rated slightly higher: 5.5 vs 4.9.
English is the sole language spoken in 2421 movies, followed by Spanish (96), Japanese (77) and Hindi (37).



"Emily and Eden Stevens escape one violent situation only to dive head first into another. Terrified and alone they are stranded in the dark woods only to be chased into a horrific scene in a house or horrors. They must work together to get out alive. But what is worse? What is on the inside or out?"

Sounds pretty tense, right? You are absolutely correct to assume that this is the plot of the highest rated movie in our dataset, Bonehill Road, directed by Todd Sheets and rated at 9.8.
Shared silver goes to The Carmilla Movie (Spencer Maybee) and The Theta Girl (Christopher Bickel), both 9.6.
The best non-English spoken movie is Philippine urban legend of Teniente Gimo rated at 9.1.

Source: IMDb


After having to abort the mission of finding out whether there is a connection between budget and review ratings (due to insufficient data), I checked whether a length of a movie affects its rating.
The answer is no, there doesn't seem to be a connection, however I still find the graph visually quite pleasant. You can see that the ratings are fairly evenly distributed across different run times.




The length of scary movies seems to be pretty standardised too, with the vast majority falling somewhere between 75 and 100 minutes.
While talking about the length of our movies, sometimes we all just want a bite-size piece of terror. I took a look at movies lasting 45 minutes or less, and these are the 3 highest rated scary nibbles:

RIP (directed by Caye Casas and Albert Pintó, run time 16 minutes, rating 8.7)
Never Hike Alone (Vincente DiSanti, 22 min, 7.8)
Ovulation (Michael Wade Johnson, 45 min, 7.5)





Some genres (such as soaps) have their dedicated actors and horror movies are no different. So who were the busiest actors between 2012 and 2017?
After a simple count, I found out that the main man Lloyd Kaufman appeared in 29 movies. I am assuming these were not major parts as that would be near impossible.
So I worked on the premise that actors are listed by the importance of their role and counted only the first 3 for each movie. And there are still some true champs out there:

Eric Roberts, 20 movies
Debbie Rochon, 17
Kane Hodder, 13


Debbie Rochon, Source: IMDb




One of the annoying parts of being a horror movie fan is that sometimes after watching a good movie, one finds out that it was a debut after which the director either stopped filming or moved into the waters of more profitable genres. I found the highest rated directors with more than one movie under their belt. Usually I'd go for 1 or 3, however this time I'll name four of them, as they're evenly representing USA and Japan and also (not shockingly) life-action and animation.
Ladies and gentlemen, without further ado, let me introduce you the Big Four:

Emir Skalonja, average rating 8.5
Michael Wade Johnson, 8.1
Hiroaki Andô, 8.1
Toshiyuki Kubooka, 7.9


That brings us to subgenres. It's more than common for a movie to fall into more than one category and some horror movies even have more than one subgenre, such as Infidus (Crime, Drama).
Thriller being the most popular with 1378 movies, which is not a surprise as most of teen slashers would be probably found in this group.
I am however surprised to see Drama (531) and Comedy (513) taking silver and bronze while pushing Mystery (453) out of the podium. Ghost stories might not be quite as popular.




In the graph above, please be aware that the scale is logarithmic so I could fit in Thriller with over 1300 counts while still allowing you to see subgenres with only one count.

I know you noticed that one movie falls into Adult category and you were too shy to ask. It's the Moonshine Meat Market Mayhem of course and judging by the trailer, it seems to be a real delicacy for low-to-no-budget B-movie aficionados.





Horror movies often recycle old material and scary stories in general have been doing so for centuries, if not longer. But how repetitive are we really?
I had a look at the most common words used in the titles and after discarding function words I was left surrounded by such a beautiful cliche, that my horror-loving heart started bouncing with joy!
107 movies have the word dead in their title, house appears in 71 of them, blood in 58 and night in 57.
55 movies have the digit 2 in their name, suggesting sequels are still a thing.
Other popular words are dark, devil, evil, zombie, last and massacre.

But what about actual movie names? There are 74 titles that appear more than once, however only one of them appears 3 times, and that is The Bride.
Brides are apparently not to be messed with in Taiwan, Japan and Russia.





If you remember the very first graph with missing data, a big chunk of filming locations was missing. So please take this paragraph with a pinch of salt.
Not surprisingly, the most popular location to shoot a horror movie is Los Angeles, followed by London and Vancouver.

And finally let's have a look at when is a good time to expect a horror movie in your local cinema.
There seems to be an increasing tendency in horror movie production starting with 345 movies in 2012 and rising up to 780 in 2017.
This trend might show increasing interest in horror, but might just as well be seen across all genres and further study might be needed.



It is not very surprising that the most horror titles are being released in October for the All Hallow's Eve. Second busiest month is January.











Tuesday, 5 May 2020

Keswick Parkrun

For my very first blog entry, I decided to look at something we all love and cherish - my local Parkrun.
This Saturday morning institution produces a huge amount of data. I tried to retrieve the (subjectively) most interesting numbers and here's what I found.
So double-knot your laces and let me show you the ins and outs of running your weekly 5k.

Source: Keswick Parkrun archive

Before we move on, one thing to clarify. Parkrun needs you to have your own unique barcode to record your time. If you are unable to present a barcode, the position you finished at is filled as 'unknown' and no time is allocated for this position.
For a purpose of this analysis, where a winner was such John Doe, I used the fastest known time as the winning time (while the winner's initials of are still 'Unknown').

To date, 303 Parkruns have taken place in Keswick, with the number of parkrunners being as few as 54 or as numerous as a party of 500 during the New Year's Parkrun of 2019.
With Keswick being a highly touristy and seasonal town, I expected to see significant peaks and valleys periodically every 52 events and while the seasonality isn't quite as visible as I believed, looking at rolling average, the patterns clearly emerge: high summers and low winters with X-mas and New Year Parkruns becoming increasingly popular.



On average, every week 149 runners cross the finish line, of which there are 65 ladies and 74 gentlemen. If you can't figure out how 65 + 74 = 149, that's because it doesn't. The remaining 10 account for those non-barcoded runners shrouded in mystery.
It's not too unusual, however, for ladies to beat the men in numbers as out of 303 Parkruns this happened 51 times. The most significant powerplay occurred when 79 women were accompanied by only 51 men.

On average 22 runners beat their personal best and 7 speedos finish under 20 minutes every week.



The age distribution is fairly similar in both the male and female field, Parkrun seems to be popular with people aged 40+.
The most entries from the men were in the 50-54 age bracket (followed by 45-49 and 55-59). For the ladies it was the 45-49 (followed by 50-54 and 40-44). Please note these are not unique entries, so had someone entered all Parkruns, they would've been counted 303 times.
The event has been completed 44 times by John M. in 90-94 category.



If your aim is to finish in the first half of the field, you should aim for a time better than 27:37, that will get you there during an average Parkrun.
However, if you're going for a win, you should get those legs up to speed and aim for 17:44. That of course might not be enough as the record stands at 15:29 (Sam S.).
The event has been so far won by a woman 5 times (Rosie S., Sarah T., Hayley C., Rebecca R. and Emily R.).
Although the sample is small, the ladies winning time is on average half a minute slower than the gents: 18:11 vs. 17:38.

The event has also been twice won by a runner in 11-14 age category (Nathan S. and Robin R.) and once in 55-59 category (Greg P.).

31 athletes won Keswick Parkrun more than once, the big 3 being Sam S. (42 wins), Mark L. (31) and Carl B. (13).





Keswick Parkrun has recently changed venue from a scenic out-and-back gravel course along the river with a juicy climb in both directions to a flat tarmac path in a local park.
Going through the archives I estimated this change to have happened from event #274.
Strangely enough, the overall average time is 21 seconds slower after the change, however the winning times have improved by 22 seconds.

One question I was asking myself is whether the number of people attending has an effect on the average finishing time.
My thought here was that Parkruns with more runners would be on average slower as the less attended events would consist mostly of hard-core athletes ready to brave the elements.
However, when I looked at a graph comparing the number of runners with the average time, it turned out that events with more runners tend to be more, well, average!
You can clearly see that while quiet events tend to go both ways (faster and slower), the more attended events tend to gravitate towards the average finishing time (28:30).
Notice on the very top, the huge New Years Party with 500 runners saw average time of 28:59 - not a bad going with 3 hours of sleep, ey?


And last but not least, what is a good friendly race without a sprint finish.
For a purpose of our Saturday morning shake-out, I looked at the difference between the first 2 known times.
If the gap was 2 seconds or less, I classified it as a "sprint finish".
Out of 303 events, this has occurred 35 times, out of which 19 were "super sprints" when the gap was less than a second.

That's something Keswick Parkrunners might want to work on if they want to get an attention of international press.