Over the past few weeks I’ve been working on combining statistics and big data with film theory. I started out with trying to calculate any trends or arcs in the career of a director. Can you clearly see a director improve? Get worse? Stay consistent?
This is proving to be unfeasible, but to compensate my brain shifted to just calculating the average rating of each director. It would be interesting to see who would rate better: Martin Scorsese or Alfred Hitchcock, Spielberg or Nolan. Any and all directors now have a score attached to them.
Introducing math into film isn’t anything new, but it is somewhat rare. Cinemetrics is the study of film through measuring the individual length of each shot and charting it. What is to be gained from this though? Well it’s useful for studying trends in the speed of editing, but it also serves as a means to demonstrate the development of early cinema. Other, and arguably more grand, applications are vague and hidden away in the halls of academia. I attended a conference on the subject and while it was interesting, I couldn’t help but notice two groups of people. There were the mathematicians, who didn’t think much about content or film theory, and film theorists, who didn’t seem to consider the more mathematical side when postulating theories.
I think there needs to be an inclusion of other sets of data, but cinemetrics isn’t completely useless. Theoretically you could look at the work of individual directors and editors, as well as their work together to figure out who tends toward what kind of editing and prove how much influence an editor has on a film. The same can be measured of directors too. For example, Quentin Tarantino used one editor for most of his films: Sally Menke. Unfortunately she passed away in 2010, leaving Tarantino’s films without her influence. By measuring shot length overall and in certain scenes of the films she worked on and comparing it to the same numbers of the films she didn’t work on, we could find how much influence she had on what are normally considered “auteur” films.
Continuing to fall down the mathematical rabbit hole, let’s talk about big data. Big data is… well just that: massive collections of data about any and all subjects, for which we can thank the internet. Now big data has many applications in the real world, but you can watch a Ted Talk on that. Film-wise, our best source of data would be mass-aggregate review sites like Rotten Tomatoes, imdb, and Metacritic. For instance, you could compile all the reviews from these sites, both critic and audience, into a giant searchable database. From there you can search certain keywords and, tying it to the related scores, find out what factors matter most to us. You could determine that bad CGI is worse than plot holes, because bad CGI mentions were associated with low scores 75% of the time as compared to plot holes’ 55%. On top of that, if the actual ratings given to the films with bad CGI were on average lower than those with plot holes than you can find out to what degree.
Imagine charting trends in the relationship between critical scores and audience scores and then also connecting that back to keywords. You use the critical reactions to see how a film will be received by the audience before it comes out.
Remember Sally Menke and her influence on Tarantino? Well by using this big data, we could determine what score the films she works on usually get (even narrowing it down to genre/type) and combined with the those same scores but from the director, actors, producers, etc… (that are all balanced by their potential influence) then you might, MIGHT, be able to calculate a film’s score before it comes out.
That last one was pretty unlikely, but so are all the rest of these theoretical calculations. Art is subjective and a thousand factors go into one film. It’s entirely possible that none of these calculations would produce reliable trends. That being said, we have to try. It should be noted that I didn’t get these from anyone, I’m going off of vague memories of high school statistics and weird spreadsheet hallucinations. Math, you may be a cold field, but you have some seriously hot applications to film theory (beyond just measuring shot length).