Leroi and big data analysis of music notes



The history of popular music has long been debated by

philosophers, sociologists, journalists, bloggers and pop stars

[1–7]. Their accounts, though rich in vivid musical lore and

aesthetic judgements, lack what scientists want: rigorous tests

of clear hypotheses based on quantitative data and statistics.

Economics-minded social scientists studying the history of music

have done better, but they are less interested in music than

the means by which it is marketed [



We obtained 30-s-long segments of 17 094 songs covering 86% of the Hot 100, with a small bias

towards missing songs in the earlier years. – problems with database


However, where these early studies focused on technical aspects of

audio such as loudness, vocabulary statistics and sequential complexity, we have attempted to identify

musically meaningful features.


To relate the

T-lexicon to semantic labels in plain English, we carried out expert annotations (electronic supplementary

  • Assignment of meaning. Coding the database.


Inherently dissonant (because of the

tritone interval between the third and the minor-seventh), these chords are commonly used in Jazz to

create tensions that are eventually resolved to consonant chords; in Blues music, the dissonances are

typically not resolved and thus add to the characteristic ‘dirty’ colour. Accordingly, we find that songs

tagged BLUES or JAZZ have a high frequency of H1 – all this data crunching to tell you that that 7th chord is used in jazz


After 1990, the frequency of T1 declines: the reign of the drum machine – shows how wrongheaded conclusions can be drawn from bad data (are drum machines really not being used any more? Could we reinterpret quantized live drums as drum machines?)


Popular music is classified

into genres such asCOUNTRY



(R‘N’B) as well as a multitude of

subgenres (DANCEPOPSYNTHPOPHEARTLAND ROCKROOTS ROCK etc.). Such genres are, however,

but imperfect reflections of musical qualities. – misunderstands the importance of genre for the sake of creating easy database



uses last.fm for reliance on categorizing songs. Last.fm users representative? Unbiased?


The history of popular music is often seen as a succession of distinct eras, e.g. the ‘Rock Era’, separated

by revolutions [3,6,14]. Against this, some scholars have argued that musical eras and revolutions

are illusory [5]. Even among those who see discontinuities, there is little agreement about when they

occurred. The problem, again, is that data have been scarce, and objective criteria for deciding what

constitutes a break in a historical sequence scarcer yet.


  • Was the point in these debates and histories to actually answer the question definitely? This guy appears to think so. I see it more as a way to create new narratives and ways on interpreting culture?



Those who wish to make claims about how and when popular music changed can no longer appeal to anecdote,connoisseurship and theory unadorned by data.


  • Ways of knowing


Acknowledges two limitations 1.) classifications only based on partial song extract. They are more complex. Says they are justified by fucking last.fm data. Another algorithm at work. Strawman.


2.) Database is limited to hot 100. Just argues for more data.


Can’t explain causes. Example of MTV raps is lame. Shows problem of data capture. Rap had been around for almost a decade before becoming siginificant in the charts.


Weanticipate that the study of cultural trends based upon such datasets will soon constrain and inspire

theories about the evolution of culture just as the fossil record has for the evolution of life