Goodreads and algorithms, part the fourth

In this (probably) last instalment of experimenting with the Goodreads algorithm, I’m particularly playing with specific biases. Joy Buolamwini, in the Ted talk I just watched (and posted), says this:

Algorithmic bias, like human bias, results in unfairness.

It would be hard, I think, to really test the biases in Goodreads, and especially insufficient to draw conclusions from just one experiment, but let’s see what happens. I’ve removed from my ‘to-read’ shelf all books written by men. I’ve added, instead, 70 new books, mostly but not exclusively from lists on Goodreads of ‘feminist’ books or ‘glbt’ books [their version of the acronym, not mine]. Every single book on my ‘to-read’ shelf is written by someone who self-identifies as female.

And after a little while (processing time again), my recommendations were updated:

Of the top five recommendations, 1 is written by a man (20%); of the fifty recommendations in total, 13 are written by men (26%).

I then reversed the experiment. I cleared out the whole of the ‘to-read’ shelf, and instead added 70 books, nearly exclusively fiction, and all written by people who identify as male.

And again, a slight pause for processing, and the recommendations update. Here are my top five:

Two of the top five books recommended are written by women, and of the 50 in total 7 were by women (14%).

So when the parameters are roughly the same, and with the very big caveat that this may be a one-off, it seems that Goodreads recommends more books by men than by women. Is this bias? Or just coincidence? Probably quite difficult to tell with just one experiment, but it may be worth repeating to learn more.

Finally, one weird thing. In both experiments, there were two books that appeared on the full recommendations list. One is by Anthony Powell, A Dance to the Music of Time which, given the general gravitas of the books I added in both experiments, is fairly understandable. The other, though, is this:

 

Bill Cosby’s ‘easy-to-read’ story, aimed at children, is included because I added John Steinbeck’s East of Eden? Unfortunately I have no idea why it was in the women-only list, because I didn’t check at the time, but that feels like a really, really peculiar addition.

Goodreads and algorithms, part trois

So far, the Goodreads recommendations based on my ‘to-read’ pile haven’t been that great, so I’ve done a few more experiments.

First, I removed from my ‘to-read’ list anything that didn’t strictly fall into the category of literary fiction or reasonably highbrow non-fiction, and I added to it six books, along a similar theme: Ulysses by James Joyce, Finnegan’s Wake by Joyce too, Infinite Jest by David Foster Wallace, The Trial by Kafka, A la recherche du temps perdu by Proust (the French version, no less), and The Brothers Karamazov by Dostoevsky.

And not much changed. Mainly because it doesn’t update automatically – again I’m noticing a delay in the algorithm working. But I noticed something else when deleting things from the list. Goodreads automatically ranks the books you add to the list, in the order that you’ve added them. This makes complete sense – I expect many people choose their reading in a far less haphazard way than I do. And in any case, this explains why books about climate change were so prominent in the recommendations – This Changes Everything was first on my list.

Goodreads also allows you to edit the ranking, so I’ve moved the two James Joyce books I added to positions #1 and #2, and I’ve moved the climate change book to #20.

Again, nothing happened. The recommendations were still based on books that I had now removed from the list. I refreshed the page, logged in and out, and no change. So I went back, and added a 7th book: Robin Stevens’ Murder Most Unladylike, which is aimed at 10 year olds. And new recommendations appeared.

ALL of them are based on the items I added earlier (not the most recent addition) – you can see the first two are about Proust, and yet NONE of them are based on the James Joyce books I moved to top ranking on the list.

Goodreads and algorithms, part 2

Earlier today I went through all fifty recommendations based on my ‘to-read’ list, and tidied them up: things that genuinely suited my interested I added (seven books in total), and things that didn’t suit, I deleted.

Since then – and it’s been about four hours – my ‘to-read’ recommendations have vanished.

I’m guessing I fall into the last category here, and they’re ‘in the process of generating recommendations’. I would have expected the algorithm to work instantaneously, desperate to populate, but clearly it’s a slower process than that.

So, anyway, I then went and added three more items to the list, bringing the total up to twenty. And immediately it came back.

None of the titles above are listed here based on the three new books I just added, and three of them are a result of the same book, Under the Udala Trees by Chinelo Okparanta.

Goodreads and algorithms

I’ve been using Goodreads to track what I’m reading for the past three years, and I thought I’d investigate the algorithm that drives its recommendations. I’ve added almost 300 books to Goodreads since I started using it, nearly all of which I have read and rated, so there’s a lot of data there on my reading habits. However, I don’t use Goodreads to plan what I’m going to read next – I don’t use it as a wishlist but as a way to record things. Subsequently I currently have just ten items on my ‘to-read’ list:

It’s not the most eclectic list of literature – two non-fiction (on Russian history and climate change), and eight novels which would probably just about fall into the genre of ‘literary fiction’ (as meaningless as that is). But I feel, at least, that this list roughly reflects my reading habits.

The recommendations right now are based on three of the ten books listed: This Changes Everything by Naomi Klein accounts for three of the books, The Romanovs by Simon Sebag Montefiore is the reason why there’s now a picture of Stalin on this blog, and Amy Tan’s The Joy Luck Club is responsible for the fifth one. Interestingly (maybe), I’ve only heard of one of these books. I’m an English Literature librarian, my partner is an English Literature teacher, and my idea of a fun day out is to a bookshop.

Goodreads lets you know why the recommendation is included, which is pretty helpful.

And it also gives some guidance on how to improve your recommendations.

 

The trouble is that I’m not really that interested in reading any of these books. My goal, using the guidance above, is to get my top five recommendations to actually be helpful, to suggest books that I want to read right now (I can think of at least a dozen off the top of my head). I’m going to try to fix it so that the algorithm reflects what I want, rather than the other way around.

With my librarian hat on, it might also be useful to compare how Goodreads recommends books to how a discovery layer (also known as a library catalogue) can recommend articles and other titles – if I have time I’ll look at that too.