Analysing the tweetorial, or why we shouldn’t focus on subjectivity

Two disclaimers before getting started:

  1. I mentioned in a blog post last week that regrettably I’d had to miss the tweetorial, and was only able to cursorily glance through some of the later tweets once it was all over.  This absence, and my subsequent uncertainty about how it unfolded, strongly influenced this blog post as well as the padlet I created.
  2. I’ve noticed a tendency that I have to lean a little too heavily on the literature, critiquing others rather than trying to use my own critical voice. I know that’s normally OK, and it’s just a question of balance, but anyway, here I’m trying to counter that. No references!

OK. Here goes.

The literal answer to Jeremy and James’ first question: “how has the Twitter archive represented our tweetorial?” is reasonably simple. The archive has stored tweets which used a predetermined hashtag, and specific tweet metadata, in a way which is linear and yet unfinished. It has used the tweets – or, at least, specific elements of them – to quantify behaviour and activity. This might allow us (or a computer) to extrapolate and draw conclusions. In this sense, it all seems rather objective.

And yet it isn’t objective. The choices made about the data collected and attached, and those which are not, were subjective. They were subjective regardless of who made them – human, computer or both. The visual representation of the data is also mediated and subjective – the clue is in the word ‘representation’. It’s necessarily, inescapably reductive. The key point is that this isn’t fundamentally bad. The data being subjective doesn’t make it meaningless or inaccurate or untrustworthy. Why privilege impartiality anyway?

And, moreover, the charge of subjectivity is easily dealt with. The quantified facts the archive presents are of course not the ‘whole picture’ (whatever that is). The conclusions we draw ought to be questioned. We should ensure that the non-quantifiable (tiredness, workload, scepticism) is considered too. There is scope for multiple interpretations, all at the same time (as I tried to show in the padlet). The ways in which the analytics are presented may or may not have educational value; we cannot be conclusive as this depends on the individual. It will motivate some while demotivating others. It will give some confidence while causing others to question themselves. There is space for all of these attitudes concurrently. The archive can’t tell us whether learning happened, or didn’t happen, or the quality of it: it was never intended to do so.

So, for me, the problem – the danger, even – with analytics like this isn’t that they’re subjective. It lies instead in their inescapable finality, even as the data collection is ongoing. The finality easily gives way to become ‘authority’, and the platform doesn’t particularly lend itself to that authority being questioned. Given the sheer number of tweets, searching and retrieving them is not simple. You can’t retrospectively change the choices made about the data which is collected, if you can change them at all. The platform does not allow it. That’s another choice, by the way. And again, it doesn’t really matter who made it.

Our ability to answer the questions set by Jeremy and James (or in my case, inability) is so fundamentally predicated on the fact that it happened last week. Our ability to identify where the data collected is subjective, and where or why this is problematic, is based on the same thing. We were there, we can remember it, so we can interpret it. And yet the fixedness, the finality, and the stability of the archive has to be compared with the fleetingness of the qualitative information and individual interpretation that we’re using to gloss it. Right at this moment, we can question the archive. Right at this moment, we know better. We have authority. But it’s temporary. After all, the data will last longer.

 

Analytics padlet

I had a go at making a padlet as a way of commenting on the tweetorial analytics. I’ve taken five of the separate ‘analytics’, and offered sometimes conflicting and sometimes totally contradictory interpretations. Most of them are reasonable, though, if a little tongue-in-cheek. Some of them are complimentary, some less so and some potentially rather damaging.

This is borne of my absence during the tweetorial, and the subsequent and fundamental decontextualisation, for me, of the data provided. But I also don’t want to suggest that the analytics are objective, and that it is only interpretation which is subjective – I take this argument up later in my blog post.

So click the image above to see it, or go here.

Goodreads and algorithms, part the definite last

Good recommendation algorithms are really (really!) difficult to do right. We built Goodreads so that you could find new books based on what your friends are reading, and now we want to take the next step to make that process even more fruitful.

This quotation is from the Goodreads blog, a post written by Otis Chandler, a Goodreads CEO. The “next step” to which he refers is Goodreads’ acquisition of the small start-up, Discovereads, which was developing algorithms around book recommendations. The algorithms used by Discovereads were multiple, based on book ratings from millions of users, and tracking data patterns of how people read, how they rate, the choices they make, what might influence them.

It’s roughly based on the sorts of algorithms that drive Netflix, though there’s an obvious difference between the two platforms, and it’s not the type of content. Goodreads isn’t a publisher nor a producer of its own content; it isn’t promoting its own creations but rather can influence the user to spend money in a way that Netflix, which works to a different economic model, may not. Chandler admits this: one of the goals in adopting the Discovereads algorithm is that it will improve marketing strategies, ensuring that sponsored content (books promoted to users) will be more up their street.

Given this, then, it’s possible to say that the way recommendations work in Goodreads is based on at least three things:

  1. The ratings provided by an individual at the point they sign up – part of the process of getting a Goodreads account is adding genres you’re interested in, and “rating” a (computer-generated) series of books
  2. The algorithms at play are monitoring human patterns of reading and rating and, presumably, analytics and big data collected on what might encourage a person to add a recommended book to their lists (and perhaps, too, to their shopping basket)
  3. The Amazon connection: the fact that Goodreads isn’t providing its own content, and that it’s owned by Amazon, makes a particular sort of economic link. Not only does it incentivise Goodreads promoting specific economic content, but it means that Goodreads can influence how and where consumers’ money is spent. Presumably analytics based on how often Goodreads’ recommendations leads to a purchase is fed back into the recommendation system to improve upon it.

Knox (2015) suggests that actor-network theory might account for the “layers of activity involved” in the complex, often hidden, and often automated ways in which humans and non-humans interact in the development and deployment of algorithms. One of the principal benefits of this approach (and there are many) is that it inherently assumes that the human and non-human are working together. This is not always self-evident, and the quotation at the top of this post suggests that the two are seen to be in opposition. The incorporation of the Discovereads algorithm, it is implied, will lead to a fundamentally different way of generating recommendations. It signals a move from human-generated recommendations (what your friends are reading) to computer-generated ones, based on this algorithm.

The responses to the blog post written by Chandler suggest that this binary is presupposed by Goodreads users as well. The posts below, for example, clearly espouse the benefits of both ‘routes’ to recommendations. But they suggest that recommendations are either human- or computer-generated: there’s no indication that non-human interference in extant friend-generated recommendations, nor any human influence in the computer-generated ones. It’s a code-based version of the binary we’ve encountered lots in the past eight weeks: the perception that the options of technological instrumentalism and technological determinism are the only ones.

The reality, of course, is that it’s a false binary. It’s not a choice of human or non-human but – as Knox outlines – both are present. The difference, then, to which Chandler refers, the change heralded by the acquisition of Discovereads, isn’t necessarily in the source of the content, but in the perception of that source. It’s in the perceived transparency or hiddenness of the algorithm.

References

Chandler, O. (2011). Recommendations And Discovering Good Reads. Retrieved 11 March 2017, from http://www.goodreads.com/blog/show/271-recommendations-and-discovering-good-reads
Knox, J. (2015). Critical Education and Digital Cultures. In M. Peters (Ed.), Encyclopedia of Educational Philosophy and Theory (pp. 1–6). Singapore: Springer Singapore. https://doi.org/10.1007/978-981-287-532-7_124-1

Goodreads and algorithms, part the fourth

In this (probably) last instalment of experimenting with the Goodreads algorithm, I’m particularly playing with specific biases. Joy Buolamwini, in the Ted talk I just watched (and posted), says this:

Algorithmic bias, like human bias, results in unfairness.

It would be hard, I think, to really test the biases in Goodreads, and especially insufficient to draw conclusions from just one experiment, but let’s see what happens. I’ve removed from my ‘to-read’ shelf all books written by men. I’ve added, instead, 70 new books, mostly but not exclusively from lists on Goodreads of ‘feminist’ books or ‘glbt’ books [their version of the acronym, not mine]. Every single book on my ‘to-read’ shelf is written by someone who self-identifies as female.

And after a little while (processing time again), my recommendations were updated:

Of the top five recommendations, 1 is written by a man (20%); of the fifty recommendations in total, 13 are written by men (26%).

I then reversed the experiment. I cleared out the whole of the ‘to-read’ shelf, and instead added 70 books, nearly exclusively fiction, and all written by people who identify as male.

And again, a slight pause for processing, and the recommendations update. Here are my top five:

Two of the top five books recommended are written by women, and of the 50 in total 7 were by women (14%).

So when the parameters are roughly the same, and with the very big caveat that this may be a one-off, it seems that Goodreads recommends more books by men than by women. Is this bias? Or just coincidence? Probably quite difficult to tell with just one experiment, but it may be worth repeating to learn more.

Finally, one weird thing. In both experiments, there were two books that appeared on the full recommendations list. One is by Anthony Powell, A Dance to the Music of Time which, given the general gravitas of the books I added in both experiments, is fairly understandable. The other, though, is this:

 

Bill Cosby’s ‘easy-to-read’ story, aimed at children, is included because I added John Steinbeck’s East of Eden? Unfortunately I have no idea why it was in the women-only list, because I didn’t check at the time, but that feels like a really, really peculiar addition.

Goodreads and algorithms, part trois

So far, the Goodreads recommendations based on my ‘to-read’ pile haven’t been that great, so I’ve done a few more experiments.

First, I removed from my ‘to-read’ list anything that didn’t strictly fall into the category of literary fiction or reasonably highbrow non-fiction, and I added to it six books, along a similar theme: Ulysses by James Joyce, Finnegan’s Wake by Joyce too, Infinite Jest by David Foster Wallace, The Trial by Kafka, A la recherche du temps perdu by Proust (the French version, no less), and The Brothers Karamazov by Dostoevsky.

And not much changed. Mainly because it doesn’t update automatically – again I’m noticing a delay in the algorithm working. But I noticed something else when deleting things from the list. Goodreads automatically ranks the books you add to the list, in the order that you’ve added them. This makes complete sense – I expect many people choose their reading in a far less haphazard way than I do. And in any case, this explains why books about climate change were so prominent in the recommendations – This Changes Everything was first on my list.

Goodreads also allows you to edit the ranking, so I’ve moved the two James Joyce books I added to positions #1 and #2, and I’ve moved the climate change book to #20.

Again, nothing happened. The recommendations were still based on books that I had now removed from the list. I refreshed the page, logged in and out, and no change. So I went back, and added a 7th book: Robin Stevens’ Murder Most Unladylike, which is aimed at 10 year olds. And new recommendations appeared.

ALL of them are based on the items I added earlier (not the most recent addition) – you can see the first two are about Proust, and yet NONE of them are based on the James Joyce books I moved to top ranking on the list.

Goodreads and algorithms, part 2

Earlier today I went through all fifty recommendations based on my ‘to-read’ list, and tidied them up: things that genuinely suited my interested I added (seven books in total), and things that didn’t suit, I deleted.

Since then – and it’s been about four hours – my ‘to-read’ recommendations have vanished.

I’m guessing I fall into the last category here, and they’re ‘in the process of generating recommendations’. I would have expected the algorithm to work instantaneously, desperate to populate, but clearly it’s a slower process than that.

So, anyway, I then went and added three more items to the list, bringing the total up to twenty. And immediately it came back.

None of the titles above are listed here based on the three new books I just added, and three of them are a result of the same book, Under the Udala Trees by Chinelo Okparanta.

Goodreads and algorithms

I’ve been using Goodreads to track what I’m reading for the past three years, and I thought I’d investigate the algorithm that drives its recommendations. I’ve added almost 300 books to Goodreads since I started using it, nearly all of which I have read and rated, so there’s a lot of data there on my reading habits. However, I don’t use Goodreads to plan what I’m going to read next – I don’t use it as a wishlist but as a way to record things. Subsequently I currently have just ten items on my ‘to-read’ list:

It’s not the most eclectic list of literature – two non-fiction (on Russian history and climate change), and eight novels which would probably just about fall into the genre of ‘literary fiction’ (as meaningless as that is). But I feel, at least, that this list roughly reflects my reading habits.

The recommendations right now are based on three of the ten books listed: This Changes Everything by Naomi Klein accounts for three of the books, The Romanovs by Simon Sebag Montefiore is the reason why there’s now a picture of Stalin on this blog, and Amy Tan’s The Joy Luck Club is responsible for the fifth one. Interestingly (maybe), I’ve only heard of one of these books. I’m an English Literature librarian, my partner is an English Literature teacher, and my idea of a fun day out is to a bookshop.

Goodreads lets you know why the recommendation is included, which is pretty helpful.

And it also gives some guidance on how to improve your recommendations.

 

The trouble is that I’m not really that interested in reading any of these books. My goal, using the guidance above, is to get my top five recommendations to actually be helpful, to suggest books that I want to read right now (I can think of at least a dozen off the top of my head). I’m going to try to fix it so that the algorithm reflects what I want, rather than the other way around.

With my librarian hat on, it might also be useful to compare how Goodreads recommends books to how a discovery layer (also known as a library catalogue) can recommend articles and other titles – if I have time I’ll look at that too.

Betty Sneezes

So here’s my digital artefact! I must apologise for the poor quality of it – I’ve not only been very stretched for time this week but I’m a horribly unvisual person, and I can’t do images. Which explains why I cheated and put a voiceover on it. Sorry about that.

Just in case you can’t see the video, it’s meant to be a commercial for a product called Betty Sneezes, which I’ve totally made up. Betty is a robot who can detect airborne rhinovirus, which causes 80% of instances of the common cold. Betty can alert you to this, allowing you to make a swift exit and therefore remain healthy. The end line of the commercial is: “you’ll never miss work again”.

I’ve wanted right from the start to make a commercial for a product – I haven’t personally given much thought to the intersection between cybercultures and consumerism but I suspect it is totally inescapable. Both from a practical perspective and an ethical one, technology can’t be economically neutral. This too is raised in the final line – while a common cold detector sounds pretty magnificent to me, I wanted to temper this with a slightly more pernicious message about human productivity.

Betty is a skeuomorph, rather than a cyborg. I did consider instead ‘inventing’ a chip or something that could be inserted into humans, but it (a) didn’t work as well visually and (b) I wanted to make a point about infection. One of the themes in this cybercultures block is what makes humans human, and whether cyborgs render debates over the differences between humans and technology completely redundant. Cathy Hills did an ingenious mentimeter poll to see what we thought the difference was, and I found the split of the results really interesting. But I wondered if, ultimately, the difference is going to be how the diseases between us spread, so I was playing a little bit on the word ‘virus’.

Finally, I just wanted to make it really clear that everything I’ve used in the video is available from Pixabay or Pexels and licensed under CC-0. No copyright infringements here 🙂 And, very much a hat tip to Cathy for her excellent poll!