Artificial Intelligence and marking: pitfalls (2022 Update)

May 12, 2022 Terry Freedman

Assessment machine, by Terry Freedman

Introduction

Things have moved on a bit since I wrote the article below in 2018. When I say “moved on”, I mean the technology is better at generating intelligent-looking text. At least, it is better in the sense of generating text that looks convincingly human. The pedagogical issues still remain, as encapsulated in the quote by John Warner (below). Mike Sharples has written recently about his experiments with AI essay generation, and how it might be used in a teaching and learning context. He has a good stab at that, though I’m not entirely convinced — at least, not from the Twitter thread (which I’ve embedded at the end of this article). In that he says that anyone can sign up for the GTP-3 Transformer. From what I can see, you need to be a researcher, or have a lot of money to spend, or a massive amount of computer RAM. I may be wrong, of course, and there are open source alternatives available, although even then the computing requirements seem quite demanding.

A big difference between the essay generator I found and the AI approach is that the essay generator does its thing with no input whatsoever. The AI, on the other hand, takes a prompt you supply as its starting point.

The original article: computer-generated articles assessed by artificial intelligence

If, like me, you misspent your youth reading superhero comics, you’ll probably know that Spider-Man had a personal philosophy: with great power comes great responsibility. These days, we might amend that slightly: with great computing power comes great responsibility.

AI marking

A development is artificial intelligence (AI) which can mark student essays accurately. It can do this quite easily, once it has been fed enough correct essays to be able to judge an essay it has never seen before. (As it happens, automated marking is available now, but (a) for questions to which the program has been given the answer and (b) for essays which, if you use a random essay generator, can probably be used to game the system. More of that in a moment.)

This all sounds wonderful, but there are potential problems that we really ought to be talking about now.

The black box problem

The first is that AI as it works at the moment is a black box. It reaches conclusions in a way that is hidden from view. In other words, we often don’t know how the program produced the result it did. Indeed, as Rose Luckin points out in her latest book, Machine Learning and Human Intelligence, the program itself doesn’t know how it reached the conclusion. It has no self-awareness or meta-cognition: it doesn’t actually know how it ‘thinks’.

This means that, from a philosophical point of view, we are prepared to take the word of a program that can process data much quicker than we ever could, but which has no idea what it’s doing. Unfortunately, even if you have little time or patience for philosophical considerations, there are practical pitfalls too.

Automation bias

Allow me to introduce something known as ‘automation bias’. This is where people trust technology more than they trust a human being. I came across a good example of this a few years ago, when I was inspecting the computing department of a school. The assessment program they were using took the students’ answer to test questions, and then told the teacher what ‘level’ the students were on. There was no indication of how it worked them out.

A teacher showed me two graphs of his students’ achievement, as measured at the start and the end of a term, using that program:

Before and after: “See, it’s gone up.” Picture credit: Assessment graphs, by Terry Freedman

“See?”, he said. “The numbers have gone up!”.

“Yes”, I said, “But what do the numbers actually mean?”

He looked incredulous that someone could actually ask such a stupid question. “Who cares? They’re higher, aren’t they?”

That’s a great example of automation bias. When it comes to AI, when the computer tells you an essay is worth a B+, you are inclined to believe it without question. After all, the AI has ‘learnt’ what a good essay looks like, so it must be right. This attitude will dramatically lower the usefulness of an AI system that marks essays. As unlikely as it sounds, one of your students could come up with a completely new theory about, say, Economics. (It has been known: when J.M.Keynes was asked why he had failed his Economics examination at Cambridge, he replied that it was because he knew more about Economics than his professors.) Since the AI has learnt what the ‘correct’ answer is, it will mark the student’s essay as wrong. Imagine what would have happened (or not happened) had Newton, Copernicus or Darwin been assessed by an automated essay marker.

What are the students’ misconceptions?

A related danger is that, if the AI is correctly marking the essay without any input from a teacher, the latter has no opportunity to see what misconceptions the student has developed. If you believe, as I do, that the purpose of education is learning stuff, then this process entirely misses the point. Of course, if the purpose of ‘education’ is to give students’ work grades, I suppose it’s fine. (In which case, I think you’ll enjoy, and find useful, 6 Ways To Respond To Requests For Pointless Data.)

I’m reading a book at the moment called Why They Can’t Write, by John Warner. In it he poses a question that goes right to the heart of matter:

If an essay is written and no one is there to read it can it be considered an act of communication?

I think most people would have to answer “No”, which kind of renders the whole exercise pointless. Indeed, to prove the point (or not), I looked for a random essay generator on the web. There seem to be plenty, and what most of them do is the following:

Invite you type in a few parameters, such as subject matter and preferred length.
Find bits of text on the subject.
Change a load of words using synonyms, in order to get round plagiarism checkers. (See below.)
Find citations to use.

I can’t reproduce the essays because they’re behind a firewall. However, I found a computer science essay generator, created for amusement apparently, and it came up with this: (just scroll to past the purple text if you can’t be bothered to read it).

"‘My’ essay

See this content in the original post

‘My’ grade for the essay

I then parsed it through an automated essay marker, where it was awarded a grade of 94%.

To combat automation bias, and to avoid this kind of nonsense, schools need to ensure that the role of human beings is not denigrated to the extent that AI rules with no questions asked. Teachers and senior leaders must feel they have the confidence to question what the AI program is saying. Unlike people, computers don’t have empathy, and they don’t understand nuance.

My views on automated marking

Am I against all forms of automated marking? No. But it should be used as a conversation starter with students, not as an alternative to a conversation. In my article From AM To AI -- Or Why Teachers Should Embrace The Robot Revolution, published in February 2018, I wrote:

I was reading an article by Matthew Syed recently entitled Artificial intelligence will change the world, but it can’t win at darts. The article is behind a paywall unfortunately, but the nub of what he was saying was that darts looks like exactly the kind of thing that can be automated. So many things are fixed -- the size and weight of the dart, the position of the dartboard, the distance to the dartboard -- that it shouldn't take AI long to work out the optimum trajectory and velocity and so on when throwing the dart. It turns out, however, that what human darts players do is make very subtle adjustments according to variations in temperature, pressure and the slipperiness of the dart.

I believe that teachers, good ones at least, posses an analogous ability to judge a situation and respond accordingly. And in my opinion, they'd be able to do so even better if they had access to the sort of wide-ranging and deep analysis that AI is able to provide.

A note on plagiarism-checkers

I wonder why people think that teachers need plagiarism checkers. As a teacher I could always tell if students had copied things out of books.

For one thing, the dramatic departure from their usual writing style was a bit of a give-away.

Secondly, I was familiar with the conventional wisdom promulgated by the standard textbooks in my subject.

In the case of grades, teachers need to feel they have the right to question unexpectedly bad marks (or unexpectedly good marks). If the student whose essay is marked as grade F is usually a high-flier, then it’s better to look into it rather than meekly accept the computer’s decision.

The era of autonomous AI in schools may be some way off, but is probably closer than we might think. In a situation in which the computer is crucial to many key decisions, how will you ensure that those decisions can be questioned?

Mike Sharples’ Twitter thread:

This is the thread I mentioned right at the start of this article.

See this content in the original post

If you found this article interesting or useful (or both), why not subscribe to my free newsletter, Digital Education? It’s been going since the year 2000, and has slow news, informed views and honest reviews for Computing and ed tech teachers — and useful experience-based tips.