[This is a copy of a post on my blog PhysicsStop, sci.waikato.ac.nz/physicsstop, 10 December 2013]
Recently there’s been a bit of discussion in our Faculty on how to get a reliable evaluation of people’s teaching. The traditional approach is with the appraisal. At the end of each paper the students get to answer various questions on the teacher’s performance on a five-point Likert Scale (i.e. ‘Always’, ‘Usually’, ‘Sometimes’, ‘Seldom’, ‘Never’.) For example: “The teacher made it clear what they expected of me.” The response ‘Always’ is given a score of 1, ‘Usually’ is given 2, down to ‘Never’ which is given a score of 5. An averaged response of the questions across students gives some measure of teaching success – ranging in theory from 1.0 (perfect) through to 5.0 (which we really, really don’t want to see happening).
We’ve also got a general question – “Overall, this teacher was effective”. This is also given a score on the same scale.
A question that’s been raised is: Does the “Overall, this teacher was effective” score correlate well with the average of the others?
I’ve been teaching for several years now, and have a whole heap of data to draw from. So, I’ve been analyzing it (for 2008 onwards), and, in the interests of transparency, I’m happy for people to see it. For myself, the question of “does a single ‘overall’ question get a similar mark to the averaged response of the other questions?” is a clear yes. The graph below shows the two scores plotted against each other, for different papers that I have taught. For some papers I’ve had a perfect score – 1.0 by every student for every question. For a couple scores have been dismall (above 2 on average):
What does this mean? That’s a good question. Maybe it’s simply that a single question is as good as a multitude of questions if all we are going to do is to take the average of something. More interesting is to look at each question in turn. The questions start with “the teacher…” and then carry on as in the chart below, which shows the responses I’ve had averaged over papers and years.
Remember, low scores are good. And what does this tell me? Probably not much that I don’t already know. For example, anecdotally at any rate, the question “The teacher gave me helpful feedback” is a question for which many lecturers get their poorest scores (highest numbers). This may well be because students don’t realize they are getting feedback. I have colleagues who, when they give oral feedback, will prefix what they say with “I am now giving you feedback on how you have done” so that it’s recognized for what it is.
So, another question. How much have I improved in recent years? Surely I am a better teacher now than what I was in 2008. I really believe that I am. So my scores should be heading towards 1. Well, um, maybe not. Here they are. There are two lines – the blue line is the response to the question ‘Overall, this teacher was effective’, averaged over all the papers I took in a given year; the red line is the average of the other questions, averaged over all the papers. The red line closely tracks the blue – this shows the same effect as seen on the first graph. The two correlate well.
So what’s happening. I did something well around 2010 but since then it’s gone backwards (with a bit of a gain this year – though not all of this year’s data has been returned to me yet). There are a couple of comments to make. In 2010 I started on a Post Graduate Certificate of Tertiary Teaching. I put a lot of effort into this. There were a couple of major tasks that I did that were targeted at implementing and assessing a teaching intervention to improve student performance. I finished the PGCert in 2011. That seems to have helped with my scores, in 2010 at least. A quick peruse of my CV, however, will tell you that this came at the expense of research outputs. Not a lot of research was going on in my office or lab during that time. And what happened in 2012? I had a period of study leave (hooray for research outputs!) followed immediately by a period of parental leave. Unfortunately, I had the same amount of teaching to do and that got squashed into the rest of the year. Same amount of material, less time to do it, poorer student opinions. It seems a logical explanation anyway.
Does all this say anything about whether I am an effective teacher? Can one use a single number to describe it? These are questions that are being considered. Does my data help anyone to answer these questions? You decide.