|
Maths CAA Series Mar 2002: Assessing ICT Assessment in Mathematics | |||||||||||
|
Home |
Page Guide: Home > Articles > maths-caa-series > Assessing ICT Assessment in Mathematicsby
G. R. McGuire g.r.mcguire@ma.hw.ac.uk and M. A. Youngson m.youngson@ma.hw.ac.uk Department of Mathematics, Heriot-Watt University, Edinburgh EH14 4AS. Index to article
|
|||||||||||
The growth in ICT assessment in recent years means that it is now being carefully examined for its possible use by examination boards [1, Harding and Raikes]. If ICT assessment is to become a significant part of public examinations, both candidates and examiners will wish to be reassured that ICT examinations and their results are somehow comparable with those from the more traditional examinations taken on paper. The most obvious first question that could be asked is whether it is possible to take a paper-based examination and set and mark exactly the same examination by computer. At the present time, the examinations which come closest to satisfying these criteria are those containing multiple choice questions and there have been several studies of comparisons between performance in multiple choice type examinations, for example [2, Lee and Weerakoon]. However most mathematics examinations do not contain multiple choice questions. Typically a question in a paper-based examination requires the candidate to perform some computations to obtain an answer in the form of a mathematical expression. Conversion of such a question for an ICT examination may require rewording to fit it into the computer assessment package. As a result the marking process may be different. Therefore, at present, the answer to the obvious first question is no. Nevertheless if this question is modified to ask if it is possible to compare the marks obtained using assessment by computer and those from assessment on paper it is possible to make some progress on answering this revised question. This article shows how this revised question for mathematics examinations might be tackled and also gives a summary of some results that have been found in exploring the answer to this question. The CUE assessment package was used for the construction of the ICT examinations [3, Beevers, 4 CALM Group].
To illustrate some differences between paper examinations and ICT examinations, let us take a step by step look at what is involved in the examination process of a mathematics examination. It is assumed that the syllabus, learning outcomes, performance criteria and other objectives have already been identified. We start at the point where the questions are to be set to test these objectives. Therefore the first step in our analysis is to set questions to test the candidates knowledge of this syllabus. These are then packaged together in some format to form an examination. This examination is then delivered to the candidates. They have to read and understand the questions and produce answers to as many of them as they can. These answers are marked and the results used to inform the candidate of their success or otherwise in the examination. Although it may look as if we are starting with the well-known product (the paper examination) and comparing the new product (the ICT examination) with it, much of what follows is equally valid for comparing the two products on their own merits. If an ICT examination was being made without reference to a paper-based examination some ground rules have already been established for its construction [5 Bull and McKenna, 6 Beevers et al, 7 Lawson].
2.1 Setting the questions
There is usually little difficulty in producing on the computer screen any mathematical symbols or diagrams that could appear in a paper-based examination. However the questions sometimes may require changes because the form of the required answer cannot be marked by computer, often due to limitations of the assessment package. For example, consider the following question on paper.
" The points (1,3) and (2,5) lie on a straight line. What is the equation of the line?"
The candidate can give the answer not only as y = 2x +1, but also as y-1 = 2x or perhaps an equation which is any multiple of the expected answer. To avoid such additional possibilities the computer question could be phrased as follows.
" The points (1,3) and (2,5) lie on a straight line. What is the equation of the line, in the form y = ?"
Many examinations will have similar types of examples where some rewording of questions is likely.
2.2 Packaging the questions to form the examination
For a paper-based examination, usually several questions will appear on each page of the examination paper with the working and answers to be given in a separate answer booklet. However an alternative format is to have fewer questions per page with space for rough working and answers in the question paper. For ICT examinations less information can be seen at any one time compared to paper-based examinations due to the size of the computer screen. Typically only one question is visible to the candidate at any time with the answers to be typed by the candidate somewhere on the same screen as the question.
2.3 Answering the questions
The candidate will use pen or pencil to write out their attempts to the questions in paper-based examinations using mathematical symbols which are familiar to them. The answer (in the same notation) will come at the end of the working. In ICT examinations the candidates will again use pen or pencil to write out their rough working on paper but at the end will have to take their answer, translate it into mathematical symbols which are recognisable by the computer and type it on the screen.
2.4 Marking the answers
In an ICT examination the computer will mark an answer right (and award full marks) or wrong (and award no marks). A human marker will give full marks to a correct answer but will give partial credit to a partially correct answer. Some possible ways of introducing partial credit in ICT examinations are given in [6 Beevers et al] and these issues are discussed further in [7 Lawson].
2.5 Assessing the candidates
Because there are several different factors involved in comparing paper-based examinations and ICT examinations it is difficult to obtain a satisfactory comparison between the results. Since there are many variables involved any difference in results could be due to any one or more of these variables.
However by breaking the examination process down into its steps and changing the variables one at a time it is possible to evaluate exactly which lead to differences (if any) in results. In the project that we are currently undertaking, we have carried out several experiments to investigate which variables introduce discrepancies between the results from ICT and paper-based assessment. At present we have data from two experiments. The first deals with the reformatting and the effects of the medium and has been completely analysed while the second concerns the issues of partial credit but has yet to be completed. In the next section we give a summary of the results from the first experiment; full details appear in [8 Fiddes et al].
In the first experiment, we looked at the effect of the rewording of the questions and the effect of the change of the medium. These corresponded to the changes discussed in sections 2.1 and 2.2 for the rewording, and section 2.3 for the change in the medium. The participants in the experiment were pupils from two schools who were about to sit their Scottish Qualifications Authority (SQA) Higher Mathematics examination.
3.1 The tests
The tests for these pupils were three different test papers, each covering most topics in the syllabus, that were supplied by the Scottish Qualifications Authority. Each contained short response questions similar to those given in Paper 1 of Higher Mathematics. The time allowed for each test was 30 minutes, but it was expected that all candidates would be able to complete the paper in less time than this as we wished to ensure that time was not a factor in the experiment. The answers required general mathematical expressions as well as numbers. These tests formed the paper-based tests for our experiment. For each test, the questions were then converted into computer test questions as required by the CUE assessment package [3, Beevers, 4 CALM Group]. These became the ICT version of the tests. Some of the questions were reworded and in order to distinguish the effects of the change in the medium from the effects of the rewording a third type of test was produced. For each test, this was simply a screen dump of the corresponding ICT test. We called these the reverse translation tests.
3.2 How the tests were to be marked
Since the reverse translation tests had exactly the same words in each question and exactly the same layout for inserting answers, comparison of the marks between the ICT and reverse translation tests (both marked in the same way) would determine the significance of the medium effect. As both the original paper-based tests and reverse translation tests were done on paper, there was no medium change. However in marking paper-based tests, working is taken into account with partial credit given for answers which are partially but not completely correct as noted in section 2.4 earlier. Something similar could be done in the reverse translation tests. Space was provided opposite each question for rough working and this working could be taken into account when marking. Therefore the reverse translation tests were marked in two ways. The first was to mark only the answer, which gave us marks (called computer reverse translation marks) which we could compare with ICT test marks, as this was how the answers in ICT tests were marked. The second was to mark the rough working as well, so as to give marks (called reverse translation marks with working) which we could compare with paper-based test marks. Hence we could investigate the effect of the medium by comparison of ICT marks with computer reverse translation marks, and we could investigate the effect of the rewording by comparison of paper-based test marks with reverse translation marks with working.
3.3 Running the experiment
Pupils from two schools took part in the experiment. Several weeks before the experiment, we set up a trial ICT test (containing questions easier than Higher standard) and visited both schools to give the pupils some practice with inputting mathematical answers. The trial test (which contained random parameters) was available to them from the day of the visit until the day of the experiment so that they could practice as often as they wished beforehand. This was the only prior practice that the pupils had with ICT tests. At each of the schools, three groups of pupils were set up so that each group had approximately the same mix of gender and mathematical ability. Their mathematical ability was estimated on the basis of their previous SQA examination or Higher preliminary examination results. Each candidate sat each of the three tests one in each of the three different formats over a 90 minute period during the experiment.
3.4 Marking and analysis
All the tests in all formats were marked and two comparisons were made: paper-based marks versus reverse translation marks with working and ICT marks versus computer reverse translation marks. A statistical analysis was performed using a matched pair analysis. This involved pairing pupils from different groups on the basis of their prior mathematical ability. There were in total 62 matched pairs of whom 59 were matched for ability and gender and 3 for ability but with different genders. This number was more than sufficient to validate the statistical analysis without any additional assumptions about the data. For any given matched pair the data used was the difference in marks of the two pupils on the same test in different formats. For example one matched pair consisted of two females, one of whom sat the ICT version of test 3 while the other sat the reverse translation version of the same test. The data used from their results on these tests was ICT mark on test 3 of the first pupil minus the computer reverse translation mark on test 3 of the second pupil. The two statistical analyses were based on the 62 differences in marks for the matched pairs of pupils, either (paper-based marks minus reverse translation marks with working) or (ICT marks minus computer reverse translation marks). The null hypothesis in either case was that the true underlying mean difference was zero against a two-sided alternative hypothesis.
For paper-based marks minus reverse translation marks with working, the 95% confidence interval was (-2.745, -0.674) so the null hypothesis mean was well outside the 95% confidence interval. This meant that there was very strong evidence of a difference due to the rewording. For ICT marks minus computer reverse translation marks, the 95% confidence interval was (-1.043, 1.872). This time the null hypothesis mean is well inside the confidence interval. This meant that there was no evidence of a difference due to the medium effect.
Similar analyses were done separately for the males and females (as all but 3 pairs were matched for gender): the same conclusions were obtained.
The change from paper-based exam to reverse translation format with working included both rewording and repackaging. So it was not clear from the experiment which (if either) was responsible for the difference in marks. One further alternative possibility was that, as the candidates did not know that the rough working was to be marked, they may have been less inhibited in what they wrote there and so obtained more partial credit for what they had written. A further experiment is needed to clarify this. Candidates had little experience of sitting computer tests. In spite of this and having to use different mathematical symbols when typing answers into the computer there was no evidence of a difference in the results due to the change in the medium. This experiment should provide reassurance that, at least in this respect, ICT examinations are comparable with paper-based examinations.
This clarified the problem highlighted in section 2.3, but this was only one of several differences identified in the second section. Our second experiment (the data from which has yet to be completely analysed) explored the issue of partial credit that was mentioned in section 2.4. This involved comparison of the marks obtained in paper-based examinations which included partial credit marks and those obtained in ICT examinations with Steps. Steps in ICT examinations provide a means of awarding partial credit in questions [6 Beevers et al, 7 Lawson]. Even if these marks were comparable, the issue of what is being assessed remains to be answered. Steps may provide a strategy in the ICT examination for solving a question but this is normally not given in paper-based examinations. This means that some ICT examinations would not be an exact alternative to paper-based examinations. However by providing strategy some candidates may be able to show skills which would not be uncovered otherwise. Perhaps the question that should be being asked is what can best be examined by an ICT examination and what can best be examined by a paper-based examination so that both forms of assessment are used to maximal benefit.
[1] Harding, R. and Raikes, N. (2002) ICT in Assessment and Learning: The Evolving Role of an External Examinations Board. http://ltsn.mathstore.ac.uk/articles/maths-caa-series/
[2] Lee, G. and Weerakoon, P. (2001) The Role of Computer-Aided Assessment in Health Professional Education: a Comparison of Student Performance in Computer-Based and Paper-and-Pen Multiple-Choice Tests, Medical Teacher 23, 152-157.
[3] Beevers, C.E. (2000) Computer Aided Assessment in Mathematics at Heriot-Watt University, Maths, Stats and OR Newsletter, 1 17 - 19.
[4] CALM Group (2001) CUE Assessment System, http://www.calm.hw.ac.uk/cue.shtmll/
[5] Bull, J. and McKenna, C. (2001) Blueprint for Computer-assisted Assessment, http://www.caacentre.ac.uk/bp
[6] Beevers, C.E., Youngson, M.A. McGuire G.R, Wild D.G., and Fiddes, D.J. (1999) Issues of Partial Credit in Mathematical Assessment by Computer, Alt-J 7, 26 - 32.
[7] Lawson, D. (2001) Computer-Aided Assessment in relation to Learning Outcomes, http://ltsn.mathstore.ac.uk/articles/maths-caa-series/
[8] Fiddes, D.J.,Korabinski,A.A., McGuire G.R, Youngson, M.A. and
McMillan,D. (2002) Are Mathematics Exam Results affected by the Mode of
©Copyright,
Higher Education Academy - Maths, Stats & OR Network
Maintained by R.L.Surowiec@bham.ac.uk
Last revised: Friday, 04-Jul-2003 10:27:00 BST