These articles have been included in the documentation pendrive.

Assessment of molecular construction in undergraduate biochemistry

Deborah Booth, Robert C. Bateman, Jr., Rudy Sirochman, David C. Richardson, Jane S. Richardson, Steven W. Weiner, Mary Farwell and Cindy Putnam-Evans
J. Chem. Educ., 2005, 82 (12), 1854-1858.
doi: 10.1021/ed082p1854

Undergraduate students in nine classes at eastern and southeastern universities in the U.S. were evaluated regarding their attitudes towards the use of molecular visualization in biochemistry lecture courses. All classes used the same visualization software (Kinemage) in lecture and homework. Approximately two-thirds of these students, the treatment group, constructed or "authored" a series of annotated images of a macromolecular topic, usually a protein complex or protein family. Both quantitative and qualitative analysis revealed that the student authors felt they learned more than the control group students and that constructing molecular images was an effective learning tool. Perceived difficulties included the time commitment of the student authors and the challenge of learning unfamiliar software. In addition to the attitudinal assessment, a molecular graphics-based performance assessment in one class (which had both treatment group and control group students) showed no significant difference between the treatment and control group. In summary, students believe that actually constructing a molecular illustration is a more effective vehicle of student learning than viewing and manipulating molecular images. Evaluating the effectiveness regarding student learning of these new pedagogical approaches to teaching with molecular visualization will require design of new performance assessment instruments.

Student difficulties with the interpretation of a textbook diagram of immunoglobulin G (IgG)

Konrad J. Schönborn, Trevor R. Anderson, Diane J. Grayson
Biochemistry and Molecular Biology Education, 2003, 30 , 93–97.
doi: 10.1002/bmb.2002.494030020036

Diagrams are considered to be invaluable teaching and learning tools in biochemistry, because they help learners build mental models of phenomena, which allows for comprehension and integration of scientific concepts. Sometimes, however, students experience difficulties with the interpretation of diagrams, which may have a negative effect on their learning of science. This paper reports on three categories of difficulties encountered by students with the interpretation of a stylized textbook diagram of the structure of immunoglobulin G (IgG). The difficulties were identified and classified using the four-level framework of Grayson et al. [1]. Possible factors affecting the ability of students to interpret the diagram, and various teaching and learning strategies that might remediate the difficulties are also discussed.

Assessment methods in medical education

John J. Norcini, Danette W. McKinley
Teaching and Teacher Education, 2007, 23 , 239-250.
doi:10.1016/j.tate.2006.12.021 or direct pdf link

Since the 1950s, there has been rapid and extensive change in the way assessment is conducted in medical education. Several new methods of assessment have been developed and implemented over this time and they have focused on clinical skills (taking a history from a patient and performing a physical examination), communication skills, procedural skills, and professionalism. In this paper, we provide examples of performance-based assessments in medical education, detailing the benefits and challenges associated with different approaches. While advances in psychometric theory and technology have been paralleled by the development of assessment instruments that improve the evaluation of these skills, additional research is needed, particularly if the assessment is used to make high stake decisions (e.g., promotion and licensure).

Validity: on the meaningful interpretation of assessment data

Steven M Downing
Medical Education, 2003, 37 , 830-837.
doi: 10.1046/j.1365-2923.2003.01594.x or direct pdf link

Context All assessments in medical education require evidence of validity to be interpreted meaningfully. In contemporary usage, all validity is construct validity, which requires multiple sources of evidence; construct validity is the whole of validity, but has multiple facets. Five sources – content, response process, internal structure, relationship to other variables and consequences – are noted by the tandards for Educational and Psychological Testing as fruitful areas to seek validity evidence. Purpose The purpose of this article is to discuss construct validity in the context of medical education and to summarize, through example, some typical sources of validity evidence for a written and a performance examination. Summary Assessments are not valid or invalid; rather, the scores or outcomes of assessments have more or less evidence to support (or refute) a specific interpretation (such as passing or failing a course). Validity is approached as hypothesis and uses theory, logic and the scientific method to collect and assemble data to support or fail to support the proposed score interpretations, at a given point in time. Data and logic are assembled into arguments – pro and con – for some specific interpretation of assessment data. Examples oftypes of validity evidence, data and information fromeach source are discussed in the context of a high-stakes written and performance examination in medical education. Conclusion All assessments require evidence of the reasonableness of the proposed interpretation, as test data in education have little or no intrinsic meaning. The constructs purported to be measured by our assessments are important to students, faculty, administrators, patients and society and require solid scientific evidence of their meaning.

Different written assessment methods: what can be said about their strengths and weaknesses?

Lambert W T Schuwirth & Cees P M van der Vleuten
Medical Education, 2004, 38 , 974–979.
doi: 10.1111/j.1365-2929.2004.01916.x

Written assessment techniques can be subdivided according to their stimulus format – what the questions asks – and their response format – how the answer is recorded. The former is more important in determining the type of competence being asked for than the latter. It is nevertheless important to consider both when selecting the most appropriate types. Some major elements to consider when making such a selection are cueing effect, reliability, validity, educational impact and resource-intensiveness. Open-ended questions should be used solely to test aspects that cannot be tested with multiple-choice questions. In all other cases the loss of eliability and the higher resource-intensiveness represent a significant downside. In such cases, multiple-choice questions are not less valid than open-ended questions.When making this distinction, it is important to consider whether the question is embedded within a relevant case or context and cannot be answered without the case, or not. This appears to be more or less essential according to what is being tested by the question. Context-rich questions test other cognitive skills than do context-free questions. If knowledge alone is the purpose of the test, context-free questions may be useful, but if it is the application of knowledge or knowledge as a part of problem solving that is being tested, then context is indispensable. Every format has its (dis)advantages and a combination of formats based on rational selection is more useful than trying to find or develop a panacea. The response format is less important in this respect than the stimulus.

Constructing written test questions for the basic and clinical sciences

Susan Case and David Swanson, National Board of Medical Examiners
Printed copies of the manual are not available from the NBME. It is available in PDF as a resource for faculty members and others interested in how to write better quality test questions. Permission to copy and distribute this document is granted by the National Board of Medical Examiners provided that (1) the copyright and permission notices appear on all reproductions, (2) use of the document is for non-commercial educational and scientific purposes only, and (3) the document is not modified in any way.

Preface to the 3rd edition (1998-2002):
This manual was written to help faculty members improve the quality of the multiple-choice questions written for their examinations. The manual provides an overview of item formats, concentrating on the traditional one-best-answer and matching formats. It reviews issues related to technical item flaws and issues related to item content. The manual also provides basic information to help faculty review statistical indices of item quality after test administration. An overview of standard-setting techniques is also provided. Issues related to exam blueprinting are not addressed in any detail. We have focused almost exclusively on the item level, leaving exam level planning for another manuscript.
We anticipate that this manual will be useful primarily by faculty who are teaching medical students in basic science courses and clinical clerkships. The examples focus on undergraduate medical education, though the general approach to item writing may be useful for assessing examinees at other levels. This manual reflects lessons that we have learned in eveloping items and tests over the past 20 years. During this period, we have reviewed (quite literally) tens of thousands of multiple-choice questions and have conducted item-writing workshops for thousands of item writers preparing USMLE, NBME, and specialty board examinations as well as faculty at more than 60 medical schools developing test questions for their own examinations. Each workshop attendee has helped us to frame our thoughts regarding how to write better quality test questions, and, over the years, we have become better able (we believe) to articulate the why’s and wherefore’s. We hope this manual helps to communicate these thoughts.
Susan M. Case, PhD, David B. Swanson, PhD, January 1998

Changing education, changing assessment, changing research?

Lambert W T Schuwirth & Cees P M van der Vleuten
Medical Education, 2004, 38 , 805-812.

In medical education, assessment of medical competence and performance, important changes have taken place in the last 5 decades. These changes have affected the basic concepts in all 3 domains. In education constructivism has provided a completely new view on how students learn best. In ssessment the change from trait-orientated to competency- or role-orientated thinking has given rise to a whole range of new approaches. Certain methods of education, such as problem-based learning (PBL), and assessment, however, are often seen as almost synonymous with the underlying concepts, and one tends to forget that it is the concept that is important and that a particular method is but 1 way of using a concept. When doing this, one runs the risk of confusing means and ends, which may hamper or slow down new developments. A similar problem seems to occur often in research of medical education. Here too, methods – or, rather, methodologies – are confused with research questions. This may lead to an overemphasis on research that fits well known methodologies (e.g. the randomised controlled trial) and neglect of what are sometimes even more important research questions because they do not fit well known ethodologies. In this paper we advocate a return to the underlying concepts and a careful reflection of their use in various situations.