Performance assessments

Generative artificial intelligence is increasingly influencing teaching and learning. It presents all those involved with new challenges in assessing performance and designing examination formats that integrate AI. This page offers practical solutions for dealing with these changes

Assessment in a world with AI
AI integrating tasks
Evaluation concepts
Supervised assessments
AI tools in the assessment context

Assessment in a world with AI

AI tools are already capable of performing complex tasks. In addition to specialised knowledge and skills, the focus is therefore shifting to the ability to use AI sensibly, question it critically and think creatively beyond it.

As a result, existing performance assessments need to be reconsidered and adapted to suit the new conditions. This will enable these crucial new skills to be specifically promoted and assessed at the same time.

It is essential for all forms of examination that students are informed transparently about whether and in what way generative artificial intelligence may be used for performance assessments and how the use of GenAI must be declared.

The use of generative artificial intelligence (GenAI) cannot be verified with legal certainty. Neither software nor manual testing procedures can determine beyond doubt whether a work has been created or optimised entirely or partially with the help of GenAI.

AI-integrating tasks

When designing and working on tasks that involve artificial intelligence, the focus is not on whether and which content was generated by AI tools. Instead, the emphasis is on problem solving while integrating AI tools and the resulting gain in competence. The decisive factor is how AI is used sensibly to achieve learning goals and develop new skills.

This integration brings about a significant change in performance assessment: the primary focus is no longer on evaluating the end product, but on intensively monitoring and assessing the entire learning process. This reorientation shapes a changed examination culture, the facets of which are reflected in the following approaches, among others:

Authentic assessments reflect real-life tasks and problems that students will encounter in their future careers. The aim is not only to test knowledge, but also to evaluate the application of skills, critical thinking and problem-solving abilities in a practical context. This approach promotes deep learning and provides optimal preparation for the realities of professional life.

Why GenAI requires this

Generative AI can easily answer knowledge questions. Authentic exams are therefore necessary to assess students' actual skills �C such as analysing information, reflecting critically and applying it creatively. These skills are becoming increasingly important in a world shaped by digital transformation.

Specific implementation options

Some ways to create authentic exam settings are:

Case study analysis: Students work on complex, realistic case studies and present their solutions and the underlying considerations.
Project-based exams: Assessment based on a semester project that includes planning, implementation and presentation (�� project-based teaching at ETH).
Simulated counselling sessions: Role-playing games in which students take on an advisory role and apply their specialist knowledge in a practical manner.
Creation of artefacts: The examination consists of developing a real product, plan or report.
Reflection e-portfolio: Students document and reflect on their learning process and the application of skills over a longer period of time.

Metacognitive strategies are techniques that students use to monitor and control their own thinking and learning processes. Such exercises help them become more aware of their strengths and weaknesses, as well as the learning processes that work best for them. By actively planning, monitoring and evaluating their learning, students can specifically improve their understanding and problem-solving skills.

Typical elements of metacognitive strategies:

Planning: Setting goals before starting a task.
Monitoring: Self-checking understanding during the work process.
Evaluation: Reflecting on the result and what could be done differently in the future.

Why GenAI requires this

The integration of metacognitive exercises into examinations is necessary because the focus is shifting from simply finding the correct answer to demonstrating critical skills, self-regulation and evaluation. As students are able to generate information using AI, assessments must evolve to measure their ability to question, verify and meaningfully adapt this content.

Specific implementation options

The following examples show how metacognitive strategies can be integrated into examination and assessment situations:

Self-assessment and peer assessment: Students assess their own work and that of a fellow student based on clearly defined criteria.
Reflection portfolio: Students create a portfolio of their work and write a final reflection on their learning process, the challenges they faced and their personal development.
��Show your thinking�� tasks: The exam tasks require not only a solution, but also step-by-step documentation of the thought process, including the reasoning behind the chosen strategies.
Process-oriented tasks: The assessment focuses primarily on the solution process. For example, in a case study, students describe the phases of problem solving (analysis, planning, implementation) in detail.
Reflection after correction: Students analyse their mistakes in a corrected paper, explain their causes and develop a plan for how to avoid them in the future.

A modern error culture in evaluation does not view errors as failures, but rather as an indispensable part of the learning process. From a didactic perspective, a constructive error culture forms a central foundation, as it promotes learning processes and supports the development of skills. With the advent of GenAI, this attitude is becoming increasingly important: instead of evaluating only the end result, the focus is now shifting to the process, decision-making and critical interaction with technology. This creates an assessment culture that specifically promotes skills such as critical thinking, problem solving and adaptability in dealing with digital content.

Why GenAI requires this

As GenAI accelerates the creation process, decision-making, the way AI is prompted, and how the results are handled become more important than the final product alone. An assessment that focuses exclusively on the end result misses key areas of learning and competence associated with the use of GenAI.

Specific implementation options

Errors as a basis for discussion: Errors should not be punished, but used as a starting point for learning processes in feedback discussions.
��Fail-forward�� mentality: Establish a culture in which controlled failure is understood as a path to innovation and improvement. See also Productive Failure as learning design.
Process-oriented rather than results-oriented: Evaluation focuses not only on the end result, but also on the creative and iterative solution process.

The integration of AI into tasks and performance assessments is changing the external page teaching and examination culture. The learning process and learning itself are becoming more central. This makes the integration of interdisciplinary skills into teaching essential, for which the ETH Competence Framework offers valuable guidance.

AI tools in the assessment context

In the context of digital tools based on generative AI, three main categories can be distinguished. The landscape of these tools is developing rapidly and offers a wide range of possibilities, from established solutions to tailor-made developments.

AI tools & licences lists the standard tools that have proven to be useful aids in the assessment context.
In addition, there is a steadily growing number of tools that are based on artificial intelligence or integrate it into specific application contexts. See, for example, the external page Collection AI Tools, Teaching Tools, UZH.
Finally, a wide variety of in-house developments are emerging that explore new applications for AI in the field of examinations. A good overview of this can be found in the section Projects in the field of AI in education.

The use of GenAI can lecturers provide support in two key areas in particular:

The use of generative AI in the creation of performance assessments offers a wide range of potential that goes far beyond the mere generation of content. GenAI can efficiently support lecturers in various phases of planning examination and assessment scenarios while opening up new learning opportunities for students.

Specific application ideas include:

Brainstorming and optimising exam questions: From simple tasks to multiple-choice questions, cloze texts and true/false statements to more complex formats such as case studies or problem descriptions. GenAI can help brainstorm ideas for a variety of exercise and exam formats, critically question them and develop them further in a targeted manner.
Support with assessment planning: GenAI can make a significant contribution to developing detailed assessment rubrics, assessment criteria and sample solutions. This promotes consistency and transparency in assessment and improves the quality of feedback to students.

A key objective is to enable students to actively use AI tools for their own learning. GenAI can provide valuable support in this regard �C for example, in self-assessment, generating their own quiz questions, preparing for oral examinations, or clarifying learning content. In this way, students develop important skills for autonomous and effective learning.

GenAI-based tools open up a wide range of possibilities for providing feedback and supporting corrections. Even standard AI tools can be an effective aid in this regard. In addition, specific projects are being pursued that use GenAI to perform analyses and evaluations.

The following aspects are key when using these tools:

Transparent communication: The use of GenAI for assessments and feedback must be communicated clearly and openly to students, as outlined in the Guidelines for the Use of Generative AI in Teaching. Openness builds trust and enables students to better understand the role of AI in the learning process.
Human control: The ��human-in-the-loop�� principle must be applied consistently: the final assessment must be carried out under human supervision to avoid bias or misjudgement. This ensures the pedagogical and ethical responsibility of the lecturers.
Empowerment for self-correction and learning: Students should be actively empowered to use AI tools independently for self-correction and learning. The tools can not only provide feedback, but also point out specific areas for improvement, whether grammatical errors, logical inconsistencies or gaps in knowledge.

Evaluation concepts

The shift from exclusively assessing products to more intensive monitoring and evaluation of learning processes also requires new assessment concepts. It is no longer just a matter of assessing the final result of a performance, but rather of accompanying, supporting and evaluating the entire learning path of the students. This paradigm shift necessitates innovative approaches to performance assessment that take greater account of both the development of skills and the individual learning journey.

Three alternative evaluation concepts* that can support this paradigm shift are presented below:

With this approach, grading is based primarily on the amount of work or ��labour�� put in by students, rather than solely on the ��quality�� of the final product. The aim is for students to complete the agreed tasks and activities to a specified extent. This encourages active participation, commitment and risk-taking in the learning process, as it reduces the fear of receiving a poor grade due to perceived ��poor quality��.

Practical implementation:

At the beginning of the semester, a ��contract�� is established that clearly defines the required scope of work for different grade levels (e.g., number of drafts, revisions, active participation).
Feedback focuses on the learning process, engagement and skill development rather than evaluating the final products according to traditional quality criteria.

Specifications grading focuses on mastering specific learning objectives or ��specifications��. Instead of awarding points for individual tasks, these are assessed on a pass/fail basis, depending on whether they meet all the specified criteria (��specifications��). Students can often make several attempts to achieve the specifications. The final grade is based on the number of successfully completed ��bundles�� of specifications. This creates transparency, allows for individual learning pace and promotes understanding of the content.

Practical implementation:

Detailed ��specifications�� (criteria catalogue) are provided for each task, which precisely describe what is required to pass without evaluating partial performance.
Students receive ��tokens�� or resubmission opportunities to revise and resubmit assignments if they do not meet the specifications until the requirements are met.

This evaluation approach measures students' progress against clearly defined learning standards or competency goals. Instead of a single overall grade, students' performance is assessed for each individual standard. This provides more detailed feedback on what students have already mastered and where they still need support. The focus is on continuous improvement and gradual mastery of the standards over time, rather than on the average of marks from different tasks.

Practical implementation:

The course content is divided into specific, measurable learning standards or competency goals that are communicated transparently to students and form the basis for all assessments.
Regular formative assessments document students' progress in mastering these standards, often using a scale that indicates the degree of mastery (e.g., ��Basic,�� Advanced,�� Comprehensive��).

Supervised assessments

Supervised assessments are characterised by the fact that they take place under direct supervision. This ensures that the assessment is conducted in accordance with the rules and that no unauthorised aids are used.

Digital examinations have been established at ETH Zurich for some time now and take place in a controlled examination setting. Proven systems are used to ensure both security and functional diversity during the examination.

Digital examinations at ETH Zurich are based on established software solutions that are used on both ETH devices and personal devices (BYOD). The external page Safe Exam Browser (SEB) secures the examination computers by blocking access to unwanted resources and simultaneously establishing a connection to the Moodle test module. Moodle serves as the ETH's central examination platform and is specially optimised for this purpose. It supports the entire examination process from question creation to correction and is capable of handling different question types and integrations for complex tasks.

Further detailed information on digital examinations at ETH Zurich and the systems used can be found on the EduIT website under digital examinations.

Oral examinations enable direct interaction between examiners and examinees. Personal attendance allows answers to be questioned in a targeted manner and evaluated according to the situation, while ensuring that the examination rules are observed.

This form of examination can also be used effectively in conjunction with other formats to enable a precise assessment of the technical skills actually acquired, as understanding and transfer skills become immediately apparent. In addition, oral examinations also allow interdisciplinary skills such as problem-solving and communication skills to be assessed.

GenAI can also be used effectively in supervised assessments. The controlled setting makes it possible to specifically integrate students' use of GenAI while promoting their skills and testing how GenAI can be meaningfully integrated into work processes.

* Further reading on alternative grading concepts:

Clark, D., & Talbert, R. (2023). Grading for Growth: A Guide to Alternative Grading Practices that Promote Authentic Learning and Student Engagement in Higher Education. Routledge.
Tomlin, A. D., & Nowik, C. M. (Eds.). (2024). Effective alternative assessment practices in higher education: Research, theory, and practice within academic affairs. Information Age Publishing.

Performance assessments

Contents

Assessment in a world with AI

AI-integrating tasks

Why GenAI requires this

Specific implementation options

Why GenAI requires this

Specific implementation options

Why GenAI requires this

Specific implementation options

AI tools in the assessment context

Evaluation concepts

Supervised assessments