Unit information: Reinforcement Learning in 2026/27

Please note: Programme and unit information may change as the relevant academic field develops. We may also make changes to the structure of programmes and assessments to improve the student experience.

Unit name	Reinforcement Learning
Unit code	SEMT20007
Credit points	20
Level of study	I/5
Teaching block(s)	Teaching Block 2 (weeks 13 - 24)
Unit director	Professor. Richards
Open unit status	Not open
Units you must take before you take this one (pre-requisite units)	None
Units you must take alongside this one (co-requisite units)	None
Units you may not take alongside this one	None
School/department	School of Engineering Mathematics and Technology
Faculty	Faculty of Engineering

Unit Information

Why is this unit important?

Reinforcement learning combines machine learning and optimal control to allow intelligent agents to learn how to make real-time decisions in a dynamic environment. Learning is based on feedback from past action choices in the form of numerical values drawn from a pre-defined reward function. In this way an agent will develop an effective policy for optimising reward by balancing exploration (where the agent explores their environment and experiments with different action choices) and exploitation (where the agent makes best use of their current knowledge). Reinforcement learning is widely applied across application domains both for embodied and software agents and provides a mechanism for tackling complex decision and control problems for which more traditional methods are impractical.

How does this unit fit into your programme of study

Reinforcement is an additional form of machine learning alongside supervised learning and unsupervised learning. It is particularly suited to learning problems in which not only does the environment change but where those changes can be partly driven by the decisions of the agent itself. An understanding of reinforcement learning therefore allows students to tackle a range of problems in autonomous systems that have these properties.

Your learning on this unit

An overview of content

Topics covered in this unit will include:

The language of reinforcement learning: Markov Decision Processes, actions, states, rewards, policies, exploration, exploitation, model-based, model-free

Established methods and algorithms such as Q-learning

Advanced and emerging methods such as deep reinforcement learning and multi-agent reinforcement learning

Engineering approaches and challenges such as reward shaping, curriculum learning and verification

How will students, personally, be different as a result of the unit

Throughout this unit there is a focus on students developing their skills in applying machine learning to agent-based decision-making in dynamic environments. Learning will be driven by experimentation using open-source reinforcement learning environments so that they will be better equipped to tackle real-world decision and control problems. Furthermore, the language of Markov Decision Process will give then a general formal framework in which to express such problems and potential solutions.

Learning outcomes

On successful completion of this unit, students will be able to:

Formulate decision and control problems in language of reinforcement learning
Identify and motivate appropriate reward functions for a given problem.
Implement established reinforcement learning using appropriate software tools and environments.
Analyse and evaluate the performance of a reinforcement learning agent using suitable performance metrics.
Critically appraise emerging methods and ideas in reinforcement learning.

How you will learn

Teaching will be delivered through a combination of synchronous and asynchronous sessions, including lectures and computer laboratory sessions.

How you will be assessed

Tasks which help you learn and prepare you for summative tasks (formative):

Reading assignments, signposts to online examples and tutorials, and computer-based exercises to develop the skills needed for the summative assignments, e.g. the software tools.

Tasks which count towards your unit mark (summative):

The unit will be assessed by two coursework assignments:

An independent written report describing the solution to a specified reinforcement learning problem using established methods, assessing ILOs 1, 3, 4. (50%)
A report on an independent computer-based case study, evaluating an advanced or emerging reinforcement learning approach, in the context of a problem of the student’s choosing, assessing ILOs 1, 2, 4, 5. (50%)

When assessment does not go to plan:

Re-assessment takes the same form as the original summative assessment. If you pass one of the summative assessments, then your mark for this can be carried forward towards your final mark and you will only have to be reassessed on the assessment that you did not pass.

Resources

If this unit has a Resource List, you will normally find a link to it in the Blackboard area for the unit. Sometimes there will be a separate link for each weekly topic.

If you are unable to access a list through Blackboard, you can also find it via the Resource Lists homepage. Search for the list by the unit name or code (e.g. SEMT20007).

How much time the unit requires
Each credit equates to 10 hours of total student input. For example a 20 credit unit will take you 200 hours of study to complete. Your total learning time is made up of contact time, directed learning tasks, independent learning and assessment activity.

See the University Workload statement relating to this unit for more information.

Assessment
The assessment methods listed in this unit specification are designed to enable students to demonstrate the named learning outcomes (LOs). Where a disability prevents a student from undertaking a specific method of assessment, schools will make reasonable adjustments to support a student to demonstrate the LO by an alternative method or with additional resources.

The Board of Examiners will consider all cases where students have failed or not completed the assessments required for credit. The Board considers each student's outcomes across all the units which contribute to each year's programme of study. For appropriate assessments, if you have self-certificated your absence, you will normally be required to complete it the next time it runs (for assessments at the end of TB1 and TB2 this is usually in the next re-assessment period).
The Board of Examiners will take into account any exceptional circumstances and operates within the Regulations and Code of Practice for Taught Programmes.

Unit and programme catalogues

Academic year

Unit information: Reinforcement Learning in 2026/27

Unit Information

Your learning on this unit

How you will learn

How you will be assessed

Resources

Related links

Information for

Study at Bristol

Research

About the University

Support the University

Jobs

A–Z of the University