Unit information: Data Engineering at Scale in 2026/27

Please note: Programme and unit information may change as the relevant academic field develops. We may also make changes to the structure of programmes and assessments to improve the student experience.

Unit name Data Engineering at Scale
Unit code SEMT10004
Credit points 20
Level of study C/4
Teaching block(s) Teaching Block 1 (weeks 1 - 12)
Unit director Dr. Liu
Open unit status Not open
Units you must take before you take this one (pre-requisite units)

None

Units you must take alongside this one (co-requisite units)

None

Units you may not take alongside this one

None

School/department School of Engineering Mathematics and Technology
Faculty Faculty of Engineering

Unit Information

Why is this unit important?

This provides a comprehensive overview of elastically scalable and remotely-accessed "cloud" computing services such as those offered by Amazon, Google, and Microsoft, and associated technologies for dealing with very-large-scale bodies of data.

The unit includes an overview of the economics that have driven the rapid development and adoption of cloud computing in a variety of industries; it then explores the provisioning of cloud services moving from infrastructure-as-a-service (IaaS), through platform-as-a-service (PaaS), software-as-a-service (SaaS), and "serverless" functions-as-a-service (FaaS). The open-source Hadoop "ecosystem" cloud service projects is introduced, and various cloud data-storage and data-processing technologies are surveyed, with evaluation of their strengths and weaknesses. The unit closes with an overview of best practices in the use and management of Big Data.

How does this unit fit into your programme of study

Modern AI and machine learning depend on the handling and processing of very large data sets. It is therefore important that AI engineers have familiarity with the tools and services that make this viable. Central amongst these are the various cloud computing services that are now available and the data-processing and compression technologies that they utilise. The material presented in this unit will then enable students to apply machine learning tools to large-scale data applications.

Your learning on this unit

An overview of content

The topics covered in this unit will include:

  • The economics of cloud computing across several industries.
  • The provision of cloud services by infrastructure-as-a-service (IaaS)
  • The provision of cloud services by platform-as-a-service (PaaS)
  • The provision of cloud services by software-as-a-service (SaaS)
  • The provision of cloud services by "serverless" functions-as-a-service (FaaS)
  • Open-source Hadoop
  • An evaluation of their strengths and weaknesses of various cloud data-storage and data-processing technologies

How will students, personally, be different as a result of the unit

Throughout this unit there is a focus on students developing practical skill in the use of cloud computing to manage and process large data sets. They will explore these ideas through real-world examples and by experimenting with different approaches. This will give them the confidence to deploy cloud-computing in machine learning applications.

Learning outcomes

On successful completion of the unit, students will be able to:

  1. Explain the economic factors and economies of scale that have driven the development of cloud computing;
  2. Compare and appropriately select among the various cloud computing services offered by major providers such as Amazon, Google and Microsoft, and have direct experience of initiating, running and managing, and closing remotely accessed computational resources via X-as-a-Service access models;
  3. Demonstrate competence as a practitioner of cloud computing architecture with reference to fundamental concepts such as availability, reliability, scalability, elasticity, security, cost effectiveness and automation;
  4. Demonstrate the combination and use of cloud computing technologies such as in-memory compute and stream-processing in high-performance and high-throughput applications;
  5. Apply effective methods to store, manage, process and secure data at very large scale (‘Big Data’).

How you will learn

Teaching will be delivered through a combination of synchronous and asynchronous sessions, including lectures, practical activities and self-directed exercises. Computer Laboratories will provide students with the opportunity to develop their skills with easy access to direct feedback and support provided by teaching assistants.

How you will be assessed

Tasks which help you learn and prepare you for summative tasks (formative):

Workshop discussions; online quizzes and multiple choice tests

Tasks which count towards your unit mark (summative):

This unit will be assessed by a single coursework: Design, implement and optimise an effective cloud architecture for an existing data processing application. (ILO 1-5; 100%)

When assessment does not go to plan:

Re-assessment takes the same form as the original summative assessment.

Resources

If this unit has a Resource List, you will normally find a link to it in the Blackboard area for the unit. Sometimes there will be a separate link for each weekly topic.

If you are unable to access a list through Blackboard, you can also find it via the Resource Lists homepage. Search for the list by the unit name or code (e.g. SEMT10004).

How much time the unit requires
Each credit equates to 10 hours of total student input. For example a 20 credit unit will take you 200 hours of study to complete. Your total learning time is made up of contact time, directed learning tasks, independent learning and assessment activity.

See the University Workload statement relating to this unit for more information.

Assessment
The assessment methods listed in this unit specification are designed to enable students to demonstrate the named learning outcomes (LOs). Where a disability prevents a student from undertaking a specific method of assessment, schools will make reasonable adjustments to support a student to demonstrate the LO by an alternative method or with additional resources.

The Board of Examiners will consider all cases where students have failed or not completed the assessments required for credit. The Board considers each student's outcomes across all the units which contribute to each year's programme of study. For appropriate assessments, if you have self-certificated your absence, you will normally be required to complete it the next time it runs (for assessments at the end of TB1 and TB2 this is usually in the next re-assessment period).
The Board of Examiners will take into account any exceptional circumstances and operates within the Regulations and Code of Practice for Taught Programmes.