Cloud Systems Engineering

Course Information

  • Language: English
  • Type: Practical Course (Lab)
  • Module: IN0012, IN2106, IN2128
  • SWS: 6
  • ECTS Credits: 10
  • Prerequisites: Knowledge equivalent to the lectures:
    • IN0009 : Basic Principles: Operating Systems and System Software
    • IN0010 : Introduction to Computer Networking and Distributed Systems
    • IN2259: Distributed Systems OR IN2258: Middleware and Distributed Systems
    • Praktikum: Systems Programming in C++ OR Systems programming (C/C++/Rust)
  • TUM Online: We will enroll you before the lab starts. Do not forget to register in the matching system to obtain a seat in the lab!
  • Course Material: For each programming assignment task, we will cover the necessary background at the beginning of each module as a short lecture, where we will present the necessary details with comprehensive references.
  • Time and Location:
    • 3-5 lectures on youtube, each before the next stage in the project


Cloud engineering involves building scalable and fault-tolerant cloud systems in a  cost-effective manner.  In this lab, we will investigate how to build cloud systems from the ground up, starting from a single-node deployment up to a fully replicated and distributed transactional system.
As part of the lab, we will cover a range of topics through a set of lectures with the necessary background and associated programming assignments over the semester. Note that the programming assignments will build the complete system stack in an incremental fashion, where each assignment will build on the previous stage.

More specifically, we will cover the following topics:

  • Stage #0: Hello world: Container, cluster orchestrator, cloud storage: How to build and deploy applications using containers in the cloud? How to employ cluster orchestrators, for job deployment in the cloud?
  • Stage #1: How to implement and evaluate a single-node cloud storage system, in particular, a single-node key-value store (KVS) with transactional support?
  • Stage #2: Distributed storage system: The second stage builds on the single-node storage system from Stage #1, and expands its scope to distributed settings, where we will distribute the key-space of the KVS across a number of nodes.
  • Stage #3: Replication protocol: We will build on the distributed KVS from Stage #2 to support fault tolerance. In particular, we will employ a replication protocol, e.g., RAFT, for fault-tolerant cloud computing.

Previous knowledge expected

As such we don’t have any compulsory pre-requisites, but we prefer students to be proficient in the basic concepts of operating systems, distributed systems, and systems programming (C/C++/Rust), or equivalent background.


  • Introduction to the cloud computing system stack.
  • Practical knowledge of building distributed systems.
  • Building and deploying state-of-the-art systems at scale.
  • Skills for performance analysis, understanding of the system design, and workflow in cloud environments.

Teaching and Learning Methods

This course consists of a set of programming modules related to different aspects of building distributed systems in cloud environments.  For each of these modules, we will first present the necessary background via a lecture. Thereafter, there will be a dedicated assignment that will help the students dig deeper into these concepts and get familiar with them with actual, useful, hands-on tasks. In addition, we will have a dedicated slack channel for the students to ask questions and clarify aspects of the programming tasks. The students will be required to perform these tasks within a time frame (3-4 weeks depending on the difficulty level and the workload of each assignment) and submit their work in the system. The submitted workpieces will then be evaluated using an automated grading system and instructors, and based on that, a grade will be calculated for each assignment.

For each programming assignment task, we will cover the necessary background at the beginning of each module as a short lecture, where we will present the necessary details with comprehensive references.

Online information

  • None

Preliminary meeting slides


Jörg Thalheim


Simon Ellmann


Masanori Misono


Evgeny Volynsky