Distributed Systems

Course Information

Language: English
Type: Lecture
Module: IN2259
ECTS Credits: 5
Prerequisites: Algorithms and data structures (searching, sorting, hash tables, lists, trees, graphs, basic notions of complexity (O-notation)), basic programing skills and tools (Java, C/C++, concurrency, multithreading, synchronization, code versioning and concurrent development (e.g., CVS or SVN)), and operating systems concepts, computer network basics, and database systems basics.
TUM Online: You must register for this course in TUM Online before the course starts
Media:
- Powerpoint slides
- Blackboard
- Assignments
- E-learning platform (Moodle)
Reading List:
- Distributed Systems: Principles and Paradigms. Andrew S. Tanenbaum & Maarten van Steen. Prentice Hall (a recent edition).
- Distributed Systems: Concepts and Design. George Coulouris, Jean Dollimore, Tim Kindberg. Addison Wesley (a recent edition).
- Selected reading material on systems, algorithms and protocols as announced in class.
Time and Location:
- Q&A session every week on Wednesday starting from 16:00 on zoom

Content

The module covers the principles and basic building blocks of large-scale distributed systems. This includes algorithms and protocols for the core challenges in building distributed systems. The module starts with a review of basic networking mechanisms and an introduction of protocols for synchronous and asynchronous process communication in networked environments. In this context, different communication architectures and paradigms are analyzed. Next, various aspects of time in distributed systems are discussed and concepts, as well as algorithms, for logical time (Lamport timestamps and Vector timestamps), and synchronization of physical clocks are presented. These concepts are important to introduce advanced communication protocols such as totally-ordered multicast or other process coordination mechanisms. The module discusses the challenges in coordination and presents algorithms for mutual exclusion, leader election, and consensus. A major part of the module focuses on the PAXOS algorithm and its variations for distributed consensus. Another important part of the module covers fault tolerance and service availability in distributed systems. Therefore, several consistency models are analyzed, and techniques and algorithms for replication, such as active replication, primary backup replication, and gossip-based replication, are presented. Besides these rather theoretical and generic challenges, the module also covers more practical aspects such as web caching with consistent hashing, distributed filesystems, and large-scale data processing with Map Reduce. Towards the end of the module, peer-to-peer systems are investigated in detail. First, structured peer-to-peer approaches like distributed hash tables (DHTs) are introduced (e.g., CHORD, CAN, PASTRY); then peer-to-peer applications are presented and discussed. All contents are supported by examples and case studies.

Objectives

At the end of the module, students know the properties of common building blocks applicable to the design of distributed systems (e.g., for communication, coordination, fault-tolerance, and replication). They are able to understand the complexities involved in developing a distributed system and can analyze common failure sources. Students understand the basics of distributed filesystems, can evaluate peer-to-peer systems, and apply MapReduce-style processing on large data collections. In general, students are able to evaluate common design decisions for building a distributed system and apply these principle characteristics to develop a distributed application. This involves the application and development of communication systems, coordination algorithms, and replication mechanisms.

Teaching and Learning Methods

The module is organized as a combination of lectures with PowerPoint slides and blackboard notes, invited talks, and small group tutorials to discuss homework assignments. The lecture motivates challenges in distributed system design by examples and presents theory, algorithms, and protocols to solve these challenges. The theoretical concepts and algorithms are applied and practiced individually by students through homework assignments. Assignments are discussed in a weekly tutorial to strengthen the learning outcome.