[WS24/25] Scalable Data Management

Institut für Systemarchitektur | Wintersemester 2024 / 2025 [WS24/25] Scalable Data Management

News

  • [2024-12-01] There will be no exercise 2.12-5.12.
  • [2024-11-17] There will be no exercise 18.-22.11
  • [2024-11-05] Room for tuesday lecture has changed to APB-E067!
  • [2024-10-21] Sorry for the delay, the videos are coming tomorrow. 

Lecturer:

Prof. Dr.-Ing. habil. Dirk Habich

 

Time and Place:

Lecture: self-study using pre-recorded videos (provided on Mondays)

Exercise #1: Monday, 09:20 - 10:50, Room APB/E007 (only in-person)

Exercise #2: Tuesday, 13:00 -14:30, Room APB-E067 (only in-person)

- Exercises start on 28.10.2024

Description

"Data is the new Oil" - with this sentence, the relevance of structured data and thus, implicitly of course the relevance of scalable database systems as a fundamental technique of analytical and transactional processing of usually large data sets becomes visible. In the context of this course, we will discuss concepts and methods that enable distributed data processing with respect to two essential properties: on the one hand, the aspect of "performance" will be addressed and thus, questions of scalability in the case of scale-out architectures will be discussed using systems such as Apache Spark. On the other hand, the aspect of "consistency" will be discussed, where different methods for synchronizing concurrent read and write activities on the same dataset will be presented.

In general, the goal of this course is to give an insight into scalable techniques and methods of database technology. The course requires a basic knowledge of databases. Attendance of another advanced courses is not necessary, but helpful in some topics. The course exercises consist of tasks that are integrated into the lecture and practical exercises in dealing with "real" systems.

Content

  • Basic Database Architectures
    • Relational Foundation
    • SQL
    • Traditional Architecture for Transactional Worklaods (OLTP)
    • Modern Architecture for Analytical Workloads (OLAP)
    • Modern Architecture for Hybrid Workloads (HTAP)
  • Parallel and Distributed Databases
    • Different Kinds of Parallelism
    • Parallel Operators
    • Data Partitioning
    • Data Replication
    • Distributed Transactions
  • Cloud Databases
    • OLTP in the cloud
    • OLAP in the cloud 
  • Map Reduce and Hadoop
  • Spark and its Ecosystem

Information

  • Please register in OPAL, because all materials and all communication is restricted to registered course members only. 
Access to this course has been restricted. Please login. Login
Information about access
You do not have enough rights to start this resource.