Course for international guest/part time students
- Faculty
- Faculty of Science
- Organization
- TTK Department of the Physics of Complex Systems
- Code
- dsmodelsf20vm
- Title
- Data Models and Databases in Science
- Usual semester
- Autumn
- Published semester
- 2026/27/1
- ECTS
- 6
- Language
- en
- Learning outcomes
- The purpose of the course is to introduce students to fundamental of data handling, data models and indexing methods in data intensive research. During computer labs strongly tied to the theoretical topics, students solve problems using real data from various fields of science. Taking the course does not require prior knowledge on advanced informatics but demands basic experience on programming and data visualization which can be acquired by taking the class “Data exploration and visualization”. a) Knowledge: He/she has an overview of the importance of data related to scientific problems of physics. He/she is aware of the current possibilities, development directions and limits of modern methods of databases. b) Abilities: Able to recognize the physical principles of natural phenomena, analyse related data and interpret the results and compare to theoretical expectations. c) Attitude: He/she is constantly striving to expand his/her knowledge and acquire new skills. d) Autonomy and responsibility: He/she is aware of the importance of scientific thinking and accurate conception, and he/she formulates his/her opinion taking these into account.
- Course content
- 1. Science and data, exponential growth, the fourth paradigm 2. Memory, CPU, I/O, sequential and associative data access, parallelization, Amdahl’s law 3. Accessing data, networks and protocols, data formats, data warehouses 4. Data files: text binary files, hierarchical data (JSON, XML), images and arrays (FITS, HDF5) 5. Data compression, dimensionality reduction, noise filtering 6. Relational databases: relational data model, SQL language, imperative and declarative programming, queries, indexes and statistics 7. Implementation of the relational data model: B-trees, logical and physical operators, relational database management systems, column stores and in-memory databases, query optimization 8. The object oriented data model, OO—relational mapping 9. Hierarchical data: JSON and XML databases, directories, XPath, LDAP, managing hierarchical data with relational databases 10. Networks and graphs, graph traversal, statistics of graphs, graph queries, triple-stores, RDF, managing graphs with relational databases 11. Images and data cubes, array databases, slicing & dicing, indexing array databases, querying array databases 12. Multidimensional point clouds and spatial databases, indexing the Euclidean space and the sphere, geographic and astronomical databases 13. Handling textual data, grammatical analysis, dictionaries and indexes, full text search 14. Distributed databases, the NoSQL and Map/Reduce paradigms, streaming data 15. Meta-data, data provenance, ontologies and dynamic data models
- Assessment method
- The grade will be based on submissions of assignments for computer lab and home work, as well as the results of the oral or written exam at the end of the semester.
- Bibliography
- • Hellerstein and Stonebraker (eds.): Readings in Database Systems (MIT Press 2005) • Hanan Samet: Foundations of Multidimensional and Metric Data Structures (Morgan Kaufmann Publishers Inc, 2005) • Joe Celko's SQL Puzzles and Answers • Byron Francis: SQL : The Complete Beginner's Guide - Step By Step Instructions (Byron Francis 2016, ISBN: 1535355697)