Lecture: Introduction to Data Science
(Summer Semester 2017)
M-TIDS, ATDI
Lecturers: Dr. Jochen L. Leidner and Kim Hee
Course start: Monday 29. May. 2017
Time and Location:
Lecture week 1: May 29-31, 2017, 13:00-16:00; Hörsaaltrakt Bockenheim – H IV
Exercise week 1: June 6-7, 2019, 13:00-16:00; Hörsaaltrakt Bockenheim – H IV
Lecture week 2: June 12-14, 2017, 13:00-16:00; Hörsaaltrakt Bockenheim – H IV
Exercise week 2: June 19-20, 2017, 13:00-16:00; Hörsaaltrakt Bockenheim – H IV
Exam: June 30, 14:00-15:30, Hörsaaltrakt Bockenheim – H III
Nachklausur: September 15, 08:30-10:00, Raum 501 Robert-Mayer-Str. 10
Languages: The language of the lecture is English
Credit Points: Students can receive 5 CPs point. Link in QIS/LFS
Assessment: by written exam.
Eligibility: Master Students in Computer Science, Bio informatics and Business informatics (Wirtschaftsinformatik, Vertiefungsbereich Informatik) are encouraged to attend
Prerequisites: programming skills, knowledge of Python, algorithms and data structures
Course Description: The goal of this compact course is to give participants a first gentle introduction and solid conceptual grounding in what has been called ‚data science‘, i.e. experimental work that is data-driven and empirical. The focus is on methodology, defining an experimental protocol, devising hypotheses, thinking about measuring success, but also on more practical approaches like basic machine learning methods (both supervised and unsupervised) and natural language processing approaches (like part-of-speech tagging, named entity recognition/classification/resolution, and parsing) and the introduction to popular tools. The course also demonstrates some practical applications of the techniques shown, and deepens the students‘ skills via practical exercises.
The lecture is delivered over 4 weeks of calendar time and consists of 2 three-day blocks of 3 hours of lectures followed by 2 days of 2.5 hours of exercises/tutorials each). It targets Master’s level students. By the end of the course, participants will be able to analyze data-sets, and to create their own predictive classifieds and visualizations.
Course Schedule (preliminary)
Date | Topic | Materials |
29.05.2017 – 13:00-16:00 | structured and unstructured profiling data sets pre-processing |
Notes will be available after a lecture day |
30.05.2017 – 13:00-16:00 | hypothesis testing descriptive v. predictive analytics machine learning I: clustering |
Notes will be available after a lecture day |
31.05.2017 – 13:00-16:00 | machine learning II: classification machine learning III: regression Web crawling & mining |
Notes will be available after a lecture day |
06.06.2017 – 13:00-16:00 | Exercise 1. getting started Exercise 2. profiling data Exercise 3. pre-processing data Exercise 4. clustering data Exercise 5. visualizing data |
Notes will be available after a lecture day |
07.06.2017 – 13:00-16:00 | Exercise 1. classifying data Exercise 2. annotating textual data Exercise 3. rule-based extraction Exercise 4. market basket analysis |
Notes will be available after a lecture day |
12.06.2017 – 13:00-16:00 | experimental protocol evaluation measures data science tools |
Notes will be available after a lecture day |
13.06.2017 – 13:00-16:00 | inter-rater agreement applications data science economics: value creation |
Notes will be available after a lecture day |
14.06.2017 – 13:00-16:00 | visualization & presentation planning your data science project data science & ethics. |
Notes will be available after a lecture day |
19.06.2017 – 13:00-16:00 | Exercise 1. Project. Form groups of max. 3 team members. Think of a group name and register your team on kaggle.com. Your team’s challenge is to predict house prices. Build a predictive model and evaluate it using 10-fold cross-validation.Exercise 2. Documentation. Document your work in a report using the template documentation/idsreport.tex (compile this into a PDF report using the make or pdflatex idsreport commands). |
Notes will be available after a lecture day |
20.06.2017 – 13:00-16:00 | Presentation. Present the results, strategy and lessons learned of your team projects. (Hint: Start by devising an outline for a 20 minutes presentation, and carefully budget your presentation time, already at authoring time of your slides.) |
Notes will be available after a lecture day |
Resources
– MOOC in Courera: Introduction to Data Science in Python by University of Michigan
– MOOC in Data Science Teaching Initiative
– A good article for beginners: http://dl.acm.org/citation.cfm?id=2347755