Lecturers
Dr. Nikolaos Korfiatis, Todor Ivanov, Sead Izberovic
Announcements
05.05.2014 – The class will meet from this Wednesday 07.05 onwards on Room 308 Robert-Mayer-Str. 6-8 (3rd floor, entrance from Robert-Mayer Str. 6 only)
Target Group:
Students willing to learn how to make insights from vast amounts of data, built innovative tools and integrate various data sources to make useful insights.
Prerequisites
Although a lot of introductory material will be provided, students need to have basic knowledge of stochastics as well as database technology
Goals:
The objectives of this course are:
- To present the basic techniques for extracting information from large datasets such as the web, social-network graphs, and large document repositories.
- To introduce to the students with the theoretical and practical tools and techniques for data mining of massive datasets through practical applications in predictive analytics.
- To help the students familiarize with the modern data science toolkits and platforms and the “big data” ecosystems.
Course Material:
Course Book
- Rajaraman, Anand, and Jeffrey David Ullman. Mining of massive datasets. Cambridge University Press, 2012.
Articles
- Lee, K. C., Orten, B., Dasdan, A., & Li, W. (2012). Estimating conversion rate in display advertising from past performance data. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 768-776). ACM.
- Tagami, Y., Ono, S., Yamamoto, K., Tsukamoto, K., & Tajima, A. (2013). CTR prediction for contextual advertising: learning-to-rank approach. In Proceedings of the Seventh International Workshop on Data Mining for Online Advertising (p. 4). ACM.
Links You are encouraged to study the resources on ODBMS.org
Particulars
The class meets every Wednesday at 16.00 in room 308 ( Robert-Mayer-Str. 6-8 -3rd floor, entrance from Robert-Mayer Str. 6 only!) . Note that the schedule content might be subject to changes.
Exam
A written Exam will take place on 16.07.2014, 16.00-17.00 at Room 308. You must register with the Prüfungsamt (can be done online through QIS System)
Course Schedule
Date | Title | Material |
23.04.2014 | Introductory concepts: Hadoop. data mining, statistical techniques, predictive analytics, text mining |
|
30.04.2014 | Map-Reduce and Distributed file systems Distributed file systems: introduction to Hadoop , compute and data nodes, large-scale file system organization. Map-Reduce: Mappers, Reducers and Combiners |
|
07.05.2014 | = Practical Hands-on Lab with Hadoop = Your own laptop is required in able to attend. You need to form a group of 3 before this hands on lab. |
|
14.05.2014 | Similarity Mining Applications of nearest neighbor search, k-item sets |
|
21.05.2014 | Similarity Mining Practical hands on lab with Hadoop and Apache MahoutApplication to document mining, application to financial news using the Reuters NIST corpus |
|
28.05.2014 | Frequent Itemsets The Market-basket model and frequent item sets, association rule mining, the market basket problem on big data sets, memory based and limited pass algorithms) |
|
04.06.2014 |
|
|
11.06.2014 | Link analysis and Pagerank – 1 Search engine essentials, pagerank computation,, topic specific models for Pagerank |
|
18.06.2014 | Link analysis and Pagerank – 2 Link spam and trust rank, Hubs and Authorities (HITS) |
|
25.06.2014 | Hands on Session: Gephi | |
02.07.2014 | Application workshop 1: Recommender SystemsThe recommendation problem: utility matrix and the long tail, content based recommendations, collaborative filtering and dimensionality reduction, Applications on the Netflix challenge corpus. |
|
09.07.2014 | Application workshop 2: Attribution in AdvertisingThe problem of channel and referral attribution, The TURN model, direct and in-direct placement of ads, CPC computation and biding models, matching algorithms for displaying ads, Adwords and competitive ratio for balance, Application with the Amazon Ad- attribution corpus |
|
16.07.2014 | ===== Exam ============ |
Additional Material (in German) with exercises and Slides (Reading is Mandatory!)
Big Data Engineering Lecture – Lars George (EMEA Chief Architect @ Cloudera)
Books on Hadoop