Lecturers 


Dr. Nikolaos Korfiatis, Todor Ivanov, Sead Izberovic

korfiatis

tudorBD sead_unii

Announcements


05.05.2014 – The class will meet from this Wednesday 07.05 onwards on Room 308 Robert-Mayer-Str. 6-8 (3rd floor, entrance from Robert-Mayer Str. 6 only)

Target Group:

Students willing to learn how to make insights from vast amounts of data, built innovative tools and integrate various data sources to make useful insights.

Prerequisites

Although a lot of introductory material will be provided, students need to have basic knowledge of stochastics as well as database technology

Goals:

The objectives of this course are:

  • To present the basic techniques for extracting information from large datasets such as the web, social-network graphs, and large document repositories.
  • To introduce to the students with the theoretical and practical tools and techniques for data mining of massive datasets through practical applications in predictive analytics.
  • To help the students familiarize with the modern data science toolkits and platforms and the “big data” ecosystems.

Course Material:

Course Book

Articles

  • Lee, K. C., Orten, B., Dasdan, A., & Li, W. (2012). Estimating conversion rate in display advertising from past performance data. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 768-776). ACM.
  • Tagami, Y., Ono, S., Yamamoto, K., Tsukamoto, K., & Tajima, A. (2013). CTR prediction for contextual advertising: learning-to-rank approach. In Proceedings of the Seventh International Workshop on Data Mining for Online Advertising (p. 4). ACM.

Links You are encouraged to study the resources on ODBMS.org

Particulars

The class meets every Wednesday at 16.00 in room 308 ( Robert-Mayer-Str. 6-8 -3rd floor, entrance from Robert-Mayer Str. 6 only!) . Note that the schedule content might be subject to changes.

Exam

A written Exam will take place on 16.07.2014, 16.00-17.00 at Room 308. You must register with the Prüfungsamt (can be done online through QIS System)

Course Schedule


Date Title Material
23.04.2014 Introductory concepts:
Hadoop. data mining, statistical techniques, predictive analytics, text mining
30.04.2014 Map-Reduce and Distributed file systems
Distributed file systems: introduction to Hadoop , compute and data nodes, large-scale file system organization. Map-Reduce: Mappers, Reducers and Combiners
07.05.2014 = Practical Hands-on Lab with Hadoop =
Your own laptop is required in able to attend. You need to form a group of 3 before this hands on lab.
14.05.2014 Similarity Mining
Applications of nearest neighbor search, k-item sets
21.05.2014 Similarity Mining
Practical hands on lab with Hadoop and Apache MahoutApplication to document mining, application to financial news using the Reuters NIST corpus
28.05.2014 Frequent Itemsets
The Market-basket model and frequent item sets, association rule mining, the market basket problem on big data sets, memory based and limited pass algorithms)
04.06.2014
11.06.2014 Link analysis and Pagerank – 1
Search engine essentials, pagerank computation,, topic specific models for Pagerank
18.06.2014 Link analysis and Pagerank – 2
Link spam and trust rank, Hubs and Authorities (HITS)
25.06.2014 Hands on Session: Gephi
02.07.2014 Application workshop 1:
Recommender SystemsThe recommendation problem: utility matrix and the long tail, content based recommendations, collaborative filtering and dimensionality reduction, Applications on the Netflix challenge corpus.
09.07.2014 Application workshop 2:
Attribution in AdvertisingThe problem of channel and referral attribution,  The TURN model, direct and in-direct placement of ads, CPC computation and biding models, matching algorithms for displaying ads, Adwords and competitive ratio for balance, Application with the Amazon Ad- attribution corpus
16.07.2014 ===== Exam ============

Additional Material (in German) with exercises and Slides (Reading is Mandatory!)


Big Data Engineering Lecture – Lars George (EMEA Chief Architect @ Cloudera)


 Books on Hadoop