Goethe University Frankfurt

Data Challenges WS 2019/2020


(Winter Semester 2019/2020)


Lecturers: Prof. Dott. Ing. Roberto V. ZicariDr. Karsten Tolle, Todor Ivanov, Timo Eichhorn and Naveed Mushtaq

Course start: October 2019 (kickoff)

Time and Location:

Wednesday 12:00 – 14:00   Hörsaaltrakt Bockenheim – H V,  Gräfstraße 50-54, Frankfurt. (Campus Bockenheim), Goethe University Frankfurt. See map here: http://www.bigdata.uni-frankfurt.de/about/

Thursday       10:00 – 12:00   Hörsaaltrakt Bockenheim – H III,  Gräfstraße 50-54, Frankfurt. (Campus Bockenheim), Goethe University Frankfurt. See map here: http://www.bigdata.uni-frankfurt.de/about/

Languages: The languages of the lecture are English and German.

Credit Points: Students can receive 6 CPs. Link in QIS/LFS

Course Description: Students will take part to two Data Challenges offered by BearingPoint.

Eligibility: Bachelor Students, Master Students, and PhD students across multiple disciplines are encouraged to attend the kickoff and to sign up for one Data Challenge.

Students in Computer Science, Data Science, Information Systems, Business Computer Science, Mathematics, Economics, Marketing, Psychology, and other disciplines will form teams of two to explore the questions posed.  Team members are required to attend the kick-off lecture to sign-up for this project. It is highly recomended that participants do have basic knowledge in Machine Learning!

Online registration will be available by middle of September 2019 latest!


Course Description:

The course consists of two phases: Phase I and Phase II will be held during the Winter Semester 2019/2020. The proposed timeline and details of these stages are:

Phase 1:

-Teams will be asked to address one of the Data Challenges offered.

Specifics will be addressed at introductory lectures. Teams will then work independently to create a proposal of a novel idea that satisfies the data challenge chosen.

-Deliverable: A mid-term presentation of the project idea, where it is required that:

  1. teams clearly state objectives,
  2. general description of the way they intend to implement the idea using the data available for the challenge chosen.

Phase 2:

Teams that submitted a successful presentation at Phase I will be then asked to implement the idea and present it at the end of Phase II.  (Exact dates and detailed agenda to be reviewed at the kickoff)



BearingPoint and Frankfurt Big Data Lab Data Challenges 2019/2020


1. Mobility Service E-Scooter – station-based rental service for electronic scooters​​
To expand the business, the company wants to identify the main business drivers to develop a growth strategy. Therefore it wants to analyze the data usage in order to identify various patterns within the data and derive strategic initiatives to ​increase revenue and reduce operating cost.​ Data set is available; high complexity.

2. Online shoe return – online shoe retailer wants to investigate return rate drivers
The retailer current return rate of all orders is ​a major influencing factor for profitability. Ways to decrease​ the return rate must be identified by clustering the customer base into segments and label those​, by analyzing product configurations for which return rates are high and ​by building a predictive model to evaluate the probability of a product being returned​. Data set is available; students must investigate into some data preparation; medium complexity.


Challenge Prizes (details will follow):


Course Schedule (preliminary)

Date  Topic  Material
16.10.2019 Mi. 12 – 14 Kickoff – Introduction to the Challenges
17.10.2019 Do. 14 – 16
23.10.2019 Mi. 12 – 14
24.10.2019 Do. 14 – 16
30.10.2019 Mi. 12 – 14
31.10.2019 Do. 14 – 16
06.11.2019 Mi. 12 – 14
07.11.2019 Do. 14 – 16
13.11.2019 Mi. 12 – 14
14.11.2019 Do. 14 – 16
20.11.2019 Mi. 12 – 14
21.11.2019 Do. 14 – 16
27.11.2019 Mi. 12 – 14  Phase I – Presentations / Elevator Pitch
28.11.2019 Do. 14 – 16
04.12.2019 Mi. 12 – 14
05.12.2019 Do. 14 – 16
11.12.2019 Mi. 12 – 14
12.12.2019 Do. 14 – 16
18.12.2019 Mi. 12 – 14
19.12.2019 Do. 14 – 16
25.12.2019 Vorlesungsfrei – Weihnachten
26.12.2019 Vorlesungsfrei – Weihnachten
01.01.2020 Vorlesungsfrei – Weihnachten
02.01.2020 Vorlesungsfrei – Weihnachten
08.01.2020 Vorlesungsfrei – Weihnachten
09.01.2020 Vorlesungsfrei – Weihnachten
15.01.2020 Mi. 12 – 14
16.01.2020 Do. 14 – 16
22.01.2020 Mi. 12 – 14
23.01.2020 Do. 14 – 16
29.01.2020 Mi. 12 – 14
30.01.2020 Do. 14 – 16
05.02.2020 Mi. 12 – 14
06.02.2020 Do. 14 – 16
12.02.2020 Mi. 12 – 14  Final Presentations – Phase II
13.02.2020 Do. 14 – 16  Reserve




UC Berkeley DATA-resources – many course materials on Python, NumPy, Pandas, SciKitLearn, MatPlotLib, TensorFlow, Machine Learning and more


Ethics and Data

Legal Implications of Data

Data Privacy

Elevator Pitch

Elevator Pitch- 5 minutes Presentation

Machine Learning

  • Machine Learning Course at Stanford by Andrew Ng, Chief Scientist of Baidu; Chairman and Co-Founder of Coursera; Stanford CS faculty.
  • Non technical 5-part series on introductory machine learning by Alex Castrounis, Product Leader and Technologist.
    • Part 1 – definition of machine learning and most widely used machine learning algorithms.
    • Part 2 – model performance, data selectionpre-processing, splittingfeature selection and feature engineering.
    • Part 3 – model variancebias, overfitting, model complexitydimensionality reduction, model evaluationperformance, tuningvalidationensemble learning, and resampling methods.
    • Part 4 - model performance and error analysis 
    • Part 5 – unsupervised learning, predictive analyticsartificial intelligencestatistical learning, and data mining.
  • Downloadable CRC Press Free Book on “Explorations in Artificial Intelligence and Machine Learning” (LINK to CRC Web site- registration required) with 7 chapters:
    • An Introduction to Machine Learning
    • The Bayesian Approach to Machine Learning
    • A Revealing Introduction to Hidden Markov Models
    • Introduction to Reinforcement Learning
    • Deep Learning for Feature Representation
    • Neural Networks and Deep Learning
    • AI-Completeness: The Problem Domain of Super-intelligent Machines

Open Source Tool

  • Apache Hadoop is a project developing open-source software for reliable, scalable, distributed computing.
  • Apache Spark is a fast and general engine for large-scale data processing.
  • Apache Flink is an open-source platform for distributed stream and batch data processing.

Advanced AI Tools

  • TensorFlow  is an open source software library for numerical computation using data flow graphs.
  • The Microsoft Cognitive Toolkit: A free, easy-to-use, open-source, commercial-grade toolkit that trains deep learning algorithms to learn like the human brain.

Making App

Chat Bot 



(C) Big Data Laboratory. Design By Tea Sets