Data Challenges WS 2019/2020

Aktuelles:

Hypercube Training findet am Mittwoch den 23.10. als Online-Kurs statt! –> kein Treffen im Hörsaal!

Infos zu HyperCube: https://www.hcube.io/en/

Die registrierten Teams erhalten die Einwahldaten per E-Mail (voraussichtlich Montag)!

Bisher nicht registrierte Studierende bitte bei Interesse zur Teilnahme melden unter: data.analytics@bearingpoint.com

lightbulb-2692247_640

(Winter Semester 2019/2020)

DC, M-DS-ADS, B-WB, M-WB, PoE, M-SIW-I1A, M-SIW-I1B

Lecturers: Prof. Dott. Ing. Roberto V. Zicari, Dr. Karsten Tolle, Todor Ivanov, Timo Eichhorn and Naveed Mushtaq

Course start: October 2019 (kickoff)

Time and Location:

Wednesday 12:00 – 14:00 Hörsaaltrakt Bockenheim – H III, Gräfstraße 50-54, Frankfurt. (Campus Bockenheim), Goethe University Frankfurt. See map here: http://www.bigdata.uni-frankfurt.de/about/

Thursday 14:00 – 16:00 Hörsaaltrakt Bockenheim – H III, Gräfstraße 50-54, Frankfurt. (Campus Bockenheim), Goethe University Frankfurt. See map here: http://www.bigdata.uni-frankfurt.de/about/

Languages: The languages of the lecture are English and German.

Credit Points: Students can receive 6 CPs. Link in QIS/LFS

Course Description: Students will take part to two Data Challenges offered by BearingPoint.

Eligibility: Bachelor Students, Master Students, and PhD students across multiple disciplines are encouraged to attend the kickoff and to sign up for one Data Challenge.

Students in Computer Science, Data Science, Information Systems, Business Computer Science, Mathematics, Economics, Marketing, Psychology, and other disciplines will form teams of two to explore the questions posed. Team members are required to attend the kick-off lecture to sign-up for this project. It is highly recomended that participants do have basic knowledge in Machine Learning!

If you still want to register and participate in the Data Challege, please send us email at dc@dbis.cs.uni-frankfurt.de

Course Description:

The course consists of two phases: Phase I and Phase II will be held during the Winter Semester 2019/2020. The proposed timeline and details of these stages are:

Phase 1: 

-Teams will be asked to address one of the Data Challenges offered.

Specifics will be addressed at introductory lectures. Teams will then work independently to create a proposal of a novel idea that satisfies the data challenge chosen.

-Deliverable: A mid-term presentation of the project idea, where it is required that:

teams clearly state objectives,
general description of the way they intend to implement the idea using the data available for the challenge chosen.

Phase 2:

Teams that submitted a successful presentation at Phase I will be then asked to implement the idea and present it at the end of Phase II. (Exact dates and detailed agenda to be reviewed at the kickoff)

lightbulb-2692247_640

BearingPoint and Frankfurt Big Data Lab Data Challenges 2019/2020

1. Mobility Service E-Scooter – station-based rental service for electronic scooters
To expand the business, the company wants to identify the main business drivers to develop a growth strategy. Therefore it wants to analyze the data usage in order to identify various patterns within the data and derive strategic initiatives to increase revenue and reduce operating cost. Data set is available; high complexity.

2. Online shoe return – online shoe retailer wants to investigate return rate drivers
The retailer current return rate of all orders is a major influencing factor for profitability. Ways to decrease the return rate must be identified by clustering the customer base into segments and label those, by analyzing product configurations for which return rates are high and by building a predictive model to evaluate the probability of a product being returned. Data set is available; students must investigate into some data preparation; medium complexity.

BearingPoint Challenge Prizes:

Visit to the Hypercube/BearingPoint office in Paris including hotel and 2^nd Class train ticket
Visit to the Wayra office in Munich including hotel, 2^nd Class train ticket and dinner at Wayra office event
Zalando or Jochen Schweizer voucher (in value of 80 €)

Course Schedule (preliminary)

Date		Topic	Material
16.10.2019	Mi. 12 – 14	Kickoff – Introduction to the Challenges	Kickoff_Presentation_DBIS Folien BearingPoint
17.10.2019	Do. 14 – 16	Workshop – How to start the project?	Tools: Sqlectron & RazorSQL
23.10.2019	Mi. 12 – 14	Hypercube Training – Online! – kein Treffen im Hörsaal	Infos zu HyperCube: https://www.hcube.io/en/Die registrierten Teams erhalten die Einwahldaten per E-Mail (voraussichtlich Montag)!Bisher nicht registrierte Studierende bitte bei Interesse zur Teilnahme melden unter: data.analytics@bearingpoint.com
24.10.2019	Do. 14 – 16	A closer look into the data …	RapidMiner DataBricks Community Edition DataGrip
30.10.2019	Mi. 12 – 14	Workshop / Q&A / Mentoring
31.10.2019	Do. 14 – 16	Workshop / Q&A / Mentoring Introduction to OpenRefine	Slides Openrefine
06.11.2019	Mi. 12 – 14	Skype Call with BearingPoint	… see invitation by e-MailSlides_Q&A_Session1
~~07.11.2019~~	Do. 14 – 16	~~Data and Ethics~~	… due to illness, this lecture will be shifted to another date. – no lecture on 7.11.!
13.11.2019	Mi. 12 – 14	Q&A / Mentoring
14.11.2019	Do. 14 – 16	Data and Ethics	Slides Ethics
20.11.2019	Mi. 12 – 14	Skype Call with BearingPoint	Slides are in the MS Teams Room! The slides also contain the passwords for the two videos on HyperCube that can be fund in the MS Teams Room.
21.11.2019	Do. 14 – 16	How to Make a Presentation
27.11.2019	Mi. 12 – 14	Phase I – Presentations / Elevator Pitch
~~28.11.2019~~	Do. 14 – 16		no lecture
04.12.2019	Mi. 12 – 14	Mentoring Meetings per Team	RMS 10, Room 501
05.12.2019	Do. 14 – 16	Mentoring Meetings per Team	RMS 10, Room 501
11.12.2019	Mi. 12 – 14	individual coaching sessions per team	details per e-mail
12.12.2019	Do. 14 – 16	individual coaching sessions per team	details per e-mail
18.12.2019	Mi. 12 – 14	individual coaching sessions per team	details per e-mail
19.12.2019	Do. 14 – 16	individual coaching sessions per team	details per e-mail
25.12.2019		Vorlesungsfrei – Weihnachten
26.12.2019		Vorlesungsfrei – Weihnachten
01.01.2020		Vorlesungsfrei – Weihnachten
02.01.2020		Vorlesungsfrei – Weihnachten
08.01.2020		Vorlesungsfrei – Weihnachten
09.01.2020		Vorlesungsfrei – Weihnachten
15.01.2020	Mi. 12 – 14	Mentoring Meetings per Team (on demand)	RMS 10, Room 501
16.01.2020	Do. 14 – 16	Datenschutz nach DSGVO: Anforderungen und Anwendungen -Vortrag von Frau Selma Gebhardt
22.01.2020	Mi. 12 – 14	Mentoring Meetings per Team (on demand)	RMS 10, Room 501
23.01.2020	Do. 14 – 16	Mentoring Meetings per Team (on demand)	RMS 10, Room 501
~~29.01.2020~~	~~Mi. 12 – 14~~	Vorlesung fällt aus
30.01.2020	Do. 14 – 16	Status Check	H III
05.02.2020	Mi. 12 – 14	Mentoring Meetings per Team	RMS 10, Room 501
06.02.2020	Do. 14 – 16	Mentoring Meetings per Team	RMS 10, Room 501
12.02.2020	Mi. 12 – 14	Final Presentations – Phase II
13.02.2020	Do. 14 – 16	Reserve

Resources

UC Berkeley DATA-resources – many course materials on Python, NumPy, Pandas, SciKitLearn, MatPlotLib, TensorFlow, Machine Learning and more

Mobility

Project Shared Streets creates data standards around public resources such as curbs and streets (ranging from pick-up and dropoff volumes by hour to permitted uses), helping both city planners and the private sector innovate more quickly and with a common definition of the urban environment. Autonomous vehicles, whether fleet-operated or privately-owned, will rely heavily on these curated data sources to be good urban citizens, by complying with regulations on permitted uses, times, and speeds. – http://sharedstreets.io
The project HubCab– gathered 170 million taxi trips by over 13,000 Medallion taxis in New York City, with GPS coordinates of all pickup and drop off points and corresponding times. – http://hubcab.org/#13.00/40.7219/-73.9484
Turo Extras, a set of features which enable Turo hosts to provide additional items along with their cars, ranging from outdoor and recreation equipment to convenience services: https://explore.turo.com/discover-extras/
On Smart Cities and Mobility. Q&A with Praveen Subramani: http://www.odbms.org/2018/05/on-smart-cities-and-mobility-qa-with-praveen-subramani/
On Data and Transportation. Q&A with Carlo Ratti: http://www.odbms.org/2018/04/on-data-and-transportation-qa-with-carlo-ratti/

Ethics and Data

Perspectives on Big Data, Ethics, and Society. May 23, 2016 / By Jacob Metcalf, Emily F. Keller Danah Boyd
Council for Big Data, Ethics, and Society – In collaboration with the National Science Foundation, the Council for Big Data, Ethics, and Society was started in 2014 to provide critical social and cultural perspectives on big data initiatives. The Council brings together researchers from diverse disciplines — from anthropology and philosophy to economics and law – to address issues such as security, privacy, equality, and access in order to help guard against the repetition of known mistakes and inadequate preparation. Through public commentary, events, white papers, and direct engagement with data analytics projects, the Council will develop frameworks to help researchers, practitioners, and the public understand the social, ethical, legal, and policy issues that underpin the big data phenomenon
Ethical Issues in the Big Data Industry, MIS Quartely Executive
The Social, Cultural, & Ethical Dimensions of “Big Data”, March 17, 2014 – New York, NY

Legal Implications of Data

Data Privacy

Big Data and Large Numbers of People: the Need for Group Privac y by Prof. Luciano Floridi, Oxford Internet Institute, University of Oxford
Navigating US-EMEA Data Privacy Rules By Kevin Petrie, Technology Evangelist at Attunity
Big Data Privacy Isn’t Just for Data Geeks and Privacy Freaks Anymore by Tamara Dull, Director of Emerging Technologies for SAS Best Practices
Privacy considerations & responsibilities in the era of Big Data & Internet of Things by Ramkumar Ravichandran, Director, Analytics, Visa Inc.
ENABLING BIG DATA THROUGH EUROPE’S NEW DATA PROTECTION REGULATION by Viktor Mayer-Schönberger, Professor of Internet Governance and Regulation, University of Oxford & Yann Padova, Former Secretary General of the French Data Protection Authority (CNIL), now Commissioner with the French Energy Regulator (CRE).

Elevator Pitch

Elevator Pitch- 5 minutes Presentation

Machine Learning

Machine Learning Course at Stanford by Andrew Ng, Chief Scientist of Baidu; Chairman and Co-Founder of Coursera; Stanford CS faculty.
Non technical 5-part series on introductory machine learning by Alex Castrounis, Product Leader and Technologist.
- Part 1 – definition of machine learning and most widely used machine learning algorithms.
- Part 2 – model performance, data selection, pre-processing, splitting, feature selection and feature engineering.
- Part 3 – model variance, bias, overfitting, model complexity, dimensionality reduction, model evaluation, performance, tuning, validation, ensemble learning, and resampling methods.
- Part 4 – model performance and error analysis
- Part 5 – unsupervised learning, predictive analytics, artificial intelligence, statistical learning, and data mining.
Downloadable CRC Press Free Book on “Explorations in Artificial Intelligence and Machine Learning” (LINK to CRC Web site- registration required) with 7 chapters:
- An Introduction to Machine Learning
- The Bayesian Approach to Machine Learning
- A Revealing Introduction to Hidden Markov Models
- Introduction to Reinforcement Learning
- Deep Learning for Feature Representation
- Neural Networks and Deep Learning
- AI-Completeness: The Problem Domain of Super-intelligent Machines

Open Source Tool

Apache Hadoop is a project developing open-source software for reliable, scalable, distributed computing.
Apache Spark is a fast and general engine for large-scale data processing.
Apache Flink is an open-source platform for distributed stream and batch data processing.

Advanced AI Tools

TensorFlow is an open source software library for numerical computation using data flow graphs.
The Microsoft Cognitive Toolkit: A free, easy-to-use, open-source, commercial-grade toolkit that trains deep learning algorithms to learn like the human brain.

Making App

Making a Flask app using a PostgreSQL database and deploying to Heroku

Chat Bot

Hartley Brody: Facebook Messenger Bot Tutorial: Step-by-Step Instructions for Building a Basic Facebook Chat Bot. 15 June 2016

Enterpreneurship

The Top 10 Mistakes of Entrepreneurs video, Guy Kawasaki, former chief evangelist of Apple and co-founder of Garage Technology Ventures.