Talks/Seminars – Big Data Lab

Monday, December 16th, 2019, 13:45 pm

Title: Trusted AI – Building Reproducible, Unbiased and Robust AI Pipelines using the python OpenSource stack

Benoît Otjacques Speaker: Romeo Kienzler is the Chief Data Scientist of the IBM Center for Open Source Data and AI Technologies (CODAIT) in San Francisco. He holds an M. Sc. (ETH) in Computer Science with specialisation in Information Systems, Bioinformatics and Applied Statistics from the Swiss Federal Institute of Technology Zurich. He works as Associate Professor for Artificial Intelligence at the Swiss University of Applied Sciences Berne and Adjunct Professor for Information Security at the Swiss University of Applied Sciences Northwestern Switzerland (FHNW). His current research focus is on cloud-scale machine learning and deep learning using open source technologies including TensorFlow, Keras, DeepLearning4J, Apache SystemML and the Apache Spark stack. He also contributes to various open source projects. He regularly speaks at international conferences including significant publications in the area of data mining, machine learning and Blockchain technologies. Romeo is lead instructor of the Advance Data Science specialisation on Coursera (https://www.coursera.org/specializations/advanced-data-science-ibm) with courses on Scalable Data Science, Advanced Machine Learning, Signal Processing and Applied AI with DeepLearning. Recently his latest book on Mastering Apache Spark V2.X (http://amzn.to/2vUHkGl) has been translated into Chinese (http://www.flag.com.tw/books/product/FT363). Romeo Kienzler is a member of the IBM Technical Expert Council and the IBM Academy of Technology – IBM’s leading brain trusts. #ibmaot

Abstract: We are just in the middle of the DeepLearning hype. A lot of things are done, but production deployments are still rare. One of the reasons is that untrusted AI doesn’t make it into production. The concerns are just too high. In this talk we’ll show how data lineage, bias detection, adversarial robustness and model explainability can be achieved using an open source stack.

Time and Location: Monday, December 16th, 13:45 pm, Big Data Lab Frankfurt at the Chair for Databases and Information Systems (DBIS), Goethe-University Frankfurt

Tuesday, April 16th, 2019, 10:00 am

Title: Visualisation as a mean to tackle some ethical issues raised by Machine Learning

Speaker: Dr Ir Benoît Otjacques obtained his PhD in Computer Science from the University of Namur (Belgium) with a thesis related to information visualisation. He also holds an Engineering degree in Computational Mechanics from the University of Louvain (Belgium). He is currently leading the „Environmental Informatics“ Unit of the Luxembourg Institute of Science and Technology. This team investigates how data sciences can be used in conjunction with physics-based approaches to solve challenging problems, with a focus on issues related to the environmental transition. Combining AI-based surrogate models with advanced visualisation and interaction techniques has proven to be relevant to make progress in domains like renewable energies, biotechnologies, engineering, crisis management or smart agriculture. More recently, Benoît Otjacques has developed further interest in Ethics of AI. In particular how to increase trust in the so-called „black box“ models used in science and technology is rising on his list of priorities.

Abstract: The growing use of Artificial Intelligence makes it look like the new Holly Grail to solve many concrete problems. However, we should admit the existence of biases and annoying results when AI techniques are used without taking care of properly cleaning the datasets and / or checking their relevance wrt the application case. Black box models can be very efficient to support tasks like computer vision but they raise the issue to be not (easily) explainable. Visualisation may be a possible answer to this problem, which will be illustrated by some examples.

Time and Location: Tuesday, April 16th, 10:00 am, Big Data Lab Frankfurt at the Chair for Databases and Information Systems (DBIS), Goethe-University Frankfurt

Download presentation slides here!

Tuesday, January 15th, 2019, 3:00 pm

Title: Toward Explainable Artificial Intelligence – RFEX: Improving Random Forest Explainability

Speaker: Prof. Dr. Dragutin Petkovic obtained his Ph.D. at UC Irvine, in the area of biomedical image processing. He spent over 15 years at IBM Almaden Research Center as a scientist and in various management roles. His contributions ranged from use of computer vision for inspection, to multimedia and content management systems. He is the founder of IBM’s well-known QBIC (query by image content) project, which significantly influenced the content-based retrieval field. Dr. Petkovic received numerous IBM awards for his work and became an IEEE Fellow in 1998 and IEEE LIFE Fellow in 2018 for leadership in content-based retrieval area. Dr. Petkovic also had various technical management roles in Silicon Valley startups. In 2003 Dr. Petkovic joined CS Department as a Chair and also founded SFSU Center for Computing for Life Sciences in 2005. Currently, Dr. Petkovic is the Associate Chair of the SFSU Department of Computer Science and Director of the Center for Computing for Life Sciences, as well as co-PI on two NIH sub-grants with Stanford University. Research and teaching interests of Prof. Petkovic include Machine Learning with emphasis on Explainability, teaching methods for Global SW Engineering and engineering teamwork, and the design and development of easy to use systems.

Abstract: Artificial Intelligence (AI) methods are rapidly gaining in importance in many critical applications in medicine, business, autonomous cars, banking, law etc.. However, these AI methods are inherently complex and often difficult to understand and explain resulting in barriers to their adoption and validation. These concerns increased significantly in US among scientists but also among general public and politicians, evidenced in increased press and public attention to these issues. We define explainability in AI as easy to use information explaining why and how the AI approach made its decisions. We believe that much greater effort is needed to address the issue of AI explainability because of the ever increasing use and dependence on AI in many applications and the need for increased adoption by non-AI experts. In this talk we will first address the issues and current problems, as well as examples of increased concern in public and decision makers in US, as well as overview recent Workshop on AI Explainability at Pacific Symposium on Biocomputing (PSB) January 2018 (jointly with Profs. L. Kobzik and C. Re). We will then present our work on Random Forest Explainability (RFEX) (joint work with Prof. R. Altman, M. Wong and A. Vigil). RFEX focuses on enhancing Random Forest (RF) classifier explainability by developing easy to interpret explainability summary reports from trained RF classifiers as a way to improve the explainability for (often non-expert) users. RFEX is implemented and extensively tested on Stanford FEATURE data where RF is tasked with predicting functional sites in 3D molecules based on their electrochemical signatures (features). In developing RFEX method we apply user-centered approach driven by explainability questions and requirements collected by discussions with interested practitioners. We also performed formal usability testing with 13 expert and non-expert users to verify RFEX usefulness. Analysis of RFEX explainability report and user feedback indicates its usefulness in significantly increasing explainability and user confidence in RF classification on FEATURE data. Notably, RFEX summary reports easily reveal that one needs very few (from 2-6 depending on a model) top ranked features to achieve 90% or better of the accuracy when all 480 features are used.

The workshop and RFEX research were supported by NIH grant R01 LM005652, Stanford Mobilize Center and SFSU Center for Computing for Life Sciences.

Download presentation slides here!

Download papers:

Time and Location: Tuesday, January 15th, 2019, 3:00 pm, Big Data Lab Frankfurt at the Chair for Databases and Information Systems (DBIS), Goethe-University Frankfurt

Wednesday, June 20th, 2018, 7:00 pm

Title: DATA FOR THE PEOPLE – Wie wir die Macht über unsere Daten zurückgewinnen!

Speaker: Dr. Andreas Weigend, ehemaliger Chefwissenschaftler von Amazon, hat in seinem Buch „Data for the People“ beschrieben, wie Daten die Qualität und den Zuschnitt individueller Dienstleistungen verbessern. Gleichzeitig wendet er sich gegen staatliche Kontrollen, wie sie jetzt die kommunistische Partei der Volksrepublik China durchsetzt. Weigend lebt und lehrt unter anderem in Shanghai und San Franciso. Er weiß, dass wir den Datengeist nicht mehr in die Flasche zurück bekommen werden. Aber können wir die Macht über unsere Daten zurückerobern, können wir Datenkontrolle erlangen und bewahren? Wie verträgt sich die individuelle Freiheit, keine Datenquelle zu sein, mit den ökonomischen und gesellschaftspolitischen Vorteilen, die die Verarbeitung von Big Data bringt? Muss es nicht auch in einer transparenten Wissengesellschaft ein Grundrecht auf private Abschottung geben?

Time and Location: Wednesday, June 20th, 2018, 19:00 pm, Goethe-Universität, Campus Bockenheim, Hörsaal H II, Eingang Gräfstraße, #m2data

Die Teilnahme ist kostenfrei.

Eine Anmeldung über Eventbrite-Veranstaltungsseite ist zwingend erforderlich, da für den Zutritt zur Veranstaltung ein gültiges Veranstaltungsticket vorzuzeigen ist. Dieses ist entweder digital auf dem Smartphone oder ausgedruckt am Einlass zur Veranstaltung vorzuzeigen.

Tuesday, April 17th, 2018, 11:00 am

Title: AI-Kindergarten: What to do when the training data become too big?

Speaker: Danko Nikolic is a brain and mind scientist, as well as an AI practitioner and visionary. His work as a senior data scientist at Teradata focuses on helping customers with AI and data science problems. In his free time, he continues working on closing the mind-body explanatory gap, and using that knowledge to improve machine learning and artificial intelligence.

Abstract: To create human level AI, huge amounts of training data would be needed in case one is using the AI technology of today. This is not necessarily going to be possible. Rather, novel technological approaches will be needed to i) organise AI technology and ii) provide training sets for that AI. I will describe a way to ‚compress‘ data into a form of intelligent knowledge, which can then be used to train other intelligent systems. A metaphor for that training approach is a kindergarten. Here, the adult educator has knowledge on the world in which the kids need to live. The educator knowledge is stored in a condensed form–from huge amounts of data to a limited number of rules. This knowledge is then transferred to kids by the means of interacting with them. Similarly, the only way to achieve high levels of intelligence in machines is through offering such condensed knowledge rather than providing ‚raw‘ training datasets, as is the case today.

Time and Location: Tuesday, April 17th, 11:00 am, Big Data Lab Frankfurt at the Chair for Databases and Information Systems (DBIS), Goethe-University Frankfurt.

Wednesday, March 7th, 2018, 11:00 am

Title: Towards Interactive Data Exploration

Prof. Carsten Binnig Speaker: Carsten Binnig is a Full Professor in the Computer Science department at at TU Darmstadt and an Adjunct Associate Professor in the Computer Science department at Brown University. Carsten received his PhD at the University of Heidelberg in 2008. Afterwards, he spent time as a postdoctoral researcher in the Systems Group at ETH Zurich and at SAP working on in-memory databases. Currently, his research focus is on the design of data management systems for modern hardware as well as modern workloads such as interactive data exploration and machine learning. He has recently been awarded a Google Faculty Award and a VLDB Best Demo Award for his research.

Abstract: Technology has been the key enabler of the current Big Data movement. Without open-source tools like R and Hadoop, as well as the advent of cheap, abundant computing and storage in the cloud, the ongoing trend toward datafication of almost every research field and industry could never have occurred. However, the current Big Data tool set is ill-suited for interactive data exploration making the knowledge discovery process a major bottleneck in our data-driven society. In this talk, we will first give an overview of challenges for interactive data exploration on large data sets and then present current research results that revisit the design of existing data management systems, from the query interface to the storage and the underlying hardware, to enable interactive data exploration.

Time and Location: Wednesday, March 7th, 11:00 am, Big Data Lab Frankfurt at the Chair for Databases and Information Systems (DBIS), Goethe-University Frankfurt.

Thursday, February 1st, 2018, 11:00 am

Title: Full conditional probabilities lead to indeterminacy in probability values

Speaker: Prof. Dr. Gregory Wheeler, Professor of Philosophy and Computer Science, Head of Philosophy and Law Department, Frankfurt School of Finance & Management

Abstract: The purpose of this talk is to show that if one adopts conditional probabilities as the primitive concept of probability, then one must accept that at least some probability values may be indeterminate, and that some probability questions may fail to have numerically precise answers. (Joint with Fabio Cozman.)

Time and Location: Thursday, February 1st, 11:00 am, Big Data Lab Frankfurt at the Chair for Databases and Information Systems (DBIS), Goethe-University Frankfurt.

Monday, November 20th, 2017, 7:00 pm at the The Science Innovation Union.

Title: The Human Side of AI

Speaker: Prof. Roberto V. Zicari, Frankfurt Big Data Lab, (Johann Wolfgang Goethe-Universität Frankfurt am Main)

Event flyer

Time and Location: Monday, November 20th, 7:00 pm at the The Science Innovation Union, Campus Westend in Room 1.811 of the Casino Building.

Monday, November 13, 2017, 11:00 am

Title: Deep Learning (m)eats Databases

Prof. Jens Dittrich Speaker: Jens Dittrich (Saarland University) is a Full Professor of Computer Science in the area of Databases, Data Management, and Big Data at Saarland University, Germany. Previous affiliations include U Marburg, SAP AG, and ETH Zurich. He received an Outrageous Ideas and Vision Paper Award at CIDR 2011, a BMBF VIP Grant in 2011, a best paper award at VLDB 2014 (the second ever given to an E&A paper), two CS teaching awards in 2011 and 2013, as well as several presentation awards including a qualification for the interdisciplinary German science slam finals in 2012 and three presentation awards at CIDR (2011, 2013, and 2015). He has been a PC member and area chair/group leader of prestigious international database conferences and journals such as PVLDB/VLDB, SIGMOD, ICDE, and VLDB Journal. At Saarland University he co-organizes the Data Science Summer School (http://datasciencemaster.de).
Since 2013 he has been teaching some of his classes on data management as flipped classrooms. See http://datenbankenlernen.de or http://youtube.com/jensdit for a list of freely available videos on database technology in German (introduction to databases) and English (database architectures and implementation techniques). He is also author of a „flipped textbook“ on databases. Since 2016 he has been working on a start-up at the intersection of deep learning and databases (http://daimond.ai).
His research focuses on fast access to big data including in particular: data analytics on large datasets, main-memory databases, database indexing, reproducability, and deep learning.

Abstract: Imagine a machine that is able to compose music and write poems; paint realistic artificial images and dream up video from textual descriptions; paint pictures or entire videos in the style of any artist; translate in-between any pair of natural languages. A machine that can recognize any content in images and videos; diagnose diseases, imitate spoken language — in any voice. A machine that wins games thought to be exclusive to human intelligence. All of that with superhuman performance of course.
Sounds like science fiction? Well, then welcome to the year 2017!
Currently we are witnessing the biggest revolution in computer science since the invention of the Internet. Deep Learning is shaking the world of computer science and overrunning entire (sub-)disciplines.
In this talk I will briefly sketch some of the recent advances in deep learning and what they have to do with databases. Where are synergies? Where should we be looking at? This talk will have a particular focus on recent technical developments in the intersection of databases and/or deep learning in Europe.

Time and Location: Monday, November 13, 2017, 11:00 am, Big Data Lab Frankfurt at the Chair for Databases and Information Systems (DBIS), Goethe-University Frankfurt.

Wednesday, May 17, 2017, 2:30 pm

Title: Introducing the DFG project „Corpus Nummorum Thracorum Klassifizierung der Münztypen und semantische Vernetzung über Nomisma.org“

Speaker: Dr. Karsten Tolle, Director, Frankfurt Big Data Lab.

Abstract: The talk will introduce the project, the current situation of the existing data, and our current first ideas on how to reach a typology definition within the project.
The portal Corpus Nummorum Thracorum (CNT) is a virtual tool for the collection and categorisation of coins from the ancient land of Thrace (set up in a previous DFG-project).
The project will make use of the potential offered by this existing information infrastructure by utilising the data recorded in CNT, together with other sources that likewise follow the approach of Linked Open Data, to present a typology of the coins of ancient Thrace.
The types recorded as norm data in the form of URIs on Nomisma.org will be able to be used as references and research resources by numerous other projects in the Semantic Web.
In this project we plan to set up an analysis database to be fed with CNT-data and additional sources, and run experiments to cluster or classify coin data, based on different methodologies/features.
The project will last 36 months and it is in cooperation with Berlin-Brandenburgische Akademie der Wissenschaften (Prof. Dr. Dr. hc. mult. Martin Grötschel), and Münzkabinett – Staatliche Museen zu Berlin ( Prof. Dr. Bernhard Weisser).

Time and Location: Wednesday, May 17, 2017, 2:30 pm, Big Data Lab Frankfurt at the Chair for Databases and Information Systems (DBIS), Goethe-University Frankfurt.

Thursday, March 16, 2017, 2:30 pm

Title: 5000 Years: Tax, Technology & Analytics

Speaker: Dr. Dirk Tassilo Hettich, Senior Consultant Tax Technology & Analytics

Bio: After having researched brain-computer interfacing for communication and control for almost 10 years, Dirk decided it is time to see how big data and advanced analytics do apply in an economic context and joined the Tax Techology & Analytics team at EY Stuttgart led by Florian Buschbacher in March of 2016. Since then he has applied his software development, machine learning, and visualization expertise in multiple client projects and is still amazed by all the real-world potential of artificial intelligence.

Abstract: New tools for new requirements in tax – from paper, calculators, and spreadsheets towards real-time tax including advanced analytics and machine learning. Digital transformation is happening in all aspects of a company and taxation intersects with almost all such aspects. The Tax Technology & Analytics team resolves complexity by applying state-of-the-art software development, database, and analytics technologies on a daily basis. This talk aims at giving insights to where big data and advanced analytics do apply in the supposedly dusty topic of taxation including working examples.

Time and Location: Thursday, March 16, 2017, 2:30 pm, Big Data Lab Frankfurt at the Chair for Databases and Information Systems (DBIS), Goethe-University Frankfurt.

Thursday, March 9, 2017, 1:45 pm

Title: MariaDB, MySQL and Four Decades of RDBMS Theory and Practice

Kaj Arnö Speaker: Kaj Arnö, Chief Evangelist at MariaDB Corporation

Bio: Software industry generalist, having serving as VP Professional Services, VP Engineering, CIO and VP Community Relations of MySQL AB prior to the acquisition by Sun. At Sun, served as MySQL Ambassador to Sun and Sun VP of Database Community. Board member of Carus Ltd Ab (Åland) and Footbalance Systems Oy (Helsinki, Finland). Past founder, CEO and 14 year main entrepreneur of Polycon Ab (Finland). Founder of what is now MariaDB Corporation Ab in 2010. Founded Green Elk (Outdoors Community) 2014. Now serving as Chief Evangelist of MariaDB Corporation.

Abstract: Plus ça change, plus c’est la même chose: Using databases require developers to be able to combine theory and practice in a way that has changed its form surprisingly little since the 1980s. Underlying themes have remained and seem cyclic. Central control moves to decentralised and back to central; memory constraint get relieved, only to again take effect in microservices. Complex pre-relational structures get a revival in in NoSQL, only to go back to relational again. Kaj Arnö takes an architectural look at RDBMSes spanning the times he’s been exposed to databases, since early 1980s.

Time and Location: Thursday, March 9, 2017, 1:45 pm, Big Data Lab Frankfurt at the Chair for Databases and Information Systems (DBIS), Goethe-University Frankfurt.

Thursday, February 9, 2017, 2:45 pm at the 7 Konferenz für Sozial-und Wirtschaftdaten in Berlin, organised by RatSWD.

Title: Big Data and The Great A.I. Awakening.

Speaker: Prof. Roberto V. Zicari, Frankfurt Big Data Lab, (Johann Wolfgang Goethe-Universität Frankfurt am Main)

Abstract:

Companies with big data pools can have great economic power. Today, that shortlist includes Google, Microsoft, Facebook, Amazon, Apple and Baidu.

I think we’re just beginning to understand the implications of data as an economic asset.

Steve Lohr (a journalists from The New York Times) had a recent conversation with Andrew Ng, a Stanford professor who worked at Google X, co-founder of Coursera and now chief scientist at Baidu. He asked him why Baidu, and he replied there were only a few places to go to be a leader in A.I. Superior software algorithms, he explained, may give you an advantage for months, but probably no more. Instead, Ng said, you look for companies with two things — lots of capital and lots of data. “No one can replicate your data,” he said. “It’s the defensible barrier, not algorithms.”

I asked myself the following question: Technology is moving beyond increasing the odds of making a sale, to being used in higher-stakes decisions like medical diagnosis, loan approvals, hiring and crime prevention. What are the societal implications of this?
steve Lohr argues that the new decisions that data science and AI tools are increasingly being used to make — or assist in making — are fundamentally different than marketing and advertising. In marketing and advertising, a decision that is better on average is plenty good enough. You’ve increased sales and made more money.

But the other decisions are practically and ethically very different. These are crucial decisions about individual people’s lives. For these kinds of decisions, issues of accuracy, fairness and discrimination come into play.
What we probably need is some sort of auditing tool; the technology has to be able to explain itself, to explain how a data-driven algorithm came to the decision or recommendation that it did. And it would important that a “human remains in the loop” for most of these kinds of decisions for the foreseeable future.

Time and Location: February 9, 2:45 pm at the 7 Konferenz für Sozial-und Wirtschaftdaten in Berlin, organised by RatSWD (Rat für Sozial-und Wirtschaftdaten: https://www.ratswd.de )

Download presentation here!

Talks/Seminars 2016

Talks/Seminars 2015

Talks/Seminars 2013-2014