Skip to navigation Skip to main content

Day 1

Registration & Light Breakfast

Chairperson Overview

Pedro Alves CEO Ople

Customer Engagement in the 21st Century

Tim Wong Solutions Consultant Couchbase

Data Visualization, Fast and Slow

A data visualization workshop or a best-selling manual on
data visualization offers practical techniques for quickly making data
visualization. But visual communication, like all forms of communication, does
not happen in just one mode. This talk will explore the traditional stereotype
of data visualization as a report for busy executives and expand into
analytical applications that demand time and investment. This affects how we
design data visualization products, what tools we use to create them, the role
of the data visualization creator in relation to their product and how this
affects a vision of engaging with data visualization readers.

Elijah Meeks Senior Data Visualization Engineer Netflix

The Importance of Data Literacy

With the volume and velocity of data available in the world
today, data is becoming the foundation for the new analytics economy. 
Unfortunately, as data has grown at incredible speeds, there has followed a
real and growing data literacy skills gap.  The inability to read, work
with, analyze and argue with data can lead to major issues within
organizations.  This session will focus on what exactly data literacy is
and why it is a critical skill for organizations to be successful.

Michael Distler Director, Product Marketing Qlik

Improve Customer Experience through Multi-arm Bandit

 A Reinforcement Learning-based
In order to accelerate innovation and learning, the data
science team at uber is looking to optimize Driver, Rider, Eater, Restaurant
and Courier experience through reinforcement learning methods.  The team
has implemented bandits methods of optimization which learn iteratively and
rapidly from a continuous evaluation of related metric performance.
Recently, we completed an AI-powered experiment using bandits
techniques for content optimization to improve the customer engagement. The
technique helped improve customer experience compared to any classic hypothesis
testing methods. In this session, we will explain various use
cases at Uber that this technique has proven its value
and how bandits have helped optimize and improve customer experience and
engagement at Uber.
probability theory, the multi-armed bandit problem is a problem in which a
fixed limited set of resources must be allocated between several choices in a
way that maximizes their expected gain (or minimizes regrets). In
artificial intelligence, Thompson sampling, named after William R. Thompson, is
a heuristic for choosing actions that address the exploration-exploitation
dilemma in the multi-armed bandit problem.

Jeremy Gu, Senior Data Scientist, Uber

Anirban Deb, Data Science Lead, Uber

Jeremy Gu Senior Data Scientist Uber

Networking Break

ETL vs ELT for Big Data

Artyom Keydunov CEO Statsbot

Rocking the Big Data World

What is data science all about?
How data is eating the retail world
IoT: living a better life with data
Malaria & Machine Learning
Making meaningful predictions in real time

Mohammad Shokoohi-Yekta Senior Data Scientist and Adjunct Faculty Apple

Self-Service Analytics has Arrived – But for Who?

In their bid to digitally
transform and compete in today’s economy, the average enterprise has increased
spending on data & analytics technology to $14M to make data-driven
decision a reality. Yet while vendors claim self service capabilities, adoption
rates hover at an abysmal 22%. Why? Because according to Gartner “no analytics
vendor is full addressing both enterprises' IT-driven requirements and business
users' requirements for ease of use.” Business users don’t have the time or
inclination to learn a complex, IT-approved analytics tool.
This session will focus on
why advancements in search & AI-driven analytics are driving the “Third
Wave of Analytics,” eliminating the need for technical training while equipping
every businessperson with the ability to analyze data quickly and efficiently. 

Sean Zinsmeister Head of Product Marketing ThoughtSpot


Understanding Product Images in E-commerce: Challenges and Lessons Learnt

Images are valuable components of any product catalog. It is crucial to understand the product images and to optimize the presentation of a product to the customer based on image content. This talk outlines the range of computer vision and machine learning based techniques that are generally used to enrich the product data and the user experience through understanding images better.

Abon Chaudhuri Sr. Applied Researcher Walmart Labs

Improving and Automating Sleep Study Interpretation

Sleep disorders impact over 100 million
Americans, yet the current method for reviewing overnight sleep studies is
cumbersome and outdated. Applying signal processing and machine learning
techniques to sleep will both standardize the analytic process and uncover
biomarkers beyond the traditional metrics. Expanding our understanding of sleep
disorders will improve patient care and the diagnostic process.

Eileen Leary Senior Manager of Clinical Research Stanford University

Demystifying the Attribution Myth

In a
Multi-Marketing Channel environment, Marketing Attribution, i.e. giving credit
to a marketing channel for a transaction has always been at the epicenter for
all ecommerce companies. Attribution determines how the marketing budget is
allocated. In this session, you will learn the challenges faced in the current
environment, the tradeoffs companies make and how Expedia is solving for this
challenging but rewarding puzzle. You will leave with ideas on managing large
data volume at scale, shifting from a batch process to a micro service
architecture for building flexibility and resiliency into platform.

Santosh Iyer Product Manager Expedia

Networking Break

Reconciling Production Data (OLTP) with the Analytics Data Stack

clickstream data (facts) with production data (dimensions) yields powerful
analytics. Unfortunately, production data often has an architecture where many
updates and deletes are performed in the relational database. Common ETL
patterns reflect production updates and deletes into the analytics data stack.
Because of how analytic databases store data, updates and deletes are very
expensive operations that can degrade analytic database performance.
talk presents ETL patterns that circumvent this issue, without having to
re-architect the production application. The premise is that updates and
deletes should never be propagated to analytic databases. This results in
tables having their own change history log that can be queried. A generic pure
SQL technique for efficiently creating “latest snapshot” views will be
presented that work in most all analytic databases, as well as a specific
technique for Vertica using Top-K projections. The talk will also touch on
Etsy's Kafka data pipeline and why these ETL patterns make data ingestion

Chris Bohn Senior Database Engineer Etsy

Govern and Manage Your Data Lake

Data lake becomes a beautiful concept through the past several years, big data technology today enables IT to process and store huge amount of data in the cloud for people to utilize, building data lake to just quickly ingest all the data and let others to self serve sounds a beautiful idea. But is it that easy and beautiful in reality?

Here we will browse eBay's experience from the past several years on how to manage and purify the data lake enable the disciplined innovation through:
Understand what you have in the lake

How is the quality, what is wrong
When to expect the data be available
Where the data is coming from
How the data is generated
Who is using the data
What business value the data is generating
Production management policy

Alex Liang Director of Data Programs and Strategy eBay

Making Sense of Unstructured Data: From Traditional ML to Deep Learning

Structured data only accounts for about 20 percent of stored
information. The rest is unstructured data – includes texts, blogs, documents,
photos, videos, etc. In this presentation, I will talk about analytical
methods and tools, to analyze unstructured data, that data scientists may
use to gather and analyze information that doesn’t have a pre-defined model or

Traditional analytical processes are not adequate to fully understand
unstructured data and as such, I want to dwell on some of the newer methods
such as semantic analysis and natural language processing to analyze
unstructured data. I will talk about the best practices that has worked for me
in my quest to untangle unstructured data as well as do shallow dives into
Recurrent Neural Networks (RNN) and Convolutional Neural Networks and how deep
learning is helping at identifying patterns in unstructured data.

Nav Kesher Head of Marketplace Data Sciences Facebook

Applying a Decision Framework to Prescriptive Analytics: Avoiding Paralysis by Analysis

With over 6 million annual patient
visits, Vituity has significant healthcare data and in a short period of time
has built several real time prescriptive analytics applications.  

The learnings along this journey
from retrospective analytics to predictive and prescriptive tools are
tremendous – what worked, what can be done differently, how does one
  Join this illuminating
discussion as we discuss the stages necessary to build prescriptive tools:

Identify the clear business goals
and how to measure their value

Include what leaders should do,
invest,  build, organize and align, in order to gain access to the next
level of analytics maturity

Define the return on investment

David Yue Senior Data Engineer Vituity

Cocktail Reception

Day 2

Registration & Light Breakfast

Chairperson Overview

Andy Mantis SVP Data Insights 1010data

Deep Learning for Predicting Customer Behaviour

Deep Learning has made remarkable progress in
fields such as Computer Vision and Natural Language Processing.  It has
excelled at problems where the data is largely unstructured and human
performance is close to the upper bound.  In the domain of predicting
customer behavior (e.g., customer lifetime value, player retention) we often
have largely structured data and human performance is far below the upper bound. 
This talk will detail a project comparing deep neural network models (using
Keras and TensorFlow) and more “traditional” tree-based ensemble models (using
scikit-learn) for predicting player behavior.  We will discuss cases where
a deep neural network shines and other cases where simpler is better.

Dennis O'Brien Director, Data Science GSN Games

P’s of Data Science: Planning Collaborations to Create Products from Data

Our lives as well as any field of business and
society are continuously transformed by our ability to collect meaningful data
in a systematic fashion and turn that into value. The opportunities
created by this change comes with challenges that not only push for new and
innovative data management and analytical methods, but also translating
these new methods to impactful applications and generating valuable products
from data. In a multi-disciplinary data science team, focusing on
collaboration and communication from the beginning of any activity improves the
ability of the team to bring together the best of their knowledge in a variety
of field including business, statistics, data management, programming, and
computing is vital for impactful solutions. This talk will overview how
focusing on some P’s in the planning phases of a data science activity and
creating a measurable process that spans multiple perspectives and success
metrics can lead to a more effective solution.

Ilkay Altintas Chief Data Science Officer San Diego Supercomputer Center

Joint Presentation: OpenTable Data Engineering

This session will discuss:•Data Eng Architecture•Data Pipelines•Data Lake•Spark Streaming•Real Time APIs•PrestoRahul
Bhatia, Senior Data Engineer, OpenTable
Marya, Director, Data Engineering and Analytics, OpenTable

Raman Marya Director, Data Engineering & Analytics OpenTable

Networking Break

E-commerce Search using Big Data and AI

One of the main drivers behind the phenomenal
growth of e-commerce is that it is able to offer much broader assortment of
products compared to a brick and mortar store. The online catalog of a big
retailer like Walmart, Amazon, eBay etc. typically contains hundreds of
millions of products. "Product search" on an e-commerce website is
the most important tool for the customers to find the right item from a large
catalog. Product search, much like web search, always has been a classic
problem to solve using Big Data and AI.
In this talk, I'll highlight the key Big
Data and AI technologies that is powering today's product search. I'll also
discuss how the revolution in AI is possibly going to shape the future of
product search and in turn the future of retail. Audience can expect to get a
good understanding of why and how Big Data and AI play an extremely critical
role in product search and why it will continue to remain a fascinating area of
Big Data and AI innovation.

Somnath Banerjee Director of Machine Learning Walmart Labs

Panel Discussion: Big Data, Big Value

Driving holistic decision making with business
Promoting a proactive, innovative culture in leveraging
big data to decision making processes
Translating data into actionable consumer insights and
better decision making
Utilizing today's latest technologies to translate data
into organizational value
Methods in data science, predictive analytics, text
Aligning your organization's strategy and long term
goals to your data analytics roadmap 
Moderator:Andy Mantis, SVP Data Insights, 1010dataPanelists:
Payel Chowdhury, Associate Director - Data Science, The Clorox Company

Gary Griffin, Senior Vice President, Database Marketing, Bank of AmericaDeep
Varma, Vice
President, Data Engineering, Trulia

Panelists Cross-Industry Experts Big Data Innovation Summit

Mission Analytics: Common pitfalls and how to avoid them

Data is in fashion, and rightly so. However, many organizations struggle to “carry” it properly. The promise of data and data analytics is immense, but its actual implementation needs more than just data science PhDs and Hadoop clusters. It requires a mindset shift. What is the right mix of talent to make that happen? What kind of projects need to be undertaken and how to phase them? How to separate the hype of advanced techniques like machine learning from what will work for business in the now and here? Why is scaling important and how does it usually get undermined? As you already have realized while solving this for your organizations, the approach requires a mix of EQ and IQ. While there is no silver bullet, in this session we will discuss how we can be proactively aware of the common pitfalls, and avoid being blindsided by them on our journey.

Neeraj Arora Global Head of Decision Science and Data Automation, Personal Insurance AIG


Predictive Analytics: Developing Service Recommendation Systems

Over the last 10 years, Chegg has evolved from a retail
company delivering low cost text book rentals, to a major brand in ed-tech.  Several of our business lines now provide
services in addition to rental services and static content. 
Expertise in predictive analytics around P2P educational
experiences, is something that we have had to develop to maintain product
differentiation while scaling.  We see
service recommendation systems as the next evolution of content recommendation
systems (think traditional search).  In
this talk, we will discuss some of our experiences and learnings.

William Ford Director, Data Science Chegg

Personalizing Guest Booking Experience at Airbnb

Airbnb is a global
platform that connects travelers and hosts from over 191 countries. In this
talk, we will present how we approach personalization of travelers’ booking
experience. We will start from the cold start problem when the data is limited.
We will then show how personalized features are used to accommodate wide
differences in our traveler & host attributes.  We will then discuss
how we deploy models in production with real-time features.

Kapil Gupta Data Science Lead Airbnb

Scaling LinkedIn’s Machine Learning & Data Pipelines with Workflow Engine Platform Azkaban

At LinkedIn, we have built a massively scalable open source
workflow engine platform (Azkaban) which handles and orchestrates almost all of
our offline data infrastructure. The jobs vary across a broad spectrum of
workloads, from simple metrics to deep learning and use different
infrastructure components such as Apache Hadoop  and Apache Spark
There are massive benefits in having one powerful workflow engine
to power all the flows. However, as companies scale and workloads differ from
machine learning to analytics, a simple workflow engine simply does not scale.
LinkedIn is solving this challenge by building "workflow engine
platform" - a highly pluggable and extensible open source workflow engine.
The centralized and extensible nature of the system allows for additional
leverages such as enforcing compliance, security, data lineage, monitoring and
Azkaban is fully open source and in the process of becoming an
Apache project. In this talk we cover how we built a very pluggable and
extensible system, rich API support, support for multiple authoring tools from
code driven to config driven, as well as integrations into wider tools and
development systems allowing flow developers to to primarily focus on
application logic.

Ameya Kanitkar Manager, Engineering LinkedIn