Skip to navigation Skip to main content

Day 1

Registration & Light Breakfast

Chairperson Overview

Pedro Alves CEO Ople

Customer Engagement in the 21st Century

Tim Wong Solutions Consultant Couchbase

Data Visualization, Fast and Slow

A data visualization workshop or a best-selling manual on data visualization offers practical techniques for quickly making data visualization. But visual communication, like all forms of communication, does not happen in just one mode. This talk will explore the traditional stereotype of data visualization as a report for busy executives and expand into analytical applications that demand time and investment. This affects how we design data visualization products, what tools we use to create them, the role of the data visualization creator in relation to their product and how this affects a vision of engaging with data visualization readers.

Elijah Meeks Senior Data Visualization Engineer Netflix

The Importance of Data Literacy

With the volume and velocity of data available in the world today, data is becoming the foundation for the new analytics economy.  Unfortunately, as data has grown at incredible speeds, there has followed a real and growing data literacy skills gap.  The inability to read, work with, analyze and argue with data can lead to major issues within organizations.  This session will focus on what exactly data literacy is and why it is a critical skill for organizations to be successful.

Michael Distler Director, Product Marketing Qlik

Improve Customer Experience through Multi-arm Bandit

 A Reinforcement Learning-based optimization In order to accelerate innovation and learning, the data science team at uber is looking to optimize Driver, Rider, Eater, Restaurant and Courier experience through reinforcement learning methods.  The team has implemented bandits methods of optimization which learn iteratively and rapidly from a continuous evaluation of related metric performance. Recently, we completed an AI-powered experiment using bandits techniques for content optimization to improve the customer engagement. The technique helped improve customer experience compared to any classic hypothesis testing methods. In this session, we will explain various use cases at Uber that this technique has proven its value and how bandits have helped optimize and improve customer experience and engagement at Uber. In probability theory, the multi-armed bandit problem is a problem in which a fixed limited set of resources must be allocated between several choices in a way that maximizes their expected gain (or minimizes regrets). In artificial intelligence, Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that address the exploration-exploitation dilemma in the multi-armed bandit problem. Jeremy Gu, Senior Data Scientist, Uber Anirban Deb, Data Science Lead, Uber

Jeremy Gu Senior Data Scientist Uber

Networking Break

ETL vs ELT for Big Data

Artyom Keydunov CEO Statsbot

Rocking the Big Data World

What is data science all about? How data is eating the retail world IoT: living a better life with data Malaria & Machine Learning Making meaningful predictions in real time

Mohammad Shokoohi-Yekta Senior Data Scientist and Adjunct Faculty Stanford University

Self-Service Analytics has Arrived – But for Who?

In their bid to digitally transform and compete in today’s economy, the average enterprise has increased spending on data & analytics technology to $14M to make data-driven decision a reality. Yet while vendors claim self service capabilities, adoption rates hover at an abysmal 22%. Why? Because according to Gartner “no analytics vendor is full addressing both enterprises' IT-driven requirements and business users' requirements for ease of use.” Business users don’t have the time or inclination to learn a complex, IT-approved analytics tool. This session will focus on why advancements in search & AI-driven analytics are driving the “Third Wave of Analytics,” eliminating the need for technical training while equipping every businessperson with the ability to analyze data quickly and efficiently. 

Sean Zinsmeister Head of Product Marketing ThoughtSpot


Understanding Product Images in E-commerce: Challenges and Lessons Learnt

Images are valuable components of any product catalog. It is crucial to understand the product images and to optimize the presentation of a product to the customer based on image content. This talk outlines the range of computer vision and machine learning based techniques that are generally used to enrich the product data and the user experience through understanding images better.

Abon Chaudhuri Sr. Applied Researcher Walmart Labs

Improving and Automating Sleep Study Interpretation

Sleep disorders impact over 100 million Americans, yet the current method for reviewing overnight sleep studies is cumbersome and outdated. Applying signal processing and machine learning techniques to sleep will both standardize the analytic process and uncover biomarkers beyond the traditional metrics. Expanding our understanding of sleep disorders will improve patient care and the diagnostic process.

Eileen Leary Senior Manager of Clinical Research Stanford University

Demystifying the Attribution Myth

In a Multi-Marketing Channel environment, Marketing Attribution, i.e. giving credit to a marketing channel for a transaction has always been at the epicenter for all ecommerce companies. Attribution determines how the marketing budget is allocated. In this session, you will learn the challenges faced in the current environment, the tradeoffs companies make and how Expedia is solving for this challenging but rewarding puzzle. You will leave with ideas on managing large data volume at scale, shifting from a batch process to a micro service architecture for building flexibility and resiliency into platform.

Santosh Iyer Product Manager Expedia

Networking Break

Reconciling Production Data (OLTP) with the Analytics Data Stack

Joining clickstream data (facts) with production data (dimensions) yields powerful analytics. Unfortunately, production data often has an architecture where many updates and deletes are performed in the relational database. Common ETL patterns reflect production updates and deletes into the analytics data stack. Because of how analytic databases store data, updates and deletes are very expensive operations that can degrade analytic database performance. This talk presents ETL patterns that circumvent this issue, without having to re-architect the production application. The premise is that updates and deletes should never be propagated to analytic databases. This results in tables having their own change history log that can be queried. A generic pure SQL technique for efficiently creating “latest snapshot” views will be presented that work in most all analytic databases, as well as a specific technique for Vertica using Top-K projections. The talk will also touch on Etsy's Kafka data pipeline and why these ETL patterns make data ingestion easier. 

Chris Bohn Senior Database Engineer Etsy

Govern and Manage Your Data Lake

Data lake becomes a beautiful concept through the past several years, big data technology today enables IT to process and store huge amount of data in the cloud for people to utilize, building data lake to just quickly ingest all the data and let others to self serve sounds a beautiful idea. But is it that easy and beautiful in reality? Here we will browse eBay's experience from the past several years on how to manage and purify the data lake enable the disciplined innovation through: Understand what you have in the lake How is the quality, what is wrong When to expect the data be available Where the data is coming from How the data is generated Who is using the data What business value the data is generating Production management policy etc.!

Alex Liang Director of Data Programs and Strategy eBay

Making Sense of Unstructured Data: From Traditional ML to Deep Learning

Structured data only accounts for about 20 percent of stored information. The rest is unstructured data – includes texts, blogs, documents, photos, videos, etc. In this presentation, I will talk about analytical methods and tools, to analyze unstructured data, that data scientists may use to gather and analyze information that doesn’t have a pre-defined model or structure. Traditional analytical processes are not adequate to fully understand unstructured data and as such, I want to dwell on some of the newer methods such as semantic analysis and natural language processing to analyze unstructured data. I will talk about the best practices that has worked for me in my quest to untangle unstructured data as well as do shallow dives into Recurrent Neural Networks (RNN) and Convolutional Neural Networks and how deep learning is helping at identifying patterns in unstructured data.

Nav Kesher Head of Marketplace Data Sciences Facebook

Applying a Decision Framework to Prescriptive Analytics: Avoiding Paralysis by Analysis

With over 6 million annual patient visits, Vituity has significant healthcare data and in a short period of time has built several real time prescriptive analytics applications.   The learnings along this journey from retrospective analytics to predictive and prescriptive tools are tremendous – what worked, what can be done differently, how does one start?   Join this illuminating discussion as we discuss the stages necessary to build prescriptive tools: Identify the clear business goals and how to measure their value Include what leaders should do, invest,  build, organize and align, in order to gain access to the next level of analytics maturity Define the return on investment

David Yue Senior Data Engineer Vituity

Cocktail Reception

Day 2

Registration & Light Breakfast

Chairperson Overview

Andy Mantis SVP Data Insights 1010data

Deep Learning for Predicting Customer Behaviour

Deep Learning has made remarkable progress in fields such as Computer Vision and Natural Language Processing.  It has excelled at problems where the data is largely unstructured and human performance is close to the upper bound.  In the domain of predicting customer behavior (e.g., customer lifetime value, player retention) we often have largely structured data and human performance is far below the upper bound.  This talk will detail a project comparing deep neural network models (using Keras and TensorFlow) and more “traditional” tree-based ensemble models (using scikit-learn) for predicting player behavior.  We will discuss cases where a deep neural network shines and other cases where simpler is better.

Dennis O'Brien Director, Data Science GSN Games

P’s of Data Science: Planning Collaborations to Create Products from Data

Our lives as well as any field of business and society are continuously transformed by our ability to collect meaningful data in a systematic fashion and turn that into value. The opportunities created by this change comes with challenges that not only push for new and innovative data management and analytical methods, but also translating these new methods to impactful applications and generating valuable products from data. In a multi-disciplinary data science team, focusing on collaboration and communication from the beginning of any activity improves the ability of the team to bring together the best of their knowledge in a variety of field including business, statistics, data management, programming, and computing is vital for impactful solutions. This talk will overview how focusing on some P’s in the planning phases of a data science activity and creating a measurable process that spans multiple perspectives and success metrics can lead to a more effective solution.

Ilkay Altintas Chief Data Science Officer San Diego Supercomputer Center

Joint Presentation: OpenTable Data Engineering

This session will discuss:•Data Eng Architecture•Data Pipelines•Data Lake•Spark Streaming•Real Time APIs•PrestoRahul Bhatia, Senior Data Engineer, OpenTable Raman Marya, Director, Data Engineering and Analytics, OpenTable

Raman Marya Director, Data Engineering & Analytics OpenTable

Networking Break

E-commerce Search using Big Data and AI

One of the main drivers behind the phenomenal growth of e-commerce is that it is able to offer much broader assortment of products compared to a brick and mortar store. The online catalog of a big retailer like Walmart, Amazon, eBay etc. typically contains hundreds of millions of products. "Product search" on an e-commerce website is the most important tool for the customers to find the right item from a large catalog. Product search, much like web search, always has been a classic problem to solve using Big Data and AI. In this talk, I'll highlight the key Big Data and AI technologies that is powering today's product search. I'll also discuss how the revolution in AI is possibly going to shape the future of product search and in turn the future of retail. Audience can expect to get a good understanding of why and how Big Data and AI play an extremely critical role in product search and why it will continue to remain a fascinating area of Big Data and AI innovation.

Somnath Banerjee Director of Machine Learning Walmart Labs

Panel Discussion: Big Data, Big Value

Driving holistic decision making with business analytics Promoting a proactive, innovative culture in leveraging big data to decision making processes Translating data into actionable consumer insights and better decision making Utilizing today's latest technologies to translate data into organizational value Methods in data science, predictive analytics, text analytics Aligning your organization's strategy and long term goals to your data analytics roadmap  Moderator:Andy Mantis, SVP Data Insights, 1010dataPanelists: Payel Chowdhury, Associate Director - Data Science, The Clorox Company Gary Griffin, Senior Vice President, Database Marketing, Bank of AmericaDeep Varma, Vice President, Data Engineering, Trulia

Panelists Cross-Industry Experts Big Data Innovation Summit

Mission Analytics: Common pitfalls and how to avoid them

Data is in fashion, and rightly so. However, many organizations struggle to “carry” it properly. The promise of data and data analytics is immense, but its actual implementation needs more than just data science PhDs and Hadoop clusters. It requires a mindset shift. What is the right mix of talent to make that happen? What kind of projects need to be undertaken and how to phase them? How to separate the hype of advanced techniques like machine learning from what will work for business in the now and here? Why is scaling important and how does it usually get undermined? As you already have realized while solving this for your organizations, the approach requires a mix of EQ and IQ. While there is no silver bullet, in this session we will discuss how we can be proactively aware of the common pitfalls, and avoid being blindsided by them on our journey.

Neeraj Arora Global Head of Decision Science and Data Automation, Personal Insurance AIG


Predictive Analytics: Developing Service Recommendation Systems

Over the last 10 years, Chegg has evolved from a retail company delivering low cost text book rentals, to a major brand in ed-tech.  Several of our business lines now provide services in addition to rental services and static content.  Expertise in predictive analytics around P2P educational experiences, is something that we have had to develop to maintain product differentiation while scaling.  We see service recommendation systems as the next evolution of content recommendation systems (think traditional search).  In this talk, we will discuss some of our experiences and learnings.

William Ford Director, Data Science Chegg

Personalizing Guest Booking Experience at Airbnb

Airbnb is a global platform that connects travelers and hosts from over 191 countries. In this talk, we will present how we approach personalization of travelers’ booking experience. We will start from the cold start problem when the data is limited. We will then show how personalized features are used to accommodate wide differences in our traveler & host attributes.  We will then discuss how we deploy models in production with real-time features.

Kapil Gupta Data Science Lead Airbnb

Scaling LinkedIn’s Machine Learning & Data Pipelines with Workflow Engine Platform Azkaban

At LinkedIn, we have built a massively scalable open source workflow engine platform (Azkaban) which handles and orchestrates almost all of our offline data infrastructure. The jobs vary across a broad spectrum of workloads, from simple metrics to deep learning and use different infrastructure components such as Apache Hadoop  and Apache Spark There are massive benefits in having one powerful workflow engine to power all the flows. However, as companies scale and workloads differ from machine learning to analytics, a simple workflow engine simply does not scale. LinkedIn is solving this challenge by building "workflow engine platform" - a highly pluggable and extensible open source workflow engine. The centralized and extensible nature of the system allows for additional leverages such as enforcing compliance, security, data lineage, monitoring and alerting. Azkaban is fully open source and in the process of becoming an Apache project. In this talk we cover how we built a very pluggable and extensible system, rich API support, support for multiple authoring tools from code driven to config driven, as well as integrations into wider tools and development systems allowing flow developers to to primarily focus on application logic.

Ameya Kanitkar Manager, Engineering LinkedIn