Big Data 101
Getting Started with Hadoop and Big Data
Brian Keller, Data Scientist, Booz Allen Hamilton
Getting started with big data analytics can be a daunting task because of the complexities of the technologies and the fast evolution of the software ecosystem. This talk will expose analytics professionals to established open source big data technologies and provide them with practical methods to prototype and develop big data analytics without requiring deep knowledge of software development or distributed computing. Come learn the secrets that hard core software developers don't want you to know! Keller will cover:
• An overview of Hadoop and Map Reduce,
• Alternatives to writing Java code to develop analytics on Hadoop,
• The big data analytic development lifecycle; how to go from data to actions,
• A worked example using the principals discussed,
• How you can get started with big data technologies today.
An Introduction to NOSQL Databases and Their Analytic Uses
Paul Brown, CEO and Founder, Koverse, Inc.
NOSQL technology has the potential to provide powerful data storage and analytic capabilities, but the space is just emerging, crowded with many approaches and difficult to navigate. This talk will expose analytics professionals to some of the core principles that make NOSQL unique and dive into the specifics of several of the more established efforts. Included will be sample use cases and lessons learned. Specifically this talk will cover:
• An overview of NOSQL principals and history
• An overview of core analytic use cases for NOSQL
• An overview of leading NOSQL solutions and analytic use cases
• Ideas and tools for getting started with NOSQL technology
Python, R, and SQL in MPP Databases
Anton J. Mobley, Leading Healthcare Organization
Python, R, and SQL are some of a data scientist’s favorite tools for exploring data. Massively Parallel Processing (MPP) databases provide one way to leverage these tools at scale. This talk introduces MPP databases and shows how to combine them with analytic tools to start developing models.
• A brief introduction to how MPP databases work from the data scientist’s perspective
• Which type of data science problems MPP databases are well suited to solve and why they are used even when HDFS is likely cheaper?
• How to use Python and R natively in MPP databases
• Sample problems that leverage Python, R, and SQL in MPP
What is Data Science, and Is A “Data Scientist” a Myth?
Steve Mills, Senior Associate, Booz Allen Hamilton
Data Scientist was named the sexiest job of the 21st Century. Given the rapid increase in the term “Data Science” over the last year, many organizations are left with figuring out who a data scientist is and what they do; leading many to think that it is a “purple unicorn”. Many organizations believe they can simply turn an existing analytics team into a “data science team” or that they can hire a “data scientist”. In addition, many organizations are evaluating Big Data systems and are challenged with providing value from these deployments.
Through client examples, this talk will discuss the terminology around OR and data science; discuss the next generation of data science technology, Big Data, and subsequent skills; showcase data science organizational design, deployment, and tips for building the team; and present case studies on how businesses and governments used data science to gain competitive advantage and to better serve citizens.
Click on the tracks below for a preliminary list of speakers.