Unstructured Data Analytics

Ellick Chan

Exponent, Inc.

Analytical Approaches to Detecting Buried Objects in Cluttered Environments

Detecting and identifying buried objects using ground penetrating radar (GPR) in real-time can be a challenging task due to clutter, uneven ground surfaces, and electromagnetic noise. In this work, we discuss algorithmic approaches to overcoming some of these challenges by using a combination of techniques from machine learning, computer vision and signal processing.  In particular, we highlight new algorithms to accurately estimate the ground distance under the GPR antenna and radar calibration parameters using machine learning techniques, while under a real-time constraint.

We discuss some applications of this work to detect buried explosive hazards in remote environments, such as land mines and improvised explosive devices (IEDs).  Our work has achieved state-of-the-art performance such that our algorithms are able to detect targets that even highly trained human spotters can miss. The use of our software with the guidance of a human operator has significantly increased the accuracy of current GPR systems.

Bio

Dr. Chan has expertise in machine learning, computer vision, data analytics, anomaly detection, computer security/privacy and algorithms. At Exponent, Dr. Chan has been exploring experimental algorithms for detecting anomalies in GPR data using deep learning techniques. He has also written high-performance software to support fast-indexing of large document collections and worked on the analysis of large data sets. Prior to joining Exponent, Dr. Chan performed post-doctoral research at Stanford on methodologies to prevent de-anonymization of medical record data. For his Ph.D. work, Dr. Chan developed software and algorithms to address many facets of computer security and privacy. His work included development of computer forensic and recovery tools for analyzing live memory dumps of devices operating on critical infrastructures such as power grid monitoring equipment. He also identified security vulnerabilities in embedded microprocessor architectures and operating systems running on them.

http://www.exponent.com/ellick_chan

Back to schedule

John Irvine

Charles Stark Draper Laboratory, Inc.

Estimating Economic and Social Indicators from Imagery

Many policy and national security challenges require understanding the social, cultural, and economic characteristics of a country or region. However, such information is difficult to gather in remote, inaccessible, or denied areas. To address this problem, we combine processing of satellite imagery with advanced modeling techniques to develop methods for inferring measures of well-being, governance, and related socio-cultural factors. Applying these models to sequestered imagery provides new predictions, which we compare to survey data to quantify the performance. Using data from Afghanistan, across 76 survey-based indicators, prediction accuracy was approximately 70-90% on the test data. Extending these methods to sub-Saharan Africa provides an assessment of robustness for the methodology. This presentation will discuss the theoretical foundation for our work, the image processing methods, model development, performance results, and avenues for further research.

Bio

John M. Irvine is the Chief Scientist for Data Analytics at Draper Laboratory. He was the Principal Investigator (PI) for “Remote Sensing and Indicators of Well-being and Governance” under the Human Social Culture Behavior (HSCB) Program sponsored by the Office of Naval Research. Previously, Dr. Irvine was a PI for IARPA’s ACE Program, the DARPA HumanID Program, and multiple programs in image analysis and exploitation. He serves on planning committees for IEEE and SPIE, and has served on several advisory panels for the Departments of Defense and Energy. He has authored over a hundred journal and conference papers and holds a PhD in Mathematical Statistics from Yale University.

Back to schedule

Patrick Lucey

Disney Research

Quantifying Behaviors in Professional Sports using Spatiotemporal Data

At Disney Research, we are attempting to create content automatically from tracking data. Due to it’s dynamic, unstructured and noisy nature, the major bottleneck in dealing with tracking data is aligning it correctly (i.e., making sure the right comparisons are made) which can allow for large-scale analysis to be made. In this talk, I’ll show the importance of aligning tracking data and show examples on how it can be used to quantify behaviors using player tracking data from STATS SportsVU in basketball,  Prozone in soccer and Hawk-Eye in Tennis.

Bio

Dr Patrick Lucey is currently an Associate Research Scientist at Disney Research in Pittsburgh, where he conducts research into understanding and predicting group behavior using large amounts of unstructured spatiotemporal data. In his previous position, he was a post-doc at the Robotics Institute at CMU. Dr Lucey received his BEng(EE) from USQ and his PhD from QUT, Australia in 2003 and 2008 respectively. He has won best paper awards at INTERSPEECH (2007) and WACV (2014) international conferences. (website: www.patricklucey.com).

Back to schedule

Shawn Mankad

University of Maryland

More Than Just Words: Using Text Analytics to Transform Unstructured Data into Actionable Insights

Leading organizations are integrating growing volumes of unstructured data to create big data ecosystems for actionable insights. Underlying this effort are modern text analytic techniques that can quickly and effectively transform publicly available documents into important structured predictors that greatly enhance forecasting models of fundamental business outcomes. In this talk, I will discuss in detail several key techniques in text analytics, and discuss their application in two examples: (i) using online reviews to improve forecasts of future demand as well as the survival of restaurants and hotels within a major metropolitan area in the United States; (ii) predicting regulatory decisions at a key financial regulatory agency using publicly available text documents generated by the notice-and-comment (public commenting) process.

Bio

Shawn Mankad joined the University of Maryland’s Smith School of Business as an Assistant Professor in Fall 2013 after obtaining a PhD in Statistics from the University of Michigan. His research aims to use analytics, machine learning, and visualization for economic modeling with unstructured and complex structured data. His work on text analytics has been featured in media outlets, such as the Wall Street Journal and Chicago Tribune, and he has consulted for the U.S. Commodity Futures Trading Commission and worked at the Federal Reserve Board on characterizing market activity with visual analytic tools.

Back to schedule

Adam Rowell

Exponent, Inc.

Analytical Approaches to Detecting Buried Objects in Cluttered Environments

Detecting and identifying buried objects using ground penetrating radar (GPR) in real-time can be a challenging task due to clutter, uneven ground surfaces, and electromagnetic noise. In this work, we discuss algorithmic approaches to overcoming some of these challenges by using a combination of techniques from machine learning, computer vision and signal processing.  In particular, we highlight new algorithms to accurately estimate the ground distance under the GPR antenna and radar calibration parameters using machine learning techniques, while under a real-time constraint. We discuss some applications of this work to detect buried explosive hazards in remote environments, such as land mines and improvised explosive devices (IEDs).  Our work has achieved state-of-the-art performance such that our algorithms are able to detect targets that even highly trained human spotters can miss. The use of our software with the guidance of a human operator has significantly increased the accuracy of current GPR systems.

Bio

Dr. Rowell specializes in the development and analysis of high-performance signal and image processing, machine learning, radar, and software systems. At Exponent, Dr. Rowell works on algorithmic approaches to analyzing buried objects using Ground Penetrating Radar. He also has extensive experience consulting on data analytics projects, such as creating tools to automate the analysis of large corpuses of image, video, and audio data.

Prior to joining Exponent, Dr. Rowell completed his Ph.D. at Stanford University, where he focused on signal processing and analyzing the statistics of extremely rare events in digital systems. Dr. Rowell twice taught Stanford’s graduate course on Digital Signal Processing and lectured on adaptive signal processing, neural networks, and computer networks. In addition to his graduate work, Dr. Rowell worked at The MathWorks for the Signal and Image Processing Toolbox teams, where he designed and implemented numerous product features, including tools for generating and analyzing signal processing C/C++ code from MATLAB source code.

http://www.exponent.com/adam_rowell

Back to schedule

Shivakumar Vaithyanathan

IBM Research

SystemT: A System and Language for Natural Language Processing Algorithms

Modern enterprises are performing complex analyses on increasingly large data sets to drive business decisions. Tasks such as root cause analysis from system logs and social media analytics for lead generation, customer retention and digital marketing are rapidly gaining importance across multiple industry segments ranging from Media & Entertainment to Finance. A critical component of this processing is the analysis of unstructured data involving sophisticated analysis over free-flowing text. I will give a brief description of several such applications and use that to motivate the need for a dedicated system and language for natural language processing tasks. I will then describe SystemT a declarative system and language for NLP tasks and describe how the declarative nature of the language abstracts away the need for programmer optimization. I will briefly discuss speeds and feeds and describe one optimization in detail. I will then describe the motivation for SystemT’s choice of language syntax and describe our experiences with this choice and end with the open question of what is the appropriate user-interaction mechanism for such a language.

Bio

Shivakumar Vaithyanathan is an IBM Fellow and IBM Chief Scientist for Big Data Analytics.  He manages the Machine Learning Systems Group at the IBM Almaden Research Center. Since joining IBM in 1998, he has been involved in multiple research areas. His department is currently involved in building systems for scalable text analytics, enterprise search and large-scale machine learning. Multiple technologies developed in his department currently ship with several IBM products including IBM’s Big Data Products. Prior to IBM, Shivakumar was part of the newly formed Altavista Group at Digital. Shivakumar has co-authored more than 40 publications and was a invited keynote speaker at the 2011 German Database Conference and 2011 ACM SiGIR Industrial Track.

Back to schedule