Predictive Analytics


Kaveh Bastani

Data Scientist
Recovery Decision Science

Credit and Profit Scoring in Peer-to-peer (p2p) Lending: Techniques, Challenges, and Proposed Solutions

Traditional peer-to-peer (P2P) lending decision support systems tend to focus on credit scoring. Credit scoring is formulated as a classification problem where the response variable is a binary variable assigning “0” to failed loans and “1” to non-failed loans. Various machine learning techniques can be used to solve the above problem by predicting the borrower’s probability of default. However, the loan default probability cannot assess the profit that the loan is likely to yield. Hence, there is a need for a scoring approach focusing on a profitability index as its output. This type of analysis is known as profit scoring, and aims to identify the most profitable borrowers. Recent studies have explored the utility of profit scoring over credit scoring in P2P lending decision support systems. This talk provides: (1) A review of the existing credit and profit scoring techniques (2) A detailed discussion on the challenges of these systems. (3) A novel profit scoring technique is developed to overcome the challenges with the existing systems. The proposed technique is a two-stage model based on machine learning algorithms. (4) A case study using real data from the Lending Club (one of the largest U.S. P2P lending platforms).


Kaveh Bastani is a data scientist at Recovery Decision Science, OH. He received his PhD in Industrial and Systems Engineering from Virginia Tech in 2016. His current research interests are predictive modeling, risk analysis, and text mining with applications to financial services. His previous research works included sparse learning, and Bayesian modeling with applications to real-time monitoring and fault diagnosis in manufacturing and service systems. His research have appeared in high-quality journals including IIE-Transactions, Decision Support Systems, IEEE Transactions on Human-Machine Systems, and IEEE Transactions on Automation Science and Engineering. He has been invited for presenting his research at international conferences such as INFORMS Annual Meetings (2011-2016), and Industrial and Systems Engineering Research Conference (ISERC) (2015, 2017). In recent years, he has given a number of seminars on his research findings at different universities such as University of Cincinnati, Purdue University, University of Washington (Seattle), Arizona State University, and University of Arkansas.


Cara Curtland

Strategist, Strategic Planning & Modeling (SPaM)
Hewlett Packard, Inc.

HP’s Instant Ink Delivers One-of-a-kind Internet of Things Business Model

HP’s Instant Ink program is a subscription service that provides automatic home delivery of replacement cartridges before the customer runs out of ink, saving customers up to 50% versus traditional ink purchases. Delivery of this new business model required collaboration and invention across the entire ecosystem, including the printer & ink cartridge hardware, supply chain, sales & marketing, finance & billing, and the supporting software & firmware.
This talk is primarily intended for business practitioners interested in a real-world case study of the challenges to be addressed and the holistic changes required to deploy analytics into successful IoT business models, including how OR professionals can steer the entire solution.


Cara is a Strategist in the Strategic Planning & Modeling (SPaM) team at HP Inc., tasked with evaluating supply chain investment and improvement opportunities, generating strategic business recommendations and driving senior executive alignment. She provides technical and thought leadership to a global team of analytical business consultants and partners with senior executives to set strategic direction. Cara is a 20-year HP veteran with experience in manufacturing, R&D, planning, inventory & working capital optimization, network design, and complexity management. She has championed supply chain innovation across all HP product categories. Cara earned B.S. and M.S. degrees in Industrial Engineering from Purdue University.


Erik Jensen

Analytics Consultant
Pricing Solutions

Innovation in Forecasting the Impact of Price Changes Across a Large Portfolio of Products with Application to the Food Service Industry

In this session, Erik will share an innovative approach to the problem of modeling the effect of price changes on a large portfolio of products which includes both complements and substitutes. The approach sidesteps the data difficulties inherent in such large-scale projects through leveraging the most efficient statistical techniques. This is joint work with Dr. Frederic Puech.
The method consists of three stages:
1. First, a prediction model for each product is developed using only information for that product. In the scenario for which this method was designed, this type of simple model was necessary because of massive multicollinearity in the data. The model that was used originally was a combination of time series and regression methods, but different situations may call for different models at this stage.
2. The second stage is an analysis of the substitution patterns between products. This can take the form of a nesting structure which may be obtained by several different methods. The direct effect of a price change on a product will affect the sales of substitute products via the nesting structure.
3. The third stage involves finding the conditional probabilities of purchase between complementary products. This was accomplished using standard data mining techniques. The change in volume of the sales of products from stage two affects the sales of the complementary products via the conditional probabilities.
These three stages form a framework for building a predictive model in any situation that involves a large portfolio of products with possibly complex interactions. This approach can handle data difficulties like multicollinearity that would preclude the use of multinomial logit or econometric models.
Erik will demonstrate an implementation of the model, developed for a recent project, which uncovered several million dollars in incremental revenue for a large US company in the foodservice industry. Erik will also show how the approach can be applied to other industries.


Dr. Erik Jensen is an Analytics Consultant at Pricing Solutions. He develops models of customer behaviour and leverages advanced statistics to simulate and optimize the effects of price changes. A former university professor, he applies analytics to deliver clear and effective business solutions for clients. Erik holds a PhD in Mathematics and Statistics and a Masters of Management Analytics from Queen’s University, where he studied with some of the leading experts in the field of pricing. He is a member of INFORMS and the Professional Pricing Society.


Ivan Mura

Director, Data Science, Cadent
Cross MediaWorks

A Multidisciplinary Analytics Effort to Support Public Health Policies Against Cervical Cancer Epidemics in Colombia

Cervical cancer (CC) is the fourth death cause by cancer in women worldwide. Developing countries bear the highest load in terms of incidence, prevalence and death rate, and Colombia is not an exception. According to the Colombian Ministry of Health, CC is the second death cause by cancer among women, and the first one for women between 15 and 44 years of age. The standardized death rate (deaths per 100,000 women) is around 8, over 3 times that of the USA. Encouragingly, the etiology of CC is much better understood than that of other cancer types. CC has an assigned cause: 99% of cases are due to the cell malignancies induced by sexually transmitted Human Papilloma Virus (HPV) infections. Moreover, when treated in early stages, CC is easily eradicated. These facts open leeway for contrast actions, such as HPV vaccination, screening tests and educational campaigns. Quantifying the cost-effectiveness of interventions is of paramount importance to define public health policies, particularly in a developing country such as Colombia, with limited resource availability. Though, the effects of CC contrast actions are only visible in the long-term. Therefore, evaluating interventions requires an ability to generate reliable predictions about the dynamics of the Colombian population´s (48.2 million people, 2016 estimate) health state. Also, it entails challenging Analytics efforts, where diverse data sources (biological, clinical, socio-economical) and multiple modeling views (demographics, epidemics, social, public health) need to be integrated for a proper quantification of interventions impacts. This paper describes an ambitious multi-disciplinary and on-going project. A team of Colombian researchers with expertise in medicine, engineering and public health from both academia and public research institutions are joining efforts with the objective of designing and implementing a data-driven computational tool to assist decision-makers in shaping national public health policies. To this aim, we are employing and integrating into a coherent framework multiple Analytics techniques. We are developing a descriptive, predictive and prescriptive solution that, for a given set of resource constraints, determines the time-dependent deployment of CC contrast interventions that turns up in the best predicted results for the Colombian population. This research endeavor brings in numerous challenges. It requires acquiring and modeling data to characterize population dynamics, HPV infection epidemics, the progression from pre-cancerous to the advanced stages of CC lesions, patient preferences and the specificity and sensitivity of screening examinations. Furthermore, it is necessary to consider the expected costs and savings for the health system and the citizens, keeping in mind the social costs of the disease and the uncertainties brought in by the post conflict rearrangements in the country. Valid predictive models must be in place to properly estimate the number of individuals in different age ranges for the growing Colombian population, to project the HPV infection spread among women of different ages, regions and socio-economic strata, and to calculate the expected incidence, prevalence and death rate due to CC over long-term horizons. Finally, complex optimization problems must be solved to complete the prescriptive step and provide the decision-makers with effective interventions roadmaps.


Dr. Ivan Mura got an Italian Laurea university degree (equivalent to B.Sc+M.Sc.) with honors in Computer Science in 1994 and a Ph.D. degree in Electronics, Informatics and Telecommunications Engineering in 1999, both from the University of Pisa, Italy. After completing his Ph.D. studies, he was for a short period of time with the Italian National Research Council as a Junior Researcher, and then he joined the Motorola R&D Center located in Turin, Italy, where he led for 5 years the Modeling & Simulation team. From 2000 to 2005 he was leading the involvement of Motorola Italy in the R&D projects CAUTION, CAUTION++ and DEGAS of the IV and V Framework Programme of the European Commission. While at Motorola, he acted as a Project Manager for several software development projects, and continued his formal education obtaining in 2005 a M.Sc. degree on Information Technology Project Management from the Business School of George Washington University. He was a Motorola internal trainer in Software Project Management and CMMI topics from 2000 to 2006. In 2007, he joined COSBI, a joint-venture research center on computational and systems biology located in Rovereto, Italy, as a senior researcher. At COSBI, he applied modeling and simulation approaches to provide answers to questions related to the functional organization and the emergent properties of complex biological systems, as well as to elucidate the etiology of diseases and the possibilities of controlling biological networks. He joined Universidad EAN in Bogotá, Colombia, in 2012, as a full professor, teaching Information Technology Project Management, Modeling and Simulation, Technology and Knowledge Management, Quantitative Research Methods, System Thinking in undergraduate and graduate programs. He continued his research work on systems modeling and computational biology topics, strengthening and expanding a network of collaborations that include members of renowned institutions (King’s College, London, Duke University, BBSRC Institute for Food Research, University of Luxembourg). In January of 2016 he joined the Department of Industrial Engineering at Universidad de los Andes, as a Visiting Professor. He is currently the Director of the M.S. program of Analytics at Universidad de los Andes. His teaching commitments include the undergraduate level course Probabilistic Modeling, and the graduate level courses Stochastic Processes, Advanced Simulation Techniques and Software Tools for Data Analysis, Thesis Seminar I and Applications Seminars II, the three latter ones taught in the M.Sc. Analytics program. He is a member of the COPA research group and has been establishing a consistent network of collaborations, creating focus areas where teams are joining efforts around research projects such as decision-making policies for cervical cancer, data analytics of air contamination in Bogotá, optimization of chemical production plant layouts, sustainable development. He is a member of the INFORMS, IISE and IEEE societies.


Michael Zargham

Director, Data Science, Cadent
Cross MediaWorks

Reinventing Cadent’s Unwired Network: Experience in Research, Development, Deployment and Measurement of Empirical Business Software

The profitability of Cadent Network is dictated by its ability to estimate the availability and impression value of the local cable inventory. Cadent predicts what the available impressions 2 year out over 70 different cable networks, 200 MVPD partners with over 2600 local order insertion points. This talk will follow the challenges overcome by Cadent Network in its efforts to modernize its advertising inventory forecasts and manage the presale of cable advertising inventory whose true supply is uncertain and volatile. We present a case study detailing the following aspects of our data-driven transformation: 1) Our empirical research & development methodology, 2) System Identification and Solution Architecture, 3) Our team building & tool selection process, 4) Application specific machine learning algorithm, 5) Direct integration of advanced analytics with business software, 6) Predictive model accountability visualization tools, 7) Top-level business Impact of transition to empirical business practices.


Dr. Michael Zargham is the Director of Data Science at Cadent, a leading provider of media, advertising technology and data solutions for the pay-TV industry. Michael received his PhD in Optimization and Decision Science from the University of Pennsylvania with a focus on constrained resource allocation
problems. Michael currently leads the Data Science and Engineering initiatives
for Cadent’s media services and technology divisions. When Michael isn’t
innovating Cadent’s data solutions, he can be found in the classroom teaching
Convex Optimization at UPenn. Michael has been a practicing data driven
business architect since 2005; working on various subcontracts during his
undergraduate and graduate work.