Skip to content

Poster Sessions

Poster presentations were scheduled in 4 sessions on Monday, April 12 from 11am-12:15pm EDT. The poster presentations were the only event on the program during these times so that all conference participants could attend the session. 

Poster Session Winners

First Place

Understanding and Predicting Project Payment Latency

Theo Ginting, Erika Ergart, Rassul Yeshpayev, Sheng Yang Chou, Matthew A. Lanham from Purdue University

Second Place

Assessing Attitudes Toward Brands Across Languages

David DeFranza, Arul Mishra, Himanshu Mishra, University of Utah

Third Place

Feature Engineering for Sparse Demand Prediction

Hsiao Yu Hsu, Robyn Campbell, Stefanie Walsh, Zinnia Arshad, Matthew A. Lanham, Purdue University

Session 1

Modified Measures of Evaluating the Value of Money

Main Arif Adnan, Silvey Shamsi, Bowling Green State University, Bowling Green, IN

Abstract
All previously derived measures or methods of evaluating the value of money like Net Present Value, Pay Back Method, Discounted Pay Back Method, Internal Rate of Return, Modified Internal Rate of Return, etc behave differently. Moreover, if the units of money changes among themselves from project to project, then the comparability among themselves will be lost. Modified forms of each of the aforesaid measures have been suggested to raise their level of comparability.

Risk Analytics for Renewal of Purchase Orders

Shubhi Asthana, Pawan Chowdhary, Taiga Nakamura, Roberta J. Mac Fadden, IBM Research – Almaden, San Jose, CA

Abstract
The transactions of goods and services between large businesses are often driven by purchase order (PO). A PO is a document that which details the sale of products and services to be delivered. This is also true for the cloud service providers with added complexity of managing the dynamic billing for services, and invoicing against the purchase order. The demand driven usage of the cloud infrastructure and services can lead to unpredictable nature of cost which can lead to the invoices that may get into a dispute due to PO amount getting exhausted much ahead of the contract end date. One needs to keep tab on all such POs so that cash flow does not get impacted due to such dispute. In large enterprise with large number of POs, managing the billing against the PO becomes large undertaking along with increased cost. There is a need for a PO dispute management system that can proactively identify potential dispute in timely manner and provide guided service to the workers managing such PO. In the current state of the art, there have been a few solutions targeted towards a cognitive approach for handling the PO data. These solutions include building a data visualization dashboard for purchase order flow with various alert systems. Another solution involves looking at high-value POs or POs with enterprise customers. However, there are no significant machine learning models that are transforming this process by strategizing the billing process based on client usage. The key here is to utilize the features of the PO along with customer data, market demands and historical usage to identify the high-risk POs that require urgent attention and would have successful renewal of POs. In this poster presentation, we provide an approach of PO analysis, by identifying the ones that run a risk of getting into a dispute and recommending those that have a higher chance of getting renewed with larger impact to the cash flow. In our approach, we first formulate the risk analytics model on future billing cycles and status of PO. Next, we use the time series forecasting model to predict the future usage of services by the customer. Lastly, we built a recommendation system that calibrates PO duration and their renewal, so as to maximize the cash flow impact. Additionally, we provide the details of implementation and results of our approach illustrating its efficiency. We run our approach on a real-world services dataset from a global IT service provider application with thousands of POs, and more than a million invoice records. We cross-validated the performance of our model with actual invoices and HitL and evaluated risk factor. Our results are encouraging and promising. In this poster presentation, the listener would understand the novel approach of building a risk analytics model, along with a recommender engine. The objective is for this listener to gain knowledge on how to recommend renewal of POs with high dimensionality data, influenced by various internal and external factors.

Understanding and Predicting Project Payment Latency

Theo Ginting, Erika Ergart, Rassul Yeshpayev, Sheng Yang Chou, Matthew A. Lanham, Purdue University, West Lafayette, IN

Abstract
This study develops an order-to-cash process map predictive solution to better understand construction project payment latency pre- and post-covid-19. In construction projects, the day a project is sold (and past rescission) to the time payment is received or installment is first accepted is defined as “order-to-cash.” This time window often has many sequential or overlapping tasks that must be performed before the company receives its payment. The motivation for our study is that while order-to-cash is often a challenge to predict and minimize prior to covid-19, it has been even more challenging for businesses to estimate since the pandemic. The risks of delayed processes and delay customer payments can hurt the company’s solvency and financial stability. In collaboration with a national construction company, we develop an order-to-cash process map and redesign their predictive modeling approach to show where the most uncertainty is coming from and provide empirical-based operational recommendations showing how they could reduce order-to-cash not only prior to the pandemic but also during. Our solution was able to improve predictive accuracy during all time periods in our study. We believe practitioners and scholars alike focused on pre-and post-pandemic forecasting, particularly related to accounts receivable or queuing-based problems would find our work valuable.

Session 2

Predicting Cannibalization Rate on the Jindong Platform for Spend Optimization

Kai-Wei Yeh, Hsin Yu Pan, Xuan-Mai Nguyen, Niuying Cao, Matthew A. Lanham, Purdue University, West Lafayette, IN

Abstract
This study helps estimate and predict paid, and organic marketing spends on JD.com, one of China’s leading e-commerce platforms. With our model, sellers could understand their advertising investments better on this platform, such as in-site banner ads, cross-platforms ads, advanced cross-platforms ads, and search ads. We show how these different ads yield common KPIs such as click-through-rate, conversion rate, and conversion values. Lastly, we formulate our predictive model into a simple optimization model that provides the seller marketing spend recommendations in the future to minimize organic purchase cannibalization.

The Price of Bureaucracy: Modeling the Matching of Affordable Housing Programs

Neha R. Gupta, Duke University, Durham, NC 

Abstract
Approximately 4.5 million American households receive some form of low-income housing assistance, which provides relief by ensuring federally-imposed living standards, and helping households save money. In most areas, eligible low-income households have one of two options. One is public housing, which would place them in a unit that is part of a complex maintained and owned by the government. The second is a housing voucher, which would provide a household with payments toward rent of a privately owned complex. Due to a severe supply limitation, local public housing authorities, which distribute vouchers and public housing units, maintain waitlists for eligible households. The majority of housing authorities do not incorporate household preferences, and if a household refuses a housing option that becomes available, the household is removed from the waitlist. Households have heterogeneous preferences on housing options. They also can fail to find a house using a voucher. I discuss the importance of a joint mechanism involving these two government programs, and discuss issues in public housing allocation research that neglects voucher programs. I define desirable properties of a joint mechanism, and propose three possible algorithms. Simulating three mechanisms, I illustrate the impact that relative preference for vouchers over public housing has on the utility of agents. I also show a shifting cost of searching for a voucher-accepting rental unit, and the resulting impact on utility of agents. These simulations are policy-relevant, as jurisdictions consider imposing supply shocks to the amount available public housing complexes, or regulations protecting voucher-holders against discrimination when searching for rentals. This problem is distinct from the “dorm allocation” problem studied in the mechanism design literature because of three key features: excess demand, the stochastic arrival of the goods that need to be allocated, and a conceptually different type of housing option.

Using the Generalized Random Forest Method to Assess Heterogeneous Effects of Airport Hubs on Flight Delays

Youngran Choi, Embry-Riddle Aeronautical University, Daytona Beach, FL

Abstract
This study helps estimate and predict paid, and organic marketing spends on JD.com, one of China’s leading e-commerce platforms. With our model, sellers could understand their advertising investments better on this platform, such as in-site banner ads, cross-platforms ads, advanced cross-platforms ads, and search ads. We show how these different ads yield common KPIs such as click-through-rate, conversion rate, and conversion values. Lastly, we formulate our predictive model into a simple optimization model that provides the seller marketing spend recommendations in the future to minimize organic purchase cannibalization.

Session 3

Visualizing Demand Forecasts for More Effective Decision-Maker Decision-Facilitator Communication

Craig Mc Iver, Dawson McMahon, Kristian Komlenic, Oladimeji Adekoya, Matthew A. Lanham, Purdue University, West Lafayette, IN

Abstract
This case study provides a way to visualize and explain product forecasts to merchant decision-makers. Often in data science and merchandising decision-support a team of technical team members (e.g., data scientists, analytics consultants, etc.) develop predictive models and provide analytics to help facilitate the decision-maker in performing their job. Analyses are often supported with statistical and business performance metrics to help level-set the decision-maker on how reliable the analytics they are receiving is. The motivation of our study lies in improving the communication amongst decision-facilitator and decision-maker for a common analytical deliverable: product demand forecasts. Rather than focus on common statistical metrics that might not resonate with the decision-maker, we design and develop a visual tool in Tableau that allows a bi-directional engagement amongst these stakeholders to show how the forecast for any product was obtained, how it relates to similar products and in similar locations, as well as other less obvious ways to group these forecasts and compare to what has occurred in the past. We found this provides the ability to have a richer conversation among team members that could lead to a better product forecasting process for certain products where they do not predict well, or to improve the confidence in the analytics the decision-maker(s) are receiving. We show how our tool helps explain the demand forecasting process to merchants, provides effective visuals to see how the forecast was derived, as well as visuals to support if the forecast makes sense compared to common decision-maker queries sent to decision-facilitators. Where forecasts do not make business sense and or lack statistical predictive performance, our tool offers a post-model correction forecast functionality that is simple to use and allows the decision-maker to be more engaged in the analytical process.

Assessing Attitudes Toward Brands Across Language

David DeFranza, Arul Mishra, Himanshu Mishra, University of Utah, Salt Lake City, UT,

Abstract
As consumer communications have moved online, social listening has become an increasingly important tactic for marketing and brand managers. However, extracting attitudes and sentiment from unstructured social media text remains a challenge. This problem is compounded for global brands who must monitor sentiment across many languages. In this work, we introduce a flexible, robust method for measuring attitudes in unstructured text amenable to multilingual applications. We illustrate the utility of this method by measuring prejudicial sentiment associated with global brand names that receive a feminine versus masculine gender class, demonstrating the act of translation itself may influence subsequent consumer attitudes.

Error and Optimism Bias Regularization in Machine Learning Models

Nassim Sohaee, University of Texas at Dallas, Richardson, TX

Abstract
In Machine Learning, the quality of prediction is usually measured using different techniques and evaluation methods. In the regression models, the goal is to minimize the error. However, in many applications, just minimizing the model error is not enough. The model should have a more systematic control on minimizing the error and controlling a specific type of error, like overcasting and undercasting. This paper will introduce a simple regularization term to manage the number of overcast (undercast) instances in a regression model.

Session 4

A Life Insurance Policy Bundling Recommendation System

Mengwei Li, Shashi Pingolia, Tianyi Yang, Matthew A Lanham, Purdue University, West Lafayette, IN

Abstract
We have researched and developed a life insurance bundling recommendation system that identifies among current home or auto insurance policy holders who is most likely to add a life insurance policy product to their existing plans as well as when to optimally recommended a life product to the customer.The motivation for this study is that insurance product bundling is a common practice in this industry. However, the implementation process of matching customers to the right products is not widely known and likely could be improved using analytical frameworks found in other domains. Generally, the life insurance business does not have integrated predictive analytics that can recommend and price policies in the same way as other insurance areas. For example, the property and casualty industry often utilize a combination of generalized linear models, credibility techniques, and credit scoring models as part of its modeling techniques for driving business decisions (Abrokwah, 2016). However, we posit that an empirically validated methodological design for the cross-product bundling recommendation process in the insurance industry is an area that necessitates deeper analytical investigation.In collaboration with a major insurance company, we develop and deploy a recommendation engine that uses current policy holder information as features into an ensemble of predictive models to identify when to offer a life policy (single premium, term, or whole life) bundle recommendation that is mostly likely to be purchased. Our solution has provided the insurance company a more efficient, analytically driven, and scalable approach to sell additional products that their customers really want and increase their business revenue. We believe our methodology connects the recommendation system literature to the insurance industry and can be easily adapted by practitioners in this field.

Feature Engineering for Sparse Demand Prediction

Hsiao Yu Hsu, Robyn Campbell, Stefanie Walsh, Zinnia Arshad, Matthew A. Lanham, Purdue University, West Lafayette, IN

Abstract
This study provides feature engineering recommendations for predictive modelers, data scientists, and analytics practitioners on how to improve demand forecasts for sparsely demanded specialized products based on collaborative experiments with a national auto parts retailer. Any seasoned modeler knows that predictive modeling is a process and there are many possibilities on how one might clean, pre-process, and format their data prior to training a model. Additional complexity often arises based on the problem characteristics (e.g., temporal response, intermittent demand, sparse demand, etc.) that can make identifying the signal from the noise even more challenging. In the field there is often some discussion of “the art with the science” on best practices to perform to achieve model accuracy good enough to support major business decisions. While there are many suggestions on methodologies for problem types, and general feature engineering ideas, there is no large-scale study to date that provides an in-depth empirical investigation of feature engineering approaches and their associated predictive gains when trying to predict sparse demand – which is one of the most challenging prediction problem classes one can encounter in practice. In collaboration with a large national auto parts retailer, we develop predictive models to predict demand for 47k+ products where 26k of them have less than five units sold in a year. Problems such as these are common in medicines, specialty products, and auto and military spares. What is novel about our study is that we run thousands of feature engineering experiments to identify where we see cross-validated predictive gains for a set of common predictive modeling algorithms. For example, various categorical encoding schemes (one-hot, frequency, label, hash, and target encodings), various scaling/transformation techniques, outlier handling for numeric data types, as well as variable fusion strategies such as interactions, powers, and ratios. This work is unique as much of the literature focuses on predicting product demand with larger quantities (non-sparse demand), supervised learning methods, or general feature engineering ideas. We show how to implement a similar large-scale feature engineering study, provide empirical insights of where we achieved noticeable gains, and why what we realized with our data could likely work with your sparse demand problem.

Registration and Awards

Each track will be judged separately, with awards of 1st, 2nd, and 3rd.  Winners will be announced and notified before the conference is over based on (1) novelty of application, (2) results (or potential results) from implementation, and (3) presentation of work.

  • INFORMS Data Mining section is sponsoring an award of $500 for the top student poster. Topics need not concentrate on data mining to be eligible. Scoring will be based on content, layout, and presentation. Certificates will be awarded for 2nd and 3rd place student posters. The top 3 winners will also receive a free voucher for a certification from SAS. The judging panel is posted below. 
  •  2nd and 3rd place winners in both tracks will receive a certificate of recognition.

Poster Judges

Lynn Letukas, PhD
Director, Global Academic Programs & Certifications
SAS
https://www.linkedin.com/in/lynn-letukas/

Daniel Steeneck, PhD
Co-Founder & Chief Scientist
Akuret Solutions
https://www.linkedin.com/in/daniel-steeneck-0b5b9755/

Shaun Doheney, PMP, CAP
Sr Data and Analytics Strategy Advisor
Amazon Web Services
https://www.linkedin.com/in/shaundoheney/

John Colias, PhD
Affiliate Assistant Professor
Director of Business Analytics Program
University of Dallas
https://www.linkedin.com/in/john-colias-62619b2/

Norm Reitter
Chief Analytics Officer & Sr VP of Analytics Operations
CANA Advisors
https://www.linkedin.com/in/norm-reitter/