Poster presentations will be scheduled in two sessions held after lunch on Monday and Tuesday. The poster presentations will be the only event on the program during these times so that all conference participants can attend the session.
Poster submission has closed.
New This Year
The competitive poster competition will feature two separate tracks: (1) Practitioners, and (2) Students.
- Optimize the London Bike Sharing System through Advanced Analytics
Filippo Focacci, Decision Brain, Paris, France
Based on machine learning and mathematical optimization models, DecisionBrain’s applications are used every day to optimize the workforce of market leaders in the Services industry such as ISS, JLL and Serco. In this session, we will discuss how we optimize operations for the London Cycle Hire (LCHS), one of the largest bikes sharing systems in the world and their mobile workforce.
- Assessing Risk of Heart Attack in Diabetes Patients: A Linear Programming Based Classification Approach
Mira Shukla, Rema Padman, Carnegie Mellon University, Pittsburgh, PA
More than 400 million adults are living with diabetes around the world. Heart disease that leads to heart attack or stroke is the leading cause of death in this population. Specifically, they are two to four times more likely to die from heart disease than those without diabetes. This study formulates the assessment of heart attack risk for diabetic patients as a non-parametric, computationally efficient Linear Programming (LP) model, minimizing the sum of deviations from a decision boundary to classify patients into high and low risk groups. We evaluate this model using a labeled dataset of 588 diabetic patients with Archimedes model-based heart attack risk predictions. Preliminary results indicate that the LP model is comparable to traditional statistical models such as Logistic Regression and Linear Discriminant Analysis, and shows promise in providing scalable and generalizable clinical decision support for early detection and efficient management of patients with chronic conditions.
- Aircraft Utilization Comparison and Prediction Between Legacy Carriers and Low-Cost Carriers
Daniel X. Zhou, Iowa State University, Ames, IA
After the latest mechanical malfunction accidents involving Allegiant and Southwest Airlines, a special interest was taken to investigate whether low-cost carriers (LCC) are taking an overly aggressive stance in regards to the utilization of aircraft within their respective fleets. Based on summary reports obtained from incident logs generated by the FAA (Federal Aviation Administration) and NTSB (National Transportation Safety Board), it was observed that Allegiant Airlines was almost three and a half times as likely to encounter a mid-air breakdown as legacy carriers are. On the economic front, the fallout that Southwest Airlines has faced from the Flight 1380 incident after an engine fan blade sheared may very likely have been a potential factor that led to an immediate decline in ticket reservations. From a cost savings perspective, figures from a forecast analysis conducted by ICF International in 2015 predict a 40 percent increase in total fleet size across all airlines combined in the world between the years 2015 and 2025. With a global fleet size approaching 40,000 aircraft by the year 2025, the use of historical utilization data could play a key factor towards profit maximization in strategic forecasting for airline maintenance and fleet planning through the study and implementation of past trends; historical data could assist airlines with making more informed decisions on fleet planning and maintenance scheduling, by taking into consideration past patterns as well as seasonality effects in the planning process. This study will look at airplane utilization of legacy carriers and LCC’s in their short-haul and medium-haul operations, and draw comparisons between each respective airline, focusing on aircraft types which are dedicated towards flying such routes. The objective is to examine whether LCC’s are utilizing their fleets too aggressively, and to find common patterns and seasonality across all airlines that can be used as a general guideline towards maintenance scheduling and fleet planning; one that incorporates safe flying practices while maximizing profits simultaneously. In addition, this study will utilize machine learning algorithms in an effort to explain the utilization patterns observed in historical data. The aim is to test the feasibility and effectiveness of using machine learning algorithms as an accurate forecasting tool in establishing airplane utilization models. Current research in the airline industry address problems and situations ranging from forecasting future fleet demands, to forecasting passenger load factors, but with less emphasis placed on problems that could benefit from an accurate utilization model; such problems could be used in applications such as fleet planning and maintenance scheduling. This study aims to fill the void, and analyze airplane utilization via the use of historical airline data and various machine learning algorithms.
- Supply Chain Demand Clustering and Forecasting for Consumer Goods
Michael Prokle, Fortune Brands GPG, Boston, MA
Cindy Hulse, Robert Swidarski, Marie Gaudard, David Mintz
Accurately predicting demand across different time horizons for a product portfolio consisting of thousands of stock keeping units across multiple sales channels worldwide is inherently difficult. Demand patterns vary greatly over the product portfolio, sales channels, and life stage showing lumpy, intermittent and seasonal demand pattern all influenced by key factors like regional consumer behavior, sales targets, promotions, economic indicators etc. Forecasting methodologies vary over the operational, tactical and strategical forecast time horizon and require the integration of various available data sources across an organization and its supply chain. We take the example of a manufacturer in consumer goods that sells thousands of stock keeping units on the US market through three major sales channels (wholesale, retail and e-commerce). Aiming towards a demand-driven value network, we highlight the practical complexity and various relevant research steps necessary to model the problem. We show preliminary results that highlight the reduction in dimensionality by clustering demand patterns based on global features extracted from the demand time series and describe meaningful cluster insights. We highlight approaches to determine the best feature set and test various forecasting methodologies for the resulting clusters against current best practice. We end with future research steps and practical implications.
- A New Model for Supply Chain Finance
Grace Lin, Asia University, Wufeng, Taiwan
The prosperity of SMEs is key to the overall economic development. In this new era of FinTech economy, serving the financial needs of small and medium companies become increasingly important, while information regarding SMB is very limited. It’s been extremely challenging for SMB to get financial support due to the lack of credit rating information and mechanism. In this research, we leverage Supply Chain network information as well as advanced NLP, multisource learning and stochastic network modeling, analysis and optimization to better asses SMB’s crediting rating and risks and to provide financial services either through loan, P2P or crowd-sourcing as needed. We will share a few industry scenarios and our experience in working with a few major companies in supporting their suppliers’ financial needs.
- A Predictive Analytics Approach to Understand, Predict, and Reduce Distribution Center Stockouts
Ankit Pandey, Purdue University, West Lafayette, IN
Nikeeta Brijwasi, Mayank Daga, Amjad Hasnian, Matthew A. Lanham
This study provides a solution to understand, predict, and reduce distribution center stockouts for a major CPG conglomerate. The motivation for this study is that distribution center(DC) stockouts have a direct impact on the revenue and profitability of the CPG as well as the retailers they supply. Having the ability to accurately predict and identify stock out root causes at the DC can lead to better business performance measures, and improving customer satisfaction downstream. This study highlights the need for better prediction for planning purposes but is novel in identifying and understanding causal drivers of why such stockouts occurred. In collaboration with a national CPG partner, we explored different time series forecasting approaches such asnaïve methods-moving averages, Holt-Winters,andARIMAmethods, and compared insights gained with more sophisticated machine learning approaches such as long-short-term-memory (LSTM)recurrent neural networks. A root-cause analysis model was developed to ascertain whether stockouts are caused by variability in demand-side parameters, such as safety stock calculations and stock allocation rules, or on the supply side, such as production schedules and material procurement policies. Our solution categorizes different products into seasonal, stable, and erratic groups based on their demand patterns. Then specific predictive models are implemented to predict stockouts for each category. We found that while ARIMA and Holt-Wintersmethodswere highly accurate in predicting stockout for seasonal products, LSTMrecurrent neural networks performed better on products which had erratic demand patterns. We then used the model predictions to determine the safety stock levels for each product. We discuss how business decisions such as production schedules, material procurement policies, shipping, and replenishment policies can affect stockouts besides demand patterns, and recommend business actions to mitigate stockouts resulting from such factors. Overall, our cross-validated solution is expected to reduce DC stock-out incidents and provide a better understanding of where improvements can be made based on root-cause identification.
- Data Wrangling to Data Science in Revenue Management: A Dynamic Pricing Model for the Vacation Rental Industry
Changxuan Liu, University, West Lafayette, IN
Yachu Liu, Jiangtao Xie, Nelson Chou, Matthew A. Lanham
We develop a dynamic pricing model for the vacation rental industry that fuses public and proprietary data to optimally price a vacation rental, while maximizing the firm’s key performance index (KPI).
The motivation of this research is the increased interest in expanding data science and analytics capabilities within the revenue management business area. Bigger picture, it is believed that the future of revenue management will begin to influence other industries where perishability of revenue exists, apart from the traditional service industry. Examples include cloud computing platform providers, financial service companies, and telecom service providers. Data-driven dynamic pricing strategies leveraging external information of the business ecosystem play an essential role in profitability.
In collaboration with a world-wide vacation rental company we fuse the firm’s proprietary vacation rental data with other rentals on the market using customized web crawlers that capture the listed price and features of those rentals. Our model ingests this data to make pricing recommendations that optimize the firm’s KPI. Our pricing model uses price, distribution channels, unit features, lead time, stay length, and other features, and provides a propensity of unit purchase given a posted price, and the current market competition.
Lastly, we can recommend the price and distribution channels that maximize the overall KPI. This not only lets the organization plan better, but also allows the firm greater flexibility to improve and price more aggressively in geographical areas where it is expanding.
- Semantic Structure Identification using Image Detection Algorithms
Udaiveer Singh Chauhan, Purdue University, West Lafayette, IN
Nipun Diwan, Ananth Nath, Daniel Whitenack
In this project, we aim to develop a tool capable of identifying useful semantic structures from various file formats such as PDFs, images, etc. The tool attempts to identify cells of semantic continuity (e.g., table cells) within files using Image recognition and table identification from images. The tool will first convert each page of the document to an image in order to detect these cells. The second stage will detect semantic structures and linkages within the data (such as data present in tables). To perform the task described in the previous paragraph, we collected over 400 PDF files that contain textual as well as tabular data. These files are used in our model as part of the training data. These files are converted to JPEG format using OpenCV. Tools such as PyTesseract are used to identify cells and create bounding boxes around them, and Machine Learning/Artificial Intelligence frameworks such as TensorFlow and Luminoth are utilized to extract tables from the images by training deep learning algorithms. Finally, arrays of these bounding boxes are analyzed to detect semantic structures.
- Using Conditional Random Fields to Forecast Product Promotions
Aniketh Kulkarni, Purdue University, West Lafayette, IN
Sudeep Kurian, Ananthapadmanabhan Sivasankaran, Matthew A. Lanham
This study examines the efficacy of Conditional Random Fields (CRFs) to estimate the demand for products on promotion. The study involves understanding the historical performance of promotions, products or product groups, and clustering (by volume of product groups). The primary goal is to generate a predictive model that estimates demand for different promotions and provide a working tool that uses these forecasts to help Merchants/Category Managers estimate the promotional effect for their categories.The motivation for this research is that it often occurs in practice that data in its entirety is not recorded and hence the predictive models built using this data deal with some level of uncertainty. In sequence modelling, a graph is usually represented as a chain where an input of sequential observed variables Xi represents a sequence of observations, and Y is a hidden state variable that must be inferred. The sequential outputs Yi – 1, Yi, ⋯., Yi + k form a chain that allows one to determine the most likely label sequence given the observations. This lends itself to promotional retail problems where products might have been on sale during a time window but were not recorded as so. For example, a pair of shoes sized 9 might have sold on promotion, but the same pair sized 12 was not shown to have been sold on promotion, even though the price was the same and sold within the same store. We posit that CRFs might first help identify true promotional sales versus unknown promotional sales, which can lead to a more complete data-set when making inferences regarding the promotional effect.In our study, we collaborate with the data science team within a major business application company and use time series transaction data, which is plagued with many missing values. We implement CRFs to model promotions for a particular good-store combination based off of historical data of the product in junction with other similar products. We provide some evidence that retailers can fill in missing entries in the promotions data using CRFs, which can also yield more accurate predictions of promotions.
- Flexible Named Entity Recognition for Date-related Entities in Financial Documents
Jaideep Dutta, Purdue University, West Lafayette, IN
Deepika Jindal, Shubham Arora, Daniel Whitenack
Currently, financial firms perform categorization of financial documents, such Annual Reports, Audit Reports, and Risk Assessment Reports, manually based on relevant date-related entities in the documents (e.g., date, timestamp, or interval). That manual process is slow and error-prone. This study examines Named Entity Recognition (NER) of date-related entities using flexible deep learning models. These flexible models allow for the automatic detection of various date and time formats, which could help firms reduce the manual effort required to categorize documents. At the same time, the models could help create a well-organized, automated process to reduce manual parsing error. In collaboration with an innovative tech company in the financial space, we analyzed financial documents to find and annotate various date-related entities. This annotated dataset was then utilized as training data to create a deep learning model with TensorFlow for NER. We highlight the various accuracy measures associated with our methodology to showcase the effectiveness of our model.
- An Intelligent Framework For Optimal Management Of Data Collection Process
Kannan Balaji, Nielsen Company, Oldsmar, FL
Sreeraman K. Krishnan, Reddy Shivampet
Retail data collection across various countries of the world requires massive planning and execution of data collection activities in various types of retail stores like grocery, pharmacy, convenience stores and so on. While planning for data collection activities one has to take into account resource (data collectors) availability and store availability, then optimally allocate and route resources to stores for various data collection activities based on data collection time, travel time between stores and various other constraints related to stores, resources and processes. After generating an optimal plan for data collection and at the time of execution of the generated plan, several operational restrictions like temporary unavailability of stores or data collectors, would trigger adjustment of the data collection plan. Finally, during data collection, compliance to the plan should be ensured and quality data collection need to be tracked. In this work we propose an analytics framework to optimally manage data collection process end to end – from optimal planning of data collection activities to ensuring quality data is collected by data collectors complying to the optimal plan; using optimization and machine learning methodologies.
- Finding the Best Production Forecasting Method
Xiaoyu Wu, Duke University, Durham, NC
Forecasting industrial production is important because it enables firms to understand the distribution of their customers and analyze economic development. But, very little research on the best forecasting model for production has been conducted. This study will fill this gap. In this work, several forecasting methods will be applied to the quarterly production of clay bricks data across several decades and a large-scale comparison will be presented. The models we consider are ARIMA, seasonality decomposition and spline regression. Seasonal decomposition produced the lowest error and is, therefore, our preferred model. In addition, we present tests of these models on quarterly production data of Australian beer. Seasonality decomposition remained the most accurate forecasting method after applying it to a different dataset.
Xiaoyu Wu received her Bachelor degree of Mechanical Engineering from Shanghai Donghua University (China) in 2018. Currently, she is the student of Engineering Management program at Duke University. She participated in technology and innovation projects as a research assistant from 2014 to 2018 From 2016 to 2018, she won awards in National 3D Printing Contest, Mechanical Innovation Contest, Innovation Contest and gained start-up capital for Shanghai Undergraduate Entrepreneurship Project. In 2017, her paper was employed by 2017 2nd International Seminar on Advances in Materials Science and Engineering (ISAMSE 2017) and published by IOP Conference Series: Materials Science and Engineering (MSE) (ISSN: 1757-8981). In 2019, she published her patent application. Her current interests are in operation management, data analysis and forecasting.
- An Unsupervised Clickstream Solution to Better Understand the Customer Journey
Nitin Sahai, Purdue University, West Lafayette, IN
Ankit Anand, Rohit Kata, Srinija Vobugari
We develop an unsupervised learning solution to capture user behaviour from clickstream data and map a customer’s journey based on click events and search tables, to understand what the customers have done before and after a search. The motivation for this solution is that online services are increasingly becoming dependent on user participation to understand their customer’s needs and potential wants. This study focuses specifically on understanding the customer journey from the lens of website state-to-state clicks (often only part of their journey). Our solution provides: (1) a way to understand a feasible number of journey’s taken by a customer group which will help in predicting the frequency and accuracy of searches, (2) identify likely states that will occur in the future that could help streamline marketing efforts or improve the customer experience. We developed our solution in collaboration with a healthcare navigation company that has accumulated a massive volume of health-related product searches. We found that the sheer number of users and potential paths to traverse on the web portal led to almost no user having identical journeys. To address this, we developed a model to estimate if two journeys were similar and then grouped all users with similar journeys into their own segment. Next, we devised an approach to encode the customer journey into a pattern of likely sequence strings. Then, we used a combination of N-gram frequencies and Hidden Markov models (HMM) to estimate state-to-state probability transitions.
- Community Detection in the Customer-Product Bipartite Graph
Lili Zhang, Kennesaw State University, Kennesaw, GA
Customer segmentation and market basket analysis are foundational applications to a CRM strategy. However, customer segmentation becomes challenging on massive datasets due to computational complexity of traditional clustering methods and market basket analysis may suffer from association rules too general to be relevant for important segments. We propose to partition a large number of customers and discover associated products simultaneously by detecting communities in the customer-product bipartite graph using the Louvain algorithm. Through the post-clustering analysis, we show that the clusters have their distinct and statistically significant characteristics from the perspectives of both customers and products. Our analysis provides greater insights into customer purchase behaviors, potentially helping personalization strategic planning (e.g. customized product recommendation) and profitability increase. And our case study of a large U.S. retailer generates useful management insights. Moreover, the graph applications, based on 800,000 sales transactions finished in 7.5 seconds on a standard PC, demonstrating its computational efficiency and better facilitating the requirements of massive datasets.
- Probabilistic Logic in Massively Customized Products
Matthew Colvin, General Motors, Austin, TX
Many industries, such as the automotive industry, offer a very large variety of customer facing options. With associated engineering features an exponentially large number of possible product offerings are possible, managed through configurability rules. A bill of materials calculation is a logical process executed against a complete and valid configuration to determine the parts used. Orders for parts must often be placed months or years before any customer orders are placed making accurate probabilistic calculations of need a key element of profitability. Forecasts will often exist for individual options and even small combinations of critical options, but not for full configurations or even large fractions of configurations. These forecasts can be logically inconsistent and understated, leading to ambiguity in considering how many parts are needed. This talk looks at applying satisfiability and mathematical programming to identify these gaps precisely in a computationally fast manner.
- Carrier Optimization for Ecommerce Deliveries at Nordstrom
Nazanin Zinouri, Nordstrom, Seattle, WA
Rao Panchalavarapu, Loren VandenBerghe, Michael Manzano
Nordstrom Transportation network is complex with deliveries destined to stores and E-commerce customers. E-commerce deliveries are accomplished from multiple origins to several 3-zip locations in US and other international destinations. The Ecommerce network is significant with millions of shipments delivered to customers, incurring very significant transportation cost on an annual basis. In addition, the significant growth rate of E-commerce deliveries requires innovative approaches to plan and manage network to improve service level and reduce cost of deliveries. Deliveries in Ecommerce network are accomplished through a combination of National, Regional and Linehaul carriers. Based on the cost structure, service level and delivery region offered by variety of carriers, there is an opportunity to identify optimal mix of carriers at a strategic level. At a tactical to operational level, transportation carriers require shippers to provide an estimate of number of packages delivered to estimate the required ground and air capacity at each shipping location. Providing these estimates is critical during holidays and special events. Historically, Excel based analysis is the basis for evaluating the strategic and tactical based decisions described earlier. This presentation provides an overview of methodology based on Integer programming to address carrier selection problem at a strategic level and freight transportation planning problem at a tactical to operational level. The model incorporates strategic and operational constraints and attempts to minimize freight transportation cost. This model is implemented in Python and solved using Gurobi. Nordstrom Supply Chain and Transportation team was able to utilize the results of this analysis to facilitate strategic and operational decisions. Optimization methodology described in this presentation has resulted in savings in freight transportation cost while retaining the desired service level. Based on the implementation experiences and feedback from leadership team we have identified opportunities for various extensions of the model.
- Zone Skipping Methodologies within Nordstrom’s Ecommerce Delivery Network
Ethan Malinowski, Nordstrom, Seattle, WA
Rao Panchalavarapu, Loren VandenBerghe, Michael Manzano
Zone skipping is a strategic logistics initiative utilized in parcel networks. This strategy is accomplished by consolidating individual packages into full truck load (FTL) quantities and shipping to a hub location closer to the destination region of the deliveries. Packages received at the hub location are further sorted and subsequently delivered to end-consumers (last-mile delivery). Freight cost reduction as well as reduction in delivery lead time are the key potential benefits via adopting a zone skipping strategy. We present a 2-phase methodology for long-term strategic planning, as well as for realistic operational implementation. During Phase-1, a model developed optimizes region selection. Given various freight cost trade-offs, service requirements, demand concentrations, and existing network infrastructure; we develop a mixed-integer linear program (MILP) which minimizes total costs throughout the network. Further, this strategic model provides insight to advantageous areas of the network for further examination. During Phase-II, the zone skipping strategy or strategies identified in Phase-I are further evaluated to identify operationally implementable solutions. A second MILP was developed to handle this problem, considering business-specific characteristics and validating against several novice operational alternatives. Adopting both phases of the proposed methodology, we were able to provide cost reductions for some of the most advantageous zone skips, in comparison with current transportation costs to these regions. Further, our proposed methodology provided a total average service improvement of over two days for consumers delivered via zoneskip strategies.
- Role of Big Data Analytics Infrastructure Capabilities in SC Resilience Development
Santanu Mandal, Coimbatore, India
Amrita Vishwa Vidyapeetham
Studies have acknowledged the importance of big data analytics (BDA) in the efficient functioning of supply chains and development of process-oriented capabilities (Wamba et al., 2017). With growing uncertainties on a routine basis, firms have focussed on developing risk mitigation capabilities. SC resilience is one such capability as it helps in restoration of supply chain operations when disrupted. Extant literature classified BDA capabilities into infrastructure, management and personnel capabilities. However, the influence of BDA infrastructure capabilities in the development of SC resilience remains unexplored. Furthermore, the inter-relationships among these BDA infrastructure flexibility capabilities remains unexplored. Hence in this study we explore these two research gaps.Data were collected from 176 analytics professionals engaged in SC management through an online survey. The collected responses were then analysed using partial least squares in SmartPLS 2.0. M3. Results suggested BDA connectivity, BDA compatibility and BDA modularity as prominent enablers of SC resilience. Furthermore, the study found BDA modularity as prominent enablers of BDA connectivity and BDA compatibility. Implications were also provided. The study is the foremost to explore the importance of big data analytics infrastructure capabilities in the development of supply chain resilience. Furthermore, the inter-relationships among the BDA infrastructure capabilities are also explored.
- Using Analytics to Increase the Number of Exam Requests Granted to Students with Disabilities
Nickolas K. Freeman, University of Alabama, Tuscaloosa, AL
As the enrollment in degree-granting postsecondary institutions has increased, so have the number of students with disabilities who require testing accommodations. The final exam period at the end of each academic term can be particularly challenging due to the concentrated surge of exam requests. This research describes a data-driven application that was developed for final exam scheduling at the Office of Disability Services (ODS) testing facility at the University of Alabama in Tuscaloosa. The application uses historical data to approximate the distribution of exam length requirements for students requesting capacity at the facility. The predictions become inputs to an optimization model which prescribes an appointment schedule for the upcoming exams. The application allowed ODS to increase the number of exams scheduled during the Fall 2018 semester by approximately 35%. The project represents an end-to-end model for using analytics to address a resource-constrained scheduling problem under uncertainty.
- Exploring the Relationship between Social Determinants of Health (SDH) and Stroke Severity
Himasagar Molakapuri, Carnegie Mellon University, Pittsburgh, PA
This exploratory study combines patient-level hospital discharge data with ZIP code-level sociodemographic data for a richer analysis of Florida stroke patients. Starting with over 400 social determinants of health (SDH) variables, we reduced the dimensionality of SDH data based on literature review and by applying unsupervised learning methods, specifically Principal Component Analysis and Factor Analysis. Random Forest and multinomial Logistic Regression models were applied for feature selection from the combined data to predict severity level of stroke admissions at Florida hospitals. Our analysis identified 15 SDH features that significantly associate with stroke severity; these can be broadly categorized into individual, household, and occupation categories. If available at the time of admission, these variables may be used to improve clinical decision-making and optimize resource allocation.
- Matching IP Buyers with Sellers: An Intellectual Property Recommendation System
Vaibhav Diyora, Purdue University, West Lafayette, IN
Akshay Kurapaty, Gautam Harinarayanan, Lanham Matthew A, Mayank Gulati
This study creates an intellectual property (IP) recommendation system that provides a list of firms, who are most likely to buy patents, to those that are trying to patent their IP. The motivation for this study is that matching IP buyers with sellers is not a trivial task. For starters, a property can be a physical as well as a virtual asset. The government plays a major role in protecting both people’s physical and virtual assets. While physical assets are protected by legal documents and trespassing rules, virtual assets, such as an idea, is protected by issuing a patent to the person who owns the idea. Such patents are called intellectual property. Exclusive access to develop and reap the benefits of patents create a competitive advantage for firms, allowing them to capture market share. However, not every individual or company is comfortable to defend patent infringements and incur the often-heavy costs associated with protecting the patent. To avoid this, firms and individuals try to sell their patents to companies which have similar interests, and which would complement their existing capabilities.<br />In collaboration with innovative patent search software provider we develop a novel recommendation system that helps match IP sellers with likely buyers. Our recommendation algorithm uses natural language processing techniques, which involves both syntactic and semantic analysis. The algorithm also utilizes other quantitative data such as history of patent purchases by the target firm, firm’s financial health, and other propriety features to increase match accuracy. To find the relevant patents, feature and key word extraction, stemming, POS tagging, lemmatization and chunking methods were used followed by TF-IDF vectorization and concept extraction to find the similarities between the patents. Each patent is given a similarity score on a scale of 0 to 1, and all the patents are ranked based on this score. Quantitative measures such as patent buying frequency, relevancy, etc. are used to re-rank the list. The ranked list acts as a recommendation list of firms with the greatest likelihood of buying at the top. We cross-validated our recommendation system using previous buyers and sellers, which gave highly promising results. Our partnering firm is considering using our recommendation as a new feature within their existing software.
- Holiday Forecasting: A Machine Learning Approach to Forecast Shipment Volume During Holidays
Vijaya Rani, Purdue University, Lafayette, IN
Hemanth Kumar Devavarapu, Syed Almas Rizvi, Arjun Chakraborty, Matthew A. Lanham
This study provides an approach to predict estimated number of packages for various origin destination combinations during holiday season. The motivation for this study is that retailers need accurate demand forecasts for tactical and operational planning within the business. Having the ability to plan reliably for these special days during holiday season can give visibility to third party logistics partners about the right amount of shipments being shipped to customers. In collaboration with a leading fashion retailer, we researched and developed a methodology that could capture these unusual holiday shipment volume peaks using machine learning algorithms, then estimate the number of packages for various origin destination combinations. we researched and developed a methodology that could capture these unusual holiday shipment volume peaks using machine learning algorithms, then estimate the number of packages for various origin destination combinations.
- An Analysis Of The U.S. Hospitals Industry Using A Non-parametric Shape-constrained Estimation Technique
Kevin Layer, Texas A&M University, College Station, TX
Andrew L. Johnson, Robin C. Sickles, Gary Ferrier
Healthcare expenses are an important concern in the US, as they are a lot higher than for any other country. State of the art big data analytic techniques are applied to the US hospitals industry to provide a productivity analysis. One of the contributions is to create a dataset using the AHA Annual Survey to determine the total of expenses for each hospital and the HCUP National Inpatient Sample to gather the exhaustive list of procedures performed on inpatients for 400 to 600 hospitals for years 2007 to 2009. These procedures are split in four output categories and a cost function with four outputs is estimated. The method used is non-parametric regression with shape constraints which allows for flexibility and avoids misspecification while enforcing shape restrictions to improve interpretability. The cost function can be generalized to a Stochastic Directional Distance Function to allow for measurement error in potentially all variables. Finally the results obtain show that the largest hospitals are considerably larger than the most productive scale size, thus implying that losses in productivity due to managing the increasing span of control outweigh the potential economies of scale.
- A Framework to Define, Predict, and Evaluate Slow-Moving Grocery Items
Anant Gupta, Purdue University, West Lafayette, IN
Vivek Avlani, Rahul Gaadhe, John Kalathil, Matthew A. Lanham
The project aims to evaluate the demand of slow-moving grocery items of a national grocery chain. Most of the companies rely on sophisticated machine learning algorithms to predict future sales. In this paper, a set of rules were defined to categorize product as slow-moving and appropriate forecasting techniques such as SARIMA, LSTM, LightGBM and time series methods in the prophet package are implemented. Models were evaluated based on their SMAPE values. It was observed that the models’ performance varied across the type of past demand of the products. This framework helped the retail company understand slow-moving products and improve response for demand.
- Activity Assignment and Routing of Field Personnel in Zillow Offers Business
Suleyman Karabuk, Zillow, Seattle, WA
Wei Xia, David Fagnan, Kyle McQueary, Adriana Baron, Barrett Rodgers
Zillow Offers business buys and sells homes as a service. An acquired home is usually renovated before it is put in the market under Zillow brand name. Each renovation project is managed by an assigned field personnel, referred to as a Superintendent, who visits a home several times throughout the renovation process. On a daily basis Zillow routes a team of Superintendents to visit homes subject to strict and preferred time constraints during the day. These decisions are heavily constrained by the prior assignment of homes to Superintendents, because the same Superintendent has to cover all visits associated with any renovation. We formulate and solve a combined multi-day assignment and daily routing problem to create daily schedules for all Superintendents. When compared to manual decision making, our algorithm achieved 15%-35% savings in miles driven. An implementation is in progress.
- Quality Risk Alerts Via Principal Component Analysis
David Bayba, Chee Hoe Cheok, INTEL, Chandler, AZ
Prior to acceptance of a material for use, it has to pass quality control checks on a variety of measurements. However, being within specification limits on individual measurements doesn’t guarantee that there may not be some complicated interactions across parameters that result in a part that ultimately tests out to be bad. Principal components analysis (PCA) is an excellent method to display high dimensional data in low dimensional space — it is likely clear visually when a new set of inputs is in an unknown space. This work will demonstrate how it is possible analytically to use such a PCA model and the PCA-based distance matrix of known good parts to predict that a part will be in an explored parameter region and thus an alert is generated to closely monitor that parts performance.
- Data Analysis for Recall Patterns In Pharmaceutical Supply Chains
Rana Azghandi, Northeastern University, Boston, MA, Jacqueline Griffin
Pharmaceutical supply chains are vulnerable to various disruptions. Closer investigation of this type of supply chain reveals that the risk of disruptions is usually inevitable. Meanwhile, traditional supply chain models fail to recover from these disruptions in a reasonable time objective. In this research, we aim to predict recall disruptions as a mitigation effort. By using pattern recognition method based on recall data we develop a sophisticated disruption model to help decision-makers to anticipate the risk in the supply chain in the future.
- Assessing Cancer Patent Portfolios Using Analytics
Aurelie Thiele, Southern Methodist University, Dallas, TX
We investigate the cancer patent portfolios of pharmaceutical companies using the “cancer moonshot” data set made available by the U.S. Patent and Trademark Office. We use various descriptive and predictive analytical techniques to assess overall trends in fighting cancer as well as the competitive advantage of the pharmaceutical companies and universities with the most patents based on information on the patent title, patent categories, year filed, year granted, name of applicants, National Institutes of Health funding and Food and Drug Administration approval.
- Forecasting the Inflow of ER Patients to Support the Scheduling of Doctors
Cody Baldwin, Brigham Young University-Hawaii, Laie, HI
The flow of patients into emergency rooms can, at times, be unpredictable – particularly during flu season. This variability is difficult to manage, and it often results in under- or overstaffing of doctors, which is problematic. Understaffing affects patient care and doctor job satisfaction, and overstaffing hurts the financial well-being of the hospital. An ER department in Hawaii sought a solution to address these issues. Thus, this study was conducted, where more than two years of data was collected. The goal was to (1) understand the factors that influence patient flow, (2) build a forecasting model to predict that flow, and (3) use the results to support the scheduling of doctors and physician’s assistants.
- A Comparative Analysis Of Economic And Environmental Tradeoffs Of Roof-mounted Solar Plants For Manufacturing Locations In The U.s.
Amir T. Namin, Northeastern University, Boston, MA
Manufacturing is responsible for approximately one-third of primary energy use and 37% of carbon dioxide emissions globally. This study considers the economic feasibility and environmental implications of installing onsite roof-mounted solar PV systems on a case study manufacturing facility in five U.S. states (California, Florida, Indiana, New Jersey, and Texas), which have varying levels of solar irradiance, different incentives, solar policies, and manufacturing incentives at both the federal and state level. In these five cases, a combination of high-efficiency SunPower solar panels (monocrystalline) with sun-tracking technology is considered. The objective of this research is to compare the impact of state incentives and regulatory policies, as well as physical and locational differences, on the economic and environmental performance of high efficiency monocrystalline solar PV panels used for powering manufacturing processes. Using NREL’s System Adviser Model (SAM), common financial metrics such as the economic payback period, Net Present Value (NPV) and Levelized Cost of Energy (LCOE) are investigated considering the federal and local incentive policies for the selected states. Energy Pay Back Time (EPBT) and Greenhouse Gas emissions (GHG) as common environmental performance metrics for the life cycle of PVs are compared for different cases. The results indicate that EPBT is approximately one year for all five selected states. The abatement cost is in the range of $2 – $150 per ton of CO2 equivalent. The economic payback time has a range of 3 to 23 years depending on the existing incentives and policies.
- Operational Constraints For Real-world VRP
Heather Moe, ESRI, Redlands, CA
Shubhada Kshirsagar, Wee-Liang Heng, Peter Yan
Solving a standard Vehicle Routing Problem (VRP) to optimality is an amazingly difficult challenge. However, the optimal solution is often unrealistic and infeasible in the real world. Additional constraints and information must be added to the model to make the solution operationalizable. In this analysis, we examine qualitative constraints such as how best to cluster a solution; balancing workloads but still being reasonably efficient; sequencing orders on a “neighborhood” basis to facilitate repeat attempts; and prioritizing high revenue orders but visiting lower priority ones if they are nearby. We will look at the multitude of ways these can be defined quantitatively to fit a company’s business rules, and the implications of the different interpretations. Although hard to define, these types of constraints are critical for producing an operational solution
- Hourly Forecasting of Clear-Sky Index: A Markov-Chain Probability Distribution Mixture Approach
Leo Degon, Savannah State University, Savannah, GA
Abhinandan Chowdhury, Suman Niranjan
In this research, hourly clear-sky index (CSI) is modeled by a Markov-chain probability distribution mixture. The method uses measured data from the National Solar Radiation Database to construct a Markov chain with states corresponding to broad states of atmospheric interference, with variation within the state modeled by a probability distribution. Parameters used in the model are determined by examining the measured Global Horizontal Irradiance (GHI) from the summers of 1995-2002 in Tallahassee, FL and calculating the mean and expected time spent in a state. A transition matrix was constructed by considering the steady state. MATLAB auto fit functions were used to construct the probability distributions associated with each state. The 2-state and 3-state models are compared for goodness of fit and autocorrelation. Both 2-state and 3-state models produce high goodness of fit and positive autocorrelation, with the 3-state model producing a more accurate model.
- Computing Cluster Usage and Profitability Optimization from Job Logs
Melvyn Thomas Daniel Sagayaraj, Purdue University, West Lafayette, IN
Chaitali J Pawar, Saurabh Suman, Daniel L Whitenack
Typical computing clusters, such as those utilized by companies and academic institutions for research and development, perform are variety of disparate tasks for multiple user. However, the potential of these clusters is only truly harnessed when their configuration and hardware is optimized for specific research activities. By way of example, Purdue University is funded for various disparate research activities every year and the computational environment for the same is provided by Purdue-managed computing clusters with a lifespan of five years. Ideally, the purchase and configuration of these clusters should align with the funded research for the length of time during which the clusters are utilized. Unfortunately, however, the applications running in the clusters are used carelessly rendering a very low percent utilization of clusters. Our project performs a detailed analysis on the Purdue server logs to study past performance of clusters on a granular level and propose actions based on the results. This analysis will help the Purdue Supercomputing team maintain maximum utilization of the cluster capability and ensure a high resell value for the hardware after decommissioning. Our results motivate a cost effective method for all research teams to ensure that the supercomputing team maintains profitability by delivering services to various research teams only for the resources they use and thereby creating a win-win situation for both the buyers (researchers) and the seller (Purdue Supercomputing).
- Scientific Advances to Continuous Insider Threat Evaluation (SCITE) for Inference Enterprise Models (IEM)
Sean Tatman, Innovative Decisions Inc, Vienna, VA
Dennis M. Buede
The Intelligence Advanced Research Projects Activity (IARPA) Scientific advances to Continuous Insider Threat Evaluation (SCITE) program seeks to develop and test Inference Enterprise Models (IEMs) to detect insider threats—individuals with privileged access within an organization who (1) are (or intend to be) engaged in malicious behaviors such as espionage or sabotage, or (2) are susceptible to malicious behaviors such as phishing attacks. To perform its function, an inference enterprise (1) gathers and processes data, (2) generates alerts when the target behavior is suspected, and (3) follows up with more thorough investigation of alert cases. This poster presents insights from developing IEMs that forecast the accuracy of existing and proposed systems for detecting insider threats. One valuable insight from this research is that one can use aggregate data (e.g., histograms or summary statistics) of an entire organization or types of employees within an organization to evaluate performance of detection systems.
- IoT in the Insurance Industry: Using Telematics Data to Strategically Manage Risks and Price Competitively
Simon Jones, Purdue University, Lafayette, IN
Miao Wang, Ho-Min Liu, Himanshu Premchandani, Matthew A. Lanham
We provide detailed risk analysis associated with driving styles and conditions using telematics data. Our solution estimates the additional uncertainty telematics data can achieve about insured drivers, and the business benefits an insurer might achieve as the number of customers enroll in such a big-brother program. The motivation for this study is that premium revenue in the property and casualty insurance industry was $558.2 billion last year. In 2016, average expenditure for auto insurance for an adult in the U.S. was $935.80 dollars. Traditionally, business models for insurance companies are built on complex actuarial calculations and theoretical assumptions. Nowadays, the Internet-of-Things (IoT) allow us to view the world through a microscope, to eliminate the gaps between facts and assumptions, and to further improve the profitability of pricing models. We collaborated with a domestic insurance company whom had collected a vast about of telematics data. Telematics data measure the driving behavior of customers where the benefit to the customer is a premium reduction, while the theoretical benefit to the insurer is reduced risk via improved understanding of the customer’s driving behavior. The dataset we investigated consisted of three separate levels of information: (1) claims data, including claim date, driver ID and amount of claim payment; (2) drivers’ data, consisting of basic information pertaining to each diver, and; (3) trip data, consisting of start time and date, as well as distance and speed traveled. Our research investigated the differences between the driving behavior of the drivers who have and have not filed the claims. We investigated the presence of differences between driving habits who had claims to see if there is any variability between trips associated with claims and those without to see if certain relationships existed (e.g. a city diver traveling at interstate speeds having a higher risk than those that typically travel at these speeds). We linked the claims data to individual trips data and explored micro factors that might lead directly to the accident cause. We also tried to find other aspects that would affect driving risks such as inclement weather. From our analysis and deep academic research in this area, we were able to develop an empirically-based strategy that would allow the insurer to estimate the additional uncertainty telematics data can achieve about insured drivers, and the business benefits an insurer might achieve as the number of customers enroll in such a program.
- Applying Advanced Analytics to Fight the Opioid Epidemic in Appalachia
Jana Laks, Carnegie Mellon University, Pittsburgh, PA
The Heinz College Opioid Action Team—comprised of Masters candidates from the Health Care, Public Policy, and Information Sciences Management Programs—was tasked with applying advanced analytics to the fight against the opioid epidemic in the Appalachian region. The team, in partnership with SAS Institute: 1) built an interactive dashboard to explore and analyze structured Centers for Medicare and Medicaid Services (CMS) claims data using SAS Viya, 2) augmented synthetic CMS claims data in Python to create new tools and processes for CMS end users, and 3) pulled via API call and processed unstructured public comment data from regulations.gov removing duplicate comments and topic modeling. The resulting three work flows enable end-users to utilize open-source and SAS platforms to analyze, propose, and model solutions addressed at the opioid epidemic in Appalachia.
- Contracts Analytics on Cognitive Enterprise Data Platform
Pitipong JS Lin, IBM Corporation, Cambridge, MA
This poster lays out business requirements and identifies today’s technology to support the consolidation of contracts into a cognitive data platform and application of analytics to quickly gain insights into sales contracts. Companies are seeking better, faster ways to analyze contracts to understand obligations and risks that will help close the deal faster during contracts negotiation. A significant root cause of revenue leakage is risky language in contracts. Often it is due to clients requiring special contract clauses that derivative from the standard template, like “bank guarantee”, or sending contracts written from their legal team for signing. It not only introduces risks but also requires tremendous amount of time in iterative contract legal reviews. Advanced analytics and cognitive/artificial intelligence technologies add value in pre- and post-contract signing. However, there are many gaps to be addressed throughout the technology pipeline. Starting from consolidating contracts from the fragmented repositories into one, to converting the picture ‘pdf’ contract into text for processing, to enhancing the metadata associated with each contract, to making it easily searchable by sellers are examples today’s challenges and pre-requisites before we can even run any text analytics. Furthermore, we need to address technologies to extract metadata, compare clauses for the various use cases from the legal, procurement, delivery, accounting perspectives to reduce risk exposure and speed up contract signing.
- A Sequence Analysis Approach to Improve New Product Forecasts
Sudarshana Singh, Purdue University, West Lafayette, IN
Shubham Gupta, Muthuraja Palaniappan, Thuy Nguyen, Matthew A. Lanham
This study investigates a novel application of sequence analysis, which has been widely used in the Bioinformatics field for protein sequencing, to forecast new SKU demand. The motivation for this study is that demand forecasting is an important application in the field of predictive analytics. With demand forecasting, analysts try to understand consumer demand for goods or services, using information of past behavior patterns and the continuing trends in the present. Knowledge of how demand will fluctuate enables the stores and distribution centers to manage inventory and utilize their shelf space more efficiently. This creates a competitive advantage for the company. As new products are introduced to the market, new replacement spares are added to stores to serve them. This creates a knapsack problem – there are many possible SKUs to choose from to stock in stores and hubs, but space and purchasing budgets are limited. Therefore, the retailer has to decide which SKUs should have stocking precedence and how many SKUs should be kept on hand to meet the local demand. For most of the SKUs they rely on past sales history, stocking information from other locations, market data, and lifecycle curves to adequately stock stores. However, for new SKUs this information does not exist and thus, often demand profiles from similar products and spares are used as a proxy, which is often highly inaccurate. Collaborating with a U.S. national retailer we have developed a novel approach of using sequence analysis, which is not a widely used approach to forecast demand, particularly for new SKUs, and compared this approach to other commonly used approaches to forecast new SKUs. We have found our approach is highly competitive for sparse demand replacement products.
- Integrating Descriptive Analytics into the Marine Corps Depot Maintenance
Anthony Giunipero, United States Marine Corps, Albany, GA
The purpose of this project was to develop metrics that measure the Marine Depot Maintenance Command’s (MDMC) operational performance in the execution of $400 million in repairs to over 400 types of ground military equipment, annually. Previous metrics focused on meeting budgeted revenue and labor hour goals vice schedule compliance and production goals. Project deliverables were to: identify metrics to be used by MDMC, Maintenance Management Center, and Marine Corps Logistics Command to capture performance and to create a semi-automated visualization tool, digestible by technical and non-technical users. Aspects of this problem included: identifying the organization’s information requirements, learning the data culture of each organization, exploring available data sources and IT infrastructure, establishing privileged information among organizations while maintaining a concurrent big picture, and identifying and creating deliverables based on current requirements and to shape the organization’s direction. The desired end state was to improve the depot’s operating efficiency and production performance through data driven decisions.
- Pair Match Algorithm to Design Pseudo Control- A Business Application in Banking Analytics
Pair Match Algorithm to Design Pseudo Control- A Business Application in Banking Analytics
The effectiveness of any marketing promotion is measured by the incremental revenue generated by the test group in comparison with the holdout control group designed with an unbiased random sampling method. However, measuring the performance of marketing campaigns can become challenging when compliance regulation prevents the business from keeping a holdout control population. Overlapping marketing promotions can also pose challenges to measure true incremental value of the individual campaigns if the control groups are not mutually exclusive. This poster presents two such business case studies and demonstrates how a pair match algorithm can be applied in such scenarios to create pseudo synthetic control group and help measure the performance of test group in comparison with the pseudo control. The first case study is about how to measure incremental value of a card reissue by a bank following the global technological migration to EMV chip card from traditional magnetic strip when compliance regulation mandates the banks to reissue the new chip card to every existing card members. This is a case of an absence of a control group. Traditional business approach was inclined towards measuring year of year lift in spend and revenue from Reissue population. However, this approach does not take in to account various indigenous consumer behavioral factors and exogenous macroeconomic factors and seasonality which are likely to influence year over year performance of the same group. Hence analytics proposed lookalike sampling method to create a pseudo control group within Reissue population and compare performance year over year only within the pair matched groups who are essentially lookalikes basis their behavioral attributes, thus enabling the business to measure the true impact of chip card reissue. The second case study is about how to measure and isolate incremental value driven by individual marketing campaigns when a major US bank had to implement a card reissue program while a popular marketing promotion was still on going. This is a case of overlapping control groups where a segment of the control group from the popular campaign became a part of the reissue population. Analytics solution was to design new test and pair matched control groups basis consumer behavioral attributes after excluding the overlapping control between two interventions. In both cases, business identified card reissue program, not only as a compliance driven customer communication, but also a major marketing channel of reminding customers of the presence of the card in their wallet and a newly reissued card is expected to drive additional customer spend. The analytics solutions shown in this presentation is author’s continued efforts of furthering analytics research around pair match algorithms and their business applications in context of marketing, as was first introduced by the author at SAS Global Forum, 2016.
- Leveraging Kubernetes on Azure for ETL, Data Validation, and Business Intelligence
Roshan N. Lalwani, Purdue University, West Lafayette, IN
Karma Y. Patel, Rajat Mittal, Yuvraj Gupta, Daniel L. Whitenack
Our client is one of the largest public accounting, consulting, and technology firms in the U.S. with a need to process massive amounts of healthcare data. In particular, the client’s “Revenue Analytics” product gathers data from over 1000 hospitals. We facilitate the commoditization of this data in the cloud using Kubernetes and blob storage. This approach allows us to create an easily accessible, robust and centralized data and analysis platform meeting the needs of various stakeholders. Moreover, this new data architecture overcomes the challenges of the client’s current architecture in which a series of ETL transformations push data into various database locations. Previously, the client could not track the point of failures in their data pipelines. We solve this problem via Azure cloud services along with Kubernetes and Docker to design, develop and track all data pipelines. These data pipelines feed a centralized “SQL Data Hub” that drives all decision-making processes and powers enhanced BI reporting.
- Interpretable Neural Network for Survival Analysis by Integrating Genomic and Clinical data
Jie Hao, Kennesaw State University, Kennesaw, GA
Dissecting complex biological processes associated to clinical outcomes (e.g., patients survival time) at the cellular and molecular level provides in-depth biological insights not only for developing new treatments for patients, but also for accurate prediction of clinical outcomes. However,highly nonlinear and high-dimension, low-sample size (HDLSS)data cause computational challenges in survival analysis. We developed a novel pathway-based, sparse deep neural network,called Cox-PASNet, for survival analysis by integrating high dimensional gene expression data and clinical data. Cox-PASNet is a biologically interpretable neural network model where nodes in the network correspond to specific genes and pathways,while capturing nonlinear and hierarchical effects of biological pathways to a patient’s survival. We also provide a solution tot rain the deep neural network model with HDLSS data. Cox-PASNet was evaluated by comparing the performance of different cutting-edge survival methods such as Cox-nnet, SurvivalNet, andCox elastic net (Cox-EN). Cox-PASNet significantly out performed the benchmarking methods, and the outstanding performance was statistically assessed. We provide an open-source software implemented in PyTorch (https://github.com/DataX-JieHao/Cox-PASNet) that enables automatic training, evaluation, and interpretation of Cox-PASNet.
- Improving Sales and Marketing Effectiveness Using Customer Segmentation
Shefali Jain, Purdue University, West Lafayette, IN
Akash Kashyap, Sagar Pradhan, Aditi Vatse, Matthew A. Lanham
The study examines different analytical methods of customer profiling in the rapidly evolving Agro-science industry. The motivation for this study is to enable sales and marketing teams to tailor optimal offers that combine the right products to different customer segments. Choices of crop productivity level and crop protection management are customer and geographic specific, and in many instances affected by temporary market conditions. Therefore, understanding of customer individual needs in this complex scenario can help the firm to design optimal combinations of Seeds and Crop Protection products for distinct customer segments. In collaboration with a major agro science company, we build and assess traditional choice models, segment customers based on various attributes such as distributors, soil type, products and transactions using advanced unsupervised machine learning algorithms such as clustering, market-basket analysis to understand the consumers buying patterns in order to increase the farm share( i.e., the percentage of farm area on which firm’s products are used) of the existing customer base, and supervised machine learning classification algorithms like decision trees, random forest for a better segmentation with a motive of increasing the firms share of the market.
- Pay Now, Gain Later: Analyzing the Effects of Higher Pay Scales in Business Performance
Liye Sun, Purdue University, West Lafayette, IN
Lorena Veronica Bustamante, Doga Sayiner, Kiran Samayam, Matthew A. Lanham
This study examines supervised learning methodologies that retailers can employ to gauge the effectiveness of different monetary compensation packages in overall store performance. This study is motivated by the high turnover rates experienced in the retail industry over the last few years, placing the industry with the second highest turnover (13%), following software / technology. 45% of leaving employees cite lack of growth opportunities as the main reason for leaving a company and 34% indicate dissatisfaction with the compensation / benefits package. Due to such an increasingly competitive environment, retailers are constantly evaluating their compensation / benefits offerings to avoid lower sales, higher turnover rates and difficulty in hiring and retaining staff. In collaboration with a national retailer with over 400 stores in the US, we built and assessed traditional inferential models to analyze the effects of higher pay scales. We compared the treatment (pilot) stores against a control (traditional) set and determined whether the Store Pay Pilot project has had positive results (e.g., better sales performance by store, increase in store team retention, etc.) as compared to traditional stores (e.g. stores without increased pay scales). We highlight the magnitude of the effects so that HR managers can use these estimates as a decision-support mechanism for their upcoming planning horizon.
- Reducing Receipt Invoices Mismatch Costs Using Categorical Variable Imputation
Shubhansh Jain, Purdue University, West Lafayette, IN
Meera Govindan, Sushil Achamwad, Aniket Banerjee, Daniel L. Whitenack
This study investigates the likelihood of a mismatch in receipt and invoice matching processes for payables, called ‘Receipt grief.’ An American fortune 100 company, which is also the world’s largest construction equipment manufacturer, was used as the subject of the study. Our study demonstrates how receipt grief costs can be reduced without manual intervention via a model that predicts the likelihood of receipt grief. The likelihood prediction is based on invoice attributes including supplier codes, invoice number, locations, charges, and receiving facilities. If the likelihood of grief is sufficiently large, an invoice can be sent back to the receiving facility for review and resolution of the grief. This path requires less overall effort and time while avoiding charges associated with manual review by payables analysts. We implement our model and analysis of real freight data using the R programming language, which contains packages like caret for predictive modeling. Additionally, we use visualization tools including Tableau to study the grief data patterns and generate insights.
- An Innovative Prototype to Create Real-Time Commodity Product MarketIntelligence and Move the Market
Fan Lu, Purdue University, West Lafayette, IN
Shengye Guo, Charul Bagla, Abhijit Kumar, Matthew A. Lanham
The project aims to build and evaluate a predictive model to estimate the aggregate export of three agricultural commodities (corn, soy beans, and wheat) from major export regions in the US that are listed on the weekly published USDA grain inspection report with an commodities market intelligence business partner. The first phase of modeling will try to estimate the export numbers of the report for the following week. For the second phase, initial estimates for a larger time frame will also be included. Our business partner, after gaining expertise in their current domain is trying to follow a model of vertical integration to translate their existing expertise to get into the new market of grains. Agricultural exports are one of the top 5 exports in the US with Soybeans, Corn and Wheat (37B USD) accounting for 25% of the total agricultural export revenue. Therefore, providing our business partner with an untapped market having a lot of potential to be leveraged. The business value is mainly dependent on getting the right information before it goes public or to gain an information edge by building an effective predictive model. Early availability of weekly numbers would provide the traders and marketers in grain industry with vital insights into constantly evolving trends, pattern and flows of export. This information could be leveraged by traders to pre-empt the demand and adjust their weekly investments accordingly The project aims to develop an efficient crawler to obtain relevant data from USDA and build time series models to predict the agricultural commodities inspection export numbers. The major components of our model are: 1)Time series modeling to use historical export inspection report numbers to predict the future trajectory of grain exports from major US ports. 2) Seasonality analysis to accommodate for the highly seasonal nature of grain exports. 3) Incorporation of the actual loading/unloading capacities of the major ports depending on the USDA data about the number of available empty vessels. This will help to set a max cap in the loading capacities of ports and achieve baseline estimates of grain export. Based on the research results, we came up with a working tool to automate this process internally, thus promoting our client’s business. Our model aggregates data from comprehensive sources, and provides a timely, seasonal and scientific result to support agricultural commodities trading decisions, which is innovative and could help move the market. Sources: https://www.fas.usda.gov/data/top-us-agricultural-exports-2017
- AGRO: An AGri-business Recommendation Optimization Engine for Sales Growth Decision-Support
Varsha Prabhakar, Purdue University, West Lafayette, IN
Rudraksh Syal, Yash Sharma, Zhi Dou1, Matthew A. Lanham
We develop an intelligent agribusiness recommendation optimization engine called AGRO to support sales team members as they identify potential customers, understand their needs, and provide accurate product recommendations.
The motivation for this research is that the United States is one of the world’s leading agricultural producers and suppliers. California alone is responsible for over one-third of the country’s vegetables and over two-thirds of the country’s fruits and nuts. Growing more than 400 commodities of various crops all-year round, approximately 47 billion dollars of total farm and ranch value is attributed to the state of California alone.
For agribusiness companies, a deep understanding of farmers’ need is critical for them to market their products and compete with other competitors. However, such understanding was usually achieved by experience or costly market surveys in the past. AGRO provides a novel case of the use of sophisticated data analytics to improve decision-support in this domain.
We expect more applications of data analytics in this area to grow as data transparency continues to evolve. For example, historically California became the first state to mandate the full reporting of agricultural pesticide use in 1990. Since then every single farmer (permittee) has to report the plantation of each and every crop as well as the use of pesticide, insecticide, or herbicide being used for that crop. This data becomes accessible only after 16-18 months after it is reported, when the government releases it to the public. However, now the same comprehensive data is readily available within six months with all-encompassing details of the crop, the amount of area in which it is grown, the time of the year at which it is grown, along with the type and amount of pesticide used for the crop. Just to provide an estimate, the state of California agribusiness for a single year is close to 2 gigabytes in size.
In collaboration with an Agri-science company, we use this public data with the firm’s proprietary data to develop a cross-validated algorithm that provides strategic decision-support to their sales team in whom to target and which assortment offering to provide them among an infeasible number of combinations. Using a combination of algorithms, this cross-validated engine ensures product recommendations are optimized, meaning inaccurate suggestions are minimized.
- Dynamic Server Placement In Cloud Data Centers With A Hybrid Metaheuristic Algorithm
Wanyi Zhu, Alibaba Inc, Hangzhou, China
Efficient use of data center infrastructure is a pressing issue for the scalability of the IT industry. To optimize the multi-modality resource utilization, we introduce an adaptive simulated annealing algorithm for server placement in the data center. Up to 30% improvement in resource utilization can be obtained with the proposed approach.
- Analyzing Product Mix, Purchase Record and Growth Trends to Improve E-Tailer Performance
Yuntong Lin, Purdue University, West Lafayette, IN
Mingjen Yeh, Abhishek Talwar, Daniel Whitenack
The steadily increasing use of the Internet as a medium to engage in buying and selling retail goods has led to the creation of a mammoth electronic marketplace. E-commerce is growing at a pace faster than any retail sector, and this has resulted in a quest for competitive advantage and more detailed understanding of consumer base. At the same time, the electronic nature of the marketplace provides new opportunities to understand the online consumer. Through an empirical study of a premier e-tailer, this project assesses how advanced analytics can be applied to improve business performance. The project focuses on: (i) analyzing purchase records to help develop better marketing and advertising programs, and (ii) conducting a granular assessment of a firm’s product mix along with corresponding growth trends. We also demonstrate how scalable purchase record parsing can be completed automatically via column identification and customer matching.
- End to End Data Cleaning Workflow for NYPL Restaurant Menu Data
Reshma Lal Jagadheesh, University of Illinois Urbana Champaign, Urbana, IL
Nithish Kaviyan Dhayananda Ganesh
About 80% of the time and resources in any data project is involved in data cleaning. In this poster we present an end to end data cleaning workflow that semi automates the process of cleaning humongous data using a case study of New York Public Library (NYPL) Data. The work flow combines some of the advanced methodologies available at present in the process of data cleaning. The data is cleaned using state of the art tools such as OpenRefine, SQL and Pandas (Python). The objective of our data cleaning model was to ensure that the final cleaned data was good to use based on the use case for the dataset. Lastly, YesWorkflow tool was used to map the step by step procedure that was involved in the data cleaning process. Some analytics and inferences were also drawn from the final cleaned data to check if the data set satisfied its use case. The NYPL Menu data has a collection of over 45000 menus which gets updated twice every month. The workflow model used in the poster can be applied in industry for performing data cleaning operations on data which gets updated regularly, thus saving a huge amount of time in identifying the repetitive cleaning tasks to be performed on the data. Also, in this project a novel workflow modelling tool YesWorkflow was used, which has the advantage of modelling a data cleaning workflow directly from Python and R scripts along with other data cleaning tools such as OpenRefine. To make the purpose of the data cleaning operations performed more meaningful, data visualization was performed to show valuable insights obtained from the cleaned data.
- Application Of Operations Research To Marketing In A Gas Company ~Optimization Of Direct Marketing Using Customer Information~
Yu Shibata, Tokyo Gas Co., Ltd, Tokyo, Japan
Miho Udagawa, Toshinori Sasaya, Ryotatu Arikawa, Sho Ouchi, Noriko Furuta, Kanako Nakayama
Tokyo Gas is now enhancing its business to residential power supply field and trying to acquire more electricity contracts with energy liberalization in Japan. We, Tokyo Gas operations research team, analyzed the past results of our marketing actions (e.g. direct-mail marketing, phone sales, and so on) and each customer data (e.g. postcode, average gas consumption volume, and so on) so that the expected response rate for each marketing action for every customer can be estimated. Based on this analysis, we intended to maximize the total expected profit by increasing electricity customer accounts in consideration of customer acquisition cost, and applied in practice the optimization method to the test marketing trial last year. This experimental marketing brought us successful results and indicated the possibility of effectiveness of the optimization.
- Experience Modeling in Underwriting Farm Insurance
Tanvi Singhal, Deloitte, Hyderabad, TX
Effectual Underwriting risk segmentation is one of the major challenges that most insurance companies face. Many insurance companies provide farm insurance insuring farm barns, buildings, and structures including coverage for most necessary livestock, machinery, supplies and tools. Given the high risk of the insured liability and richness of data for each coverage, portrayal of correct inferences from them, identifying the key drivers of Farm policy performance and enhancing their underwriting management process becomes critical. Deloitte’s ‘Experience Modeling’ solution was developed for our clients based on Deloitte’s industry experience, corporate strategy, Farm-specific requirements, organizational readiness, and short term and long-term strategic underwriting objectives. In the proposed poster, we display inferences from the data (internal and external) and how it can be used to discourse some critical Underwriting management issues. The poster also provides a taste of the end-to-end development and deployment process, concluding with potential benefits of the client upon implementing our solutions.
- Reducing Retails Sales Forecast Error with Hierarchical Bayes
Guang Zhao, John V. Colias, University of Dallas, Irving, TX
We apply Hierarchical Bayes (HB) regression to forecast multiple time series. The HB model uses a Gibbs sampler and a Markov Chain Monte Carlo (MCMC) algorithm to model seasonality and trend for each individual retail sales time series for multiple sectors and regions of a nation. We compare forecast accuracy for three forecasting models:
- Traditional ARIMA model
- HB regression
- Bayesian gradient boosting method adapted for time series data
HB regression potentially improves the forecast of each time series by borrowing information from other time series that share similar seasonality and trend patterns.
Our Bayesian gradient boosting method begins with HB regression for each time series. Then, HB regression models multiple observations of forecast error for each individual time series (corresponding to different subsamples ending in different seasons). Final forecasts sum the beginning regression fitted values and the predicted forecast errors from the HB regression.
Registration and Awards
Each track will be judged separately, with awards of 1st, 2nd, and 3rd place in each track. Winners will be announced and notified before the conference is over based on (1) novelty of application, (2) results (or potential results) from implementation, and (3) presentation of work.
- $500 award from INFORMS Data Mining section will be given to the top student poster over the two day event.
- $500 award from Purdue’s Business Information & Analytics Center (BIAC) will be given to the top practitioner poster over the two day event.
- 2nd and 3rd place winners in both tracks will receive a certificate of recognition.
Production & Presentation Guidelines
Posters must be produced as a single-sheet exposition that can fit on the bulletin board provided by INFORMS. The bulletin board measures 90” wide by 43” tall. We recommend a poster size of 72” wide by 36” tall or 48” wide by 36” tall. However, other sizes are acceptable as long as they do not exceed the bulletin board dimensions of 90” wide by 43” tall.
In preparing your poster, you may want to reference these online sites that provide templates, as well as printing services. Please see below for our guidelines on poster content and design.
We strongly recommend that you have your poster printed locally or by an online site before you travel to the meeting, rather than attempt to have it printed onsite in Austin.