Marketing & Retail Analytics

Setareh Borjian

Retail Science Team

Extracting Product Attributes from Free-form Product Description Strings Using Machine Learning

Entity extraction is one of the commonly used machine learning techniques in the area of text mining with applications in social media data analysis and business intelligence where the goal is to extract information and relevant contents from unstructured text. Specifically, it is used for identifying and classifying key elements from unstructured text into pre-defined categories. In this presentation, we will talk about Product Attribute Extraction, an innovative application of entity extraction in retail industry. Product attributes are seeing ever wider use in the retail industry, because of retailers’ efforts to increase sales and profits by using software which requires attribute information. In fact, many retail applications such as customer decision tree, assortment optimization, and recommendation systems involve techniques that are based on similarity of products and therefore require product attributes. Given that such attributes are provided by retailers only in the form of description strings and that manually extracting the attributes is tedious and extremely time consuming, a system is needed to extract the attributes through parsing the description strings. Despite the increasing importance of product attributes, little attention so far has been focused on actually obtaining attribute information for products. Existing techniques using software frequently involve a rules-based approach to parsing the description strings and thus require a fair amount of manual effort and per-retailer customization. Also, existing software is not specifically geared towards attribute extraction from description strings. The Attribute Extraction (AE) software developed by Oracle Retail Science uses machine learning algorithms to extract attributes from description strings of products of grocery and fashion retailers. It consists of two main modules: 1) an interactive system of rapid annotation and bootstrap training (to prepare training data for the machine learning algorithm); (2) a system of detecting user errors and a linking and spell correction system (to standardize the attribute values). These modules together greatly reduce the amount of manual effort involved in attribute extraction.


At Oracle, Setareh Borjian is a member of the Retail Science Team, applying her expertise in machine learning, mathematical modelling and optimization to solving different problems in retail industry. Prior to Oracle, she received her dual Master’s degree in Operations Research and Transportation from Massachusetts Institute of Technology.

Michael Ketzenberg

Associate Professor of Information & Operations Management
Mays Business School, Texas A&M University

Managing Return Abuse with Data Analytics

Retailers often provide lenient, consumer-friendly return policies to reduce customers’ perceived shopping risk and increase demand. As an unfortunate side effect for retailers, empirical findings demonstrate that lenient return policies lead some customers to abuse return policies through opportunistic and even fraudulent behavior. Customers can abuse return policies by making purchases with the full intention of returning the products or by returning a product long after extracting most of a product’s market value. In doing so, abusive customers extract utility (physical, experiential, or financial) from these purchases, at little or no cost to themselves. However, retailers incur significant costs from such return abuse, with estimates topping $6.8 billion annually in the U.S. alone. Identifying behaviors of customers who perpetrate return abuses remains a critical topic. This talk is grounded on the analysis of a transactional secondary data set of over one million customers and over seventy-five million transactions from a national U.S.-based retailer. The analysis generates new empirical insights that characterize observable customer actions related to abusive returners, legitimate returners, and non-returners. The talk will also introduce a predictive model that enables actionable managerial intervention and the opportunity to recapture significant returns costs that might otherwise be lost to avoidable return abuse. The analysis also highlights the need for a more holistic perspective to predicting, managing, and preventing returns.


Michael Ketzenberg is an Associate Professor of Information and Operations Management at the Mays Business School at Texas A&M University. His research falls under the umbrella of supply chain management and focuses on consumer returns and their management as well as the value and use of information for inventory management. Dr. Ketzenberg’s research work has been published in several academic journals, among them, Harvard Business Review, Production and Operations Management, European Journal of Operational Research, and Journal of Operations Management. Dr. Ketzenberg holds a B.S. in Information Decision Systems from Carnegie Mellon University, an M.B.A in Operations Management from Vanderbilt University, and a Ph.D. in Operations Management from the University of North Carolina, Chapel Hill. Prior to joining the Mays faculty, Professor Ketzenberg held academic positions at Colorado State University and George Mason University. He has over eight years of professional work experience as a project manager, systems developer, and research analyst. He has engaged in a number of research and consulting projects with firms in a variety of industries that include retail, finance, advertising, manufacturing, distribution, and education.

Brian Quanz


Optimizing E-commerce Order Sourcing: A Big Data Analytics and Total Cost Optimization Solution

To meet rising e-commerce demand and expectations, retailers are shifting to an omni-channel fulfillment approach – i.e., using all nodes in their fulfillment network, including brick-and-mortar stores, as opposed to just specialized e-commerce warehouses. This leads to new challenges as fulfillment networks now have potentially thousands of different, diverse nodes, and multiple, conflicting business objectives the retailers need to take into consideration when determining how to fulfill an order, such as minimizing shipping cost vs. balancing network load or inventory. To overcome these challenges, we have developed a cloud-based solution for order fulfillment utilizing predictive analytics and multi-objective optimization to determine optimal total-cost fulfillment decisions. The first version of this solution is being used in production by a large US retailer and led to significant estimated cost savings during the 2016 peak season. In this talk, I will present key aspects of our solution and some results we have seen.


Dr. Brian Quanz is a researcher at IBM in the cognitive commerce research group, where he works on developing innovative applications in the area of commerce. For the past 2 years, his main focus at IBM research has been on the area of e-commerce, and particularly order fulfillment, and he has worked closely with large retailers to understand their problems in this area and develop software solutions to address them. He was part of a research project at IBM on e-commerce order fulfillment optimization from the beginning of its development, which has recently been productized as Watson Order Optimizer. Additionally he serves as a director for the board of the Computer and Information Systems division of the Institute for Industrial and Systems Engineers. Over his career he has worked on a wide variety of analytics applications, filing more than a dozen patents and co-authoring more than a dozen technical papers.


Maarten Oosten

Senior Manager, Advanced Analytics Optimization Solutions Team
SAS Institute, Inc.

Optimizing Rates for Service Agreements by Shaping Loss Functions

One of the optimization challenges within Business-to-Business pricing is the task of setting rates for a set of services. A common practice is to model the likelihood of winning a service within the agreement by means of a win-rate curve. It may be tempting to treat this curve as a regular demand curve, however, the variability of the demand around the expected value is very different depending on the price point. This risk needs to be addressed in the optimization model.

The financial services industry faces similar challenges of managing risk, for example in the context of portfolio optimization. Recently, value-at-risk and conditional value-at-risk have become popular procedures for shaping loss functions.

In this presentation, we will explore opportunities to apply these procedures to manage the risk inherent to setting rates for service agreements. We will cast the examples in the context of rate optimization for cargo rail services.


Maarten Oosten is a Senior Manager in the Advanced Analytics Optimization Solutions team at SAS. Maarten brings close to 20 years of experience designing and implementing pricing solutions to the team. These solutions include advanced pricing analytics across different industries such as distribution, manufacturing, express shipping, cargo and travel & transportation industries.

Maarten has a Ph.D. in Mathematics from the University of Maastricht and a M.S. in Econometrics from the University of Groningen. He has held positions as visiting assistant professor at GSIA, the Business School of Carnegie Mellon University, and as post-doctoral fellow at the University of British Columbia. Furthermore, he is an active member of the INFORMS Section of Revenue Management and Pricing as well as the Professional Pricing Society.

Dirk Van den Poel

Full Professor, Data Analytics/ Big Data
Ghent University

Does Voice Matter in Customer Service Calls Harnessing the Power of Voice Analysis in Big Data Analytics

Traditionally, an important aspect of telephone conversations in customer contact centers has been overlooked, i.e., voice characteristics of the service call. In this presentation we discuss the results of processing millions of actual customer service calls to a real-life contact center. We show that audio voice analysis can be used effectively in various applications. First, we demonstrate how voice features (tempo, pitch, loudness, and timbre) can be used to discriminate between call types (e.g., we identified vivid conversations, i.e., interactions with a lot of highs and lows, where both parties talk with a lot of intonation). Second, we augment the existing CRM database with voice characteristics. Third, we elaborate on the most important features (out of 200+ available audio analysis variables) and their relationship with customer satisfaction. Finally, we include a detailed discussion on how our approach can be integrated in the day-to-day operations of a call center. E.g., the evaluation of call-center agents can be performed in a much more systematic (Big Data) way by analyzing ALL calls, instead of just a small sample of calls to be evaluated by humans.


Dirk Van den Poel (PhD) is Full Professor of Data Analytics/Big Data at Ghent University, Belgium. He teaches courses such as Statistical Computing, Big Data, Analytical Customer Relationship Management, Advanced Predictive Analytics, Predictive and Prescriptive Analytics. He co-founded the advanced Master of Science in Marketing Analysis, the first (predictive) analytics master program in the world as well as the Master of Science in Statistical Data Analysis and the Master of Science in Business Engineering/Data Analytics. His major research interests are in the field of analytical CRM (Customer Relationship Management) including customer acquisition, churn, upsell/cross-sell, and win-back modeling. His methodological interests include ensemble classification methods and big data analytics. He has co-authored 80+ international peer-reviewed (ISI-indexed) publications in journals such as Journal of Statistical Software, Journal of Applied Econometrics, Applied Geography, European Journal of Operational Research, IEEE Transactions on Power Systems, and Decision Support Systems.