The O.R Dog That Did Not Bark
“Is there any point to which you would wish to draw my attention?”
“To the curious incident of the dog in the nighttime.”
“The dog did nothing in the night-time.”
“That was the curious incident,” remarked Sherlock Holmes.
– Arthur Conan Doyle, Silver Blaze (1892).
By Sidney Paget (1860-1908) – http://www.sshf.com/encyclopedia/index.php/The_Adventure_of_Silver_Blaze, Public Domain, Link
In data science, we are often given datasets that only record the ‘winning’ instances, i.e., ‘when the dog barked’. This is fairly common. Retail stores usually record all successful sales, but not those cases where customers were interested in buying their products but walked away to a competitor, or simply chose not to purchase at all. The situation is a little better in e-commerce, where we have some partial data about lost transactions. Of course, one could use only the ‘wins’ to build a win-prediction model and this is how traditional forecasting models work. However, if we dig a little deeper, it is quite interesting to try and train an ‘O.R. dog’ that would also let us know when it did not bark, using only ‘bark data’.
This may seem to be an impossible task at first glance, but it is quite possible to tackle this problem in practice. An interesting application of advanced O.R methods is the optimal reconstruction of such censored ‘zero’ signals to build better prediction and decision optimization models. Exploring this interesting problem eventually led us to practical breakthroughs in forecasting and prediction methods. This helped solve challenging revenue management, pricing, forecasting, and inventory management problems in multiple industries including retail, B2B, the futures market, and AI applications in travel-and-transportation.
As we welcome Operations Researchers to Seattle and wish them a happy October, it is worth revisiting an Econometrica article from October 1951, authored by Prof. Oskar Morgenstern. This article was a tribute to Prof. Abraham Wald, a pioneer of Operations Research. I learned from this article that Dr. Wald’s life was tragically cut short in December 1950 when his plane crashed into the Nilgiri mountains, miles from my native town in Tamil Nadu, India. Eight of Dr. Wald’s family members were killed in WW2, and yet his spirit would not be crushed, and his research work yielded several important contributions, as noted by Prof. Morgenstern.
A challenging problem solved by Prof. Wald was to identify an optimal distribution of armor on WW2 allied aircraft. After looking at the recorded data of the bullet-hole pattern in the aircraft after flying sorties, he rejected the official suggestion of fortifying the most frequently hit areas. His famous story of ‘the missing bullet holes‘ explains that he suggested the exact opposite- to strengthen the area that wasn’t hit – the engine. The dataset was quiet about the aircraft that did not make it back. However, looking at the aircraft that survived revealed something about the ones that were lost. Clearly, predicting with censored data is not a new thing.
I’ll close with an interesting potential application of censored data estimation models – to tackle the frustrating problem of fake news. I picked a leading English newspaper in my home state of Tamil Nadu at random and compared their top news reports from a few years ago with those in other Tamil and English newspapers. The results were interesting. The fake news that is published is, in a sense, a lesser problem as they can be debunked quickly by fact-checks. However, key news that is not published is fake news too. The public watchdog did not bark, misleading their readers and viewers into concluding that nothing happened. And hidden inside these sounds of silence are curious incidents.