Simulating A/B tests offline with counterfactual inference

Simulating A/B tests offline with counterfactual inference

A classic information retrieval way to evaluate how good such a system performs offline would be something akin to mean average precision (MAP) or normalized discounted cumulative gain (NDCG) where you store historical user interactions, and validate your system by verifying whether the algorithm places the items a user interacted with at the top of the recommendations. In user-interactive systems, it is generally easy to collect a dataset where the is what a system knows about the user and their intent, is the item recommended to user by some “algorithm” , and is the reward (like/view/dislike) that a user provides to the system. With this data, IPS gives us unbiased estimates of the performance of any algorithm with:

Where is the propensity for item to be recommended to user for a new different algorith .

Source: abhadury.com