Subscribe to Vincent Granville's Weekly Digest:

Preserving metrics and scores consistency over time and across clients, when data sets change

Changes can come from muliple sources: definition of a visit or web site visitor is changed, resulting in visitor counts suddenly dropping or exploding. Internet log files change, for instance the full user agent string is no longer recorded, impacting traffic quality scores. Or one client has data fields that are not the same or only partially overlap with those from other clients.


How do you handle this issue?

The answer is simple: when a change in scores is detected (whether your scoring algorithm or your data has changed), apply the new scores backward to at least 2-week before the change, compare the old and new score for these 2 weeks of overlapping scores, then re-calibrate the new scores using these 2 week worth of data, to make them consistent (e.g. same median, same variance).
If the issue is not temporal but rather the fact that different clients have different data sets, then use a subset of the two data sets, where data fields are compatible, and compute scores for both clients on these reduced data sets (and compare with scores computed on full data sets). These 4 scores (2 clients, reduced data and full data) will be used for re-calibration.

Notes

  • Use change-point, trend-reversal or slope-change detection algorithms to detect changes. However, the changes I am taking here are usually brutal and definitely visible with the naked eye even by a non-statistician (and in many cases unfortunately, by one of your clients). 
  • When you improve a scoring algorithm, if it improves scores on A but makes them worse on B, then create an hybrid, blended score consisting of old score for B and new score for A.

Views: 461

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Follow us

© 2013   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service