Data Intelligence, Business Analytics
Changes can come from muliple sources: definition of a visit or web site visitor is changed, resulting in visitor counts suddenly dropping or exploding. Internet log files change, for instance the full user agent string is no longer recorded, impacting traffic quality scores. Or one client has data fields that are not the same or only partially overlap with those from other clients.
How do you handle this issue?
The answer is simple: when a change in scores is detected (whether your scoring algorithm or your data has changed), apply the new scores backward to at least 2-week before the change, compare the old and new score for these 2 weeks of overlapping scores, then re-calibrate the new scores using these 2 week worth of data, to make them consistent (e.g. same median, same variance).
If the issue is not temporal but rather the fact that different clients have different data sets, then use a subset of the two data sets, where data fields are compatible, and compute scores for both clients on these reduced data sets (and compare with scores computed on full data sets). These 4 scores (2 clients, reduced data and full data) will be used for re-calibration.