Data Intelligence, Business Analytics
Interesting article published in the NewYorkTimes, discussing how statistical scores are regulated by the government and how they could be used in different contexts. They discuss a company, eBureau, who (they claim) created new types of scores.
A few highlights:
Related article: An alternative to FICO scores?
Here's the article:

AMERICANS are obsessed with their scores. Credit scores, G.P.A.’s, SAT’s, blood pressure and cholesterol levels — you name it.
So here’s a new score to obsess about: the e-score, an online calculation that is assuming an increasingly important, and controversial, role in e-commerce.
These digital scores, known broadly as consumer valuation or buying-power scores, measure our potential value as customers. What’s your e-score? You’ll probably never know. That’s because they are largely invisible to the public. But they are highly valuable to companies that want — or in some cases, don’t want — to have you as their customer.
Online consumer scores are calculated by a handful of start-ups, as well as a few financial services stalwarts, that specialize in the flourishing field of predictive consumer analytics. It is a Google-esque business, one fueled by almost unimaginable amounts of data and powered by complex computer algorithms. The result is a private, digital ranking of American society unlike anything that has come before.
Read full article at http://www.nytimes.com/2012/08/19/business/electronic-scores-rank-c...
Comment
Comment by Amy on August 26, 2012 at 9:48am
Comment by Douglas Dame on August 26, 2012 at 2:31am I haven't looked for more information on the algorithms and modeling approaches used by eBureau. I'm willing to assume they know much more than I do about industrial scale modeling and are not oblivious to computationally efficient techniques. As do you, of course. So let's assume that your simulated annealing implementation at least matches their modeling approach/es, whatever they are.
One of the intriguing aspects of their business model is that they don't know today what dependent (target) variables they'll be asked to work with tomorrow. "Many and highly diverse" is the impression.
Surely your ability to "match results with 500 predictors" would be somewhat dependent on what additional information content their 50,000-500=49,500 additional potential predictors could give to problem of modeling the next unanticipated problem that hits the in-box.
Or to state this another way, modeling problems have irreducible error (the inherent variance in Y) and reducible error (what the modeling process leaves on the table.) Use of optimal algorithms and transformations for a given problem, if discoverable, theoretically reduce their contributions to the total model error to zero. But in practical terms, reducible error is always conditional, it's floored by the information content of the predictors considered.
"Better ingredients, better pizza." - Papa John's.
"We don’t have better algorithms than everyone else; we just have more data.” Google’s Chief Scientist Peter Norvig, 3-21-10
What am I missing or misunderstanding ? (Feel free to point me to any references I need to add to my self-study efforts.)
Comment by Vincent Granville on August 25, 2012 at 11:20pm @Douglas: If your collect (say) 500 attributes on each user, you can derive 500*499 combinations of 2 metrics (e.g. income per age), and 500*499*498 combinations of 3 metrics (e.g. income per age and zip-code). This is much more than the 50,000 attributes used by e-Bureau. Indeed, we are dealing with a computational optimization problem involving trillions of trillions of trillions of (compound) metrics. There are automated techniques such as simulated annealing that will quickly give you a robust (local) optimum to your problem, using less than 50 metrics (out of say 10^30 compound metrics combinations), with a lift superior to the one obtained using 50,000 attributes, and barely below the lift provided by using the entire 10^30 metrics combinations mentioned above (to provide the global optimum). There's no hand-crafting involved in my solution, indeed I plan to offer it as AaaS (Analytics as a Service) which means that the solution would be obtained without any human interaction of any kind (just machines talking to machines).
I'll soon publish a paper about "how to test trillions of attribute combinations at once to identify great predictors in a robust way". More on this when I will have completed my paper.
Comment by Douglas Dame on August 25, 2012 at 2:42pm Vincent:
I can readily imagine that with (your skills and) your data, your target variables, and your methods, you can show that well-designed small models can equal or out-perform what I'll call massively wide models.
But will that prove that with eBureau's target variables (? scores for propensity to buy, financial quality, potential customer-lifetime-value, etc ?), their data, their methods ... including any business requirements for turnaround times for training and/or scoring ... that a small model approach would also work as well or better for them ? I'm inferring from the article that they're doing mongo-scale industrial modeling, highly automated ... so careful "hand-crafting" is not a viable option. (But that's just my guess.)
Per eBureau's website, the eScore is not a single, pre-defined metric. It is specialized to the purpose at hand. ("eScores are highly effective because they are developed and customized for the particular needs of your business.") So as a customer I might have one flavor of eScores in regards to my personal computer purchases, one for my yacht purchases (that would be a zero), another for mail-order dehydrated mangos, and another for the probability that I will renew my PO Box new year, just to make up a few random things.
The cleverness of their approach is that with a humongously wide collection of consumer attributes, and mega-horsepower, they can (apparently) throw everything against the wall as a brute-force approach and see what sticks.
The value of information is a function of its usefulness, cost, and timeliness. At the very least, you have to say they've developed an interesting approach.
(If I were a credit regulator or consumer advocate, I would be very concerned with any scoring schemes for POTENTIAL customers that had the de facto effect of triaging them onto a good/cheap credit or price path vs a bad/expensive credit path. Clearly sometime in the near future society is going to need to revisit the issue of what is, or is not, discriminatory or predatory pricing, because the ability to do those things, instantaneously, is already highly advanced. In one of the cited examples, call-in customers were scored AND triaged before the phone was even answered.)
I'm just thinking on paper here, don't ever recall having heard of eScores before.
Comment by Vincent Granville on August 23, 2012 at 2:47pm Click here to read a potential application to improving ad relevancy.
Comment by Lance Olson on August 23, 2012 at 1:50pm Nice find.
I would like to contribute to your statement "...scores are not just used to predict credit worthiness..."
Not long ago(March 2012), I bought a book titled "Who's #1?: The Science of Rating and Ranking" by Amy Langville and Carl Meyer. These are the same authors who wrote about Google's pagerank. The book gives a good introduction to Rating and Ranking, which is pretty much scoring. The primary focus is on sports teams but the mathematical methods could be applied to many other fields as mentioned by the authors. The book also talks about how Qbert is related to ranking. I love the book.
I thought that I would post a partial list of the uses of scores/rating/ranking applications:
... the list goes on and on.
Maybe some have more ideas to post here.
© 2013 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge