Subscribe to Vincent Granville's Weekly Digest:

Large number of data-driven, distribution free confidence intervals

A well known result from the Extreme Value Theory says that for any set of observations x1, ... , xn from a same distribution, if you draw a new observations x* from the same population, then P[x*< min(x1...xn)] = P[x*>max(x1...xn)] = 1/(n+1).

Has anyone used this result (or generalizations of this result) to build a large number of confidence intervals, e.g. when you create a rule system with a large number of attribute combinations, and you want to assess the statistical significance of some parameter associated with each attribute combination (such as probability of paying back a loan if you are male, over 60, and renter in NYC).

To put it in a different prespective, you want to compute very conservative confidence intervals, very fast with a simple methodology, and build millions of confidence intervals. People familiar with scoring large data sets (e.g. credit card transactions) might have an answer to this question.

Tags: analysis of variance, attributes, confidence intervals, cross validation, extreme value theory, false positives, features, scores, scoring

Views: 7

Reply to This

Follow us

© 2013   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service