A well known result from the Extreme Value Theory says that for any set of observations x1, ... , xn from a same distribution, if you draw a new observations x* from the same population, then P[x*< min(x1...xn)] = P[x*>max(x1...xn)] = 1/(n+1).
Has anyone used this result (or generalizations of this result) to build a large number of confidence intervals, e.g. when you create a rule system with a large number of attribute combinations, and you want to assess the statistical significance of some parameter associated with each attribute combination (such as probability of paying back a loan if you are male, over 60, and renter in NYC).
To put it in a different prespective, you want to compute very conservative confidence intervals, very fast with a simple methodology, and build millions of confidence intervals. People familiar with scoring large data sets (e.g. credit card transactions) might have an answer to this question.
Tags: analysis of variance, attributes, confidence intervals, cross validation, extreme value theory, false positives, features, scores, scoring
-
▶ Reply to This