In my experience, datasets are resources, not simply subjects of summary. Also, summaries are goal oriented, at least in terms of measures used to assess lossiness in a summary.
I have used repeated subsampling to assess stability of a conclusion t…
Sure, and because of a coincidence, I just discovered a neat conceptual thread for this. The theory of birthday problem can be cast into a form suitable for counting populations, the so-called Schnabel census of population biology. That has been gen…
My blog is at www.ekzept.net and it is sometimes devoted to numerical and statistical subjects, and discussions of data. It also dabbles in politics and other matters.
Anyone looked at qualitative techniques for identifying "social groups" behind Web proxies? Anyone looked at using spatial correlation from position traces of wireless mobile devices for the same?
Joint interest of users of mobile wireless devices can sometimes be inferred by observing common visits to the same locales. With the advent of time series of positions for such devices, inferences of common purpose by correlated tracks are also pos…
When these kinds of datasets are attempted, what do you do? Do you put data into huge relational databases, accepting the startup cost of designing a good schema? Do you work with flat files and UNIX sort? Do you sample from your dataset of 20 billi…
Well, you can do categorical regression. Gelman and Hill have a good, recent book that's more practically minded than Agresti or Christensen.
You might also look at latent space analysis as is done in Recommender Systems work.