Hm ...
If I understand this chart correctly, then the lift is calucaled for each decile. Could it be that your ranking of scores is so good, that indeed most positive class items have been ranked to the top, so that less positive class items are av…
@Patrick:
Ouch, that implication hurts. As a technical person, I of course consider function over form. But as we know, the world of recruiters and references, which allows to express the worst things in elegant words, is completely different. My i…
Please take no offense, I just want to help:
Your website looks like pure Web 1.0 which (I am not exaggerating) can (today) be created by highschool kids with a basic html book. The worst thing is the color usage. I suggest to use one of the free a…
I agree, too. But one thing keeps bothering me: At first glance it is correct that the response rate should be low so that the noise caused by treating response as no-response is small. But:
Given an arbitrary response rate, a set of responses and…
Just a remark:
For the sake of occam's razor I would try a k-nearestneighbor (ignoring the labels of course) first. This approach has been ennobled by collaborative filtering. Of course the performance will be worse compared to SOM and slow (if your…
I do not agree.
Of course, such small errors happen and yes, they are annoying. The question is: Can be ensured (regarding your example) that if the code compiles after the semicolon was added, that the semantics is exactly as the coder planned it…
I do not have much time currently, but I cannot hold back to remark this:
Kohonen is equal to k-Means if a) k = number of neurons used in the map and b) k-means is implemented in the way that the center is adjusted instead of simple calculating the…
I am a little confused: Do you already have a predictor variable or not ?
A short remark:
I did not know that Kohonen Clustering is directly applicable to discrete data. I guess it works for the same reason as in the case of Logistic Regression: Tr…
Data mining is a huge mystery to almost everyone who isn't a practitioner.
I definitely agree. Even practitioners are surprised from time to time by black-box-effects, which cannot be understand because e.g. the model does allow only restricted ins…
Here is a paper which helped me to get a better feeling for the challenges in this area:
http://www.sigkdd.org/explorations/issues/4-2-2002-12/lo.pdf
Note: I studied this paper a year ago and did not do any reasearch since then. There may be some "…
Hm. In my opinion this statement is too general to be true.
I recently used Logistic Regression in combination with genetic attribute construction and selection. It indeed improved the results significantly (yes, variables were removed).
Despite t…
Thanks both of you for your remarks.
@Ralph: Yes, a source control is essential. Tag the code and (please) accept the policy not to change tagged code and (*gasp*) save it back to the tag. I am still wondering why some system (at least subversion)…
Pretty funny picture, I hope it's not your office!
I have lots of tricks, but lists are death for me. I spend lots of time making them, and then never refer to them again. Whiteboards and chalkboards work (somewhat) for me, but paper and electronic…
Just like any software project, it is best to "version" both your code, as well as the data you are using. The code should be no problem, you can always maintain multiple version changes of the code. It can get tricky if you are maintaining multiple…
I recently finished my Diplom in Computer Science and Mathematics and looking now for the area of Data Mining to specialize in.
I am supporting and believing in Open Source Tools for Data Analysis. Currently I am using R and RapidMiner, but I am always looking for new languages and technologies to improve my processes more and more.
Thanks a lot Steffen..that was really helpful on node impurity etc...
At 12:00am on November 2, 2009, DeeptiSaxena said…
Hi,
nice to hear from you.
yeah! this site definitely gives you a feeling that there is a bunch of mad heads(for numbers) and few want to hire them as well. ;P
I am a post grad in stats and currently working on web analytics, SEM and statistical modelling.