Using design of experiments techniques and statistical data mining, I identified a botnet generating more than 10 million dollars in click fraud revenue annually. I discovered and understood the mechanism used by the botnet to generate click fraud. I also discovered that with very little additional efforts and intelligence, they could potentially have generated 100 million in fraud revenue and at the same time be considerably more difficult to detect.
The botnet - a low frequency botnet hitting advertisers no more than twice per day to avoid detection, was initially discovered in a small dataset that was used for testing purposes. The dataset in question was part of a design of experiments.
Oultlier detection techniques applied to million of multivariate and compound metrics found that the click-to-conversion ratio was consistently outside a very conservative confidence interval. This initiated an investigation where many other abnormalities were found:
very low variance
very short visits
inability to generate fake user agents due to the technology used by the botnet
targeting 50% of all advertisers, working with keyword lists
relatively good IP distribution, with some government IPs over-represented
poor user agent distribution
triggering CSS, JS, GIF and JPEG HTTP requests (behaving a little bit like a real human)
generating a very large volume of bogus conversions
erroneously identified by Alexa and other web analytics companies as real users
associated with the largest search engine
discovered before Google engineers found it (I am not sure if they ever found it)
clicks generated by the botnet was charged as good clicks
After understanding the mechanisms at play, I was able to identify additional botnets associated with other search engines.
Tags: botnet, fraud detection, web analytics, web mining
Share
-
▶ Reply to This