Subscribe to Vincent Granville's Weekly Digest:

10+ Great Metrics and Strategies for Fraud Detection

Emphasis here is on web log data. More than one rule must be triggered to fire an alarm. You may use a system such as hidden decision trees to assign a specific weight to each rule.

  1. Monte Carlo simulations to detect extreme events. Example: large cluster of non-proxy IP addresses that have exactly 8 clicks per day, day after day. What is the chance of this happening naturally
  2. IP address or referral domain belongs to a particular type of blacklist, or whitelist. Classify the space of IP addresses into major clusters: static IP, anonymous proxy, corporate proxy (white-listed), edu proxy (high risk), highly recycled IP (higher risk), etc.
  3. Referral domain statistics: time to load with variance (based on 3 measurements), page size with variance (based on 3 measurements), text strings found on web page (either in HTML or Javascript code). Create list of suspicious terms (viagra, online casino etc.) Create list of suspicious Javascript tags or codes but use white list of referral domains (e.g. top publishers) to eliminate false positives. 
  4. Analyse domain name patterns, example: a cluster of domain names, with exactly identical fraud scores, are all of the form xxx-and-yyy.com, and their web page all have the same size (1 char).
  5. Association analysis: buckets of traffic with a huge proportion (>30%) of very short (< 15 seconds) sessions that have two or more unknown referrals (that is, referrals other than Facebook, Google, Yahoo or a top 500 domain). Aggregate all these mysterious referrals across these sessions - chances are that they are all part of a same Botnet scheme (used e.g. for click fraud).
  6. Mismatch in credit card fields: phone number in one country, email or IP adress from a proxy domain owned by someone located in another country, physical address yet in another state, name (e.g. Amy) and email address (e.g. joy431232@hotmail.com) look very different, and a Google search on the email address reveals previous scams operated from same account, or nothing at all
  7. Referral web page or search keyword attached to a paid click contains gibberish or text strings made of letters that are very close on the keyboard, such as fgdfrffrft. 
  8. Email address contains digits other than area code, year (e.g. 73) or zip-code (except if from someone in India or China)
  9. Time to 1st transaction after sign-up is very short
  10. Abnormal purchase pattern (Sunday at 2am, buy most expensive product on your e-store, from an IP outside US, on a B2B e-store targeted to US clients)
  11. Same small popular dollar amount (e.g. $9.99) across multiple merchants with same merchant category, with one or two transactions per cardholder

Related articles:

Views: 1793

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Amy on June 1, 2012 at 8:26am
Gregory: possibly a bigger issue is how frequently lookup tables are updated. For instance, how often do you update your blacklist of IP addresses? Better blacklists should have several fields:
- IP address or range
- date when flagged
- date when flag should automatically clear
- reson code for flagging
- severity or score
Comment by Gregory Piatetsky-Shapiro on June 1, 2012 at 7:47am

Very useful list !  I wonder how rapidly fraud patterns change ?  How long would these patterns hold for?

Comment by Angela Waner on May 31, 2012 at 9:21am

When I read the title of this post, I thought you would be talking about insurance fraud. I have been working on projects related to insurance companies recent, which is why I jumped to this conclusion. 

I have found number 6 to be highly useful for website fraud detection.

Comment by Amy on May 31, 2012 at 1:10am

If you analyze referral data, scrape all referral domains (good and bad), and create a data dictionary containing all the terms found across all domain webpages, with a fraud score attached to each term.

Follow us

© 2013   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service