Subscribe to Vincent Granville's Weekly Digest:

Seven questions about real time analytics

  1. What types of data structures are routinely used for in-memory real time transaction scoring? I've used doubly circular linked lists to store (say) 20 most recent transactions with time stamp and other attributes, per merchant / per customer.
  2. What kind of metrics work well in this context? Among many metrics, I've used last transaction or time to 5-th previous transaction.
  3. Do you use a lot of rather small lookup tables that you can upload in memory, to store historical data, such as merchant summary statistics broken down per day, for the last 3 months (one entry per merchant per day)?
  4. How do you optimize server performance? For instance, at 2am, when the volume of transactions is 5 times lower than at peak time, do you use the analytic servers for other tasks, such as end-of-day re-scoring?
  5. At peak time (severe peaks), do you use a simplified model that requires less memory, if you lack bandwidth?
  6. Have anybody used the Hadoop environment to feed into a true real time processing system (that is, with no latency), such as credit card processing?
  7. For data science ROI to be positive, should advanced analytics / data science costs (in terms of people, extra hardware and software) represent less than 10% of the cost of general computer architecture (servers, engineers, basic data processing and reporting)? Is there a magic number, and if it is not 10%, what would it be?

Views: 379

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Mike O'Neil on May 12, 2012 at 5:27pm

Re Q7, there is no magic other than the simple arithmetic that for any ROI to be > 1, R just needs to be bigger than I. (ROI=R/I) 

Is this question really about how to attribute the contribution of data science relative to the other contributions to calculating I.  What matters for that is establishing the marginal R on marginal I for each component contributing to total I. If marginal return on marginal I is greater for data science inputs, then you would spend the extra on it, if it was not, then clearly the marginal investment should be on the other inputs.

Follow us

© 2013   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service