Subscribe to DSC Newsletter

Featured Blog Posts (1,420)

New Article on Data Mining: Classifiers & the ROC Curve

Many customer behaviors have the flavor of a choice between two alternatives:  Yes or no.  Buy or sell.  Renew or cancel.  Suppose software called a “classifier” is available to predict customer choices in advance.  Would you use it?  Perhaps you’d like to test it to see how well it performs before you commit.  In this installment of my series on the nuts and bolts of data mining, I discuss the use of classifiers and questions about their performance.  Regarding performance, we specifically…


Added by Daniel Graettinger on February 27, 2012 at 10:29am — No Comments

Quantifying of Extreme Events

Quantifying of Extreme Events

Vicky Fasen Claudia Kluppelberg Annette Menzel

September 28, 2011

abstract / summary

Understanding and managing risks due extreme events is one of the most demanding topics of our society. We consider this problem as a statistical problem and present some of the probabilistic and statistical theory, which was developed to model and quantify extreme events. By the very nature of an extreme event…


Added by John A Morrison on February 24, 2012 at 9:10am — No Comments

Attensity, Teradata and Anderson Analytics talk on the future of text analytics

The text analytic market is set to exceed £635mln as businesses look to capture customer sentiment to gain competitive advantage.

Companies from industries as diverse as financial services, pharmaceuticals and online retail are today looking to harness the voice of the customer across social networks to improve their services.

The technology to capture customer sentiment is becoming increasingly sophisticated, responsive, and flexible to distinct business needs. Despite the…


Added by Vincent Granville on February 21, 2012 at 5:23pm — No Comments

Are 4% mortgage interest rates a mirage?

I believe so. Here are some interesting thoughts on this:

You talk to a mortgage adviser at (say) Wells Fargo bank. You are interested in financing, own > 50%, have 2 salaries (your wife + yourself) that represents more than 50% of the amount you want to refinance,  can make a 30% down payment and have an external income…


Added by Vincent Granville on February 21, 2012 at 6:30pm — 4 Comments

Detecting Economic Events Using a Semantics-Based Pipeline

Detecting Economic Events Using a Semantics-Based Pipeline

Alexander Hogenboom, Frederik Hogenboom, Flavius Frasincar, Uzay Kaymak, Otto van der Meer, and Kim Schouten

Erasmus University Rotterdam


In today's information-driven global economy, breaking news on economic…


Added by John A Morrison on February 21, 2012 at 8:10am — No Comments

From Semantic Search & Integration to Analytics

From Semantic Search & Integration to Analytics

Amit Sheth 

LSDIS lab, University of Georgia, 415 Graduate Studies Research Center,

Athens, GA 30602-7404

Semagix Inc., 297 Prince Avenue,

Athens, GA 30601


Semantics is seen as the key ingredient in the next phase of the Web infrastructure as well as the next generation of enterprise content management. Ontology is the centerpiece of the most prevalent semantic technologies…


Added by John A Morrison on February 21, 2012 at 6:30am — No Comments

Document Classification: latent semantic vs bag of words. Who is the best?

We have seen few posts ago an approach to extract meta "concepts" from text based on latent semantic paradigm.

In this post we apply this approach to classify documents, and we do a comparison between this approach and the canonical bag of words.

The comparison test will be done through the ensemble method already showed in the last post.

To read the entire post click …


Added by Cristian Mesiano on February 20, 2012 at 7:22am — No Comments

Example of Bad Analytics and How to Remedy it

This came in my mailbox as a sales pitch by Autobox, however I thought that it is interesting:

Since we are always interested in learning about how others do time series and testing how our approaches work vis-à-vis other dated procedures, we pursued the data and would like to share our results. 

Sometimes in an…


Added by Vincent Granville on February 16, 2012 at 10:00pm — No Comments

10+ Great Metrics and Strategies for Email Campaign Optimization

This is our first article in a series about good actionable KPI's to optimize various ROI. Future articles will focuse on metrics for fraud detection, user engagement etc. This one focuses on newsletter optimization.

If you run an online newsletter, here are a number of metrics you need to track:…


Added by Vincent Granville on February 12, 2012 at 8:00pm — 1 Comment

The Age of Big Data | New York Times

By . GOOD with numbers? Fascinated by data? The sound you hear is opportunity knocking.…


Added by Vincent Granville on February 12, 2012 at 10:29am — No Comments

J.D. Opdyke, Author: Bootstraps, Permutation Tests, and Sampling With and Without Replacement Orders of Magnitude Faster Using SAS®

A very efficient approach to random sampling in SAS® achieves speed increases orders of magnitude faster than the relevant "built-in" SAS® procedures. For sampling with replacement as applied to bootstraps, seven algorithms are compared, and the fastest ("OPDY"), based on the new approach, achieves speed increases over 220x faster than Proc SurveySelect. OPDY also handles datasets many times larger than those on which two hashing algorithms crash. For sampling without replacement as applied…


Added by J.D. Opdyke on February 12, 2012 at 9:30am — No Comments

Estimating Operational Risk Capital: the Challenges of Truncation, the Hazards of MLE, and the Promise of Robust Statistics

J.D. Opdyke and Alex Cavallo

In operational risk measurement, the estimation of severity distribution parameters is the main driver of capital estimates, yet this remains a non-trivial challenge for many reasons.  Maximum likelihood estimation (MLE) does not adequately meet this challenge because of its well-documented non-robustness to modest violations of idealized textbook model assumptions, specifically that the data are independent and identically distributed (i.i.d.), which is…


Added by J.D. Opdyke on February 10, 2012 at 3:56pm — No Comments

Monitoring Financial Stability in a Complex World

Monitoring Financial Stability in a Complex World

Mark D. Flood Allan

Office of Financial Research

I. Mendelowitz

Committee to Establish the Office of Financial Research

William Nichols 

National Institute of Finance

Version 10 / January 19, 2012

Copyright 2012, M. Flood, A. Mendelowitz and W. Nichols


We offer a tour d’horizon of the data management issues facing…


Added by John A Morrison on February 9, 2012 at 10:54pm — No Comments

Australian website allows you to sell your financial spreadsheet template

We recently launched Vumero - a marketplace for Finance and Financial Modeling expertise -…


Added by Vincent Granville on February 9, 2012 at 7:30pm — No Comments

Sports Analytics – Featured Case Studies at PAW – March 4-10, San Francisco

Check out these sessions featuring sports analytics at …


Added by Vincent Granville on February 9, 2012 at 6:30pm — No Comments

Pentaho Cited as a Big Data Strong Performer by Independent Research Firm

Pentaho’s Kettle data integration product cited for ‘richest functionality and most extensive integration with open source Apache Hadoop’


Orlando, Fla. – February 8, 2012 – Delivering the …


Added by Vincent Granville on February 9, 2012 at 6:56pm — No Comments

Bayesian Outlier Detection with Dirichlet Process Mixtures

Matthew S. Shotwell and Elizabeth H. Slate


We introduce a Bayesian inference mechanism for outlier detection using the augmented Dirichlet process mixture. Outliers are detected by forming a maximum a posteriori (MAP) estimate of the data partition. Observations that comprise small or singleton clusters in the estimated partition are considered outliers. We offer a novel interpretation of the Dirichlet process precision parameter, and…


Added by John A Morrison on February 9, 2012 at 12:37am — No Comments

Interview with Drew Rockwell, CEO of Lavastorm

1. Short Bio

I started my career in the communications industry, where I spent 20 years with a Tier 1 carrier in probably 15 different jobs across the entire organization: Marketing, Advertising, Product Management, Operations, Sales, General Management, Strategy and Business Development. I basically…


Added by Vincent Granville on February 9, 2012 at 4:00pm — No Comments

Request For Proposal: Financial Market Analysis Algorithms


I am contacting you to make you aware of NineSigma Request, RFP# 67977, "Financial Market Analysis Algorithms."

NineSigma, representing a multi-billion dollar IT company, invites proposals…


Added by Vincent Granville on February 9, 2012 at 3:30pm — No Comments

One million web sites scored by how does Compete eliminate bias, blend multiple data sources and standardize unique counts?

Bigger, more diverse, more actionable online data

Since we started Compete, we have been continuously updating the quality and consistency of our data. With clickstream data available since 2002, and 10 terabytes of new data arriving monthly, we have amassed and organized hundreds of terabytes of daily consumer digital…


Added by Vincent Granville on February 9, 2012 at 3:00pm — 1 Comment

Featured Monthly Archives










Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2016 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service