Subscribe to Dr. Granville's Weekly Digest

Featured Blog Posts (1,501)

Document Classification: latent semantic vs bag of words. Who is the best?

We have seen few posts ago an approach to extract meta "concepts" from text based on latent semantic paradigm.

In this post we apply this approach to classify documents, and we do a comparison between this approach and the canonical bag of words.

The comparison test will be done through the ensemble method already showed in the last post.

To read the entire post click …

Continue

Added by Cristian Mesiano on February 20, 2012 at 7:22am — No Comments

Example of Bad Analytics and How to Remedy it

This came in my mailbox as a sales pitch by Autobox, however I thought that it is interesting:

Since we are always interested in learning about how others do time series and testing how our approaches work vis-à-vis other dated procedures, we pursued the data and would like to share our results. 



Sometimes in an…

Continue

Added by Vincent Granville on February 16, 2012 at 10:00pm — No Comments

10+ Great Metrics and Strategies for Email Campaign Optimization

This is our first article in a series about good actionable KPI's to optimize various ROI. Future articles will focuse on metrics for fraud detection, user engagement etc. This one focuses on newsletter optimization.

If you run an online newsletter, here are a number of metrics you need to track:…

Continue

Added by Vincent Granville on February 12, 2012 at 8:00pm — 2 Comments

The Age of Big Data | New York Times

By . GOOD with numbers? Fascinated by data? The sound you hear is opportunity knocking.…

Continue

Added by Vincent Granville on February 12, 2012 at 10:29am — No Comments

J.D. Opdyke, Author: Bootstraps, Permutation Tests, and Sampling With and Without Replacement Orders of Magnitude Faster Using SAS®

A very efficient approach to random sampling in SAS® achieves speed increases orders of magnitude faster than the relevant "built-in" SAS® procedures. For sampling with replacement as applied to bootstraps, seven algorithms are compared, and the fastest ("OPDY"), based on the new approach, achieves speed increases over 220x faster than Proc SurveySelect. OPDY also handles datasets many times larger than those on which two hashing algorithms crash. For sampling without replacement as applied…

Continue

Added by J.D. Opdyke on February 12, 2012 at 9:30am — No Comments

Estimating Operational Risk Capital: the Challenges of Truncation, the Hazards of MLE, and the Promise of Robust Statistics

J.D. Opdyke and Alex Cavallo

In operational risk measurement, the estimation of severity distribution parameters is the main driver of capital estimates, yet this remains a non-trivial challenge for many reasons.  Maximum likelihood estimation (MLE) does not adequately meet this challenge because of its well-documented non-robustness to modest violations of idealized textbook model assumptions, specifically that the data are independent and identically distributed (i.i.d.), which is…

Continue

Added by J.D. Opdyke on February 10, 2012 at 3:56pm — No Comments

Monitoring Financial Stability in a Complex World

Monitoring Financial Stability in a Complex World



Mark D. Flood Allan

Office of Financial Research

I. Mendelowitz

Committee to Establish the Office of Financial Research

William Nichols 

National Institute of Finance



Version 10 / January 19, 2012



Copyright 2012, M. Flood, A. Mendelowitz and W. Nichols

Abstract



We offer a tour d’horizon of the data management issues facing…

Continue

Added by John A Morrison on February 9, 2012 at 10:54pm — No Comments

Australian website allows you to sell your financial spreadsheet template

We recently launched Vumero - a marketplace for Finance and Financial Modeling expertise - www.vumero.com…

Continue

Added by Vincent Granville on February 9, 2012 at 7:30pm — No Comments

Sports Analytics – Featured Case Studies at PAW – March 4-10, San Francisco

Check out these sessions featuring sports analytics at …

Continue

Added by Vincent Granville on February 9, 2012 at 6:30pm — No Comments

Pentaho Cited as a Big Data Strong Performer by Independent Research Firm

Pentaho’s Kettle data integration product cited for ‘richest functionality and most extensive integration with open source Apache Hadoop’

 

Orlando, Fla. – February 8, 2012 – Delivering the …

Continue

Added by Vincent Granville on February 9, 2012 at 6:56pm — No Comments

Bayesian Outlier Detection with Dirichlet Process Mixtures

Matthew S. Shotwell and Elizabeth H. Slate

Abstract.

We introduce a Bayesian inference mechanism for outlier detection using the augmented Dirichlet process mixture. Outliers are detected by forming a maximum a posteriori (MAP) estimate of the data partition. Observations that comprise small or singleton clusters in the estimated partition are considered outliers. We offer a novel interpretation of the Dirichlet process precision parameter, and…

Continue

Added by John A Morrison on February 9, 2012 at 12:37am — No Comments

Interview with Drew Rockwell, CEO of Lavastorm

1. Short Bio

I started my career in the communications industry, where I spent 20 years with a Tier 1 carrier in probably 15 different jobs across the entire organization: Marketing, Advertising, Product Management, Operations, Sales, General Management, Strategy and Business Development. I basically…

Continue

Added by Vincent Granville on February 9, 2012 at 4:00pm — No Comments

Request For Proposal: Financial Market Analysis Algorithms

Hello- 



I am contacting you to make you aware of NineSigma Request, RFP# 67977, "Financial Market Analysis Algorithms."

NineSigma, representing a multi-billion dollar IT company, invites proposals…

Continue

Added by Vincent Granville on February 9, 2012 at 3:30pm — No Comments

One million web sites scored by Compete.com: how does Compete eliminate bias, blend multiple data sources and standardize unique counts?

Bigger, more diverse, more actionable online data

Since we started Compete, we have been continuously updating the quality and consistency of our data. With clickstream data available since 2002, and 10 terabytes of new data arriving monthly, we have amassed and organized hundreds of terabytes of daily consumer digital…

Continue

Added by Vincent Granville on February 9, 2012 at 3:00pm — 2 Comments

Monte Carlo Evaluation of Consistency and Normality of Dichotomous Logistic and Multinomial Logistic Regression Models

Naima Shifa & Mamunur Rashid

Abstract



The dichotomous logistic regression model is one of the popular mathematical models for the analysis of binary data with applications in physical, biomedical, and behavioral sciences, among others. The feature of this model is to quantify the effects of several explanatory variables on one dichotomous outcome variable. Multinomial logistic regression model, on the other hand, handles the categorical dependent…

Continue

Added by John A Morrison on February 9, 2012 at 12:00am — No Comments

Your house might be worth more than you think: questioning Zillow estimates and their statistical methodology

When you are looking to refinance your mortgage, your banker will first look at a Zillow estimates too check the value of your home. These estimates are based on statistical models that were trained on data available when sales volume were high and patterns were either "strong growth" or "strong decline".…

Continue

Added by Amy on February 5, 2012 at 10:00pm — 4 Comments

How to prevent future 9/11 attacks

One of the things that I don't understand about 9/11, is the fact that two planes crashed into a very visible obstacle in very good weather, despite having sophisticated obstacle-avoidance systems to prevent such collisions. These are known as "ground proximity warning systems", and they have been designed long ago mostly by Don Bateman; they have saved many lives by preventing functioning airplanes from crashing into mountains. These systems are a perfect example of a highly successful…

Continue

Added by Amy on February 5, 2012 at 9:30pm — 1 Comment

Is my car spying on me? | Orato.com

There is a little computer in your car that records information that you would not have imagined. It knows the speed you're racing down the highway, it knows whether or not you're pounding on the gas or the brakes, it knows if you're wearing a seatbelt, and so much more.

The memory is stored in this little information box, safe and sound, until you get into an accident. Very similar to the black boxes on airplanes, this box holds some very important data about a driver's…

Continue

Added by Vincent Granville on February 4, 2012 at 4:59pm — No Comments

What Data Mining Can and Can't Do | CIO Insight

Peter Fader, professor of marketing atUniversity of Pennsylvania's Wharton School, is the ultimate marketing quant—a world-class, award-winning expert on using behavioral data in sales forecasting and customer relationship management. He's perhaps best known for his July 2000 …

Continue

Added by Vincent Granville on February 3, 2012 at 7:56pm — No Comments

Who Are The Top 20 Influencers in Big Data? | Forbes

We are in the top 20 list :-)

A month back I used Traackr to look at influencers in mobile (look out for an upcoming piece on Kred if you are…

Continue

Added by Vincent Granville on February 3, 2012 at 6:30pm — No Comments

Featured Monthly Archives

2014

2013

2012

2011

2010

2009

2008

© 2014   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service