# AnalyticBridge

Subscribe to Vincent Granville's Weekly Digest:

# May 2011 Blog Posts (34)

### O(n) clustering algorithm for very large, unstructured data

Let's say that you have a large number n of elements a, b, c, etc. and you want to group them into clusters. Each cluster is supposed to contain few elements, say O(1).

You have one similarity metric d(a,b) to compare any two elements a, b. Also, you have a list of all pairs where d(a,b) > threshold, or in other words, all pairs (a,b) where a and b belong to the same cluster. The n x…

Continue

Added by Vincent Granville on May 30, 2011 at 11:00pm — 1 Comment

### SEO Strategies- The Secrets to boost your Online Visibility

For a business, a website serves as a medium to earn money and mark an online presence. But is it enough to create a website and leave it as it is? Probably it would then resemble a stationary car, which was started-on but nobody cared to put the pedal on the accelerator.

Many-a-times, you may come across a good-looking website, not giving any results to its owner. It could be heart-breaking! It usually happens…
Continue

Added by Manish Mohan on May 30, 2011 at 12:38pm — No Comments

### The Analyticbridge Theorem (AKA the Fundamental Business Analytics Theorem)

See attached document, including the theorem, its proof and applications to business analytics (e.g. to produce model-free, data-driven confidence intervals for predictive scores). More explanations coming soon, in particular about how to leverage this deep statistical result when computing metrics against very large data sets.…

Continue

Added by Vincent Granville on May 29, 2011 at 7:00pm — 1 Comment

### What causes predictive models to fail - and how to fix it?

• Over-fitting.If you perform a regression with 200 predictors (with strong cross-correlations among predictors), use meta regression coefficients: that is, use coefficients of the form f[Corr(Var, Response), a,b, c] where a, b, c are three meta-parameters (e.g. priors in a Bayesian framework). This will reduce your number of parameters from 200 to 3, and eliminate most of the over-fitting
• Perform the right type of cross-validation. If your training set has…
Continue

Added by Vincent Granville on May 28, 2011 at 8:00pm — 8 Comments

If you have more than 100 friends on Facebook, you've probably noticed that Facebook always show up the same 20 friends on your profile page, day after day. FB actually shows up to 10 friends, but they rotate from a list of 20 friends that, according to FB data mining algorithms, are deemed to be your best friends.

What makes a connection become one of your FB best friend is how frequently she visits your profile. Your can influence this list to some extent, by posting comments…

Continue

Added by Vincent Granville on May 28, 2011 at 6:30pm — No Comments

### IBM Commits \$100 Million to Massive Scale Analytics Research

ARMONK, N.Y.May 20, 2011 /PRNewswire/ -- As companies seek to gain real-time insight from diverse types of data, IBM (NYSE: IBM) today unveiled new software and services to help clients more effectively gain competitive insight, optimize infrastructure and better manage resources to address Internet-scale data. For the first time, organizations can…

Continue

Added by Vincent Granville on May 28, 2011 at 10:58am — No Comments

### New Startup in Predictive Analytics Market

Big data predictive analytics provider Alpine Data Labs secures large capital boost; enters U.S. market

Information Management Online, May 12, 2011 , Valerie Valentine

May 12, 2011 - Big data predictive analytics developerAlpine Data Labs got a \$7.5 million funding boost this week, at the same time as it announced entry to the market after 15 months of product development of the Big Data…

Continue

Added by Titus on May 25, 2011 at 12:52pm — No Comments

### Analytics Driving Customer Engagements

Marketing has traditionally been perceived as a cost centre and defining an optimum marketing spend has never been that easy. Big companies spend huge on brand promotions or ATL activities. BTL managers are usually under pressure to justify ROI from each penny spent. The fact that BTL activities also promote the brand is very often ignored and all you have…
Continue

Added by Rakesh Ranjan on May 25, 2011 at 10:00am — No Comments

### RapidMiner voted most popular data mining / analytic software on KDNuggets

The poll had a record participation (over 1,100 voters). Among them, 43% used only commercial software, 32% only free software, and 25% both. The average number of tools per user was 2.2.

RapidMiner, R, and Excel were again the most popular tools, with SAS remaining the top commercial tool. Pie chart shows the breakdown of voters by region. We also note that W. European data miners had the highest % of free tool use (due to popularuty of tools like RapidMiner and KNIME… Continue

Added by Vincent Granville on May 24, 2011 at 6:15pm — No Comments

The position would be located in Ebay’s Whitman Campus (Campbell, CA)

Business Title – Sr. Manager – Vertical Analytics

Position Type – Full Time Employee

Description - The Site Analytics group is responsible for delivering business insights and high impact analyses to the Global eBay Marketplace businesses (eg: eBay.com, eBay.co.uk).  Within this group, teams partner with business unit clients to address strategic and operational questions…

Continue

Added by Heena Tripathi on May 24, 2011 at 5:49pm — No Comments

### Beyond Predictive Analytics: Prescriptive Analytics

I came across an interesting read about prescriptive analytics that I wanted to share with the community. The author describes prescriptive as the next evolution after predictive analytics. This paragraph summarizes the differences between Predictive and Prescriptive:

While predictive analytics helps you model…

Continue

Added by Mike Kennedy on May 24, 2011 at 10:08am — 2 Comments

### Influence Score

Hi I trying to build "influence score" for web sites and journals and I was wondering if someone can point me in a right direction as to the model and technique I should use.   Can I model the “influence score” simulate to credit score? Is the Klout score for measuring influence of bloggers a good example of “influence score”? Can I use logistic regression to try to model "influence score" or should I use other models? If anyone has experience in modeling "influence score"  or knows of any…

Continue

### Credit Score Cards

Information Technology as a industry has grown up in leaps and bounds. You may not find any organization on the planet which does not have any IT involved.  This has given rise to lot of jobs supporting the IT functions. Salaries have increased tremendously in IT compared to other business areas. Overall economy had gone up which increased the tendency of people to afford & buy more & more.
This has increased the usage of Credit in everyday life. “Buy now pay later”…
Continue

Added by Sandeep Raut on May 22, 2011 at 8:40pm — No Comments

### Google introduce WebP: New image format for web

As we are familiar with WebM which has been introduced last year and successful implementation of that format in Youtube last month. Now Google announced a new format for image called WebP. WebP format of image allows you to compress your file space upto 40% without any change in its original resolution not only that but it also magnifies your pix resolution from all other formats like JPEG or PNG.

Google officially announced that they have improved the compression algorithm in WebP… Continue

Added by Manish Mohan on May 22, 2011 at 2:24pm — No Comments

### GOOGLE goes Social by introducing +1 Button

Few days ago I was searching for something in Google. When I got result of my search then I saw that one blue button with +1 appeared in the right end of that search result. I just clicked that button and ignore that thing and just got involved with my search results which I had searched for. But few days later when I had gone through my Google profile I noticed that one new tab of +1’s had been appeared in my profile. When I…
Continue

Added by Manish Mohan on May 22, 2011 at 11:37am — 1 Comment

### Ethics of Graph-Making: Originally posted at StatSoft.com

In a few political and data-visualization blogs the past several days, there has been a kerfuffle concerning this bar chart that the Wall Street Journal published. The gist of the chart is that the bulk of the taxable income in this country…
Continue

Added by Amanda Shankle-Knowlton on May 20, 2011 at 7:30am — No Comments

### ASA and CHANCE Magazine Sponsor Blog to Foster Discussions of Probability, Statistics

The American Statistical Association and CHANCE magazine have debuted The Statistics Forum, a blog to provide everyone the opportunity to participate in discussions about probability and statistics and their role in important and interesting topics. The blog, which is located on the CHANCE web site atchance.amstat.org, is edited by Andrew Gelman. Everyone is invited to read and comment on the…

Continue

Added by Vincent Granville on May 19, 2011 at 5:43pm — No Comments

### American Statistical Association Urges Support of Statistical Literacy Bill

The American Statistical Association (ASA), the nation's preeminent statistical society, urges members of the House of Representatives to support the Statistics Teaching, Aptitude and Training Act of 2011 (STAT Act of 2011), which was introduced today by Congressman Dave Loebsack (D-Iowa). A copy of the bill may be viewed at…

Continue

Added by Vincent Granville on May 19, 2011 at 5:41pm — No Comments

### humans.txt: New Idea for Human not for Robots

This new creation is brought to you by humanstxt.org; just to make you aware about all those incorporated individuals who brainstorm day and night for their website. It’s a new era where the creator doesn’t take the credit himself but also admires the other helping hands.

Nowadays,   robots.txt is often used. It prevents the robot of search engines to… Continue

Added by Manish Mohan on May 17, 2011 at 5:56am — No Comments

### Statistics Academic Journal Pulls Climate Denialist Study After Charges of Plagiarism

"Evidence of plagiarism and complaints about the peer-review process have led a statistics journal to retract a federally funded study that condemned scientific support for global warming.

The study, which appeared in 2008 in the journal Computational Statistics and Data Analysis, was headed by statistician Edward Wegman of George Mason University in Fairfax, Va. Its analysis was an outgrowth of a controversial congressional report that Wegman headed in 2006. The 'Wegman Report'…

Continue

Added by Richard on May 16, 2011 at 7:20pm — No Comments

2013

2012

2011

2010

2009

2008

1

2

3

4

5

6

7

8

9

10