Data Intelligence, Business Analytics
Or blending data science with the art of search engine optimization (SEO). Here we propose a statistical methodology to increase the amount of organic traffic that a web site receives from Google for specific keywords, leveraging SEO principles to make it a real science, not just an art.
Traditionally, SEO (when implemented by statisticians) is just about A/B, multivariate or Taguchi testing, and other similar schemes sometimes involving fractional factorial designs. Here's my proposal for a high level, generic SEO engine, to find out what drives page rank (that is, whether the page in question is listed in position #1, #2 etc.) on Google search result pages for a specific search keyword:
Step 1:
Gather page rank data for 1,000 high-value keywords (from 3 or 4 different keyword categories) across multiple web pages and web sites
Step 2:
For each webpage and keyword combination, gather the following statistics (broken down per day, over the last 4 weeks), using a web crawler:
Step 3:
Built predictive model (e.g. regression) based on the data/metrics analyzed in step 2.
Note
This is a good project for someone who wants to become a data scientist. The same methodology can be used to predict generic Google page rank or web domain rank. If the page has been updated, it is better to compute the metrics on Google's cache version of the page. All the metrics mentioned above can automatically be computed with a web crawler, using multiple IP addresses from multiple locations (in case Google serves different content based on location), and multiple daily downloads for each page/keyword.
Related articles:
Comment
Comment by Vincent Granville on October 26, 2012 at 10:22am Reg, Number of occurences of keyword in question in landing page and Keyword density are two different things: one is an absolute number, the other one is a relative number.
KW in URL only helpsin some circumstances, thus the idea to test many metrics and see how their interact, and which ones can be ignored. A decision ttype of approach could work better than regression, or better use boosted / blended predictive models to discover patterns.
Comment by Reg Charie on October 26, 2012 at 10:15am There is too much in the list if you consider that Google only uses the visible portion of a page (and it's markup) to determine search results.
I have done most of these tests and here are my findings.
These are not considered. Page size, unless it is slowing down loading is immaterial.
Static or dynamic is also not a factor.
Are both the same thing. Google has told us that KW density is not a ranking factor.
This factor does not impact Google. It is an independent metric.
The same as above.
These are not factors. Google does not care how much code you use, or ratios.
The key factors are:
Google considers how people read, where they read, and how much they read and one would do well to study these factors.
Since indexing is based on relevance, the designer should understand the theory of relevance
Best of luck.
Reg
Comment by Jose Fernandes on October 26, 2012 at 9:48am Beautiful framing of the problem!!!
Could this be valuable for SEO specialists to assess where they should devote their time to have maximum impact? Through measuring variable importance and their experience of what could be improved, they would be able to improve their decision-making process..... (just a thought)
© 2013 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge