Data Intelligence, Business Analytics
When you see google ads on Google search result pages or elsewhere, the ads that are displayed in front of you eyes (should) have been highly selected in order to maximize the chance that you convert and generate ad revenue for Google. Same on Facebook, Yahoo, Bing, LinkedIn and on all ad networks.
If you think that you see irrelevant ads, either they are priced very cheaply, or Google's ad relevancy algorithm is not working well.
Ad scoring algorithms used to be very simple, the score being a function of the max bid paid by the advertiser, and the conversion rate (referred to as CTR). This led to abuses: an advertiser could generate bogus impressions to dilute competitor CTR, or clicks on its own ads to boost its own CTR, or a combination of both, typically using proxies or botnets to hide its scheme, and thus gaining unfair competitive advantage on Google.
Recently, in addition to CTR and max bid, ad networks have added ad relevancy in their ad scoring mix (that is, in the algorithm used to determine which ads you will see, and in which order). In short, ad networks don't want to display ads that will make the user frustrated - it's all about improving user experience and reducing churn to boost long term profits.
How does ad relevancy scoring work?
Here's our solution. There are three components in the mix:
The solution is as follows.
First create three taxonomies:
The two important taxonomies are B and C, unless the ad is displayed on a very generic web page, in which case A is more important than B. So let's ignore taxonomy A for now. The goal is to match a category from Taxonomy B with one from Taxonomy C. Taxonomies might or might not have the same categories, so in general it will be a fuzzy match, where for instance, the page hosting the ad is attached to categories Finance / Stock Market in Taxonomy B, while the ad is attached to categories Investing / Finance in Taxonomy C. So you need to have a system in place, to measure distances between categories belonging to two different taxonomies.
How do I build a taxonomy?
There are a lot of vendors and open source solutions available on the market, but if you really want to build your own taxonomies from scratch, here's one way to do it:
Let's say that (X, Y) is such a pair. Compute n1 = # of occurences of X in your table; n2 = # of occurrences of Y in your table, and n12 = # of occurences where X and Y are associated (e.g. found on a same web page). A metric that tells you how close X and Y are to each other would be R = n12 / SQRT(n1 * n2). With this dissimilarity metric (used e.g. at http://www.frenchlane.com/kw8.html) you can cluster keywords via hierarchical clustering and eventually build a taxonomy - which is nothing else than an unsupervised clustering of the keyword universe, with labels manually assigned to (say) top 20 clusters - each representing a category.