As a responsible SEO, I’m supposed to fall in line with my SEO brethren and repeatedly chant the following mantra: “Don’t chase the algorithm! Focus on long-term strategy!! Do real company sh*t!!!”
That’s all great advice, but you know what? Being responsible is boring… and I love a good chase! Plus, if no one chases the algorithm, Matt Cutts will get bored.
No one wants Matt Cutts to be bored so this post offers 7 resources to help you chase Google’s algorithm like a champ…
Detecting Algorithm Changes
First thing’s first. If you don’t know when the algorithm changes direction, you’ll never catch it! With that in mind, these resources will help you detect algorithm changes.
1. SERP Fluctuations
When Google updates their algorithm, numerous sites rank differently (some sites move up in the rankings; other sites move down). These ranking changes create fluctuations in the SERPs that can be monitored to approximate the magnitude of the update.
(a) MozCast
This tool provides a Google weather report, which describes recent (and historic) fluctuations in the Google SERPs for 1,000 keywords. If the reported Google weather is hot and stormy, the SERPs have been highly volatile. On the other hand, if the reported Google weather is cold and clear, the SERPs have been relatively stable.
The weather is calculated based on organic search results for a hand-selected set of 1,000 keywords. Every day, the SERP for each keyword is compared against the previous day’s SERP. The changes for each keyword are calculated, and then, all of the calculations are averaged and normalized to appear as a Fahrenheit temperature.
In addition to the daily report, MozCast also provides a 30-day historical view of the Google weather and a summary of major weather events (e.g., Panda, Penguin, and mysterious updates). For more information about the tool and how the weather is calculated, check out MozCast’s About page.
(b) SERPmetrics US Flux Charts
This chart shows the recent SERP fluctuations for the three most popular search engines: Google, Bing, and Yahoo. The SERPmetrics chart isn’t as pretty as MozCast’s weather-inspired graphics, but it’s slightly more useful because it draws from a larger sample size.
The daily flux is calculated based on organic search results for over 100,000 keywords (compared to 1,000 for MozCast), and the top 100 search results are evaluated for each keyword (compared to the top 10 for MozCast). Similar to MozCast, the changes for each keyword are calculated, and then, those calculations are averaged and plotted on the chart.
To provide an even better comparison of these two SERP monitoring tools, the following graph plots the Google fluctuations for both tools using data from July 8 through July 30. For comparison purposes, each tool’s data was normalized by scaling it to fall between 0 and 1.
As the graph shows, the tools are somewhat correlated (Spearman’s rho is 0.43); however, notable divergences have occurred. The most interesting differences can be observed between July 15 and July 29.
During this time period, both tools identify a significant amount of flux on July 16 and around July 23. However, SERPmetrics observes this flux as two prolonged periods, while MozCast observes it as two quick bursts and one prolonged period.
The most obvious explanation for these differences is the simple fact that both tools use an independent collection of keywords. Additionally, the fact that SERPmetrics uses a significantly larger sample size (more than 100,000 keywords and 100 results for each keyword) helps explain why the tool’s data identifies longer periods of flux (July 15 – July 18) and earlier signs of flux (July 22).
It’s important to note that these differences are NOT a bad thing. Each tool identifies flux in a sample of the SERP universe; therefore, when one tool identifies flux and the other doesn’t, it simply means that one collection of keywords is experiencing heavy turnover (while another collection is remaining relatively stable).
Thus, when either of the tools reports significant fluctuations, we can assume that something noteworthy is happening. Continuing with that train of thought, when both tools report significant fluctuations, we can assume that a major update is happening.
2. Webmaster Forums
The previous tools are great for identifying fluctuations in the SERPs, but they don’t tell us why those fluctuations are happening. For that information, we need to rely on the power of the crowd.
When SERPs are fluctuating, sites are dropping in the search rankings. And when sites start dropping, webmasters start talking. Thus, to learn more about SERP fluctuations, we need to monitor webmaster conversations, and the best place to do that is in webmaster forums.
Here are a few of the best forums for observing webmaster conversations:
Alternatively, if you don’t have time to monitor these forums, you can read Barry Schwartz‘s blog: Search Engine Roundtable. Each day, Barry covers the most important search stories as they are happening in the search forums. It’s like having your own tour guide for the forums.
3. Google Announcements
Another source of information about Google’s algorithm is… drum roll, please… Google! Unfortunately, since they’re the ones we’re chasing, we have to take their guidance with a huge grain of salt.
(a) Google Blogs
Many of the major search-related announcements can be found on the Webmaster Central Blog or the Inside Search Blog.
(b) Google Social Media Accounts
Google has a Google+ account as well as a Twitter account; however, the latter typically provides more useful information for chasing purposes. For example, here’s a recent Google tweet about Panda 3.9:
New data refresh of Panda starts rolling out tonight. ~1% of search results change enough to notice. More context: goo.gl/huekf
— A Googler (@google) July 24, 2012
(c) Matt Cutts
As the de facto Google spokesman for all things search, Matt is a good person to follow. Here is his Google+ account and his Twitter account. Also, here’s a recent Google+ post from Matt:
(d) SEOmoz’s Google Algorithm Change History
If you miss one of the official announcements from Google, SEOmoz has you covered. This excellent resource documents major updates dating all the way back to 2000.
Algorithm Details
The previous section is full of resources that help us identify algorithm changes. This section presents resources that help us determine the ranking factors used by Google’s algorithm.
4. Algorithm Surveys
Surveys are an excellent way to pool a community’s collective knowledge about a given topic. Since everyone has an opinion about the importance of various ranking factors, a survey helps identify factors that are generally accepted as being important.
(a) SEOmoz Search Engine Ranking Factors
Every other year, SEOmoz surveys the biggest names in the industry about their opinions on ranking factors. In 2011, 134 SEO professionals were surveyed to identify Google’s most likely ranking factors.
Here is a chart that shows the survey results for domain level keyword usage:
For more results, check out SEOmoz’s complete list of Survey Data.
(b) The Periodic Table Of SEO Ranking Factors
This graphic summarizes the major ranking factors used by Google to determine which pages belong at the top of the SERPs.
The table breaks the factors into 9 categories (content, HTML, architecture, links, social, trust, personal, violations, and blocking), and each factor is given a weight (-3 for strong penalty potential up to +3 for strong improvement potential).
For more details about these factors, read Search Engine Land’s Guide to SEO.
(c) The Search for 200
Back in 2009, Matt Cutts commented that the Google algorithm incorporated more than 200 variables, and it immediately prompted fellow chasers to start itemizing those variables.
The first effort to list the 200 variables began in this WebmasterWorld thread, and it was quickly picked up by Ann Smarty in this Search Engine Journal article.
5. Correlation Studies
Although I respect Michael Martinez’s stance on correlation studies (Google Correlation Studies are Sham Search Engine Optimization), I ultimately have to side with Rand (Why the Marketing World Needs More Correlation Research).
Yes. Correlation is not causation. But I still think it’s incredibly interesting to identify ranking factors that are strongly correlated to higher rankings. So… save the flame war, and just keep reading.
(a) SEOmoz Search Engine Ranking Factors
In 2011, SEOmoz augmented their ranking factors survey with a correlation-based analysis. Specifically, they collected 10,271 keyword search results from Google. Then, they analyzed the relationship between those search results and various proposed algorithm components using Spearman’s rank correlation coefficient.
Here is an illustration of the correlation data for page level social metrics:
For more results, check out SEOmoz’s complete list of Correlation Data.
(b) The Open Algorithm Project
Mark Collier started this project in an effort to identify the ranking factors used by search engine algorithms. To date, Mark’s work has focused on quantifying the correlation between various potential ranking factors and actual Google rankings.
Specifically, Mark collected a data set of the top 100 organic search results from Google for 12,573 keywords. Then, he used Spearman’s rank correlation coefficient to measure the relationship between Google rankings and more than 150 potential ranking factors. You can find Mark’s correlation data here.
6. Search Patents
How do search engines protect their most important intellectual property? Patents! Lots and lots of patents! Thus, if you want to learn more about this intellectual property, you should read search engine patent filings.
Or… you can just read Bill Slawski‘s blog: SEO by the Sea. The blog covers a wide range of search-related patents and articles, and it’s full of invaluable observations and advice.
If you’re new to the patent game, I strongly recommend Bill’s article about the 10 Most Important SEO Patents.
I’m also a huge fan of his posts about the 100 Best SEO Documents of All Time. In addition to patents, these SEO documents also include academic research papers, which is an excellent segue to our final algorithm chasing resource…
7. Academic Research
Most search-related academic research is focused on improving the search experience (e.g., optimizing IR systems, duplicate content detection, Web spam identification, etc.). All of this work is valuable, but for our algorithm chase, we’re interested in papers that attempt to identify ranking factors.
(a) An Analysis of Factors Used in Search Engine Ranking
In this 2005 paper, Bifet et al. attempted to approximate Google’s ranking algorithm using query results. Specifically, they used popular machine learning algorithms (e.g., logistic regression, support vector machines, and binary classification trees) to build models based on various page features (e.g., content features, formatting features, link features, etc.).
Then, they used the models to predict Google search rankings. Unfortunately, the models only marginally outperformed the strongest individual feature (i.e., the feature with the most predictive power) for a given keyword category. Based on this result, the authors concluded that Google uses numerous ranking factors that are “hidden” (i.e., not directly observable outside of Google).
(b) Removing Web Spam Links from Search Engine Results
In this 2009 paper, Egele et al. developed a classification technique that identifies spam in search engine results. However, this paper’s approach to identifying important ranking factors is far more interesting than its classification techniques.
Specifically, the authors identified 10 important features (e.g., keyword in title, number of backlinks, keyword in domain, etc.). Then, they created numerous test pages with various combinations of these features (all optimized for the same fake keyword).
Based on the relative rankings of these pages in the search engine results, they observed that the keyword’s presence in the title and text body had the strongest positive influence on the ranking. However, since the paper is 3 years old, the actual results of its experiments are not nearly as important as its general methodology for identifying ranking factors.
(c) How to Improve Your Google Ranking: Myths and Reality
In this 2010 paper, Su et al. claimed to successfully reverse engineer Google’s algorithm using linear learning models and a recursive partitioning ranking scheme. Using their approach, they were able to correctly predict 7 out of a SERP’s 10 pages for 78% of their evaluated keywords.
The authors used 17 features to represent each page (e.g., PageRank, keyword in hostname, keyword in title, etc.), and based on their experiments, they consistently identified PageRank, keyword in hostname, and keyword in title as the most important ranking factors.
Unfortunately, the general validity of this paper’s results is highly suspicious due to the limited data set that was used in the experiments (only 60 keywords were evaluated – 15 in the training set and 45 in the test set). However, despite this disclaimer, the paper is still encouraging because it gives us hope that we can actually catch the big G!
Happy Chasing!
I would love to hear from you in the comments. What are some of your favorite algorithm chasing resources? Do you have any exciting chase stories you’d like to share?
Hi Steve,
Thank you for including seobythesea.com in your list.
I’m not sure that I would call what I do on my blog “chasing” algorithms. I do look at a lot of algorithms while picking through and deconstructing patents, but I consider that more of a way to build an understanding and awareness of some of the approaches that search engines might use.
I like looking at patents and papers because they are primary sources from the search engines themselves, and that gives those documents some credibility and authority that we don’t necessarily always get from writers who don’t work at one of the search engines. They allow us the chance to gain the perspectives of search engineers, and the assumptions they make about the Web, and about search and searchers.
Search engines are black boxes, and a lot of the research search engineers do becomes propriety information retrieval information that isn’t necessarily shared with academics, with marketers, and with people who work upon and rely upon the Web. We do sometimes see papers from Google or Microsoft or Yahoo sharing some information, but most of the research they conduct stays behind their corporate walls. Fortunately, the patent process does enable us to get some hints and peek behind those corporate walls.
It’s likely that some percentage of the research that Google or Micrsoft or Yahoo conducts becomes trade secrets that we may never get any insight into. But knowing something about the information that does become public gives us the ability to be somewhat proactive in how we approach things like SEO.
Bill
Hi Bill,
Thanks for the comment!
First off, I just want to clarify that I am not trying to minimize the contributions of your blog by including them in this “algorithm chasing” discussion. The primary objective of this post is to raise awareness about excellent resources in the industry (e.g., your blog) that can be used to gain deeper insights into how search engine algorithms operate. The “chase” was meant to be more tongue-in-cheek than anything 😉
I completely agree with your assessment of patents and academic papers. They don’t offer unrestricted access to internal search engine operations, but their insights are still incredibly valuable. The Hadoop Project is an excellent example of this trade-off. Its core systems (e.g., MapReduce, HDFS, HBase, etc.) were all modeled initially after papers that described somewhat outdated internal Google systems (e.g., MapReduce, GFS, Bigtable, etc.). Although these Google systems evolved significantly (or were replaced completely) by the time these papers were published, they still offered truly valuable insights into the inner workings of the “black box.”
You’re also spot on with your assessment of the trade secret nature of the search industry. I could envision a scenario where an up-and-comer (e.g., blekko) reveals significant details about their ranking algorithm, but we’ll never see full disclosure from one of the big boys.
Thanks again for reading and commenting!
Just wanted to comment on the 1,000-keyword sample we’re using. We actually talked about tapping into the much larger SEOmoz rank-tracking data, but decided against if for a couple of reasons. While big data has its advantages, any client rank-tracking data is going to change daily (as clients add and remove keywords). Those keywords have a lot of commercial intent, may have local intent, etc.
So, we chose to hand pick a sample of keywords, distribute it evenly across volume, and very tightly control it. Each keyword is crawled at roughly the same time of day, from the same location. Rankings naturally have a ton of background noise, so our primary goal was to reduce any variations. We’re going to shoot for a larger keyword set (10K-25K), but the goal will still be to screen that set and try to make it a representative sample.
I completely agree with the decision to focus on a collection of tightly controlled keywords. This is obviously a challenging problem, and the 1,000-keyword sample is an excellent starting point. It’ll be interesting to see how the results change when the keyword count jumps an order of magnitude.
On a more general note, you made a very important observation on Twitter: we’re much better off today (with 2 flux monitoring tools) than we were a few years ago (with 0 flux monitoring tools). I’m very excited to see how MozCast evolves over time, and I really appreciate the hard work you’ve put into making it a reality. If there’s anything I can do to help, please let me know 🙂
Absolutely – I’m glad there are two sets of numbers, and I’m glad we’re approaching the problem from different angles. We’re trying to validate our numbers against a black box, and that’s incredibly tough.
Although I have seen several of these tools before there are quite a few that I haven’t seen. The weather report would be a good reference for those clients that love to send you their current rankings from an unclean browser over the weekend. I have been sent some fairly weird SERPs from weekend browsing which have been hard to explain other than “its a test”.
Hi Chris,
Thanks for the comment! I’m really glad you found some new tools 🙂
And you’re absolutely right… MozCast and SERPmetrics are both great for educating clients about the fluctuations that happen in the SERPs.
What about http://www.NotProvidedCount.com ?
Hi Graham,
(Not Provided) Count is a very valuable resource for tracking the prevalence of the (not provided) keyword, but that’s a little different from chasing the ranking algorithm. The former deals with the disappearance of Google keyword referral data, and the latter focuses on analyzing the various ranking factors that dictate the ordering of pages in the SERPs. Hopefully, that distinction makes sense.
Thanks for the comment!
-Steve