As a responsible SEO, I’m supposed to fall in line with my SEO brethren and repeatedly chant the following mantra: “Don’t chase the algorithm! Focus on long-term strategy!! Do real company sh*t!!!”
That’s all great advice, but you know what? Being responsible is boring… and I love a good chase! Plus, if no one chases the algorithm, Matt Cutts will get bored.
No one wants Matt Cutts to be bored so this post offers 7 resources to help you chase Google’s algorithm like a champ…
Detecting Algorithm Changes
First thing’s first. If you don’t know when the algorithm changes direction, you’ll never catch it! With that in mind, these resources will help you detect algorithm changes.
1. SERP Fluctuations
When Google updates their algorithm, numerous sites rank differently (some sites move up in the rankings; other sites move down). These ranking changes create fluctuations in the SERPs that can be monitored to approximate the magnitude of the update.
This tool provides a Google weather report, which describes recent (and historic) fluctuations in the Google SERPs for 1,000 keywords. If the reported Google weather is hot and stormy, the SERPs have been highly volatile. On the other hand, if the reported Google weather is cold and clear, the SERPs have been relatively stable.
The weather is calculated based on organic search results for a hand-selected set of 1,000 keywords. Every day, the SERP for each keyword is compared against the previous day’s SERP. The changes for each keyword are calculated, and then, all of the calculations are averaged and normalized to appear as a Fahrenheit temperature.
In addition to the daily report, MozCast also provides a 30-day historical view of the Google weather and a summary of major weather events (e.g., Panda, Penguin, and mysterious updates). For more information about the tool and how the weather is calculated, check out MozCast’s About page.
This chart shows the recent SERP fluctuations for the three most popular search engines: Google, Bing, and Yahoo. The SERPmetrics chart isn’t as pretty as MozCast’s weather-inspired graphics, but it’s slightly more useful because it draws from a larger sample size.
The daily flux is calculated based on organic search results for over 100,000 keywords (compared to 1,000 for MozCast), and the top 100 search results are evaluated for each keyword (compared to the top 10 for MozCast). Similar to MozCast, the changes for each keyword are calculated, and then, those calculations are averaged and plotted on the chart.
To provide an even better comparison of these two SERP monitoring tools, the following graph plots the Google fluctuations for both tools using data from July 8 through July 30. For comparison purposes, each tool’s data was normalized by scaling it to fall between 0 and 1.
As the graph shows, the tools are somewhat correlated (Spearman’s rho is 0.43); however, notable divergences have occurred. The most interesting differences can be observed between July 15 and July 29.
During this time period, both tools identify a significant amount of flux on July 16 and around July 23. However, SERPmetrics observes this flux as two prolonged periods, while MozCast observes it as two quick bursts and one prolonged period.
The most obvious explanation for these differences is the simple fact that both tools use an independent collection of keywords. Additionally, the fact that SERPmetrics uses a significantly larger sample size (more than 100,000 keywords and 100 results for each keyword) helps explain why the tool’s data identifies longer periods of flux (July 15 – July 18) and earlier signs of flux (July 22).
It’s important to note that these differences are NOT a bad thing. Each tool identifies flux in a sample of the SERP universe; therefore, when one tool identifies flux and the other doesn’t, it simply means that one collection of keywords is experiencing heavy turnover (while another collection is remaining relatively stable).
Thus, when either of the tools reports significant fluctuations, we can assume that something noteworthy is happening. Continuing with that train of thought, when both tools report significant fluctuations, we can assume that a major update is happening.
2. Webmaster Forums
The previous tools are great for identifying fluctuations in the SERPs, but they don’t tell us why those fluctuations are happening. For that information, we need to rely on the power of the crowd.
When SERPs are fluctuating, sites are dropping in the search rankings. And when sites start dropping, webmasters start talking. Thus, to learn more about SERP fluctuations, we need to monitor webmaster conversations, and the best place to do that is in webmaster forums.
Here are a few of the best forums for observing webmaster conversations:
Alternatively, if you don’t have time to monitor these forums, you can read Barry Schwartz‘s blog: Search Engine Roundtable. Each day, Barry covers the most important search stories as they are happening in the search forums. It’s like having your own tour guide for the forums.
3. Google Announcements
Another source of information about Google’s algorithm is… drum roll, please… Google! Unfortunately, since they’re the ones we’re chasing, we have to take their guidance with a huge grain of salt.
(a) Google Blogs
(b) Google Social Media Accounts
Google has a Google+ account as well as a Twitter account; however, the latter typically provides more useful information for chasing purposes. For example, here’s a recent Google tweet about Panda 3.9:
New data refresh of Panda starts rolling out tonight. ~1% of search results change enough to notice. More context: goo.gl/huekf
— A Googler (@google) July 24, 2012
(c) Matt Cutts
(d) SEOmoz’s Google Algorithm Change History
If you miss one of the official announcements from Google, SEOmoz has you covered. This excellent resource documents major updates dating all the way back to 2000.
The previous section is full of resources that help us identify algorithm changes. This section presents resources that help us determine the ranking factors used by Google’s algorithm.
4. Algorithm Surveys
Surveys are an excellent way to pool a community’s collective knowledge about a given topic. Since everyone has an opinion about the importance of various ranking factors, a survey helps identify factors that are generally accepted as being important.
Every other year, SEOmoz surveys the biggest names in the industry about their opinions on ranking factors. In 2011, 134 SEO professionals were surveyed to identify Google’s most likely ranking factors.
Here is a chart that shows the survey results for domain level keyword usage:
For more results, check out SEOmoz’s complete list of Survey Data.
This graphic summarizes the major ranking factors used by Google to determine which pages belong at the top of the SERPs.
The table breaks the factors into 9 categories (content, HTML, architecture, links, social, trust, personal, violations, and blocking), and each factor is given a weight (-3 for strong penalty potential up to +3 for strong improvement potential).
For more details about these factors, read Search Engine Land’s Guide to SEO.
(c) The Search for 200
Back in 2009, Matt Cutts commented that the Google algorithm incorporated more than 200 variables, and it immediately prompted fellow chasers to start itemizing those variables.
5. Correlation Studies
Although I respect Michael Martinez’s stance on correlation studies (Google Correlation Studies are Sham Search Engine Optimization), I ultimately have to side with Rand (Why the Marketing World Needs More Correlation Research).
Yes. Correlation is not causation. But I still think it’s incredibly interesting to identify ranking factors that are strongly correlated to higher rankings. So… save the flame war, and just keep reading.
In 2011, SEOmoz augmented their ranking factors survey with a correlation-based analysis. Specifically, they collected 10,271 keyword search results from Google. Then, they analyzed the relationship between those search results and various proposed algorithm components using Spearman’s rank correlation coefficient.
Here is an illustration of the correlation data for page level social metrics:
For more results, check out SEOmoz’s complete list of Correlation Data.
Mark Collier started this project in an effort to identify the ranking factors used by search engine algorithms. To date, Mark’s work has focused on quantifying the correlation between various potential ranking factors and actual Google rankings.
Specifically, Mark collected a data set of the top 100 organic search results from Google for 12,573 keywords. Then, he used Spearman’s rank correlation coefficient to measure the relationship between Google rankings and more than 150 potential ranking factors.
Here is a chart of the correlation data for page backlinks:
To view more of Mark’s results, check out his Correlation Data.
6. Search Patents
How do search engines protect their most important intellectual property? Patents! Lots and lots of patents! Thus, if you want to learn more about this intellectual property, you should read search engine patent filings.
If you’re new to the patent game, I strongly recommend Bill’s article about the 10 Most Important SEO Patents.
I’m also a huge fan of his posts about the 100 Best SEO Documents of All Time. In addition to patents, these SEO documents also include academic research papers, which is an excellent segue to our final algorithm chasing resource…
7. Academic Research
Most search-related academic research is focused on improving the search experience (e.g., optimizing IR systems, duplicate content detection, Web spam identification, etc.). All of this work is valuable, but for our algorithm chase, we’re interested in papers that attempt to identify ranking factors.
In this 2005 paper, Bifet et al. attempted to approximate Google’s ranking algorithm using query results. Specifically, they used popular machine learning algorithms (e.g., logistic regression, support vector machines, and binary classification trees) to build models based on various page features (e.g., content features, formatting features, link features, etc.).
Then, they used the models to predict Google search rankings. Unfortunately, the models only marginally outperformed the strongest individual feature (i.e., the feature with the most predictive power) for a given keyword category. Based on this result, the authors concluded that Google uses numerous ranking factors that are “hidden” (i.e., not directly observable outside of Google).
In this 2009 paper, Egele et al. developed a classification technique that identifies spam in search engine results. However, this paper’s approach to identifying important ranking factors is far more interesting than its classification techniques.
Specifically, the authors identified 10 important features (e.g., keyword in title, number of backlinks, keyword in domain, etc.). Then, they created numerous test pages with various combinations of these features (all optimized for the same fake keyword).
Based on the relative rankings of these pages in the search engine results, they observed that the keyword’s presence in the title and text body had the strongest positive influence on the ranking. However, since the paper is 3 years old, the actual results of its experiments are not nearly as important as its general methodology for identifying ranking factors.
In this 2010 paper, Su et al. claimed to successfully reverse engineer Google’s algorithm using linear learning models and a recursive partitioning ranking scheme. Using their approach, they were able to correctly predict 7 out of a SERP’s 10 pages for 78% of their evaluated keywords.
The authors used 17 features to represent each page (e.g., PageRank, keyword in hostname, keyword in title, etc.), and based on their experiments, they consistently identified PageRank, keyword in hostname, and keyword in title as the most important ranking factors.
Unfortunately, the general validity of this paper’s results is highly suspicious due to the limited data set that was used in the experiments (only 60 keywords were evaluated – 15 in the training set and 45 in the test set). However, despite this disclaimer, the paper is still encouraging because it gives us hope that we can actually catch the big G!
I would love to hear from you in the comments. What are some of your favorite algorithm chasing resources? Do you have any exciting chase stories you’d like to share?