A Panda Wandering in the Siberian Steppe...

Following the recent shake up in Google’s organic results after the Panda update, SEOs, and web site owners alike, have become increasingly concerned about the future of natural search. There is no doubt that most of what works today for SEO is not going to work for very long.  That supports the fact that most of what was effective 10 or even 5 years ago cannot deliver positive positional gains these days.

According to Google’s official blog post the Panda update “was designed to reduce ranking for low quality sites – sites which are low-value add for users, copy content from other websites or sites that are just not very useful.” But as anyone in the field would expect, SEOs and website owners who suffered great traffic loss from the update, heavily criticised its merits.

Applying Experts’ Opinions Algorithmically to Define Quality

Probably the most interesting aspect of the Panda update was the fact that for the first time Google admitted at wired.com that they used an evaluation system, based on specific questions made to human participants:

“We used our standard evaluation system that we’ve developed, where we basically sent out documents to outside testers. Then we asked the raters questions like: “Would you be comfortable giving this site your credit card? Would you be comfortable giving medicine prescribed by this site to your kids?”

Based on the answers of such questions Google defined what would be considered by humans as low quality content and tried to apply it algorithmically. Whether they succeeded or not is questionable but judging from the various follow up Panda updates it seems that it was rolled it out without being sufficiently tested, as many sites have been hit rather unfairly.


One thing is certain; if Google could genuinely distinguish between low and high quality content then Panda should hit specific pages rather than entire sites. It seems harsh to devalue content algorithmically without introducing any kind of adaptive machine learning. Not all verticals are the same; therefore it is 100% unfair to treat all sites the same way. Low quality content on a news site for instance could be content that hasn’t been written in a journalistic way or may be rumours appearing as news, whilst low quality content on a photographer’s site may be a badly taken shot or on a recipes’ site low quality could refer to lack of descriptive cooking instructions or lack of high quality images.


Quality, Usefulness and Subjectivity

Another important aspect is that of subjectivity. Quality and usefulness are very subjective attributes, so, what may be high quality for one may be low quality for another. Isn’t it humiliating to rely on Google to decide what is and what isn’t quality content? How could an algorithm make a decision on what is useful for a human being? Isn’t it a kind of censorship that a group of people have defined what they consider to be low quality and then pass their judgements on to another, more technical minded group, which then attempted to propagate all those judgements programmatically? Can anyone imagine what the reaction would be if the BBC were to arbitrarily define type A programs (e.g. documentaries) as high quality and type B programs 9 (e.g. football games) as low quality, hence broadcasting just that of type A? And what about those who find football more interesting?

In order to define quality content Google should first define quality. It sounds rather arrogant of them to define quality content, while at the same time they don’t reveal how that is being measured. Who are the actual evaluators? What is their background? Are they scientists or artists? Male or female? Americans, French or Japanese? Are there any intellectuals and philosophers among them? Quality cannot be imposed or taught, nor can it be measured. Why should we rely on an algorithm that sorts web pages in a rather simplistic Boolean way in its attempt to bring quality content to the front?


Usability as a Ranking Factor?

Speaking about quality on the web, usability should be a key factor.  Although Matt Cutts has recently admitted that site usability is not a ranking factor and that Panda isn’t targeted at usability, there is no reason to rule it out as a future addition into Google’s search engine algorithm. Once Google figure out how to measure usability algorithmically, usability evaluation would instantly be incorporated into the ranking signals.


MatrixNet: An Algorithm That Learns

Unlike Google which follow the dogma ‘one size fits all,’ Yandex have taken a totally different approach in order to return the best possible results. Machine learning sits at the core of MatrixNet, the power engine of Yandex, Russia’s most successful search engine:

“...a search engine has to be able to make decisions based on the previous experience, that is, it has to learn...After a search engine has found dependencies between web pages in the learning sample and their properties, it can choose the best ranking formula for the search results it can deliver to a specific user’s query and return the most relevant of them on top of all the rest.


In order to meet users’ expectations, Yandex sampled several user queries as well as desired results applying human assessments. The ‘assessors’ - similar to Google’s evaluators,  decide what the ideal result for a given search query should be. This process being repeated several times until a sufficient sample is built which will be fed into the search engine. With the use of machine learning and artificial intelligence, MatrixNet will identify the dependencies between the pages that consist of the good results.

By analysing the properties of the relevant result pages, the algorithm can learn which other pages could be of equally high quality. The higher the number of properties which are being evaluated, the better the quality of the results. By extending the number of assessed attributes erroneous dependencies will be avoided. For instance, it may be the case that two sites, which have been assessed as of good quality, may both have a red background colour and a white font but this doesn’t mean that that these are the attributes responsible for the relevant results. According to Yandex, the MatrixNet formula is based on tens of thousands of ranking factors as opposed to a few hundreds of signals in Google’s algorithm.

The launch of MatrixNet in 2009 was followed by Yandex being the fastest growing search engine, ahead of Google, Yahoo and Baidu. The reason behind Yandex success, isn’t so much the fact dealt with great success the peculiarities of Russian, an inflected language.  The main reason behind the success story is Yandex’s search technology and in particular the quality results powered by MatrixNet.


Was Panda Inspired From MatrixNet?

Yandex is still beating Google in Russia and it doesn’t seem like Google stand a good chance to take over in the near future. With the recent Panda update, it seems that Google have tried to introduce some of Yandex’s MatrixNet technology innovation into their algorithm, with questionable success. However, Google still have a long way to go as far as machine learning and artificial intelligence is concerned.

Given that Yandex focus exclusively on Slavic-speaking countries there doesn’t seem to be any threat for Google in the foreseeable future. However, this lack of competition may be a great missed opportunity to more swiftly enhance the quality of natural search results by introducing some advanced artificial intelligence. is there any food for Pandas in Siberia?


* This post was writen by Modi (AKA Modi Mann). You can follow Modi on Twitter @macmodi