The following article was written ages ago… it may not be accurate today. But read it anyway, I put a heap of work into it back in the day.
Google is a search engine; everyone knows that – but how does it actually work. In this article, we will delve into Google’s behind the scenes, from how they crawl the web, to how results are shown to you.
Part I: The Googlebot.
Like most search engines, Google ‘crawls’ the web with a ‘bot’ known as the googlebot. It starts at a website submitted to Google, and follows all the links on that page. These types of bots are known as ‘spiders’ due to the way they spread around the internet. By doing this, Google creates a database of websites and their content. This database is updated fairly regularly, but it is not instant, that’s why changes on some websites do not appear on Google instantly, this update time can be anywhere from a few hours to weeks, or even longer.
Part II: The Database.
Google’s database contains millions, if not billons of web sites. Because of the sheer amount of data, Google starts stripping away at unneeded data. Common words such as and, if, or, but; etc. Google converts pages to lower case, and ignores punctuation. Google then creates an index of words and web pages that contain those words, much like the index in the back of a book.
Part III: The Search.
When a search is run on Google, it searches the index for the words in the query. A list of websites that contain those words are sent to the document servers, where the original database of websites is found. More information is then gathered, like the title of the web page, and the excerpt displayed below the result. This information is then collated and sent back to the browser in a fraction of a second.
Google recieves more than 1.5 billion hits a day.
Part IV: The Rank.
The order Google results are displayed is not random, nor is it the amount of times the query appears in the page. No, google ranks webpages using a system known as PageRank. Google’s PageRank system is mainly a company secret, however their Corporate Information Website says that google uses “more than 500 million variables and 2 billion terms” to rank websites. The main rankings are based on how many links a page has to it. For example, if one page is linked to by another site, it will have its value increased. The amount by which it is increased is determined by the value of the page that links to it… and so on and so forth.
So, that’s pretty much the basis of Google’s search process, now of course, this isn’t everything, but I hope that this has given you at least a rough idea of how Google works.
