Chapter 13. PageRank in Google

In time of increasing of information in the Internet and information indexed by search engines the searcher developers had an issue – it was difficult to rank correctly in results of search because quantity of documents equally relevant with respect to enquiry was not big. Also, ranking documents that were developed for controlling collections of documents were unprotected from simple methods of influence on them. For the good result securing you could copy the structure of key words placement from a well-ranked text by this question. Te necessity is appeared to divide information into more and less reliable and to take into account «importance» and «authoritativeness» of resources providing it. How it can be made? It is better to do basing on the data of page popularity, for example – visiting.

The model was developed that emulates user movement on documents of the network in a way of crossing through links from one document to another. It means that user crosses through any links with equal probability. So, the probability of the user to get concrete document will depend on the quantity of links on it. Also it will depend on probability of user finding on some document and quantity of outbox links in the document. This probability was accepted as indicator of authoritativeness or a PageRank:

PRa – PageRank of considered page,
d - Attenuation factor (it means probability of  the user who has come on a page will pass on one of the links locating on this page, instead of he will stop travel through a network, it is usually established equal 0,85),
PRi - PageRank of i page referring to the page а,
Ci – the general number of links on i page.

One of the main delusions is that it is possible to calculate PageRank with the help of this formula for separate document using knowing values PageRank for the referring document. You have not to do it. To calculate PageRank of any document you need to make the system N of line equations of this type for each documents from search base. There N is quantity of documents in searching base. The sum of values PageRank for all documents (i.e. probability of the user is on any page) is equal 1. The multiplier 1/N is added to free term (1 - d). The system will include N unknowns. When it will be solved, we will get values PageRank for each document known by search engine. Search bases of a lot of search engines have great amount of documents. Despite of the matrix appropriating to the system of equitation will be thinned out, numbered solution of this system needs huge accounting facilities. Thus, search system needs to make simpler the process of accounting. These concrete features of classic formula PageRank realization are commercial secret of search engines.

For the concrete document loaded into the browser, it is possible to learn normalized value PageRank, having downloaded and having established Google ToolBar - the special panel of tools for work with this searcher.

Obviously, volume of data base (DB) of search system is very important factor, but quality of DB plays great role in consist with frequency of uploadings. The aim of each SE is to provide maximally relevant response on search inquiry. Search algorithm is math formula which includes N number (enquiry of user). To consider a lot of variants of delivery by inquiry SE delivers only one enquiry according to the formula. The search algorithm provides relevant results to users by comparing key words with information in DB.