How search engines work

Finding Information on the Web:

To find Web pages on the World Wide Web, search engines employ a special kind of programs called ‘spiders’. These spiders start with some very popular and heavily used websites and start following all the links on those pages. This process of following links is, living up to the arachnid naming system of the World Wide Web, called crawling. Following every link, these spiders then manage to spread out from website to website and indexing the pages they encounter. Most search engines also allow manual submission of websites from webmasters in case their site is not linked enough. This way these search engines manage to find almost an exhaustive network of information located on the WWW.

Indexing Web Pages:

Different search engines have different ways of indexing the information they find on the Web pages. We will look into these differences in detail in coming sections. In general, however, search engines may chose to index (which means storing information and keywords found in a database) a part or whole of the information they find on these pages. Google for example is known to index all the words in the document leaving out only the articles. Some other search engines like Lycos, index page header, link text, sub-headings, first 20 words in the document and 100 most commonly used keywords.
Once search engines index a page, they need to have some mechanism of storing this information within their database. Simply storing all this keyword information with the URLs alone would not make much sense, as search engines will then have no way of telling which words on which pages are more important. To overcome this, search engines assign weights to keywords depending on where they appear in the document and how frequently they have been used. Different search engines have different approaches for this and this is why the results in various search engines differ widely on same searches.
Search engines need to maintain a look up table (a data structure) for the keywords they index. Hash table is a data structure which is among the most effective ones for these purposes.

Preparing and ordering search results:

A simple query by a user on a search engine may have many websites offering the results. You may have yourself noticed this when some of your search terms return millions of results. Now, how do search engines order these results? One parameter to use is of course the keyword weighting system discussed above. However, because the search engines want to increase the user satisfaction with their search results, the results for a particular query are ordered also, according to the quality score the search engines assign to different websites. This also helps these engines to ward off their misuse by unscrupulous webmasters who tend to deliberately stuff keywords at important places. Many different search engines have evolved widely different techniques for displaying the search results. Algorithms by some of the search engines are also highly dynamic in nature, applying huge amounts of analytics based on users’ behaviour pattern, like which sites are clicked etc, and are in a continuous state of change based on the feedback to every single query.