How the Search Engines Work
Search engines are run by an automated mathematical algorithm. They use programmes called ‘spiders’ to go out, follow links and find websites. The spiders then read the content of a site, and take the info back to base. The data is stored, and later visited by another programme called a crawler. This one assesses your content and rates it against similar pages. It is then ranked, along with other factors described elsewhere on this site, within the search
index. One of the tricks to getting the search engine spiders to visit your site is lots of
back links, the more the better. Also, regular updates to your site ensure the spiders visit your site
regularly.
A search engine is a database of resources extracted from the
Internet through an automated "crawling" process. This database is searchable through user queries.
How does a search engine work? Words or
phrases you enter in the search box are matched to resources in the search engine's database that contain your terms. These are then
automatically sorted by their probable relevance and presented with the most "relevant" sites appearing first.
How search results are organized
Once a search engine has used your search terms to gather "hits" from its database, it lists or "ranks" the resulting sites in order of its own
estimation of their relevance. The procedures and factors used to create this ranking are often company secrets, so understanding exactly why one
hit is listed higher than another is difficult.
The following is a survey of some of the factors search engines use to
automatically sort web sites for presentation to the user.
Relevance Prediction Currently, search
engines predict relevance based on two sets of factors: those based on a site's content and those external to the site.
Factors based on a web site's
content
* Word frequency (How often search terms occur in a
page in relationship to other text)
* Location of search terms in the document (Are they in the title? Are they near the top of the page?)
* Relational clustering (How many pages in the site contain the search terms?)
* The site's design (Does it use frames? How fast does it load?)
Factors external to the
site
* Link popularity -- Sites with more links pointing to
them are prioritized
* Click popularity -- Sites visited more often are prioritized
* "Sector" popularity -- Sites visited by certain demographic or social groups are prioritized (Note: This system requires
user-provided information)
* Business alliances among services -- Results from a partner search service are ranked higher
* Pay-for-placement rankings -- Site owners pay for high rankings
|