Search Technologies - An Introduction
Search is a technology that is barely two decades old, yet it has profoundly empowered people and organizations around the world. The importance of Search technology lies in the fact that it provides effective knowledge management, discovery of new information, improved productivity, more transparency in the market place, ability to link up with the right people and companies, apart from statistics of searches made worldwide annually, the revenue it generates in terms of advertising. For anyone who is trying to understand the basic concept of Search, this article is absolutely meant for you.
The "ever-growing" Internet is getting accumulated with enormous flow of unstructured web data and every enterprise organization produce huge volume of unorganized machine data. Hence, generating some business value from this unmanageable large data is very challenging. A key factor of an effective knowledge management strategy is to build a robust search solution that offers a little known and pragmatic approach to overcome the above mentioned challenge of extracting business value from huge data by organizing and presenting it at the right place. There are two major forms of Search technologies that anyone should basically understand to know the complete purpose of Search technology. These two are more important among the other types of Search like desktop search,etc.
- Web Search
- Enterprise Search
In this post, I am going to explain about the important features of Web Search, followed by Enterprise Search in my next blog post.
Web search engines are meant for digging information on the World wide web commonly called as the Internet. Right at the moment when you come across this word web search, the next that comes immediately in your mind is "Google". Google is on top of the list of most used web search engine for a decade from the year 2000, with its innovation called 'Page Rank'. Internet contains more disorganized data and web search engines try to organize information present on the Internet. Web Search engines follow one of the following methods to collect information about web pages and other files such as images, pdf, word documents etc.,
- Crawler powered Search engines
- Human powered Search engines
- Hybrid Search engines
Crawler powered search engines gets information about web pages and other documents using an automated software referred as the Crawlers ,that follows a particular algorithm to retrieve content and meta data from the HTML code of web pages. This content is then analyzed to generate an index structure to which the web pages are tagged. Hence when the user searches for a certain keyword, it first looks in to the index and displays the appropriate web pages that are indexed for this particular keyword. Crawling is done periodically to check for the changes in the web pages. Google is a best example for this.
Human powered search engines rely on the information that user submits which is then indexed and tagged. These search engines usually have 'Add URL' link and let people to submit their websites to their large directories. In this case, there will be no effect when someone changes the content of the web page. Yahoo!, DMOZ and LookSmart are examples of this kind.
Hybrid search engines follow both crawler-based information and human-listings. For example, MSN uses human powered indexes from LookSmart and also uses its own crawler based information to produce search results. There are also some search engines so called as the Meta Search engines, which accumulate results from all other top search engines at the same time and display them. Dogpile and MetaCrawler are examples of meta search engines.
Another important thing to be considered in web search technology is the Page ranking. Each web search engine ranks web pages based on their own criteria. One of those factors that contribute to page ranking is keyword scanning, the search engine looks for number of occurrences of a keyword in a webpage and the page with more number will have increased relevancy ranking. However, if you accumulate your webpage with thousands of occurrences of a keyword just to get your page a higher rank, there is a danger of your web page getting removed from the index, as part of a process called spamdexing.
Another common way search engines use to rank pages is link analysis. By analyzing how web pages are linked to each other, it gets to know about what the content is all about and how important the content is to be ranked high in the search results. There are many other ways by which search engines compute the search rankings of their results, probably I will cover that in a separate post later.