What is Enterprise Search?

Enterprise Search - a search solution that integrates all the information within an enterprise present in the form of files, databases, Content Management Systems, SharePoint Systems.etc. and provides it in a searchable form for the use of its employees. I explained about what Web Search is all about in my previous post here. In contrast to Web search that provides page or document search for information present on the internet or the web, Enterprise Search is meant for searching documents, files and other forms of data inside an organization. It is very difficult and almost impossible for employees to go and search all the data sources of the company to find the right thing they want. 

For every enterprise, that is overloaded with lot of information it is very much required to implement an enterprise search solution to integrate data from various sources and present it  to the employees on a single query. There are many open source and proprietary  enterprise search solutions offered by various vendors like Microsoft, Oracle, IBM, Apache etc. Each search solution has its own features, procedures and architecture for providing best search results to its customers. 

On a high level the various phases through which the content travels right from its source to the search results are,

1. Data Acquisition or Crawling
2. Data Refinement
3. Indexing
4. Query processing
5. Search Results Display on a web browser

Data Acquisition is generally the process of collecting structured and unstructured data from various data sources using connectors for each type of data such as file system data, sharepoint data, content management system data.etc. There are many connectors available for almost every data format & applications and these connectors can also be customized as needed.

Next process is the data refinement which is generally referred to as the document processing, in which a document filter software is used to extract relevant and meaningful text from the documents. This process also gathers all possible metadata from the documents such as type,name,size,title,author etc. Some document filters also gets the ACL ( Access Control List) information from the documents in order to implement security constraints on who is authorized to view it in the search results.

The most important part of enterprise search architecture is Indexing. Indexing is a process where the texts or keywords retrieved from the document processor is stored in an index. When a user searches for a document, this index is looked up in order to produce relevant documents in the search results. The index may contain a group of meaningful words, sometimes called as the Keywords, information related to page ranking and other meta data to facilitate faceted search.

The end users are provided with a web page that usually contains a search box. When a user types a query in the search box, the request is then processed by a query processor that formats the request properly in order for the search engine to understand it. The search engine receives this formatted query, searches the index that is created during the process of 'Indexing' and sends the results back to the browser where the users can view it.

There are other features that almost all of the enterprise search solutions offer such as relevancy ranking, synonyms, performance analysis, security and lots more. One of the important feature that makes enterprise search an efficient information access tool is faceted search approach. With faceted search, users are provided with more advanced options to refine the search results and get even more closer to the exact document they need. This is made possible with the help of meta data it gathers in the data acquisition phase. For example, if an user knows the file name or author but does not know where the file actually resides, then he/she can refine the results with an option called 'refine results by file name/ file author', where the user is provided with a drop down list of all available file names or authors to select from. 

I am planning to write more on these key terms that each enterprise search solutions possess so as to build an efficient, highly scalable and performance effective search application in my upcoming posts. Please leave your comments and queries about this post in the comment sections in order for me to improve my writing skills and to showcase more useful posts.



