Finding files and documents with Recoll

Desktop search engines are all the rage these days. While Beagle may be the most popular desktop search engine for Linux, there are alternatives. If you are looking for a light-weight and easy-to-use yet powerful desktop search engine, you might want to try Recoll. Unlike Beagle, Recoll doesn’t require Mono, it’s fast, and it’s highly configurable. Recoll is based on Xapian, a mature open source search engine library that supports advanced features such as phrase and proximity search, relevance feedback, document categorization, boolean queries, and wildcard search.


Figure 1: Recoll in action

Since Recoll’s Web site provides binary packages for most major Linux distributions – such as Fedora, SUSE, Ubuntu, and Debian – you can install it easily using your distro’s package manager. If you want to enable support for document types that require external helpers, you have to install them separately using your distro’s package manager (a list of the required external helpers is available at Recoll's Web site). You can then launch Recoll by choosing Recoll from the Applications → Accessories menu (in Ubuntu) or running the recoll command in a terminal window.

Recoll can handle plain text, HTML, OpenOffice.org documents, Mozilla Thunderbird and Evolution email messages, and Lyx and Scribus files. In addition to those native formats, Recoll can also work with other file types by using external helper applications. For example, the Xpdf software provides support for PDF files, while Word, PowerPoint and Excel documents are handled by Antiword and catdoc. Recoll stores all internal data in Unicode UTF-8 format, but it can index files with different character sets, encodings, and languages into the same index.

During the first run, you will be prompted to create a default set of configuration files that will contain all Recoll’s settings. Recoll doesn’t provide a GUI configuration tool, so you have to edit the configuration files manually. Fortunately, Recoll’s user manual provides a detailed description of the configuration options that you can tweak. However, since Recoll’s default settings cover all the basics, you might not need to edit them.

Like any desktop search engine, Recoll must index documents before it can search them. By default, Recoll indexes the files in your home directory, but you can specify another or additional locations. During the first run Recoll performs a full indexing, which can take some time. Once Recoll has built an index, you can update it manually using the recollindex command. You can also run recollindex as a cron job. Alternatively, you can run the recollindex -m command, which runs as a daemon that indexes modified files in real time.

Once the files have been indexed, Recoll is ready to go. To perform a simple search, enter a search term or terms into the search field and press the Search button. Besides the search for all or any specified term, Recoll also allows you to search for file names as well as perform more advanced searches using wildcards and boolean operators. Recoll supports three type of wildcards. The * wildcard can be used to match one or several characters (e.g. writ* returns writer, written, and writing). The ? wildcard matches just a single character (e.g. b?ll returns ball, bull, and bell). The [] wildcard allows you to specify a set of matching characters, e.g., [a-h] or [1-5]. To perform a boolean search, select the Query Language item from the drop-down menu next to the search field. You can then use boolean operators to construct more complex searches. For example, the following search from:“tristram shandy” linux AND openoffice -windows finds documents containing the phrase tristram shandy in the from field (useful when searching email messages) as well as the words linux and openoffice but not the word windows.

The Advanced Search feature can be used to create even more advanced queries. The default fields (called Clauses) allow you to specify a wide range of criteria, such as proximity, unlimited number of search terms (you can add extra fields by pressing the Add clause button), excluded words, and wildcards. You can also narrow your search to specific file types or a specific directory.

When you perform a search, Recoll displays the results in the main window. Each search result contains a file type icon, relevance in %, and context surrounding the search term. There are also two links: the Preview link allows you to quickly preview the document in a separate window, while the Edit link opens the file for editing in an appropriate application.

Finally, Recoll also features a Term Explorer tool (Tools → Term Explorer) that can come in handy when you don’t remember the exact spelling of a particular search term. Basically, it acts as a mini search engine that searches the index. This allows you to see all the derivatives of the entered search terms and select the one you need.

Although Recoll looks deceptively simple, it is indeed a powerful desktop search engine. To get the most out of it, make sure to read Recoll’s user manual, paying particular attention to the tips and tricks section.

Related articles:

Automating Windows with AutoHotkey    
Collecting and organizing "stuff" with ScrapBook and Basket    
Creating interactive forms with OpenOfice.org Writer    
File juggling with Krusader    
Finding files and documents with Recoll 2009/11/29 17:15 Dmitri Popov
Managing your mobile phone with floAt's Mobile Agent    
Must-have open source applications for writers    
Thunderbird productivity guide    
Using Thunderbird as a context management tool    
Visualizing your del.icio.us bookmarks DeliciousMind    
iKog: The tiny task manager that could    

AddThis Social Bookmark Button

 
articles/recoll.txt · Last modified: 2009/11/29 17:23 by Dmitri Popov
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Need high-quality compatible Avery labels? Get them at WorldLabel.
Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki