Google Desktop may finally be on Linux, but how does it fare against the competition?
The traditional way to find files on a Linux system has been with the appropriately-named “find” command: give it part of a filename and it will hunt through a whole tree of folders for it. The rest of the world has moved on from such brute-force searching though, with Spotlight in Mac OS X and Instant Search in Vista, along with third-party options like Google Desktop, giving users super-fast access to desktop search results.
Linux isn’t without its own desktop search tools though, with Novell’s Beagle project leading the charge. There’s competition in the wings though, from both the king of search and a young, lightweight alternative.
Beagle
Beagle is the de-facto standard in desktop search on Linux, mainly because it’s the oldest serious option. It’s been backed by Novell for a long time, and is now included in most Linux distributions, and is even enabled by default in some.
The first time you run Beagle, it needs to build its initial index from scratch, so you won’t be able to find many files at first. Because the indexing task is quite conservative, you’re very unlikely to see Beagle interfere with your system performance. However, you may have to wait a day or more until the index is complete, where all of your files are searchable.
The search interface is quite neat, with a central pane breaking search results down by category, and a preview pane below this, showing the section of the highlighted document that matched your search term. If you’re running GNOME, you also get automatic acceleration of all searching in Nautilus, along with the ability to add live search results to your panel using the Deskbar applet. The command-line “beagle-query” tool is also available.
Beagle is often criticised for its size: it depends on a lot of different libraries to get its job done, and if you don’t have the Mono .NET runtime installed, you’ll need to install that as well. However, much of this is simply because of the remarkable breadth of files that Beagle can handle. Not only can it index the contents of almost every imaginable document format, but it also includes backends that pull data from specific applications, such as email from Evolution, Thunderbird, or KMail.
Arguably the biggest problem with Beagle is its memory usage. It’s not uncommon for Beagle to use 50MB of RAM or more, and in certain circumstances it can briefly peak much higher. On my test system with 2GB of RAM it’s not a big deal, but even on 1GB systems, that’s a significant chunk of RAM that could be put to better use.
Tracker
Tracker is the opposite of Beagle in many ways: it’s a young project that’s small and light, making it a natural fit for low-memory systems. This small size comes at the cost of a number of features, primarily in back-end application support – Tracker doesn’t yet support indexing email messages or other application data, though support for Evolution mail is in the works. It does however manage to index the contents of many document formats, as well as extracting metadata from audio, video, and image files.
Continuing the trend, Tracker performs its initial indexing as quickly as it can, rather than waiting for idle time. It does run its processes at a low CPU priority, but it still hammers your hard drive, so any other job that needs to access files is likely to be slowed down while the indexing occurs. Even after the initial indexing, Tracker causes a flurry of disk activity when it’s started on subsequent logins, though this usually subsides in under a minute.
Though it lacks Beagle’s ability to index application data, such as emails, Tracker does live up to the performance claims, typically using less than 10MB of RAM while running. It also has a “low-memory” mode that further reduces the memory requirements, at the expense of some indexing performance.
Tracker’s search interface is a simpler affair than Beagle’s, but it works well; the only notable thing missing is an indication of the number of search results returned. The results aren’t broken down by category, but you can choose to show only the results that match a particular category. Tracker also offers live search results in the Deskbar, and in recent GNOME versions it can accelerate Nautilus searches as well. For command-line users, the “tracker-search” tool works well and is incredibly quick– typical search times are under 100 milliseconds.
Google Desktop
Google is arguably the name in search, and the Google Desktop search engine has been available for Windows for several years. The Linux version is brand-spanking-new, but it already works quite well. After installation (Google has packages for a range of distributions available) and the initial launch, a Google Desktop tool appears in the notification area. Double-clicking on this brings up the Quick Search Box, while a right-click opens a menu with further options.
The Quick Search Box is a neat shaped window that appears in the middle of the screen, giving you quick access to the top few search results. To access further results, Google Desktop opens its web-based interface – a version of the traditional Google interface – in your default browser. It’s easy to page through the results with the mouse, just as you would with a Google search, but it lacks the easy keyboard navigation of a native interface. Most other parts of the user interface, such as the settings dialogs, are also web-based.
The initial indexing takes a slow-and-steady approach like Beagle, keeping system activity low. There’s even a progress meter of sorts: the “Index status” option in the right-click menu opens a page that gives you a completion percentage, and while I’m not sure exactly how accurate it is, it’s certainly better than nothing. The indexer supports many document formats, and can also index mail from Thunderbird.
The main area where Google Desktop comes up short is the desktop integration. Partly because of it’s web-based interface, it feels like an add-on rather than a natural part of the desktop. The Quick Search Box is good start, but it’s also annoying: the (currently unconfigurable) hotkey to bring it up is tapping Ctrl twice, which I found myself accidentally hitting while switching between desktops. It also lacks a command-line search tool.
Memory usage may be a problem as well. It’s more slender than Beagle, but it’s still not uncommon to see it’s various tasks using a combined 40MB or so.
However, Google Desktop definitely has solid foundations, and given that this is the first release, it could soon mature in to a fine application.
Privacy concerns
Of course, by making it easier to find your files, you may inadvertently help others find them as well. However, all of the desktop search engines we’ve looked at today do make some effort to help ensure your privacy.
The simplest precaution is to ensure that your most sensitive files aren’t indexed. All of these search tools let you specify directories to exclude from indexing: for Beagle and Google Desktop, the options are in their respective configuration tools, while you’ll have to manually edit the “tracker.cfg” file for Tracker. Google Desktop can also exclude individual files, while Beagle can exclude arbitrary file patterns, as well as individual mail folders in any of the supported mail applications.
Another risk is the availability of your search indexes to other users.Because the indexes contain much of your file contents, albeit in a heavily processed form, the index itself can hold sensitive information. However, all of the search tools restrict the file permissions on the index folders they create, ensuring that only the user that owns the files can get access.
I was worried that Google Desktop’s web-based interface would open some loopholes, but that doesn’t appear to be a problem. The local web interface will only open if you pass it an appropriate session identifier, and the only way to get one is through the notification area tool or search results box.
There’s also the possibility that the search software itself may be malicious, sending sensitive data out to the world, but I don’t think this is a serious possibility. Anyone can check the source code to Beagle or Tracker to make sure they play nice, and while we can’t do the same with the closed-source Google Desktop, it’s definitely possible to analyse its network traffic to ensure that it’s not misbehaving. Google Desktop does have an optional feature that transmits summary usage information to the Google servers, but this is easily disabled.
Overall, all three options do the job, but some do a nicer job of it than others. Google Desktop needs some user interface work, but it returns quality results, and if you’re not using GNOME then you won’t miss the desktop integration offered by Beagle and Tracker. Otherwise, both Beagle and Tracker offer about the same level of desktop integration, but Tracker is simple and streamlined, while Beagle has broad application data support. I run Beagle, because with 2GB of RAM I can afford to, but I’ll be keeping a close eye on Tracker. If it improves, then its size and its lack of a dependency on Mono may see it become the new standard.