Menu

Tips & tricks

Running DocFetcher without installing Java

Option 1: PortableApps

There's a PortableApps version of DocFetcher that works with portable Java. See this thread. You can get portable Java from here.

You may want to increase the amount of memory available to PortableApps DocFetcher. To do so, follow these steps:

  • Open the PortableApps DocFetcher folder.
  • Open this file in a text editor: App\AppInfo\Launcher\DocFetcherPortable.ini
  • In the .ini file, go to the line starting with CommandLineArguments.
  • The line contains a parameter starting with -Xmx, e.g. -Xmx512m. This controls the amount of memory. For 1 GB of memory, change it to -Xmx1g.

Option 2: Modifying Portable DocFetcher

You can modify the "official" portable version of DocFetcher to make it work with a portable Java runtime. Instructions:

  • Get portable Java from here.
  • Download and unpack the latest version of portable DocFetcher.
  • Open the DocFetcher folder. Move the file misc\DocFetcher.bat from the misc folder one level up into the DocFetcher folder.
  • Edit the last line in DocFetcher.bat so that it points to the portable Java runtime. For example, if the Java runtime folder is named Java and resides in the same parent folder as the DocFetcher folder, you can replace the java keyword with start /b ..\Java\bin\javaw.exe. (The start /b and use of javaw instead of java will hide the black command prompt window after launch.)
  • From now on, always start DocFetcher by double-clicking on the DocFetcher.bat file.
  • The last line in DocFetcher.bat contains a parameter starting with -Xmx, e.g. -Xmx512m. This controls the amount of memory available to DocFetcher. For 1 GB of memory, change it to -Xmx1g.

For further discussion, see this thread.

Option 3: DocFetcher Pro

DocFetcher Pro, the commercial big brother of DocFetcher, comes bundled with an internal Java runtime, so it can be run without installing Java.

Using DocFetcher's advanced search features

There's a lot more you can do with DocFetcher's search field than just typing in some words to search for. For instance:

  • Wildcards: docfetch*, docf?tcher
  • Phrase search: "dog cat\"
  • Boolean operators: dog AND cat

All this and more is covered in the built-in program manual under the section "Query Syntax".

Setting up DocFetcher in a multi-user environment

A preliminary note: DocFetcher was originally not designed for simultaneous use by multiple users, and the multi-user support described below was only tacked on as an afterthought. As a result, it suffers from various known problems, such as poor performance and program instability. If you can afford to spend some money on a modern, proper implementation of multi-user support, consider purchasing DocFetcher Server.

Multi-user support in DocFetcher can be realized by setting up a central index to be shared among multiple DocFetcher instances. Here's how it works:

  • You'll need the portable version of DocFetcher.
  • Use portable DocFetcher to create one or more indexes. If the indexed files are modified frequently, it's probably a good idea to untick the checkbox "Watch folders for file changes" when creating the indexes. After having created all indexes, move the indexes folder inside the program folder to the new shared location.
  • On the computer of each user, set up an instance of portable DocFetcher and configure it as follows.
  • The file misc/paths.txt allows you to change the location of the indexes and settings. In a multi-user environment, you should probably leave the location of the settings as is, so that each user has his own settings. So just change the indexes path so it points to the new shared indexes.
  • On the preferences dialog, there's an "Advanced Settings" link that points to the file conf/program-conf.txt. The following entries in that file might be of interest to you:
    • AppName - change the program window title
    • AllowIndexCreation - set to false to prevent users from creating indexes
    • AllowIndexUpdate - set to false to prevent users from updating indexes
    • AllowIndexRenaming - set to false to prevent users from renaming indexes
    • AllowIndexRebuild - set to false to prevent users from rebuilding indexes
    • AllowIndexDeletion - set to false to prevent users from deleting indexes
    • SaveSettings - set to false to disable writing the program settings to disk
    • ShowAdvancedSettingsLink - set to false to hide the "Advanced Settings" link on the preferences dialog
    • MaxResultsTotal - lowering this value might give you better search performance, especially if your index is on a remote filesystem

Performance will be especially poor if a larger number of computers have write access to the same indexes. Thus, you should only give write access to one or a few DocFetcher instances in charge of updating the indexes. Another option is to give each computer a separate copy of the indexes to search with.

Accessing DocFetcher via its Python scripting API

In DocFetcher 1.1.20 and later versions, DocFetcher supports Python-based scripting. This can be used to programmatically execute searches and retrieve the results. For an example of how this is done, see the explanation at the top of the file search.py, which can be found in the DocFetcher program folder.

Making DocFetcher faster

Depending on your hardware, DocFetcher's indexing and searching might get (a lot) faster if you increase the so-called initial heap size and/or the maximum heap size of the Java runtime on which DocFetcher runs. A good starting point for further testing would be a initial heap size of 512 MB and a maximum heap size of 4 GB. The maximum value for both values is limited by the amount of RAM available.

Instructions on how to increase the maximum heap size can be found on the [FAQ]. Increasing the initial heap size works the same way, namely by adding the parameter -Xms (e.g. -Xms512m). For further information on tuning the Java runtime, see this StackOverflow post.

Indexing large folders

The indexing of large folders is more likely to fail due to greater folder depth and greater number of large files that might cause DocFetcher to run out of memory. If you need to index large folders anyway, keep the following tips in mind:

  • On the top right of the indexing dialog, there's a "+" button that allows you to put multiple folders in a queue for indexing.
  • Instead of indexing the entire folder, try to index its subfolders one by one, using the aforementioned indexing queue. This will isolate possible failures so that if indexing of one folder fails, the other indexing processes won't be affected.
  • Giving DocFetcher more RAM decreases the risk of DocFetcher dying on large files during indexing. See the section "How to raise the memory limit" in the manual. DocFetcher comes with launchers with up to 8 GB of RAM; anything above 1 GB requires a 64-bit Java runtime.

Indexing Thunderbird emails

According to a user suggestion on the DocFetcher forum, here's a workaround to get some limited indexing support for Thunderbird emails:

  • Make sure you're using the latest Thunderbird version.
  • In the Thunderbird settings, go to the Advanced tab and check Allow Windows to search messages.
  • In DocFetcher, index the folder containing your Thunderbird profile. In Windows 7, the path looks like this: C:\Users\[username]\AppData\Roaming\Thunderbird\Profiles\
  • Before starting the indexing process, add wdseml to the list of plain text file formats.
  • Downside: By setting mail as searchable in Thunderbird, it is stored unencrypted.

Various indexing tricks

  • Queueing directories for indexing: The "+" button on the top right of the indexing dialog allows you to put directories in a queue for indexing.
  • Treating unsupported file formats as plain text: Even if a certain file format is not directly supported by DocFetcher, you may be able to search in files with that format in a limited way: On the indexing dialog, add the file format's extension to the list of plain text files. This seems to work with Outlook MSG files, for example, as reported in this feature request.
  • Embedding data in filenames: Since DocFetcher can search in filenames as well as file contents, you can add some extra data in the filenames. This is particularly useful for unsupported file formats, which wouldn't show up in the search results otherwise.

Updating indexes from the command-line

To update all indexes from the command-line, start the DocFetcher executable with the %%--%%update-indexes parameter.

Note that on Windows, the DocFetcher.exe will quit immediately after starting an updating process in the background, so you won't get any feedback on the console about what it is doing. As a workaround for this, you can move the file misc/DocFetcher.bat one level up into the DocFetcher folder and run the index update with the DocFetcher.bat file instead. -- The latter does not fork into the background.

Linux issue: 100% CPU usage caused by DocFetcher daemon

A known issue with the DocFetcher daemon on Linux, which has the filename docfetcher-daemon-linux, is that when it is automatically launched on startup, it can permanently use 100% of one CPU core. In that case, check whether the folder containing your indexes (e.g. ~/.docfetcher) contains a file named .indexes.txt. If not, create an empty file with that name.


Related

Wiki: FAQ