Project Gutenberg is a real treasure trove for bookworms and casual readers alike, but turning etext files into a readable form is not as easy as it may seem. In theory, since etexts are just plain text files, you should be able to open and read them on any platform without any tweaking. In practice, however, this approach rarely works. Hard line breaks, for example, ruin the text flow, making it virtually impossible to read the book on a mobile device. Another problem is that most books are stored as single files, so locating a particular chapter or section in a lengthy book can quickly become a serious nuisance. Then there are minor, but still annoying formatting quirks, such as inconsistent handling of italicized text, use of straight quotes instead of smart ones, and so on.
Fixing all these and other issues manually to make an etext readable – or even printable – is a very daunting proposition, especially when dealing with longer texts. Thankfully, the GutenMark tool can take most of the burden off your shoulders. The utility converts Project Gutenberg etexts into neatly formatted HTML or LaTeX files. The goal of the GutenMark project is to create a tool that produces files that don’t require any additional cleanup and tweaking. While there is still some way to go before this goal is achieved, GutenMark does a remarkable job of turning etexts into readable and printable files. Initially, GutenMark was a command-line tool, but the latest version of the application comes with the GUItenMark graphical interface and the GutenSplit tool which can split a single file into multiple chapters. These tools come as a single installer, but before you download and run it on your system, you have to make sure that it has all the required packages: glitz, libpng, and libtiff. On Ubuntu, you can install them using the sudo apt-get install libglitz1 libpng libtiff command. You also need to create a couple of symbolic links as follows:
sudo ln --symbolic /usr/lib/libtiff.so.4 /usr/lib/libtiff.so.3 sudo ln --symbolic /usr/lib/libexpat.so.1 /usr/lib/libexpat.so.0
Download then the GutenMark installer and make it executable. The installation instructions on GutenMark’s Web site recommend that instead of using the chmod command, you make the installer executable by right-clicking on it and ticking the Execute check box. Run the installer and GutenMark is ready to go.
Using the GUI version of GutenMark to convert etexts is rather straightforward. Use the Input Files pane to the left to add one or several etexts, and then configure the available conversion options by ticking the desired check boxes. Most of the options are self-explanatory, and you experiment with different settings to achieve the best results. GutenMark allows you to save different settings as profiles. You can, for example, create two separate profiles for converting etexts to HTML and LaTeX, or you can set up different profiles for different languages. When converting etexts to HTML, you have an option to split the source file into multiple chapters. This can be useful for long books. To enable this feature, tick the Split at headings check box, and specify the splitting points. Usually, ticking the H1 (Heading 1) check box works just fine, but you can chop the etexts into smaller pieces by enabling other heading options. If you choose to split the etext, make sure you enable the Table of contents option which creates a separate HTML file with links to the created chapters. To convert the selected etext, press the Arrow button, and the converted files appear in the Output Files pane. You can then open the converted files directly from within GutenMark by double-clicking on them. If you prefer to use GutenMark from the command line, the Usage page provides a detailed description of the available command-line options. Even if you stick to the GUI, the page can help you to figure out what each option does.
Final Word
Although GutenMark does a formidable job of converting etexts to the HTML format which is readable virtually on any device, the converted files might still need some manual tweaking. So it’s a good idea to go through the converted file and correct the remaining issues before you load it to your device. This is, however, a minor nuisance compared to converting an entire etext by hand.
Related articles: