This series is written by a representative of the latter group, which is comprised mostly of what might be called "productivity users" (perhaps "tinkerly productivity users?"). Though my lack of training precludes me from writing code or improving anyone else's, I can, nonetheless, try and figure out creative ways of utilizing open source programs. And again, because of my lack of expertise, though I may be capable of deploying open source programs in creative ways, my modest technical acumen hinders me from utilizing those programs in what may be the most optimal ways. The open-source character, then, of this series, consists in my presentation to the community of open source users and programmers of my own crude and halting attempts at accomplishing computing tasks, in the hope that those who are more knowledgeable than me can offer advice, alternatives, and corrections. The desired end result is the discovery, through a communal process, of optimal and/or alternate ways of accomplishing the sorts of tasks that I and other open source productivity users need to perform.

Sunday, March 27, 2016

Miscellaneous Monday quickies: manipulating and excising pages from pdfs

So you've gotten, via inter-library loan, the pdf's you requested to aid you in researching the article you're writing. For purposes of putting them on your e-reading device, the form they're in is probably perfectly suitable. But what if you'd like to do other things with one or more of them, such as printing them out? There is quite a range of pdf manipulation tools that can help you put them in a form more congenial to such aims.

One that I recently discovered, for example, allows you to "n-up" your pdf document, i. e., to put more than one page per sheet of paper. Should you wish to print the document, this can help you lessen paper waste. The utility is called pdfnup, and the relevant command for accomplishing this end is pdfnup --nup 2x2 myfile1.pdf myfile2.pdf. Presumably one could use 4x4 in place of 2x2 to get four pages per sheet instead of two.

This utility gives results similar to psnup, a utility I have used (and previously witten about in this blog) in the past for making booklets comprised of multiple pages per sheet of paper, though pdfnup likely lacks the advanced collating options of psnup. But psnup involves greater complexity in that it operates on postscript files, which usually need to be converted to or from some other format.

Getting back to the task at hand, should you wish to print out any of your pdf's with the aim of minimizing paper waste, you may well wish to eliminate extraneous pages from your document. In my experience, for example, inter-library loan pdf documents routinely include one or more copyright notice pages. Before printing such documents, I almost always try to exclude those pages--simple enough if you send them directly from the printer from a pdf reader. But what if you're taking the additional step of n-upping multiple pages per sheet?

As it turns out, pdfnup is actually part of a larger pdf-manipulation suite called pdfjam. And that suite enables you to not only n-up your pdf document, but to eliminate extraneous pages as part of the same process. To give an example, if you have a fifteen page document wherein the first and last pages are copyright notices that you wish to exclude from your 2-upp'ed version, you'd use the command

pdfjam MyDoc.pdf '2-14' --nup 2x1 --landscape --outfile MyDoc_2-up.pdf.

The meaning of the various command switches will, I think, be obvious.

This is just a thin slice of the capabilities offered by just one suite of pdf manipulating tools available under GNU/Linux. I have used other tools such as pdfedit, pdftk, flpsed, to good effect as well.

LATER EDIT: I just discovered a new and interesting functionality of pdfjam; it can convert pdf's from a4 to letter format (and vice versa). The relevant command is pdfjam --paper letter --outfile out.pdf in.pdf

Monday, March 21, 2016

Addendum to the seventh installment: imagemagick as a resource for the budget-constrained researcher

In this installment, I'll cover concatenating multiple image files into a multi-page pdf--a very handy trick the imagemagick utility convert makes possible. But first, a bit of grousing on the subject of academia, budget-constrained researching, and academic publishing.

Pricing for on-line academic resources tends, not surprisingly, to be linked to budgetary allowances of large academic institutions: what institutions can afford to pay for electronic access to some journal or other, for example, will influence the fee that will be charged to anyone wishing to gain such access. If one is affiliated with such an institution--whether in an ongoing way such as by being a student, staff, or faculty member, or in a more ephemeral way, such as by physically paying a visit to one's local academic library--one typically need pay nothing at all for such access: the institution pays some annual fee that enables these users to utilize the electronic resource.

But for someone who has no long-term affiliation with such an institution and who may find it difficult to be physically present in its library, some sort of payment may be required. To give an example, while doing some research recently for an article I'm writing, I found several related articles I need to read. I should mention that, since I am still in the early stages of writing my article, there are bound to be several additional articles to which I will need ongoing access. I will address two journal articles in particular, both of which were published between twenty and thirty years ago.

I discovered that both articles were available through an on-line digital library. I was offered the option of downloading the articles at between twenty and forty dollars apiece. At that rate--since one of my articles' main topics has received fairly scant attention in modern times and I might need to review only another twenty or so articles--it could cost me well over six hundred dollars just to research and provide references for this topic. The time I spend actually writing and revising the article--the less tangible cost, which will exceed by a substantial amount the time spent researching--is an additional "expense" for producing the material.

There are, of course, ways to reduce the more tangible costs. Inter-library-loan often proves a valuable resource in this regard since even non-academic libraries who may lack subscriptions to academic publishers or digital libraries can nonetheless request journals or books containing relevant articles or, even better yet, obtain for their patrons such articles in electronic format--typically pdf files--these latter often having been created by scanning from paper-and-ink journals or books.

Some digital libraries even offer free--though quite limited--access to their materials. In researching my project I found three articles available from such a site. On registration at their site, they offered free on-line viewing, in a low-resolution scan, of just a couple of articles--those being made available for viewing for only a few days. Once the limited number of articles was reached, only at the end of those few days could another article be viewed. For purposes of the budget-constrained researcher, while this is a promising development, it's not an entirely practicable one.

Being able to view an article on a computer screen is a lot better than having no electronic access to it at all. But it also is of no help in those circumstances where one may be without an internet connection. Having the ability to save the article to an e-reader would be preferable and far more flexible than reading it, one page at a time, in a browser window. But the service seems calculated to preclude that option without payment of the twenty- to forty-dollar per article fee. It turns out, however, that sometimes ways around such restrictions can be discovered. And that, finally, is where the tools mentioned in the first paragraph of this article enter in. Thus, without further ado, on the the technical details.

Some digital libraries actually display, on each page of the article that appears as you go about reading it in a web browser window, a low-resolution image of the scanned page. As I discovered, one can right-click on that image and select to save it to the local drive. The file name may have, instead of a humanly-comprehensible name, just a long series of characters and/or numbers. And it may, as well, lack any file extension. But I discovered that the page images could, in my case, be saved as png files. Those png files, then, appropriately named so as cause them to retain their proper order, could then, using imagemagick tools, be concatenated into a multi-page pdf. That multi-page pdf can then be transferred to the reading device of choice. I found that, although the image quality is quite poor, it is nonetheless sufficient to allow deciphering of even such smaller fonts as one typically finds in footnotes. Although involving a bit of additional time and labor, using this tactic can yet further defray the budget-constrained researcher's more tangible costs.

For reasons that will become obvious, the image files should be saved to a directory empty of other png files. How the images are saved is essentially a numerical question and is dependent on the total number of pages in the article. If the total number of pages is in the single digits, it would be a simple matter of naming them, for example, 1.png, 2.png, 3.png, and so forth. If the number of pages reaches double digits--from ten through ninety nine, zeros must be introduced so that all file names begin with pairs of numbers; for example 00.png, 01.png, 02.png, and so forth. The same formula would hold for--God forbid, since the task would become quite tedious--articles with total pages reaching triple digits.

Provided imagemagick is already installed, once the saving is done, the very simple formula convert *.png full-article.pdf can be used to produce the pdf of concatenated image files. Since the files have numerical prefixes, the program will automatically concatenate them in the proper order.

In the next installment I will be covering manipulation of pdf's provided through inter-library loan services--focusing on removal of extraneous pages (e.g., copyright-notice pages) routinely included by those services.