This series is written by a representative of the latter group, which is comprised mostly of what might be called "productivity users" (perhaps "tinkerly productivity users?"). Though my lack of training precludes me from writing code or improving anyone else's, I can, nonetheless, try and figure out creative ways of utilizing open source programs. And again, because of my lack of expertise, though I may be capable of deploying open source programs in creative ways, my modest technical acumen hinders me from utilizing those programs in what may be the most optimal ways. The open-source character, then, of this series, consists in my presentation to the community of open source users and programmers of my own crude and halting attempts at accomplishing computing tasks, in the hope that those who are more knowledgeable than me can offer advice, alternatives, and corrections. The desired end result is the discovery, through a communal process, of optimal and/or alternate ways of accomplishing the sorts of tasks that I and other open source productivity users need to perform.

Wednesday, November 27, 2013

14th installment: home-brewed WOD by e-mail daily

I used to subscribe to an on-line dictionary's word-of-the-day (WOD) program. That entailed signing up, using a valid e-mail address, on their web site so that they would, each day, send a different WOD along with its definition to that address. The service proved to be a bit flaky, however, and the e-mails would sometimes get caught up in my spam filter. So, somewhere along the line--perhaps owing to an e-mail address change--I stopped receiving those educational e-mails.

I'd had in the back of my mind going back to using that service but hadn't signed up again--all the while having a nagging suspicion that it must be possible, using open source tools, to cobble together some way of doing this sort of thing from my own computer, thereby obviating the need to sign up for some service. But could I, with my modest technical acumen, actually pull this off? Read on to find out the result.

Meantime, as I've continued learning my way around GNU/Linux and computing in general, I made some headway in learning how to use the program remind to help me keep track of scheduled appointments and to-do items, even progressing so far as puzzling out how to get my computer to e-mail me my schedule on a regular basis. Perhaps I'll write more about that accomplishment--of which I'm quite proud--in a future entry.

The relevance of that observation to the present post is that I learned how to use the program mail, along with the small msmtp, for sending--when triggered by cron--to myself automated reminder e-mails from my system. So some major ingredients were actually already in place that would allow me finally to implement my own, home-brewed WOD-by-e-mail solution.

This was perhaps the final piece of the puzzle for me, although another crucial piece had been under my nose recently as well, something I ran across while investigating bash functions (I wrote about that a few installments earlier, as you can see here). By adapting one of the bash functions I'd found, I was first able to see the WOD from the command line by simply issuing wod from a command prompt. But I soon began forgetting to do that, which spurred me to consider once again having the WOD somehow e-mailed to me.

Finally, putting two and two together, I realized I could adapt the thrust of that function to my needs by having its output placed into the body of an e-mail that would be automatically sent to me each day at 6 A.M. Following is a description of how I did that.

A key ingredient I have not yet mentioned is the text-mode browser lynx, which produces an html file that gets parsed for material that will be inserted into the e-mail body: and I didn't mention it because lynx and me go back a long, long ways--clear back to the close of the twentieth century, to be precise. The line, swiped straight from the bash function I found on the web, is as follows: lynx -dump http://www.wordthink.com/. That simply "dumps the formatted output of the default document or those specified on the command line to standard output," as the man page tells us--obviously not enough to get a WOD into an e-mail body, but fairly close.

What's needed, then, is, like the bash function, to pipe that output through grep, searching for a certain pattern, then to extract from it the relevant lines which belong in the body of the e-mail. Those results then get piped to mail, which inserts the lines into the body of an e-mail. Below is the full line that I inserted into my crontab file, minus the bit that tells the line to be executed at 6 A.M. daily:

lynx -dump -nonumbers "http://www.wordthink.com/" | grep -A 10 -m 1 "Today's Word of the Day" | mail -s WOD my-addy@my-mail.com

This cron entry tells lynx to dump the page found at the specified link to standard output (whatever that means), then to pipe that through grep, searching for the phrase "Today's Word of the Day." Once that phrase is found, grep is to stop searching (the -m 1 switch--it's to look for only one instance) and to "print" the ten lines preceding (the -A 10 switch), which actually then get piped to mail and become the body of the message. The -s switch specifies what the subject line of the e-mail should be. The -nonumbers switch just tells lynx not to preface the links it finds in the page with numerals between square brackets, which it would otherwise do.

That's about it for this entry. I really do need to write up a remind entry, since I had hoped long and hard to find some scheduling utility that would be no-frills, yet powerful enough to issue me reminders on a regular basis. So that may be next on the agenda for this blog.

Some afterthoughts: piping the output of lynx -dump through grep to extract target text is not ideal, since--if I've read its man page correctly--you are limited to extracting text by line. A problem arises here because the number of lines for the target entries for the WOD can vary day-by-day. As a result, it is likely that either extraneous line(s) will be included on many days, or that some target line(s) will get cut off on other days. Perhaps piping the lynx -dump output through sed or awk--which, as I understand it are both far more flexible when it comes to identifying target text--might be a better solution. But because I am not well-versed in either of those utilities and because extracting the WOD from web sites whose layout may change at any time is a moving target, I am presently not attempting to improve on the method I've described here. I do, on the other hand, welcome suggestions for improving this WOD-by-e-mail solution from any reader who may know those--grep included--or other utilities better than I.

No comments:

Post a Comment