One of the neatest features of the gnus mail and newsreading package for emacs is its ability to expand digests into individual messages that can be read with the full power of the newsreader. What's really cool about the feature is that it's extensible: you can write rules to describe new digest formats. That's especially handy in the modern world, where too many publishers think that RFC 1153 shouldn't apply to them.
The downside is that it's not at all easy to write these rules. The documentation is terse, to say the least, and when you screw up, the error messages are monumentally unhelpful. This Web page is an attempt to rectify that situation by teaching you how to write and debug digest rules. I also provide links to all the rulesets I've written.
If you want to undigestify something, the easiest approach is to use
somebody else's work. :-) The second easiest is to adapt something
that already exists. If one of the following rulesets matches your
needs, just slap it into your
.gnus file and you're
done. I like to put my rulesets into an
subdirectory in my search path, and then use the following code
in my .gnus file to pick it up:
(require 'nndoc-generic-functions "nndoc/nndoc-generic-functions.el") (mapcar '(lambda (file) (if (string-match "\\.el$" file) (load-library (concat "nndoc/" (replace-regexp-in-string "\\.el$" "" file))))) (directory-files "~/elisp/nndoc"))
If no ruleset fits, try adapting something that's close before you start from scratch. (Note that some of the following rulesets are early efforts that don't do as much as some later ones. See the rulesets for Yahoo! Groups and Crypto-Gram for some useful techniques.)
Some rulesets are no longer maintained. I apologize if they don't work; I've stopped receiving those lists and so I'm not able to fix them.
Anyone who wishes to contribute additional rulesets is welcome to
e-mail them to me. Please name them
.el and include only one
ruleset per file.
This section is intended as a supplement to the GNUS documentation. Before you read this, you should familiarize yourself with what the TexInfo files have to say about adding new digests. If there's something you don't understand there, I suggest you don't try to puzzle it out, because it may become clearer here. It may be useful to reread the documentation after reading this page.
The basic idea of a new ruleset is that you must describe to
nndoc how to find the beginning and ending of each
article in the digest. Ideally, this is done with a few regular
expressions. Sometimes (all too often, it seems) you will also have
to write code that converts a badly formatted article into a more
nndocParses a Digest
The most important part of writing a ruleset is understanding the
exact way gnus (i.e., the
nndoc package) goes about
turning a digest into individual messages. This process is
very complex because it has tons of options. You
need to know about all of the options, though, because they are the
key to getting your ruleset to work correctly.
Digest processing is divided into two parts: dissection and display.
nndoc figures out exactly where each
message starts and ends in the digest. The output of this process is
an association list ("alist") that describes each individual message
as a set of offsets. See the comments about
nndoc-dissection-alist in the
for more information. This step is usually the killer; it's very hard
to get it exactly right.
The second processing step happens during display. Here, the message is extracted from the digest (which is easy because of the offsets generated in step 1) and then reformatted for display. This is where you can make things look nice.
Dissection is performed by the function
nndoc-dissect-buffer. Understanding this function is key
to writing correct rulesets. If you have problems, this is also the
function to step through in the debugger. The output of
nndoc-dissect-buffer is the alist mentioned above.
The steps performed by
nndoc-dissect-buffer are as follows:
Preparation is performed once per digest:
dissection-functionis defined, call it and return the result, skipping all the other steps listed below.
file-beginpattern is defined, search for it.
Dissection is performed in a loop, until there are no more messages (articles) in the digest. In all cases, the term "bol-search" means "Search for the given regular expression, and set point to the beginning of the line containing it. If the regular expression is not found, set point to the beginning of the current line." The dissection loop is:
first-articleis defined, bol-search for
article-begin-functionis defined, call it. Note that there is no first-article-function. However, the free variable
firstis available to
tfor the first article, so the effect of a
first-article-functioncan be achieved by testing
head-begin-function, call it. Otherwise, if
head-beginis defined, bol-search for it.
file-endis defined and we are looking at
file-end, terminate the loop. (Note that this means
file-endmust always match from the beginning of a line, no matter how the digest is formatted.)
head-end(default is "^$", i.e., a blank line). Save this as the end of the article header.
body-begin-functionis defined, call it to find the beginning of the body. Otherwise, bol-search for
body-begin(default "^\n"). Save the result as the beginning of the article body. Note that this step can potentially cause information to be ignored between the article header and body. Also note that because the pattern includes a newline instead of a dollar sign, the position saved is after the blank line rather than at it.
body-end-functionis defined, call it and use the resulting value of point.
body-end-functionmust return a non-
nilvalue or the following steps will be executed.
body-endis defined, bol-search for it.
body-end), search for the beginning of the following article using the procedure in Step 1 above, subparts (2) and (3).
file-endis defined, search backwards for it and go to the beginning of that line.
generate-head-functionis defined, call it to generate fake headers for the article. Otherwise, simply grab the lines between the beginning and end of the article header and call them the headers. In either case, add a "Lines:" header with a calculated line count. (Note: the important header material depends on what you show in your summary buffer. Typically, "Subject:", "From:", and maybe "Date:" are useful things to generate.)
Whew! That's a complicated mess. Fortunately, you often don't need to understand it in detail. It's documented above in case you need to debug something. But the general summary is:
-functionover a pattern.
first-articleas the pattern for article #1.
That makes it much simpler, right?
The second layer of processing comes when it's time to display the article. This is much simpler:
prepare-body-functionis defined, call it.
article-transform-functionis defined, call it.
I've found that the most important detail is that
article-transform-function needs to produce "proper"
headers. For example, the subject should be preceded by "Subject: "
(including the blank). I also find it very useful to create
"From:", "Cc:", and "Reply-To:" lines designed so that I can just use
the "reply" and "wide reply" features to reply to article authors or
the entire mailing list. Thus, for example, when I recognize Yahoo!
group digests I save the group name in
nndoc-yahoo-groups-cc, and the
nndoc-transform-yahoo-groups-article function inserts a
CC: line to that group. The result is that I can reply to an
individual or wide-reply to the entire group, as needed.
Here's a summary of all the options you can set for an
nndoc digest type. All "find" functions can leave point
anywhere in the line found;
nndoc will move to the
beginning of that line before proceeding. Unless otherwise specified,
all options are "if defined"; the default is to simply do nothing.
Also, all patterns and functions are used during dissection, with the
pointshould be somewhere on the first meaningful line of the article. NOTE: it may be necessary for this pattern to also match
nndoc-file-end, so that the EOF check in step 3 above can work.
prepare-body-function. Note that if necessary, you can extract information from the original unparsed article; see the Google Groups code for an example.
first-article. The difference is that
first-articlecan stand entirely alone, while
file-beginis followed by a search for either
first-article(if defined) or
file-endwill only work properly if either (a)
body-endare undefined, or (b) the
first-articleunset and use
file-beginto skip past the garbage at the front of the file.
nndoc-current-bufferto extract relevant information, then return to the original buffer and insert generated headers there. This function must modify the article buffer. Use an existing one as a guide for writing your own.
nndoc-file-end, so that the EOF check in step 3 above can work.
point-minif it wants to muck with the article headers as well; in this sense it duplicates
Rulesets are hard to write correctly. No matter how hard you try, you'll make mistakes, and then you're stuck with figuring out what went wrong.
One thing to remember is that
nndoc caches some
information for speed. Whenever you change your rulesets, go to a
different article than the one you're working on, and type "C-d" to
enter it. It doesn't matter if it's a digest or not; the point is to
get nndoc to clear its cache. Then return to the article in question
and try it again.
Some mistakes happen over and over again. Here are some common problems and suggested solutions:
*Article*buffer and check your patterns. Every option listed above is saved in
nndoc-option-name. For example, the
head-beginpattern is in
nndoc-head-begin. You can use
ESC :to execute an Elisp expression that experiments with those patterns. For example, use
ESC : (re-search-forward nndoc-first-article) RETto see if you're correctly finding the first article in the digest. Remember that
pointmust wind up on the first line of the article header (unless
head-begin-functionis going to correct it).
head-beginpattern that skips past the article beginning found by
head-beginshould be unset.
article-beginpattern that matches multiple lines, but no
body-endpattern. The result is that the end of the body extends into the beginning of the following article, so that a subsequent
article-beginsearch won't find the beginning of that article. The solution is to define a
body-endpattern that matches only the first line of the
article-beginpattern, or to define a
body-end-functionthat finds the beginning of the proper area. I often use the following
(defun nndoc-generic-body-end () (and (re-search-forward (concat nndoc-article-begin "\\|" nndoc-file-end) nil t) (goto-char (match-beginning 0)) (skip-chars-backward " \t\n") (if (eq (following-char) ?\n) (forward-char 1))) t)
head-beginpattern that skips past the article beginning found by
head-beginshould be unset.
head-endpattern takes you into the article body, so that the
body-beginpattern matches the blank line at the end of the article. Then
body-endmatches that same place.
generate-head-functionisn't creating plausible RFC-compliant headers.
article-transform-function. In the absence of proper headers,
gnusguesses that the first line of the article is a subject. But if the subject has a colon in it,
gnusgets confused. The solution is simple: insert "Subject: " (with the blank) in front of the first line.
If the above hints don't get you going, you're kind of up a creek. It
would be nice if there were some special functions to help debugging.
For example, it would be really cool to be able to go into an article
M-x nndoc-show-markers RET, and see
colorization that describes how
nndoc parsed the buffer.
Until then, you have two tools: experimenting with individual parameters, and stepping through the relevant code.
The very first thing to do is to verify that your
-type-p function works.
*Article* and type
ESC : (nndoc-foo-type-p)
RET where foo is the name of your added type (e.g.,
technews-summary). It should return
not, fix that function so that it correctly recognizes your digest.
Be as selective as possible; you don't want your TechNews recognizer
to try to parse RFC 1153-compliant digests.
If your type-recognition function seems to work, double-check it by
looking at the contents of
nndoc-article-type. If that's
wrong, some other type may have beaten you to the punch. Use the
second argument of
nndoc-add-type to control this
problem. Also, remember that if the type-recognition function returns
a number, it's taken as a priority, so be sure it returns
if it's certain it's found the correct type.
The next step is to check all your patterns. In
*Article*, search for each pattern you defined. If the
type recognizer succeeded, each pattern will be saved in a variable
with the same name, preceded by
nndoc-. So, for example,
ESC : (re-search-forward nndoc-first-article)
RET. Make sure each pattern matches what it's supposed to, and
that it leaves
point somewhere in the line that's at the
beginning or end of the header or body, as appropriate.
If none of this helps, you need the debugger. Before you start
debugging, make sure you have non-compiled code by explicitly loading
the file "
nndoc.el" (use the
to find it). In the group summary buffer, select the digest and use
C-u g to get the "raw" version that
looks at. Then use
M-x debug-on-entry RET nndoc-dissect-buffer
RET to set a breakpoint. Type
C-d to enter the
digest, hit "
d", then "
c". At this point
the buffer should have been dissected, and the results are available
in the variable
nndoc-dissection-alist. You can look at
the values with
ESC : nndoc-dissection-alist RET or
(better) go into the
*scratch* buffer to look at it. The
alist will be too long to see all of it, but you can check some of the
values to see if they look reasonable. Copy those values into another
window (I like to copy and paste into "cat >/dev/null" in a shell
window to record this sort of information). You can then go into the
*Article* buffer and use
M-x goto-char RET
to go to the various places in the buffer and see if they seem
If you have trouble generating the alist, or if it looks very wrong,
you can step through your dissection functions (if any) or
nndoc-dissect-buffer itself. While stepping, the command
ESC : (switch-to-buffer nndoc-current-buffer) RET will
put you into the buffer that is being dissected, so you can look at
what the functions are seeing. Likewise,
ESC : (switch-to-buffer nntp-server-buffer) RET will
put you into the article buffer that is being built. Note that the
latter includes special NNTP codes; those aren't a mistake.
If the alist looks OK and you can get a group summary, but can't see
an individual article correctly, you probably have display-related
M-x cancel-debug-on-entry RET
nndoc-dissect-buffer RET to turn off debugging, the
debug-on-entry RET nndoc-request-article RET to set a new
breakpoint. This time, use only
d to step through the
function. After the second time
is called, you can use
ESC : (switch-to-buffer buffer)
RET to temporarily get into the scratch buffer where the
article is being built. This will let you see what the transformation
functions are about to work on. Use
C-x b RET to return
to the debugger buffer, and step through your own code with
d. At any time, you can see the current state of the
point) by repeating the
(switch-to-buffer buffer) RET command. (A handy shortcut is
C-x ESC ESC, which repeats the last command—often,
switch-to-buffer command, or at least you can
get there with a few
Finally, if you are getting inexplicable behavior (i.e., the changes you make don't seem to take effect, or you breakpoint on a function that you know is being called and the debugger isn't entered), try exiting GNUS and reentering. Sometimes, stuff gets cached in weird places.
It's fair to say that the debugging process is sometimes painful.
However, the end result is well worth it: you type
a big digest with tons of messages, and they're nicely broken up (and
even threaded) for your reading convenience.
This text is explicitly placed in the public domain. Feel free to use it, extend it, modify it, abuse it, or destroy it as you wish.