Gnus Manual: 6.5 Browsing the Web

6.5 Browsing the Web

Web-based discussion forums are getting more and more popular. On many subjects, the web-based forums have become the most important forums, eclipsing the importance of mailing lists and news groups. The reason is easy to understand—they are friendly to new users; you just point and click, and there’s the discussion. With mailing lists, you have to go through a cumbersome subscription procedure, and most people don’t even know what a news group is.

The problem with this scenario is that web browsers are not very good at being newsreaders. They do not keep track of what articles you’ve read; they do not allow you to score on subjects you’re interested in; they do not allow off-line browsing; they require you to click around and drive you mad in the end.

So—if web browsers suck at reading discussion forums, why not use Gnus to do it instead?

Gnus has been getting a bit of a collection of back ends for providing interfaces to these sources.

The main caveat with all these web sources is that they probably won’t work for a very long time. Gleaning information from the HTML data is guesswork at best, and when the layout is altered, the Gnus back end will fail. If you have reasonably new versions of these back ends, though, you should be ok.

One thing all these Web methods have in common is that the Web sources are often down, unavailable or just plain too slow to be fun. In those cases, it makes a lot of sense to let the Gnus Agent (see section Gnus Unplugged) handle downloading articles, and then you can read them at leisure from your local disk. No more World Wide Wait for you.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.5.1 Archiving Mail

Some of the back ends, notably nnml, nnfolder, and nnmaildir, now actually store the article marks with each group. For these servers, archiving and restoring a group while preserving marks is fairly simple.

(Preserving the group level and group parameters as well still requires ritual dancing and sacrifices to the ‘.newsrc.eld’ deity though.)

To archive an entire nnml, nnfolder, or nnmaildir server, take a recursive copy of the server directory. There is no need to shut down Gnus, so archiving may be invoked by cron or similar. You restore the data by restoring the directory tree, and adding a server definition pointing to that directory in Gnus. The Article Backlog, Asynchronous Article Fetching and other things might interfere with overwriting data, so you may want to shut down Gnus before you restore the data.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.5.2 Web Searches

It’s, like, too neat to search the Usenet for articles that match a string, but it, like, totally sucks, like, totally, to use one of those, like, Web browsers, and you, like, have to, rilly, like, look at the commercials, so, like, with Gnus you can do rad, rilly, searches without having to use a browser.

The nnweb back end allows an easy interface to the mighty search engine. You create an nnweb group, enter a search pattern, and then enter the group and read the articles like you would any normal group. The G w command in the group buffer (see section Foreign Groups) will do this in an easy-to-use fashion.

nnweb groups don’t really lend themselves to being solid groups—they have a very fleeting idea of article numbers. In fact, each time you enter an nnweb group (not even changing the search pattern), you are likely to get the articles ordered in a different manner. Not even using duplicate suppression (see section Duplicate Suppression) will help, since nnweb doesn’t even know the Message-ID of the articles before reading them using some search engines (Google, for instance). The only possible way to keep track of which articles you’ve read is by scoring on the Date header—mark all articles posted before the last date you read the group as read.

If the search engine changes its output substantially, nnweb won’t be able to parse it and will fail. One could hardly fault the Web providers if they were to do this—their raison d’être is to make money off of advertisements, not to provide services to the community. Since nnweb washes the ads off all the articles, one might think that the providers might be somewhat miffed. We’ll see.

Virtual server variables:

nnweb-type

What search engine type is being used. The currently supported types are google, dejanews, and gmane. Note that dejanews is an alias to google.

nnweb-search

The search string to feed to the search engine.

nnweb-max-hits

Advisory maximum number of hits per search to display. The default is 999.

nnweb-type-definition

Type-to-definition alist. This alist says what nnweb should do with the various search engine types. The following elements must be present:

article: Function to decode the article and provide something that Gnus understands.
map: Function to create an article number to message header and URL alist.
search: Function to send the search string to the search engine.
address: The address the aforementioned function should send the search string to.
id: Format string URL to fetch an article by Message-ID.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.5.3 RSS

Some web sites have an RDF Site Summary (RSS). RSS is a format for summarizing headlines from news related sites (such as BBC or CNN). But basically anything list-like can be presented as an RSS feed: weblogs, changelogs or recent changes to a wiki (e.g., http://cliki.net/site/recent-changes).

RSS has a quite regular and nice interface, and it’s possible to get the information Gnus needs to keep groups updated.

Note: you had better use Emacs which supports the utf-8 coding system because RSS uses UTF-8 for encoding non-ASCII text by default. It is also used by default for non-ASCII group names.

Use G R from the group buffer to subscribe to a feed—you will be prompted for the location, the title and the description of the feed. The title, which allows any characters, will be used for the group name and the name of the group data file. The description can be omitted.

An easy way to get started with nnrss is to say something like the following in the group buffer: B nnrss RET RET y, then subscribe to groups.

The nnrss back end saves the group data file in nnrss-directory (see below) for each nnrss group. File names containing non-ASCII characters will be encoded by the coding system specified with the nnmail-pathname-coding-system variable or other. Also See section Accessing groups of non-English names, for more information.

The nnrss back end generates ‘multipart/alternative’ MIME articles in which each contains a ‘text/plain’ part and a ‘text/html’ part.

You can also use the following commands to import and export your subscriptions from a file in OPML format (Outline Processor Markup Language).

Function: nnrss-opml-import file: Prompt for an OPML file, and subscribe to each feed in the file.

Function: nnrss-opml-export: Write your current RSS subscriptions to a buffer in OPML format.

The following nnrss variables can be altered:

nnrss-directory: The directory where nnrss stores its files. The default is ‘~/News/rss/’.
nnrss-file-coding-system: The coding system used when reading and writing the nnrss groups data files. The default is the value of mm-universal-coding-system (which defaults to emacs-mule in Emacs or escape-quoted in XEmacs).
nnrss-ignore-article-fields: Some feeds update constantly article fields during their publications, e.g., to indicate the number of comments. However, if there is a difference between the local article and the distant one, the latter is considered to be new. To avoid this and discard some fields, set this variable to the list of fields to be ignored. The default is '(slash:comments).
nnrss-use-local: If you set nnrss-use-local to t, nnrss will read the feeds from local files in nnrss-directory. You can use the command nnrss-generate-download-script to generate a download script using wget.

The following code may be helpful, if you want to show the description in the summary buffer.

(add-to-list 'nnmail-extra-headers nnrss-description-field)
(setq gnus-summary-line-format "%U%R%z%I%(%[%4L: %-15,15f%]%) %s%uX\n")

(defun gnus-user-format-function-X (header)
  (let ((descr
         (assq nnrss-description-field (mail-header-extra header))))
    (if descr (concat "\n\t" (cdr descr)) "")))

The following code may be useful to open an nnrss url directly from the summary buffer.

(require 'browse-url)

(defun browse-nnrss-url (arg)
  (interactive "p")
  (let ((url (assq nnrss-url-field
                   (mail-header-extra
                    (gnus-data-header
                     (assq (gnus-summary-article-number)
                           gnus-newsgroup-data))))))
    (if url
        (progn
          (browse-url (cdr url))
          (gnus-summary-mark-as-read-forward 1))
      (gnus-summary-scroll-up arg))))

(eval-after-load "gnus"
  #'(define-key gnus-summary-mode-map
      (kbd "<RET>") 'browse-nnrss-url))
(add-to-list 'nnmail-extra-headers nnrss-url-field)

Even if you have added ‘text/html’ to the mm-discouraged-alternatives variable (see (emacs-mime)Display Customization section ‘Display Customization’ in The Emacs MIME Manual) since you don’t want to see HTML parts, it might be more useful especially in nnrss groups to display ‘text/html’ parts. Here’s an example of setting mm-discouraged-alternatives as a group parameter (see section Group Parameters) in order to display ‘text/html’ parts only in nnrss groups:

;; Set the default value of mm-discouraged-alternatives.
(eval-after-load "gnus-sum"
  '(add-to-list
    'gnus-newsgroup-variables
    '(mm-discouraged-alternatives
      . '("text/html" "image/.*"))))

;; Display ‘text/html’ parts in nnrss groups.
(add-to-list
 'gnus-parameters
 '("\\`nnrss:" (mm-discouraged-alternatives nil)))

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

This document was generated on January 25, 2015 using texi2html 1.82.

6.5.1 Archiving Mail
6.5.2 Web Searches		Creating groups from articles that match a string.
6.5.3 RSS		Reading RDF site summary.