
InterNet News How-To |
| Abstract | This document is an attempt to plot out a deployment road map for INN
2.1. The document needs to be revised upon an upgrade. New documentation for INN 2.3 and higher can be found here. It is not 100% complete yet but there's more there than here. |
| Author | Elena Samsonova - World Access / Planet Internet |
| Date | December 18, 1998 |
| Version | 1.1 |
Since I only run one particular configuration and often follow plain vanilla choices, I cannot know all the intricacies of other sites, therefore contributions are welcome! All the author rights will be preserved, in the sense that the author's name will appear next to his/her contribution.
Parts of the document that are not yet
completed, are marked in blue.
They
ask for a contribution! Click on the pig. :)
This document is not a transcript of the manuals, and by no means a replacement for them! Although it may be a great disappointment to those who don't like to RTFM :) I'm still not going to copy and paste manuals into here. Instead, every section that talks of programs or scripts, lists relevant manual pages on the right. As one of the greatest difficulties in installing INN is finding the right man page, this is supposed to help.
One
of the best sources I found so far, and the one I used mostly, is presented
by Forrest J. Cavalier, Mib Software (http://www.mibsoftware.com/userkt).
Other nice pages???
Many thanks to the following people who sent me very useful comments:
Some of the functionalities are not covered in this How-To simply because I have insufficient knowledge on them, therefore please contribute!! At least the following parts are not covered:
Table of Contents |
System Architecture |
Table
of Contents |
INN is a package of various programs and scripts meant for different purposes, and different system and server configuration may require different programs. This section describes several most commonly used configurations, as far as I got input on those that I don't run myself.
For now, the section contains two configurations: centralized and distributed with separate multiple readers and a single feeder. Input about other configurations is welcome.
In a centralized system, one machine runs a set of programs that handle incoming feed, outgoing feed and user connections for reading and posting. A stand-alone configuration is also possible when the server is used for internal purposes only and no incoming or outgoing feed is needed.
In a distributed system not every machine runs the same programs. INN, being a package of multiple programs working together, offers many mix-and-match possibilities.
Figure 1 depicts overal architecture of a centralized system. This section is meant to shed some light on the interaction of the processes on the system, it does not explain how to get those processes to behave this way. See section Implementation Guide for further details.
| The system runs an innd daemon which handles incoming feeds, manages the active and history files, as well as the article spool, and listens on port 119 and accepts user connections. For each accepted connection it spawns a child nnrpd process which handles further interaction with the user. |
|
| Each nnrpd process reads the active and history files to find article information, fetches requested articles from the spool and sends them to the user. It also accepts user postings. |
|
| User postings are first pulled through a filter, filter_nnrpd, which is a Perl script. It is loaded when nnrpd starts up for subsequent use. The filter may reject certain postings, in which case the user gets an error back (see note). If a postings passes through the filter, nnrpd passes it on to innd. nnrpd does not attempt to store user postings in the spool. | |
innd
is configured to accept incoming feed from several external peers. All
the incoming articles are first pulled through a filter which is loaded
at startup. One of the popular filter scripts is cleanfeed. Contributions
about other filters are welcome! The filter drops rejected
articles silently. It does however log relevant information. |
|
| When an article makes it through cleanfeed, innd registers it in the active and history file and stores it in the article spool. If configured, innd also sends the article to the corresponding external peer, either via a channel or via a batch. |
|
| For peers that receive low volume feed, a
news administrator can choose to use the
batch method. It therefore spools relevant articles to batch files (one
per peer) for further processing. nntpsend is called on a regular
basis from cron which examins the batch files and spawns one innxmit
process per peer, according to peer configuration. innxmit establishes
connection with the peer, transfers the articles and closes the connection
when done. Note that when a peer goes down ungracefully (without closing
connection), innxmit hangs. It is possible to install a script
on the feeder which checks for peers and kills hanging innxmit
processes if necessary.
|
|
| For peers that receive high volume feed, as well as for peers that receive identical feed, a news administrator can choose to use the channel method. It spawns innfeed at startup and opens a channel to it. Every time innd finds an article to be fed to the peers, it sends it to the innfeed channel. innfeed is configured to feed multiple peers with the same articles from the channel. It manages connections to the peers and writes backlogs in case a peer is unavailable or too slow. innfeed writes one backlog file per peer. The backlog is truncated to a specified length in order to prevent disk space overflow. When this happens, the peer is said to miss articles. innxmit does not process backlogs; a separate program (e.g. innxmit) should be called to do that afterwards. |
|
| news.daily is run daily for article expiration, log file rotation and reporting purposes. For article expiration news.daily spawns expire which processes the history database purging entries for articles to be expired. It produces a list of articles to be removed from the spool, and renumbers the active file to reflect changes. expire calls fastrm to actually remove the articles on the expire list from the spool. |
|
| For log rotation and reporting purposes, news.daily calls scanlogs, which analogous to the one on the readers, rotates the log files and calls innreport to process them, create a report and mail it to the news administrator. |
|
| There is a separate program that maintains innd, called ctlinnd, and another special program that watches over innd, called innwatch. News group maintenance is also done with ctlinnd. See Implementation Guide for further details. |
|
Figure 2 depicts overal architecture on server level. Functions of the readers are all identical, so there may be as many of them as necessary to cope with the load, which provides for horisontal scaling.
The readers accept user connections, read articles from the spool and deliver them to the users, and accept user postings and forward them to the feeder. The readers do not write either to the spool or to the database files (in ~news/db).
The feeder accepts incoming feeds from external peers and user postings from the readers and writes them to the spool and sends them out to the internet to the external peers. Note that the feeder replicates external newsfeed.
The newsstore is merely a filer which hosts shared data.
Because of this functional split, the readers and the feeder are called the frontend, and the newsstore is called the backend.
Figure 3 depicts overal architecture on process level. The figure shows only one reader because all the readers have identical architecture.
The readers run nnrpd which handles user connections and spawns one process per user. It reads article information from the active and history files and the articles from the spool, and delivers them to the users. It accepts user postings and stores them in a batch. rnews is run periodically, it reads user postings from the batch and sends them to the feeder for propagation. The readers do not store user postings in the spool, as they don't register them in the database.
The feeder runs innd which handles newsfeeds. It accepts incoming newsfeeds from external peers and user postings from the readers, stores them in the spool and updates the active and history files accordingly. It also propagates newsfeed to external peers and sends out user postings. The feeder runs expire daily to purge old articles from the spool.
All frontend machines also run innreport daily which scans the log files and creates a daily report which is then mailed to the news administrator.
This section is meant to shed some light on the interaction of the processes on the reader and feeder systems, it does not explain how to get those processes to behave this way. See section Implementation Guide for further details.
| Figure 4 depicts INN architecture on a reader. | |
| The system runs an nnrpd daemon (started up with the -D switch), which listens on port 119 and accepts user connections. For each accepted connection it spawns a child nnrpd process which handles further interaction with the user. |
|
| Alternatively, nnrpd
could be started by inetd from /etc/inetd.conf and /etc/services
by specifying it for port 119. This ensures that the mother daemon will
never die since there's no mother daemon in this case. However, if inetd
dies, you're still in trouble.
This approach is equivalent to running a mother daemon nnrpd -D because the program simply forks a new process for each incoming user. |
|
| Each nnrpd process reads the active and history files to find article information, fetches requested articles from the spool and sends them to the user. It also accepts user postings. |
|
| User postings are first pulled through a filter, filter_nnrpd,
which is a Perl script. It is loaded when nnrpd starts up for
subsequent use. The filter may reject certain postings, in which case the
user gets an error back ( see note ).
Note: I am currently evaluating a patch to nnrpd which will allow to reject certain postings without returning an error message to the users. If a posting passes through the filter, there are two configurations possible: either nnrpd immediately connects to the feeder and forwards the posting, or nnrpd stores it in a batch to be sent to the feeder. In either case however nnrpd does not attempt to store user postings in the spool. The first option has the following properties:
Note: you may not always want to notify your users that their spam has been dropped as it would present a perfect way to find a work-around for your anti-spam filter. The second option has the following properties:
|
|
| When the second option is used, rnews is run on a regular basis from cron to send user postings to the feeder. It processes the batch created by nnrpd and attempts to make a connection to the feeder. If the feeder is temporarily down or does not accept connections for some other reason, rnews leaves the articles in the batch. Next time it is started, it will try again. |
|
| For log file rotation and reporting purposes, news.daily is run daily. news.daily on the readers does not run expire. It spawns scanlogs which rotates the logs and calls innreport which analyses them, creates a report and mails it to the news administrator. |
|
| Figure 5 depicts INN architecture on the feeder. | |
| The system runs innd daemon which handles incoming feeds and manages the active and history files, as well as the article spool. |
|
innd
is configured to accept incoming feed from several external peers and from
the readers. Note that the feeder does not see any difference between
external feed and user postings from the readers. All the incoming
articles are first pulled through a filter which is loaded at startup.
One of the popular filter scripts is cleanfeed. Contributions
about other filters are welcome! The filter drops rejected
articles silently, as there is no user to issue the error to. It does however
log relevant information. |
|
| When an article makes it through cleanfeed, innd registers it in the active and history file and stores it in the article spool. If configured, innd also sends the article to the corresponding external peer, either via a channel or via a batch. |
|
| For peers that receive low volume feed, innd uses the batch method. It therefore spools relevant articles to batch files (one per peer) for further processing. nntpsend is called on a regular basis from cron which examins the batch files and spawns one innxmit process per peer, according to peer configuration. innxmit establishes connection with the peer, transfers the articles and closes the connection when done. Note that when a peer goes down ungracefully (without closing connection), innxmit hangs. It is possible to install a script on the feeder which checks for peers and kills hanging innxmit processes if necessary. |
|
| For peers that receive high volume feed, as well as for peers that receive identical feed, innd uses the channel method. It spawns innfeed at startup and opens a channel to it. Every time innd finds an article to be fed to the peers, it sends it to the innfeed channel. innfeed is configured to feed multiple peers with the same articles from the channel. It manages connections to the peers and writes backlogs in case a peer is unavailable or too slow. innfeed writes one backlog file per peer. The backlog is truncated to a specified length in order to prevent disk space overflow. When this happens, the peer is said to miss articles. innxmit does not process backlogs; a separate program (e.g. innxmit) should be called to do that afterwards. |
|
| news.daily is run daily for article expiration, log file rotation and reporting purposes. For article expiration news.daily spawns expire which processes the history database purging entries for articles to be expired. It produces a list of articles to be removed from the spool, and renumbers the active file to reflect changes. expire calls fastrm to actually remove the articles on the expire list from the spool. |
|
| For log rotation and reporting purposes, news.daily calls scanlogs, which analogous to the one on the readers, rotates the log files and calls innreport to process them, create a report and mail it to the news administrator. |
|
| There is a separate program that maintains innd, called ctlinnd, and another special program that watches over innd, called innwatch. News group maintenance is also done with ctlinnd. See Implementation Guide for further details. |
|
Implementation Guide |
Table
of Contents |
There are a couple of things to pay attention to when configuring machines in a distributed system. This section describes those specific things. Note that I only describe deviations from the norm because explanations for standard values can be found in the manuals. However, the meaning and use of the various configuration files is outlined here.