INN Architecture Guide

 $Id: architectures.pod,v 1.4 1999/11/10 14:47:02 esamsono Exp esamsono $
 By Elena Samsonova, <elena.inn@inter.nl.net>.

Position among the INN documentation

From most general to most detailed, the INN documents are structured in the following way:

Relation to other INN documentation

This is the INN Architecture Guide. This document gives you an overview and explanation of the major parts of INN and how they all fit together in various configurations. Several distinct configurations are described, although many more are possible. When you are ready to implement an architecture of your choice, switch to the INN Implementation Guide for configuration details. For greater detail yet please refer to the Install document and manual pages.

For help in choosing a suitable architecture, please refer to the INN Cookbook. For something more general still, refer to the Readme.

How to use it?

Read the section that describes your chosen architecture, that should make you familiar with your future setup. Then continue with the INN Implementation Guide to put it in existence.


Introduction

INN is a package of various programs and scripts meant for different purposes, and different system and server configuration may require different programs. This section describes several most commonly used configurations. Other configurations are possible too, of course, as far as your fantasy goes.


Structure of this document

The sections below give detailed descriptions of three architectures: centralized, distributed with a shared spool and distributed based on article replication. Some aspects of INN remain the same for all architectures, they are outlined in section Common Architectural Aspects in the end of this document. However, they do feature on each of the architecture specific diagrams in order to show their place in it.

The document sheds some light on the interaction of the processes on the system but it does not explain how to get those processes to behave this way, the INN Implementation Guide does that.

The configurations presented here are the minimal ones that will get a given architecture to work. More complex aspects that may be used to improve performance are discusses in section Advanced Configuration. These too are common to all architectures.

The sections below contain references to relevant manual pages, although it is advisable to look into the INN Implementation Guide first.


Key to the Diagrams

The diagrams in this section use a color scheme to show different types of data flow within a news system. By following the arrows one can trace the path of news articles within it. The color codes are as follows:

Each diagram contains its name and identification so that the diagrams could be used separately (i.e. if your browser does not support graphics, you can display or print the diagrams with a different program). Please use diagram identification when asking questions about the diagrams. This document uses the following diagrams:

        centr.processes.jpg
        distr.shared_spool.servers.jpg
        distr.shared_spool.processes.jpg
        distr.shared_spool.reader.jpg
        distr.shared_spool.feeder.jpg
        distr.replication.servers.jpg
        distr.replication.processes.jpg
        distr.replication.reader.jpg
        distr.replication.feeder.jpg


Overview of Architectures

Hopefully you have looked in the INN Cookbook to determine an architecture best suitable for your situation. If you have not done that, please be warned that you cannot make your choice using solely this document: you may pick a wrong architecture.

Now the responsibility off my shoulders, let's look at some architectures.

As outlined in the INN Cookbook we are basically looking at two architectural types: centralized and distributed. While centralized architecture is fairly fixed, distributed architecture gives all the freedom to your imagination. This guide describes two most commonly used distributed architectures: one with a shared spool and one based on article replication.

Centralized architecture

This is basically a single server providing all the functionalities of a news server: it handles incoming feed, serves articles to users and accepts and sends out their postings.

Application: small servers.

Advantages:

Disadvantages:

 

Distributed architecture with a shared spool

This system separates out the feeder and the reader functionalities of a news server allocating a separate machine that handles incoming and outgoing feed. The system can have multiple reader machines that handle user connections and serve articles to them. The article spool itself is located on a shared media, either a dedicated NFS server or a disk array of the feeder machine accessed by the readers via NFS.

Application: large systems.

Advantages:

Disadvantages:

 

Distributed architecture based on article replication

This system builds a hierarchy of news servers each of which contains the entire article spool identical to each other. The readers are being fed from the feeder which ensures transparency to the user (i.e. all the readers behave in the same way).

Application: large systems.

Advantages:

Disadvantages:

Distributed architecture based on caching

This architecture uses a news server of centralized architecture and an array of news caches (e.g. with nntpcache). Note that the caches are not the same as readers, they do not consult the spool themselves but instead query the news server for articles. Then they cache the articles pretty much the same way as web caches cache pages.

This architecture is not described in further detail in this version of the document, I shall add it later on.


Centralized (Single Server) Architecture

In a centralized system, one machine runs a set of programs that handle incoming feed, outgoing feed and user connections for reading and posting. A stand-alone configuration is also possible when the server is used for internal purposes only and no incoming or outgoing feed is needed.

The core of the system

The system runs an innd daemon which handles incoming feeds, manages the active and history files, as well as the article spool, and listens on port 119 and accepts user connections. For each accepted connection it spawns a child nnrpd process which handles further interaction with the user.

        innd(8), nnrpd(8), inn.conf(5), readers.conf(5)  

Handling user connections

Each nnrpd process reads the active and history files to find article information, fetches requested articles from the spool and sends them to the user. It also accepts user postings.

        active(5), history(5) 

User postings are first pulled through a filter, filter_nnrpd, which is a Perl or a Tcl/Tk script. It is loaded when nnrpd starts up for subsequent use. The filter may reject certain postings, in which case the user gets an error back. If a postings passes through the filter, nnrpd passes it on to innd which pulls it through its own filter filter_innd and returns an error in case of problems. nnrpd then forwards the error message to the user. nnrpd does not attempt to store user postings in the spool.

It is also possible to break up the innd-nnrpd coupling here which may offer performance benefits for the cost of increased complexity. In this case we are really talking about a distributed system where the feeder and the reader are incidentally running on the same physical computer. If you want to experiment with that, read about distributed architectures.

Using anti-spam filters

See section Using anti-spam filters.

Handling news feed

See section Handling news feed.

Article expiration and reporting

See section Article expiration and reporting.

Watching over the system

See section Watching over the system.


Distributed Architecture with a Shared Spool

In a distributed architecture with a shared spool there is one feeder machine that handles incoming and outgoing news feed, multiple reader machines that handle user connections and a news store which contains all shared data. The main idea is to store the data only once so that the readers can remain reasonably light weight machines.


Server Level Overview

Figure below depicts overall architecture on server level. Functions of the readers are all identical, so there may be as many of them as necessary to cope with the load, which provides for horizontal scaling.

Readers

The readers accept user connections, read articles from the spool and deliver them to the users, and accept user postings and forward them to the feeder. The readers do not write either to the spool or to the database files located on the news store.

Feeder

The feeder accepts incoming feeds from external peers and user postings from the readers and writes them to the spool and sends them out to the Internet to the external peers. Note that the feeder replicates external news feed.

Shared store

The news store is merely a filer which hosts shared data using NFS. It can be a dedicated machine or a set of disks physically connected to one of the other machines in the architecture (the feeder would seem the most logical candidate for that).

Because of this functional split, the readers and the feeder are called the frontend, and the news store is called the backend.


Process Level Overview

Figure below depicts overall architecture on process level. The figure shows only one reader because all the readers have identical architecture.

Readers

The readers run nnrpd which handles user connections and spawns one process per user. It reads article information from the active and history files and the articles from the spool, and delivers them to the users. It accepts user postings and stores them in a batch. rnews is run periodically, it reads user postings from the batch and sends them to the feeder for propagation. The readers do not store user postings in the spool, as they don't register them in the database.

The readers run innreport daily which scans the log files and creates a daily usage report which is then mailed to the news administrator. The report reflects the user's behavior in reading news and posting.

Feeder

The feeder runs innd which handles news feeds. It accepts incoming news feeds from external peers and user postings from the readers, stores them in the spool and updates the active and history files accordingly. It also propagates news feed to external peers and sends out user postings. The feeder runs expire daily to purge old articles from the spool.

The feeder runs innreport daily which scans the log files and creates a daily feed report which is then mailed to the news administrator. The report contains statistics on incoming and outgoing feeds and article expiration.


Architectural Details

This section is meant to shed some light on the interaction of the processes on the reader and feeder systems, it does not explain how to get those processes to behave this way. See the INN Implementation Guide for further details.


Readers

Handling user connections

Figure above depicts the INN architecture on a reader. The system runs an nnrpd daemon (started up with the -D switch), which listens on port 119 and accepts user connections. For each accepted connection it spawns a child nnrpd process which handles further interaction with the user.

        nnrpd(8), inn.conf(5), readers.conf(5), moderators(5) 

Alternatively, nnrpd could be started by inetd from /etc/inetd.conf and /etc/services by specifying it for port 119. This ensures that the mother daemon will never die since there's no mother daemon in this case. However, if inetd dies, you're still in trouble. This approach is equivalent to running a mother daemon nnrpd -D because the program simply forks a new process for each incoming user.

        inetd(8), /etc/inetd.conf(5), /etc/services(5)

Serving articles

Each nnrpd process reads the active and history files to find article information, fetches requested articles from the spool and sends them to the user.

        active(5), history(5) 

Accepting postings

See section Accepting user postings.

Log rotation and reporting

For log file rotation and reporting purposes, news.daily is run daily. news.daily on the readers does not run expire. It spawns scanlogs which rotates the logs and calls innreport which analyzes them, creates a report and mails it to the news administrator.

        news.daily(8), scanlogs(8), innreport(8), innreport.conf(5)  


Feeder

Figure above depicts INN architecture on the feeder. The system runs innd daemon which handles incoming feeds and manages the active and history files, as well as the article spool.

        innd(8), active(5), history(5) 

Handling news feed

See section Handling news feed. The feeder handles all the incoming feed and makes all the necessary changes to the spool which includes control and cancel article processing.

Article expiration and reporting

See section Article expiration and reporting.

Watching over the system

See section Watching over the system.


Distributed Architecture with Replication

In a distributed architecture with replication there is one feeder machine which handles incoming and outgoing news feed, and multiple readers which handle user connections. The feeder also replicates the news feed to each of the readers which store it on their local disk arrays. The feeder makes sure that the readers have synchronized data. This configuration is an alternative to a shared data store.


Server Level Overview

The figure below depicts the server level overview of the architecture. Functions of the readers are identical, so there may be as many of them as necessary to provide for horizontal scaling.

Readers

The readers accept user connections, read articles from their local spools and deliver them to the users, and accept user postings and forward them to the feeder. The readers do not update their own spools as those are being updated by the feeder via replication.

Feeder

The feeder accepts incoming feeds from external peers and user postings from the readers, writes them to its local spool, replicates them to the spools of the readers and sends them out to the Internet to the external peers. Note that the feeder replicates external news feed.


Process Level Overview

Figure below depicts overall architecture on process level. The figure shows only one reader because all the readers have identical architecture.

Feeder

The feeder runs innd which handles news feeds. It accepts incoming news feeds from external peers and user postings from the readers, stores them in its local spool and updates the active and history files accordingly. It also propagates news feed and user postings to the readers and external peers. The feeder runs expire daily to purge old articles from the spool.

The feeder runs innreport daily which scans the log files and creates a daily feed report which is then mailed to the news administrator. The report contains statistics on incoming and outgoing feeds and article expiration.

Readers

The readers run two separate news processes: innd which handles the news feed coming in from the feeder, and nnrpd daemon which listens on port 119 and spawns one child nnrpd process per incoming user connection. Note that innd and nnrpd daemon run on different ports in order to avoid conflicts.

An nnrpd process reads article information from the active and history files and the articles from the spool, and delivers them to the users. It accepts user postings and stores them in a batch. rnews is run periodically, it reads user postings from the batch and sends them to the feeder for propagation. The readers do not store user postings in the spool, as they don't register them in the database in order to ensure that each posting gets the same article number on all the readers.

The readers run their own expire daily to purge old articles from the spool. This way the feeder's spool can be significantly smaller than the readers' one.

Note that if you are deploying the reader group as one big server (i.e. transparent to the user), it is smart to configure all the readers the same way.

The readers run innreport daily which scans the log files and creates a daily usage report which is then mailed to the news administrator. The report reflects the user's behavior in reading news and posting.


Architectural Details

This section is meant to shed some light on the interaction of the processes on the reader and feeder systems, it does not explain how to get those processes to behave this way. See the INN Implementation Guide for further details.


Readers

Handling incoming feed

The system runs an innd daemon on a feed port which is not 119. This port is determined by the system architect and serves for transmitting news feed from the feeder to the readers. It may not be 119 in order to avoid conflicts with the nnrpd daemon also run on the readers (see below).

innd accepts incoming feed, stores it in the local spool and updates active and history files accordingly.

        innd(8), inn.conf(5)

Handling user connections

The system runs an nnrpd daemon (started with the switch -D which listens on port 119 and accepts user connections. For each accepted connection it spawns a child nnrpd process which handles further interaction with the user.

        nnrpd(8), inn.conf(5), readers.conf(5), moderators(5) 

Alternatively, nnrpd could be started by inetd from /etc/inetd.conf and /etc/services by specifying it for port 119. This ensures that the mother daemon will never die since there's no mother daemon in this case. However, if inetd dies, you're still in trouble. This approach is equivalent to running a mother daemon nnrpd -D because the program simply forks a new process for each incoming user.

        inetd(8), /etc/inetd.conf(5), /etc/services(5)

Serving articles

Each nnrpd process reads the active and history files to find article information, fetches requested articles from the spool and sends them to the user.

        active(5), history(5) 

Accepting postings

See section Accepting user postings.

Article expiration and reporting

See section Article expiration and reporting. Note that the readers run their own expire to purge the spool. Since the articles coming in are the same, the spools should also be the same. If you want to keep all the readers identical, you should also configure expire the same way on each of them.

Control articles processing

The readers do not process control articles because they do not get any. The feeder alone makes selection and processes the articles. The readers synchronize their active files with that of the feeder on a daily basis using actsync. This way it is ensured that they have the same groups as the feeder.

The disadvantage of this method is that the readers refuse the articles of the groups that have not yet been created there but were already created on the feeder. Therefore you miss the first day of postings.

Watching over the system

See section Watching over the system.


Feeder

Figure above depicts INN architecture on the feeder. The system runs innd daemon which handles incoming feeds, manages its own active and history files and the article spool and propagates the news feed to the readers.

        innd(8), active(5), history(5) 

Handling news feed

See section Handling news feed. The feeder propagates news feed to the readers except for control articles which the feeder processes. Cancel articles however are both processed by the feeder and passed on to the readers so that they may purge canceled articles from their spools as well.

Article expiration and reporting

See section Article expiration and reporting.

Watching over the system

See section Watching over the system.


Common Architectural Aspects


Handling news feeds

This section is applicable to any of the architectures described in this document, and probably to a good deal of architectures that you think up yourself.

Accepting incoming feed

innd can be configured to accept incoming feed from several external peers. User postings forwarded by the nnrpd processes (see above) are also handled as incoming feed. All the incoming articles are first pulled through a filter, filter_innd, which is loaded at startup, see an anti-spam filter discussion in section Using anti-spam filters. The filter logs relevant information in files news.log and news about the articles it drops. An overview of anti-spam filters can be found on URL <http://www.exit109.com/~jeremy/news/antispam.html >. Some of the most popular filters include cleanfeed and Spam Hippo.

        incoming.conf(5), news, news.log, control.ctl(5)  

When an article makes it through the filter, innd registers it in the active and history file and stores it in the article spool. If configured, innd also sends the article to the corresponding external peer, either via a channel or via a batch (see below).

        newsfeeds(5), moderators(5)  

Processing control articles

A special type of article is meant for news group management, i.e. creation and deletion of news groups. Such articles are called control articles and are always posted in the hierarchy control.*. The articles are processed by ctlinnd invoked either manually by the news administrator or automatically by innd. See the INN Implementation Guide for further details.

        ctlinnd(8), control.ctl(5)

Beware that some wicked people stage so called ``control article attacks'' on Usenet every so often. They send out tens of thousands of control articles for group creation and/or deletion which, if processed, caused extreme load on news servers, both on the INN software, and on the file system. It is therefore very important to configure your control article processing in a robust way.

Processing cancel articles

Yet another special type of article is meant to cancel posted articles, i.e. cause their removal from the article spools of all news servers all over the world. Well, as far as you can believe in this idyll. :-) Such articles are always posted in the cancel.* hierarchy and when processed, they cause innd to remove the corresponding articles from the spool and history.

Beware that some wicked people stage so called ``cancel attacks'' every so often. They engage powerful computers to send out cancel articles for every single article currently on Usenet which causes exceptionally high load on news servers world wide and may in fact take down some of the heavy loaded ones with little reserve capacity. That, not to speak of the fact that if everything is canceled, there is no more news in the world. :-(

Sending outgoing feed: low volume

For peers that receive low volume feed, a news administrator can choose to use the batch method. It therefore spools relevant articles to batch files (one per peer) for further processing. nntpsend is called on a regular basis from cron which examines the batch files and spawns one innxmit process per peer, according to peer configuration. innxmit establishes connection with the peer, transfers the articles and closes the connection when done.

        nntpsend(8), nntpsend.ctl(5), passwd.nntp(5), innxmit(8), cron(8)  

Sending outgoing feed: high volume

For peers that receive high volume feed, as well as for peers that receive identical feed, a news administrator can choose to use the channel method. It spawns innfeed at startup and opens a channel to it. Every time innd finds an article to be fed to the peers, it sends it to the innfeed channel. innfeed is configured to feed multiple peers with the same articles from the channel. It manages connections to the peers and writes backlogs in case a peer is unavailable or too slow. innfeed writes one backlog file per peer. The backlog is truncated to a specified length in order to prevent disk space overflow. When this happens, the peer is said to miss articles. innfeed processes its backlogs when it the peer comes back on-line.

        innfeed(8), innfeed.conf(5)  


Accepting user postings

This section applies to distributed architectures. It may also be used for a centralized system but then you are really stretching it and should probably be thinking of rebuilding it into one of the distributed forms.

nnrpd accepts user postings. They are first pulled through a filter, filter_nnrpd, which is a Perl or a Tcl/Tk script. It is loaded when nnrpd starts up for subsequent use. The filter may reject certain postings, in which case the user gets an error back. See section Using anti-spam filters for more information.

If a posting passes through the filter, there are two configurations possible: either nnrpd immediately connects to the feeder and forwards the posting, or nnrpd stores it in a batch to be sent to the feeder. In either case however nnrpd does not attempt to store user postings in the spool. The first option has the following properties:

The second option has the following properties:

When the second option is used, rnews is run on a regular basis from cron to send user postings to the feeder. It processes the batch created by nnrpd and attempts to make a connection to the feeder. If the feeder is temporarily down or does not accept connections for some other reason, rnews leaves the articles in the batch. Next time it is started, it will try again.

        rnews(8), cron(8) 


Using anti-spam filters

I made several references to anti-spam filters throughout this document. We see two filter hooks: filter_nnrpd and filter_innd. Note that it is not compulsory to actually hang filters on them. The hooks have the following purposes:

Depending on your anti-spam policies, you can either use one of the filters, or both of them, or none at all. There are a few important remarks to make about filter usage.

  1. When filter_nnrpd rejects a posting, nnrpd automatically notifies the user. This may not always be your intention since it allows spammers to locate holes in your filter and thus avoid it.

  2. When filter_innd rejects a news article, it may or may not send a notification to the user depending on the method nnrpd uses to forward postings to innd, see section Accepting user postings. In the light of the previous item, not sending error reports to the users may be the very behavior that you need in order to fight spammers.

  3. Anti-spam filters require a considerable performance to run, especially so on nnrpd where each child process needs to read in and interpret the filter. nnrpd does that at start up, so even if the user is not going to post anything, the filter is still processed. On innd on the other hand, performance requirements are caused by the sheer volume of news feed that needs to be scanned.

  4. In view of the previous items, it may not be a good idea to run both filters on a single server architecture. Distributed architectures however may benefit from using both filters if the filters are orthogonal and the machines have sufficient processing power. In this case, filter_nnrpd could be used to teach well-meaning but illiterate users what is and what is not allowed in Usenet (e.g. reject postings with subjects like ``Make money fast''), whereas filter_innd could be deployed against conscious spammers.


Article expiration and reporting

Article expiration

news.daily is run daily for article expiration, log file rotation and reporting purposes. For article expiration news.daily spawns expire which processes the history database purging entries for articles to be expired. It produces a list of articles to be removed from the spool, and renumbers the active file to reflect changes. For a traditional spool, expire calls fastrm to actually remove the articles on the expire list from the spool.

        news.daily(8), expire(8), expire.ctl(5), fastrm(8)  

Log rotation and reporting

For log rotation and reporting purposes, news.daily calls scanlogs, which analogous to the one on the readers, rotates the log files and calls innreport to process them, create a report and mail it to the news administrator.

        scanlogs(8), innreport(8)


Watching over the system

There is a separate program that maintains innd, called ctlinnd, and another special program that watches over innd, called innwatch. News group maintenance is also done with ctlinnd. See INN Implementation Guide for further details.

        control.ctl(5), ctlinnd(8), innwatch(8), innwatch.ctl(5)  


Advanced Configuration


Using Different Ports

I've been using port 119 as the standard news port throughout the document. Indeed, port 119 is reserved for running news. However, it does not mean that a different port cannot be used. Indeed, distributed architecture with replication uses this to run innd on the readers on a different port than nnrpd.

Generally, if you have a regular news server, you will want to run at least nnrpd on port 119 so that your users would not need to reconfigure their news readers. In all other situations, decide for yourself.


Using Overview

FIXME: to do.


Configuring Article Storage

The INN Implementation Guide and Install documents have a detailed description of different storage methods and the ways to configure them. In this section, I want to give a general overview of the possibilities with their pros and cons so that you could plan your article spool.

INN supports the following article storage methods:

Traditional spool

This is a very simple method which stores articles in individual text files in a directory structure. For example, article 12345 in news.software.nntp is stored as news/software/nntp/12345 relative to the root of the article spool.

This method presents a major challenge for most UNIX file systems because they usually limit the number of files that one directory can handle with an adequate performance. With the current news volume some groups can have over 10,000 articles which by far exceeds most file system limitations. However, there exist file systems that use a different structure and no longer have problems with large directory sizes. In this case this method can be successfully used.

Time hashed spool

Articles are stored as individual files as in a traditional spool but are divided into directories based on the arrival time to ensure that no single directory contains so many files as to cause a bottleneck.

Although this method no longer presents large directory challenges to the file system, it requires a higher overhead to maintain time hashed directories. Also, it is no longer possible to easily find articles in the physical spool by hand as is the case with the traditional spool.

Time hashed buffered spool (timecaf)

Similar to the time hashed spool described above, articles are stored by arrival time but instead of writing a separate file for each article, multiple articles are put in the same file. This is where the name comes from: ``caf'' stands for crunched article file.

This method shows roughly four times better performance than time hashed spool by reducing file system overhead for file creation.

Cyclic buffer spool (CNFS)

A cyclic buffer spool is a pre-configured buffer file where articles are stored. When the end of the buffer is reached, new articles overwrite the oldest articles. This is where the name comes from: CNFS stands for cyclic news file system.

Perhaps the most important property of this method besides its excellent performance, is that your server can never run out of disk space. A direct consequence of this is however that if your buffer is too small, you are not retaining the articles long enough. And because the buffer is cyclic and the oldest articles are overwritten automatically, you have no control on the period of time that you want to keep the articles in the spool. See also the quality of service discussion in the INN Cookbook.

To give you an example, you may want to put high volume explosive groups in a cyclic buffer especially if you don't need to guarantee retention time on them.

So which method to choose? The good news is that you don't really have to choose one single method for the whole spool but instead can use different methods for different parts of the spool. Please refer to the INN Implementation Guide for the details on the storage methods.

In order to design a storage system, you need to look at the groups you plan to carry and allocate sub-hierarchies for different methods. Note that you can have several cyclical buffers in the system for different retention criteria.