$Id: cookbook.pod,v 1.3 1999/11/10 14:47:02 esamsono Exp esamsono $ By Elena Samsonova, <elena.inn@inter.nl.net>.
From most general to most detailed, the INN documents are structured in the following way:
Readme
INN Cookbook
INN Architecture Guide
INN Implementation Guide
Install
manual pages
This is the INN Cookbook. This document is meant to help you to choose one of the various possible configurations that INN can assume, depending on your needs and possibilities. Once a configuration is chosen, look it up in the INN Architecture Guide for an overview and explanation of the major parts. This knowledge will help you to read the INN Implementation Guide which describes various configuration aspects in detail. For greater detail yet please refer to the Install document and manual pages. For something more general than what's in this document, please refer to Readme.
Simply answer the questions below and find the most likely best configuration for your situation. Then continue with the INN Architecture Guide.
The Cookbook does not relieve you from using your brain at all times. :-)
Those of you who already know what news is and how Usenet works, may safely skip this section. If however you start having difficulties with the rest of the document, it may be helpful to come back here again.
When your boss calls you into his office and tells you: ``We want a news server and you are going to set it up'', you may find yourself wondering what the heck is news, let alone a news server. Well, read on!
News is a way to allow every single individual to tell the rest of the world what he thinks about it without having to kidnap CNN's CEO. News is a network of servers that exchange users' postings so that each of them ends up having the same copy of the whole mass of them. Well, more or less. It is really pretty much like buying your favorite news paper in the East of the country and in the West of the country and discovering that the same issue contains exactly the same articles so that in fact you don't have to rush to your local news stand but can get it anywhere.
Now, in order to organize this mass of articles somehow and make it easier for people to find postings of their interest, they are broken down into news groups. The whole collection of news groups together with the rules of their creation and of their use is called Usenet.
When you look at each particular news server in the news network, you can see its article spool as a file system with directories, one separate directory for each news group. You can see separate articles as files in those directories.
As with every filing system, the news group structure is not flat. Indeed, that would create a list of tens of thousands of news groups and it would be still too hard to browse through it. Therefore news groups are organized in branched hierarchies. On the top level, the division is as follows:
comp.* - computers
humanities.* - humanities
misc.* - miscellaneous stuff
news.* - everything about news
rec.* - recreation
sci.* - science
soc.* - society
talk.* - serious talk
alt.* - alternative, everything is allowed here
These are organized by country codes, for example:
at.* - Austria
de.* - Germany
nl.* - Netherlands
tw.* - Taiwan
uk.* - UK
Examples here include:
bionet.* - biology
compuserve.* - old CompuServe network news
These are set up by specific companies, for example:
borland.*
microsoft.*
In addition to these hierarchies, every news server can have its own local
hierarchy usually called local.* which does not leave the news server.
As you have most probably already guessed, the dot-asterisks (.*) in the
news group names indicate that there are branches in these hierarchies.
Indeed, for example the news group that discusses news servers and offers
help on INN is called news.software.nntp (and you can access it at http://www.dejanews.com if you don't
have your own news server set up yet).
So, how to decide which hierarchies to carry? Typically, you either carry a full feed, that is everything that exists in the world, or some subset of it. It is
usually a good idea to carry the Big Eight hierarchies as well as your
country one and perhaps some of your neighbors. The alt.*
hierarchy is a rather controversial one. It is by far the largest of all,
both in the number of groups in it and in the volume of postings and easily
constitutes 1/3 to 1/2 of Usenet's full feed. There is a lot of junk in it
but there are a lot of good things too. It is certainly one of the most
popular hierarchies and you are likely to get complaints if your users
don't find it on your server. Don't worry, there are ways to exclude parts
of hierarchies that you do not want to carry.
It is really quite simple. You either have a stand-alone news server or one connected to Usenet. If you are stand-alone, you can only carry your own local hierarchies and cannot exchange any information with the outside world. It may be appropriate for corporate news servers.
If you are a part of Usenet, you are connected to one or more of other news servers with which you exchange articles. They send you stuff and you send them stuff as well. You can be connected to Usenet in one of the two modes: you can be either an internal node or a leaf. Consider the figure below:

Here the blue rectangles are internal nodes and the green ellipses are leaf nodes. Internal nodes exchange high volume news feeds, that is they pass the news around the world and thus make sure that everyone gets his copy. That is why the lines between them have no arrows. Leaf nodes only receive high volume news feed but do not send it out anymore. They only send out the users' postings which is a very insignificant amount compared to all the news. That is why their lines have arrows pointing to them.
So what you need to decide is whether you want to become an internal node and pass the whole mass of news around, or a leaf node and only send out your own users' postings.
News group creation and management issues are discussed in the INN Implementation Guide, but if you have burning questions already now, you can read a document about the Big Eight creation process at http://www.eyrie.org/~eagle/faqs/big-eight.html . Other hierarchies have more or less similar rules and procedures.
If you intend to run a stand-alone news server or an isolated network of news servers without connecting to Usenet, you don't need anything else.
If you want to exchange news with the rest of the world, you need to get connected to Usenet. The other news servers with whom you exchange news are called news peers. As the absolutely minimal requirement you need to find a peer who will accept your users' postings and forward them on to the network. No, I did not forget the feed: you can get that out of the sky these days from a news satellite. There are companies that provide these services.
If you don't want to deal with satellites, you will also need one or more (better more than one!) peers who will feed you news. The only requirement for setting up peers is that you have network connectivity to them and enough bandwidth to send and receive the feed. For full news feed you need to count on some 3-4 Mb/s sustained traffic these days, and this number is growing rapidly.
So, if you still want to get into this, start defining your system in the next section.
For each of the sections that follow, calculate your score by adding up the points you get on each item. Find your section verdict in the end of the section. After you've gone through all sections, you can derive the final answer by combining the results of all sections.
Please note that news feed size is growing rapidly so that it is impossible for me to give precise numbers here. This should still be a fairly good indication though.
1 - non-binaries feed: less than 10 GB a day
2 - major hierarchies feed: around 30 GB a day
3 - full feed from the whole world: more than 50 GB a day
1 - less than 5
2 - 5 to 15
3 - more than 15
1 - less than 5
2 - 5 to 15
3 - more than 15
1 - filtering only your users' postings
2 - filtering all feed, incoming and outgoing
4 to 5: light feed requirements
6 to 8: medium feed requirements
9 to 11: heavy feed requirements
This section analyzes the type and intensity of usage that you expect.
0 - no readers at all
1 - less than 500
2 - 500 to 2000
3 - above 2000
0 - no usage
1 - reading text articles via a modem connection
2 - reading (err, watching) binary articles one by one via a fast
modem connection or ISDN
3 - sucking binaries (downloading many articles in one batch) via
ISDN or a fat fixed line
See also section Authentication Methods.
0 - no authentication (everyone who connects gets access)
0 - IP-based authentication (access granted according to rules
based on the users' origin IP addresses)
1 - less than 1000 users for local authentication (using an INN
configuration file or the system password file)
2 - more than 1000 users for local authentication
2 - authentication using an external database
Please note that this estimation is very coarse, your own situation may be different.
0 to 1: no news server usage
2 to 3: light news server usage requirements
4 to 5: medium news server usage requirements
6: heavy news server usage requirements
In this section you need to look at your scalability and quality of service requirements for some period in time. It is not always true that when in doubt, better take a larger system--larger systems require more maintenance with higher complexity. On the other hand, not planning ahead may force you to start from scratch when it appears that your existing configuration cannot handle growing requirements. A good trade-off is the hardest part.
1 - the level will remain constant
Note that in absolute numbers it will mean that your news feed will increase steadily. The tendency today is to nearly double each 6 months but at the time of your reading it, it may be even more.
2 - number of peers will double
3 - number of peers will increase 10 fold or more
please take the closest value here
3 - number of hierarchies will double (or more)
0 - at busy hours, the server may refuse incoming connections
1 - at busy hours, the users may experience a slow news server
with as much as 30 seconds delay before the article starts
appearing on the PC
2 - at busy hours, the users may experience a slight performance
degradation with at most 10 seconds delay before the article
starts appearing on the PC
3 - at busy hours, the users may not experience any performance
degradation, there may be no noticeable delay in article
retrieval
0 - will not grow
1 - will double
2 - hard to predict (but you assume it will grow real fast)
1 to 2: limited scalability requirements
3 to 5: regular scalability requirements
6 to 8: heavy scalability requirements, consider building a larger system right away
If your scalability requirements are heavy, you should consider building a larger system right away because the one you'd choose according to your current requirements is guaranteed not to be able to cope with your new requirements. Alternatively, you can opt for a medium size system that is too large for your current requirements and somewhat too small for your new requirements but which can scale better.
If you have light scalability requirements, you can stick to the smallest option, if you have regular requirements you should probably read about the smallest option and the next larger one and decide which way to go.
The table below gives an indication of the type of architecture most suitable for your requirements. There exist two basic types of architecture: single server and distributed (consisting of multiple servers). Please refer to the INN Architecture Guide for details on these architectures.
Server types are rated as light, medium and heavy, and regardless of the actual computer manufacturer and OS (as long as it is a Unix) are graded as follows:
light: single CPU, 128 MB RAM
medium: single powerful CPU or two less powerful ones, 512 MB RAM
heavy: four CPUs, 1 GB RAM
Disk sizes are not included in the server rating because they depend on the amount of articles that you want to keep and are discussed in section Article Spool Considerations.
f e e d r e q u i r e m e n t s :
light medium heavy
.....................................................
none : single light single medium single heavy
: server server server
:
u light : single medium single heavy single heavy
: server server server
s :
medium : single heavy d i s t r i b u t e d :
a : server medium feeder heavy feeder
: single medium reader or
g : multiple light readers
:
e heavy : d i s t r i b u t e d :
: light feeder medium feeder heavy feeder
: m u l t i p l e r e a d e r s
This section helps you to determine the amount of disk space you will need for your system depending on the feed you expect to receive. The INN Implementation Guide and the Install document contain detail information on the choice of a spooling method and its configuration.
First, you need to decide on the quality level of your news service that you are going to provide. Among other concerns, the period of time that you keep articles on your server is an important parameter. Consider that many people only really have the time to read news during the weekend, so if you keep the articles for anything shorter than 7 days, your users will miss stuff. 10 days would give them a nice overlap while 15 days would ensure that they can miss a weekend and still get all the news.
On the other hand, the longer you want to keep articles on the server, the more disk space you need. For example, keeping 10 days worth of binary pictures would ask for anything between 200 GB and 500 GB of disk space and this is probably not what you want. So what to do?
The good news is that you don't have to keep all the articles the same period of time but can set up a fairly fine grain configuration specifying up to a newsgroup how long it should be kept. This will allow you to keep text groups longer than binaries, for example. See INN Implementation Guide for sample configurations that can give you ideas.
So, at this point in order to determine the disk space you will need, use one of the following simple formulas:
total disk space = ( GB per day * 1/4 ) * days you keep text articles +
( GB per day * 3/4 ) * days you keep binaries +
10 GB for supporting files
total disk space = GB per day * days you keep text articles +
6 GB for supporting files
Note that the values you will get here are only approximate and are not significantly better than an educated guess, but they do give you an indication. Make sure that you can add disk space as needed in case your estimation was wrong.
Authentication can request some serious resources. Therefore it is important to determine whether you will need authentication or not and on which scale.
No authentication is the lightest type you can have. It permits anyone to use your service. If you are setting up a news server within a well defined network segment, you can disable authentication on the server itself and enable routing filters and/or firewalls on your network instead which will ensure that only your own users can access the server.
If you cannot deploy routers and firewalls to achieve this but you do have a certain range of IP addresses that your users can have, you can use IP address authentication on the news server. This is a lightweight method and will not impair the performance. This method uses users' IP addresses to determine who is allowed in.
A variant of this is to resolve the users' IP addresses and use patterns of their domains to determine access. This is handy if your IP addresses do not fall into one cluster and naming them all would create a messy configuration. This method is slightly heavier than the previous one because of the DNS lookups.
If you require however that users can connect from any IP address, you cannot use the two methods listed above and need to look at some user name and password based authentication. Here you can consider maintaining a configuration file on your server, adding all the users to its local password file or having it connect to an external database instead. The latter option does not come out of the box with INN but can be installed with just a reasonably small effort, so don't discard it right away.
Either of these authentication methods require some extra performance from your machine. It is extremely difficult to predict just how much CPU time and memory it will need, so be prepared to scale your system if necessary.
The INN Implementation Guide provides some samples of different configurations.
Now that you have determined the scale of your future system as well as gave some thought to the service that you are going to provide, go on with the INN Architecture Guide and select the proper architecture for your system.