From: Jens Thoms Toerring (no email)
Date: Mon Sep 15 2003 - 13:00:53 EDT
Hi,
Sorry for the empty mail, here's the real one...
Since I had a bit of a problem understanding some finer points of
the documentation for aspseek.conf I tried to read the sources.
Unfortunately, the way config.cpp was written made it less than
easy to understand what were going on. So I started to refactor
config.cpp. The result you can find at the end of this email
(sorry it's not a diff but the new file itself, a diff would have
been even longer because nearly all of both the old and the new
file would have ended up in it and the new version alone is already
long enough...).
The most important change is that I took apart the huge if-else
construct where the different keywords are tested for. Instead I
put all keywords into a table, together with pointers to handler
functions. Hopefully, that makes it more readable (and easier to
maintain in case new keywords need to be added or removed).
Unfortunately I had to use two additional global variables (but
with scope restricted to config.cpp), but I hope you don't object
too much.
Of course there's now a huge set of functions, one for each keyword,
plus a few additional helper functions. In the process of setting
them up I also found some inconsistencies and several potential
buffer overruns, which I hopefully managed to get rid of.
I also started to make the syntax check for the configuration file
much more picky - until now wrong arguments often were simply
discarded and the default taken instead. I don't think that this
was a very good idea, because it's against the principle of least
surprise: when a user accidentally mistypes an argument he/she
should be told so instead of having the program silently discard
the user input and work in an unexpected way. So now the arguments
to most keywords are checked (as far as it was possible without
changing code in other files) and on errors a message is printed
and parsing of the configuration file is abandonded.
Another point is that there are now some additional checks to be
able to warn the user when he/she uses keywords that make no sense
with or without Unicode support.
All uses of alloca() are thrown out and replaced by new/delete calls
- I am currently in the process of trying to get aspseek to run on
IRIX and there's no alloca() function. More about this and the
required changes in a different mail...
Finally, while comparing the code with the man page for aspseek.conf
I found some incongruencies:
1. For DBAddr the man pages says
DBAddr DBType:[//[User[:Pass]@]Host[:Port]]/DBName
while in config.cpp we're looking for DBUser, DBPass, DBHost and
DBPort. Fortunately, all of them should be obsolete, but it would
be better to either remove them completely from config.cpp or to
change the man page.
2. In the description for 'Server' one reads:
Add URL as an URL to start indexing from. You can
specify many Server commands, and set the different
options for different sites - see below. Note that
if URL contains path, the whole site will be
indexed nevertheless, so to limit indexing to some
subdirectory of site use Disallow parameter
described below.
This could be interpreted as if 'Disallow' would be a server
specific keyword, which it isn't. As far as I can see excluding
subdirectories will apply to all servers, not just one. I think
that's a major drawback and 'Allow', 'Disallow', etc. should be
made server specific keywords instead of global ones (as should
be several other keywords). Since I probably will need to have
the ability to restrict the search to certain subdirectories for
some of the servers I will have to index implementing it is some-
thing on my todo-list.
3. The keywords
AuthBasic
Alias
NoIndex (but meaning is obvious)
NoFollow (but meaning is obvious)
OnlineGeo
are not documented.
4. The keywords
HTDBList
HTDBDoc
IspellCorrectFactor
IspellIncorrectFactor
NumberFactor
AlnumFactor
MirrorRoot
MirrorHeadersRoot
MirrorPeriod
are recognized by the program but are neither documented nor seem
to do anything useful at the moment.
Can someone tell me what they are supposed to do or if some or all
of them are just left over from some older versions of aspseek and
should be removed?
I hope you like the new version of config.cpp. Please excuse my
idiosyncracies when it comes to indentation - I like to have lots
of white space both vertically and horizontally (perhaps it's a
sign of my age, my eyes aren't getting any better). I also can't
stand lines longer than 80 charcters.
Unfortunately, I am not much of a C++-programmer, feeling much more
at home with C. Thus some of the code may look quite a bit awkward
to real C++-programmers, but there's not too much I can do about it
at the moment, I hope you can cope..
Regards, Jens
--
Freie Universitaet Berlin Jens Thoms Toerring
Universitaetsbibliothek
Webteam Tel: 0049 30 838 56055
Garystrasse 39 Fax: 0049 30 838 53738
14195 Berlin e-mail:
----------8<-------------------------------------------------------
/* Copyright (C) 2000, 2001, 2002 by SWsoft
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
/* $Id: config.cpp,v 1.52 2002/10/08 08:18:12 kir Exp $
Author : Alexander F. Avdonkin
Uses parts of UdmSearch code
*/
#include "aspseek-cfg.h"
#include <stdio.h>
#include <errno.h>
#include <string>
#include <vector>
#include "config.h"
#include "parse.h"
#include "sqldb.h"
#include "defines.h"
#include "charsets.h"
#include "index.h"
#include "filters.h"
#include "paths.h"
#include "misc.h"
#include "logger.h"
#include "stopwords.h"
#include "datetime.h"
#include "geo.h"
#ifdef UNICODE
#include "ucharset.h"
#endif
#include "mimeconv.h"
static int config_trim_line( char *line, char *cur_line, char **start_of_line,
int lino );
static int config_boolean( const char *option, char *line, int lino,
int *param );
static int config_time_val( const char *option, char *line, int lino,
time_t *param );
static int config_Filter( const char *option, char *line, int lino,
int filter_type, int reverse );
static char *config_charp_arg( const char *option, char *line, int lino );
static int config_int_arg( const char *option, char *line, int lino,
int *param );
static int config_DBHost( const char *option, char *line, int lino );
static int config_DBWordDir( const char *option, char *line, int lino );
static int config_DataDir( const char *option, char *line, int lino );
static int config_DBName( const char *option, char *line, int lino );
static int config_DBUser( const char *option, char *line, int lino );
static int config_DBPass( const char *option, char *line, int lino );
static int config_DBPort( const char *option, char *line, int lino );
static int config_DBAddr( const char *option, char *line, int lino );
static int config_DBLibDir( const char *option, char *line, int lino );
static int config_DebugLevel( const char *option, char *line, int lino );
static int config_LocalCharset( const char *option, char *line, int lino );
static int config_DBType( const char *option, char *line, int lino );
static int config_AllowCountries( const char *option, char *line, int lino );
static int config_DisallowNoMatch( const char *option, char *line, int lino );
static int config_Disallow( const char *option, char *line, int lino );
static int config_AllowNoMatch( const char *option, char *line, int lino );
static int config_Allow( const char *option, char *line, int lino );
static int config_CheckOnlyNoMatch( const char *option, char *line, int lino );
static int config_CheckOnly( const char *option, char *line, int lino );
static int config_AddType( const char *option, char *line, int lino );
static int config_StopwordFile( const char *option, char *line, int lino );
static int config_CharsetAlias( const char *option, char *line, int lino );
static int config_CharsetTableU1( const char *option, char *line, int lino );
static int config_CharsetTableU2( const char *option, char *line, int lino );
static int config_Dictionary2( const char *option, char *line, int lino );
static int config_CharsetTable( const char *option, char *line, int lino );
static int config_Charset( const char *option, char *line, int lino );
static int config_Proxy( const char *option, char *line, int lino );
static int config_HTTPHeader( const char *option, char *line, int lino );
static int config_AuthBasic( const char *option, char *line, int lino );
static int config_HTDBList( const char *option, char *line, int lino );
static int config_HTDBDoc( const char *option, char *line, int lino );
static int config_Converter( const char *option, char *line, int lino );
static int config_Alias( const char *option, char *line, int lino );
static int config_Server( const char *option, char *line, int lino );
static int config_MaxBandwidth( const char *option, char *line, int lino );
static int config_FollowOutside( const char *option, char *line, int lino );
static int config_Index( const char *option, char *line, int lino );
static int config_Follow( const char *option, char *line, int lino );
static int config_Robots( const char *option, char *line, int lino );
static int config_DeleteBad( const char *option, char *line, int lino );
static int config_DeleteNoServer( const char *option, char *line, int lino );
static int config_Clones( const char *option, char *line, int lino );
static int config_AddressExpiry( const char *option, char *line, int lino );
static int config_NextDocLimit( const char *option, char *line, int lino );
static int config_WordCacheSize( const char *option, char *line, int lino );
static int config_HrefCacheSize( const char *option, char *line, int lino );
static int config_DeltaBufferSize( const char *option, char *line, int lino );
static int config_UrlBufferSize( const char *option, char *line, int lino );
static int config_Tag( const char *option, char *line, int lino );
static int config_ReadTimeOut( const char *option, char *line, int lino );
static int config_Period( const char *option, char *line, int lino );
static int config_MaxHops( const char *option, char *line, int lino );
static int config_MaxDocsPerServer( const char *option, char *line, int lino );
static int config_IncrementHopsOnRedirect( const char *option,
char *line, int lino );
static int config_RedirectLoopLimit( const char *option, char *line,
int lino );
static int config_MinDelay( const char *option, char *line, int lino );
static int config_IspellCorrectFactor( const char *option, char *line,
int lino );
static int config_IspellIncorrectFactor( const char *option, char *line,
int lino );
static int config_NumberFactor( const char *option, char *line, int lino );
static int config_AlnumFactor( const char *option, char *line, int lino );
static int config_MinWordLength( const char *option, char *line, int lino );
static int config_MaxWordLength( const char *option, char *line, int lino );
static int config_MaxDocSize( const char *option, char *line, int lino );
static int config_MaxDocsAtOnce( const char *option, char *line, int lino );
static int config_NoIndex( const char *option, char *line, int lino );
static int config_NoFollow( const char *option, char *line, int lino );
static int config_OnlineGeo( const char *option, char *line, int lino );
static int config_IncrementalCitations( const char *option, char *line,
int lino );
static int config_CompactStorage( const char *option, char *line, int lino );
static int config_HiByteFirst( const char *option, char *line, int lino );
static int config_UtfStorage( const char *option, char *line, int lino );
static int config_Include( const char *option, char *line, int lino );
static int config_Countries( const char *option, char *line, int lino );
static int config_MirrorRoot( const char *option, char *line, int lino );
static int config_MirrorHeadersRoot( const char *option, char *line,
int lino );
static int config_MirrorPeriod( const char *option, char *line, int lino );
static int config_Replace( const char *option, char *line, int lino );
using std::string;
using std::vector;
char conf_err_str[ STRSIZ ] = "";
static char _user_agent[ STRSIZ ] = "";
static char _extra_headers[ STRSIZ ] = "";
int _max_doc_size = MAXDOCSIZE;
string MirrorRoot,
MirrorHeadersRoot;
string DataDir( DATA_DIR );
string ConfDir( CONF_DIR );
vector<string> dblib_paths;
CBWSchedule bwSchedule;
CMimes Mimes;
ULONG MaxMem = 10000000; // seems to be unused !
ULONG WordCacheSize = 50000;
ULONG HrefCacheSize = 10000;
int IncrementalCitations = 1;
#define BASE64_LEN( len ) ( 4 * ( ( ( len ) + 2 ) / 3 ) + 2 )
static char base64[ ] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
/*
* The following array of structures contains a list of all keywords that can
* be used in the configuration file (usually aspseek.conf) and the address of
* the function to be called when the keyword is found in the configuration
* file. In order to create a new keyword just append a new element to the
* array of structures and it will be included into the handling automatically.
*
* Note: The functions for handling keywords always have to return an integer,
* which must be 0 on success and 1 on error (in which case the function should
* print a message explaining the problem into the 'conf_err_str' char array).
* The functions all receive three arguments, first the name of the keyword,
* a pointer to a char array which contains the complete line with the keyword
* and the arguments (with the pointer pointing to the first character of the
* keyword and the line guaranteed to end with a non-white-space character) and
* an integer with the number of the line in the configuration file.
*/
static struct cfg_fcnts {
const char *option;
int ( * fnct )( const char *, char *, int );
} config_Functions[ ] = {
{ "DBHost", config_DBHost },
{ "DBWordDir", config_DBWordDir },
{ "DataDir", config_DataDir },
{ "DBName", config_DBName },
{ "DBUser", config_DBUser },
{ "DBPass", config_DBPass },
{ "DBPort", config_DBPort },
{ "DBAddr", config_DBAddr },
{ "DBLibDir", config_DBLibDir },
{ "DebugLevel", config_DebugLevel },
{ "LocalCharset", config_LocalCharset },
{ "DBType", config_DBType },
{ "AllowCountries", config_AllowCountries },
{ "DisallowNoMatch", config_DisallowNoMatch },
{ "Disallow", config_Disallow },
{ "AllowNoMatch", config_AllowNoMatch },
{ "Allow", config_Allow },
{ "CheckOnlyNoMatch", config_CheckOnlyNoMatch },
{ "CheckOnly", config_CheckOnly },
{ "AddType", config_AddType },
{ "StopwordFile", config_StopwordFile },
{ "Charset", config_Charset },
{ "CharsetAlias", config_CharsetAlias },
{ "CharsetTableU1", config_CharsetTableU1 },
{ "CharsetTableU2", config_CharsetTableU2 },
{ "Dictionary2", config_Dictionary2 },
{ "CharsetTable", config_CharsetTable },
{ "Charset", config_Charset },
{ "Proxy", config_Proxy },
{ "HTTPHeader", config_HTTPHeader },
{ "AuthBasic", config_AuthBasic },
{ "HTDBList", config_HTDBList },
{ "HTDBDoc", config_HTDBDoc },
{ "Converter", config_Converter },
{ "Alias", config_Alias },
{ "Server", config_Server },
{ "MaxBandwidth", config_MaxBandwidth },
{ "FollowOutside", config_FollowOutside },
{ "Index", config_Index },
{ "Follow", config_Follow },
{ "Robots", config_Robots },
{ "DeleteBad", config_DeleteBad },
{ "DeleteNoServer", config_DeleteNoServer },
{ "Clones", config_Clones },
{ "AddressExpiry", config_AddressExpiry },
{ "NextDocLimit", config_NextDocLimit },
{ "WordCacheSize", config_WordCacheSize },
{ "HrefCacheSize", config_HrefCacheSize },
{ "DeltaBufferSize", config_DeltaBufferSize },
{ "UrlBufferSize", config_UrlBufferSize },
{ "Tag", config_Tag },
{ "ReadTimeOut", config_ReadTimeOut },
{ "Period", config_Period },
{ "MaxHops", config_MaxHops },
{ "MaxDocsPerServer", config_MaxDocsPerServer },
{ "IncrementHopsOnRedirect", config_IncrementHopsOnRedirect },
{ "RedirectLoopLimit", config_RedirectLoopLimit },
{ "MinDelay", config_MinDelay },
{ "IspellCorrectFactor", config_IspellCorrectFactor },
{ "IspellIncorrectFactor", config_IspellIncorrectFactor },
{ "NumberFactor", config_NumberFactor },
{ "AlnumFactor", config_AlnumFactor },
{ "MinWordLength", config_MinWordLength },
{ "MaxWordLength", config_MaxWordLength },
{ "MaxDocSize", config_MaxDocSize },
{ "MaxDocsAtOnce", config_MaxDocsAtOnce },
{ "NoIndex", config_NoIndex },
{ "NoFollow", config_NoFollow },
{ "OnlineGeo", config_OnlineGeo },
{ "IncrementalCitations", config_IncrementalCitations },
{ "CompactStorage", config_CompactStorage },
{ "HiByteFirst", config_HiByteFirst },
{ "UtfStorage", config_UtfStorage },
{ "Include", config_Include },
{ "Countries", config_Countries },
{ "MirrorRoot", config_MirrorRoot },
{ "MirrorHeadersRoot", config_MirrorHeadersRoot },
{ "MirrorPeriod", config_MirrorPeriod },
{ "Replace", config_Replace }
};
static const size_t num_config_keywords = sizeof config_Functions /
sizeof config_Functions[ 0 ];
/*--------------------------------------------------------------------------*
*--------------------------------------------------------------------------*/
static void base64_encode( const char *s, char *store, int length )
{
int i;
unsigned char *p = ( unsigned char * ) store;
for ( i = 0; i < length; s += 3, i += 3 )
{
*p++ = base64[ s[ 0 ] >> 2 ];
*p++ = base64[ ( ( s[ 0 ] & 3 ) << 4 ) + ( s[ 1 ] >> 4 ) ];
*p++ = base64[ ( ( s[ 1 ] & 0xf ) << 2 ) + ( s[ 2 ] >> 6 ) ];
*p++ = base64[ s[ 2 ] & 0x3f ];
}
// Pad the result
if ( i == length + 1 )
*( p - 1 ) = '=';
else if ( i == length + 2 )
*( p - 1 ) = *( p - 2 ) = '=';
*p = '\0';
}
/*--------------------------------------------------------------------------*
*--------------------------------------------------------------------------*/
char* UserAgent( )
{
return _user_agent;
}
/*--------------------------------------------------------------------------*
*--------------------------------------------------------------------------*/
char* ExtraHeaders( )
{
return _extra_headers;
}
/*--------------------------------------------------------------------------*
*--------------------------------------------------------------------------*/
int AddType( char *mime_type, char *reg, char *errstr )
{
CMime* m = new CMime;
m->SetType( reg, mime_type );
if ( m->m_mime_type.size( ) > 0 )
{
Mimes.push_back( m );
return 0;
}
else
{
delete m;
return 1;
}
}
/*--------------------------------------------------------------------------*
*--------------------------------------------------------------------------*/
void CServer::AddReplacement( char* str )
{
char find[ STRSIZ ] = "",
replace[ STRSIZ ] = "";
int n = sscanf( str, "%s%s", find, replace );
if ( n > 0 &&
( strlen( find ) > MAX_URL_LEN || strlen( replace ) > MAX_URL_LEN ) )
{
sprintf( conf_err_str, "Error: in config file: URL is too long "
"for Replace\n" );
m_replace = 0;
return;
}
switch ( n )
{
case 0 : case -1 :
m_replace = 0;
break;
case 1 : case 2 :
CReplacement* repl = new CReplacement;
if ( repl->SetFindReplace( find, replace ) )
{
delete repl;
}
else
{
if ( m_replace == 0 )
m_replace = new CReplaceVec;
m_replace->push_back( repl );
}
break;
}
}
/*--------------------------------------------------------------------*
* Main function for parsing the configuration file
*--------------------------------------------------------------------*/
CServer csrv; // the server server-specific arguments are applied to
// These local variables (scope is restricted to this file) are required for
// a few handler functions that need additional arguments beside keyword,
// line and line number
static string localcharset;
static int local_load_flags;
static int local_config_level;
int LoadConfig( char *conf_name, int config_level, int load_flags )
{
int line_number = 0;
FILE *config;
char line[ STRSIZ ] = "";
char cur_line[ STRSIZ ];
char *start;
local_config_level = config_level;
if ( config_level == 0 ) // Do some initialization
{
sprintf( _user_agent, "%s/%s", USER_AGENT, VERSION );
_extra_headers[ 0 ] = 0;
_max_doc_size = MAXDOCSIZE;
DBPort = 0;
SetDefaultCharset( CHARSET_USASCII );
local_load_flags = load_flags;
}
string config_file_name;
// check if the path is absolute
if ( isAbsolutePath( conf_name ) )
config_file_name = conf_name;
else
config_file_name = ConfDir + "/" + conf_name;
// Open config
if ( ! ( config = fopen( config_file_name.c_str( ), "r" ) ) )
{
sprintf( conf_err_str, "Error: can't open config file '%s': %s",
config_file_name.c_str( ), strerror( errno ) );
local_config_level--;
return 1;
}
logger.log( CAT_FILE, L_INFO, "Loading configuration from %s\n",
config_file_name.c_str( ) );
// Read lines and parse
while ( fgets( cur_line, sizeof cur_line, config ) )
{
switch ( config_trim_line( line, cur_line, &start, ++line_number ) )
{
case -1 :
fclose( config );
local_config_level--;
return 1;
case 1 : // line ended with a backslash, was
continue; // empty or a comment
}
// Now that we have a full line evaluate it (we could get a bit faster
// by sorting the array of keyword/function structures and then do a
// binary search, but since this is going to speed it up by not more
// then a few milliseconds only it's probably not worth the hassle)
size_t i;
for ( i = 0; i < num_config_keywords; i++ )
if ( ! STRNCASECMP( start, config_Functions[ i ].option ) )
{
if ( config_Functions[ i ].fnct( config_Functions[ i ].option,
start, line_number ) )
{
fclose( config );
local_config_level--;
return 1;
}
break;
}
if ( i == num_config_keywords ) // unknown option ?
{
sprintf( conf_err_str, "Unknown keyword in config file at line "
"%d: %s\n", line_number, start );
fclose( config );
local_config_level--;
return 1;
}
*line = '\0';
}
fclose( config );
// Test that we weren't in a continued line on end of file
if ( *line )
{
sprintf( conf_err_str, "Error: in config file: Premature end of "
"file\n" );
local_config_level--;
return 1;
}
#ifndef UNICODE
if ( ! GetDefaultCharset( ) )
{
SetDefaultCharset( GetCharset(localcharset.c_str( ) ) );
logger.log(CAT_ALL, L_DEBUG, "Set localcharset to [%s]\n",
localcharset.c_str( ) );
}
#endif // UNICODE
if ( DBWordDir.empty( ) )
DBWordDir = DataDir + "/" + DBName;
// On level0 : Free some variables, prepare others, etc
if ( config_level == 0 )
{
// Add one virtual server if we want FollowOutside
// or DeleteNoServer no
if ( csrv.m_outside || ! csrv.m_delete_no_server )
{
csrv.m_url = "";
AddServer( csrv );
}
else
csrv.m_url = "";
if ( UrlBufferSize == 0 )
UrlBufferSize = DeltaBufferSize << 3;
#ifdef UNICODE
FixLangs( );
#endif
if ( *conf_err_str )
logger.log( CAT_ALL, L_WARN, "Warnings loading config: %s\n",
conf_err_str );
}
local_config_level--;
return 0;
}
/*------------------------------------------------------------------------*
* Returns -1 on error, 1 if the line is either empty or is a comment line
* or if it ends with a backslash, and 0 when we got a complete line ready
* for parsing.
*------------------------------------------------------------------------*/
static int config_trim_line( char *line, char *cur_line, char **start_of_line,
int lino )
{
char *start, *end;
// Find first non-white-space character in current line
for ( start = cur_line; *start && isspace( *start ); start++ )
/* empty */ ;
// Remove white-space from end of current line
for ( end = start + strlen( start ) - 1;
end >= start && isspace( *end ); end-- )
/* empty */ ;
*( end + 1 ) = '\0';
// Return when line is empty or a comment line (but make sure we get
// handling of continuation lines right)
if ( *start == '\0' )
return *line ? 0 : 1;
if ( *start == '#' )
{
if ( *line != '\0' )
{
sprintf( conf_err_str, "Error: in config file at line %d: "
"Comment within continued line\n", lino );
return -1;
}
return 1;
}
// Make sure we have enough space left before we try to append the
// current line to the whole line
if ( strlen( line ) + strlen( start ) + 3 > STRSIZ )
{
sprintf( conf_err_str, "Error: in config file at line %d: Line is "
"too long\n", lino );
return -1;
}
// If this is a new line (not a continuation line) pass back the pointer
// to the first non-white-space character
if ( ! *line )
*start_of_line = start;
// Append the current line to the whole line (make sure to also insert
// a space when the current line is a continuation)
if ( *line )
strcat( line, " " );
strcat( line, start );
// Test if the line ends in a backslash, in which case we need to read the
// next line
end = start + strlen( start ) - 1;
if ( *end == '\\' )
{
// Remove backslash and all white-space before the backslash
for ( --end; isspace( *end ); end-- )
/* empty */ ;
*++end = '\0';
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* obsolete keyword
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_DBHost( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
DBHost = line;
return 0;
}
/*--------------------------------------------------------------------------*
* obsolete keyword
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_DBWordDir( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
DBWordDir = line;
return 0;
}
/*--------------------------------------------------------------------------*
* DataDir /some/dir
* Sets directory in which delta files and files with
* information about words, subsets, spaces will be
* stored. Default is @localstatedir at dot
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_DataDir( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
DataDir = line;
return 0;
}
/*--------------------------------------------------------------------------*
* obsolete keyword
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_DBName( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
DBName = line;
return 0;
}
/*--------------------------------------------------------------------------*
* obsolete keyword
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_DBUser( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
DBUser = line;
return 0;
}
/*--------------------------------------------------------------------------*
* obsolete keyword
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_DBPass( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
DBPass = line;
return 0;
}
/*--------------------------------------------------------------------------*
* obsolete keyword
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_DBPort( const char *option, char *line, int lino )
{
return config_int_arg( option, line, lino, &DBPort );
}
/*--------------------------------------------------------------------------*
* DBAddr DBType:[//[User[:Pass]@]Host[:Port]]/DBName/
* Defines SQL server connection parameters.
* DBType is SQL server type, it can be mysql or ora-
* cle8 for now.
* User is a SQL server's user to connect as.
* Pass is a User's password. If this field is omit-
* ted, no password is used.
* Host is a host name or IP address of host to con-
* nect to. If you are running SQL server on the same
* machine, use localhost.
* Port is a port number on which SQL server is lis-
* tening at. Default is the same as default port of
* used SQL server.
* DBName is a name of the database used.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_DBAddr( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
ParseDBAddr( line );
return 0;
}
/*--------------------------------------------------------------------------*
* DBLibDir /some/dir
* Adds /some/dir to list of directories to search for
* database backend library (libdbname-version.so).
* Default library search path is @libdir at dot Several
* such options can be used, each adding one more
* directory to the list. Last added directory is used
* first; compiled in path is last.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_DBLibDir( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
dblib_paths.push_back( string( line ) );
return 0;
}
/*--------------------------------------------------------------------------*
* DebugLevel none | error | warning | info | debug
* Sets the level of debugging. If set to none, noth-
* ing will be logged. If set to debug, you will get a
* bunch of messages. Default value is info.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_DebugLevel( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
logger.setloglevel( line );
if ( logger.getLevel( ) == L_NONE && STRNCASECMP( line, "none" ) )
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"argument for %s\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* LocalCharset charset
* Sets the local charset for ASPseek, so all data in
* the database is assumed to be in that charset.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_LocalCharset( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
localcharset = line;
return 0;
}
/*--------------------------------------------------------------------------*
* obsolete keyword
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_DBType( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
DBType = line;
return 0;
}
/*--------------------------------------------------------------------------*
* AllowCountries cc1 [cc2...]
* Specifies to index only sites from countries speci-
* fied by cc1, cc2, etc. Should be used together with
* the Countries.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_AllowCountries( const char *option, char *line, int lino )
{
AddCountries( line );
return 0;
}
/*--------------------------------------------------------------------------*
* DisallowNoMatch regexp [regexp...]
* Disallows to index URLs not matching regexp.
* Function returns 0 on success and 1 on error
--------------------------------------------------------------------------*/
static int config_DisallowNoMatch( const char *option, char *line, int lino )
{
return config_Filter( option, line, lino, DISALLOW, 1 );
}
/*--------------------------------------------------------------------------*
* Disallow regexp [regexp...]
* Do not allow to index URLs matching regexp.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Disallow( const char *option, char *line, int lino )
{
return config_Filter( option, line, lino, DISALLOW, 0 );
}
/*--------------------------------------------------------------------------*
* AllowNoMatch regexp [regexp...]
* Allows to index URLs not matching regexp.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_AllowNoMatch( const char *option, char *line, int lino )
{
return config_Filter( option, line, lino, ALLOW, 1 );
}
/*--------------------------------------------------------------------------*
* Allow regexp [regexp...]
* Allows to index URLs matching regexp.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Allow( const char *option, char *line, int lino )
{
return config_Filter( option, line, lino, ALLOW, 0 );
}
/*--------------------------------------------------------------------------*
* CheckOnlyNoMatch regexp [regexp...]
* Use HEAD request instead of GET for URLs not match-
* ing regexp. So, such URLs will not be downloaded,
* just information about them will be stored in url-
* word table.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_CheckOnlyNoMatch( const char *option, char *line, int lino )
{
return config_Filter( option, line, lino, HEAD, 1 );
}
/*--------------------------------------------------------------------------*
* CheckOnly regexp [regexp...]
* Use HEAD request instead of GET for URLs matching
* regexp. So, such URLs will not be downloaded, just
* information about them will be stored in urlword
* table.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_CheckOnly( const char *option, char *line, int lino )
{
return config_Filter( option, line, lino, HEAD, 0 );
}
/*--------------------------------------------------------------------------*
* undocumented keyword
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_AddType( const char *option, char *line, int lino )
{
char *s,
*s1,
*lt;
if ( ( s1 = GetToken( line + strlen( option ), " \t", < ) ) )
while ( ( s = GetToken( 0, " \t", < ) ) )
if ( AddType( s1, s, conf_err_str ) )
sprintf( conf_err_str, "Problem in config file at line %d: "
"Can't add %s for MIME type %s\n", lino, s, s1 );
else
{
sprintf( conf_err_str, "Error: in config file at line %d: Missing "
"argument(s) for %s\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* StopwordFile lang file [charset]
* Loads stopwords for language lang from file. If
* charset is not specified, file contents is assumed
* to be in LocalCharset, otherwise it is in charset.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_StopwordFile( const char *option, char *line, int lino )
{
int len = strlen( line ) - strlen( option ) + 1;
char lang[ 3 ];
char *file = new char[ len ];
char* encoding = new char[ len ];
int n = sscanf( line + strlen( option ), "%2s%s%s", lang,
file, encoding );
#ifdef UNICODE
if ( n >= 2 )
{
if ( Stopwords.Load( file, lang,
n == 3 ? encoding : localcharset.c_str( ) ) < 0 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Can't "
"load stopword file '%s'\n", lino, file );
delete [ ] file;
delete [ ] encoding;
return 1;
}
}
#else
if ( n >= 2 )
{
if ( Stopwords.Load( file, lang ) < 0 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Can't "
"load stopword file '%s'\n", lino, file );
delete [ ] file;
delete [ ] encoding;
return 1;
}
if ( n == 3 )
sprintf( conf_err_str, "Warning: in config file at line %d: "
"Option %s doesn't accept charset argument without "
"Unicode support\n", lino, file );
}
#endif
else
{
sprintf( conf_err_str, "Error: in config file at line %d: Missing "
"argument(s) for %s\n", lino, option );
delete [ ] file;
delete [ ] encoding;
return 1;
}
delete [ ] file;
delete [ ] encoding;
return 0;
}
/*--------------------------------------------------------------------------*
* CharsetAlias charset alias1 [alias2...]
* Defines alias1, alias2, ... as aliases (alternative
* names) for charset. This is needed because in many
* cases there is no "one true name" for the charset -
* different web servers and page authors use differ-
* ent names.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_CharsetAlias( const char *option, char *line, int lino )
{
int len = strlen( line ) - strlen( option ) + 1;
char *name = new char[ len ];
char *aliases = new char[ len ];
if ( sscanf( line + strlen( option ), "%s%s",
name, aliases ) == 2 )
AddAlias( name, aliases );
else
{
sprintf( conf_err_str, "Error: in config file at line %d: Missing "
"argument(s) for %s\n", lino, option );
delete [ ] name;
delete [ ] aliases;
return 1;
}
delete [ ] name;
delete [ ] aliases;
return 0;
}
/*--------------------------------------------------------------------------*
* CharsetTableU1 charset lang file [lmfile]
* Loads the Unicode mapping for charset of language
* lang from file. Optionally load langmap file
* lmfile, which is used for charset guesser.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_CharsetTableU1( const char *option, char *line, int lino )
{
#ifdef UNICODE
int len = strlen( line ) - strlen( option ) + 1;
char *name = new char[ len ];
char *dir = new char[ len ];
char *lang = new char[ len ];
char *lmdir = new char[ len ];
int param,
charset_id = 0;
param = sscanf( line + strlen( option ), "%s%s%s%s", name, lang,
dir, lmdir );
if ( param >= 3 &&
( charset_id = LoadCharsetU1( lang, name, dir ) ) == -1 )
{
logger.log( CAT_FILE, L_WARN, "Charset %s has not been loaded\n",
name);
delete [ ] name;
delete [ ] dir;
delete [ ] lang;
delete [ ] lmdir;
return 1;
}
#ifdef USE_CHARSET_GUESSER
if ( param == 4 && charset_id > 0 )
langs.AddLang( charset_id, lmdir );
#endif
if ( param < 3 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Missing "
"argument(s) for %s\n", lino, option );
delete [ ] name;
delete [ ] dir;
delete [ ] lang;
delete [ ] lmdir;
return 1;
}
delete [ ] name;
delete [ ] dir;
delete [ ] lang;
delete [ ] lmdir;
return 0;
#else
sprintf( conf_err_str, "Warning: in config file at line %d: Invalid "
"keyword %s, Unicode support not compiled into aspseek\n",
lino, option );
return 0;
#endif
}
/*--------------------------------------------------------------------------*
* CharsetTableU2 charset lang file [lmfile]
* Loads the Unicode mapping for multibyte charset of
* language lang from file. Optionally load langmap
* file lmfile, which is used for charset guesser.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_CharsetTableU2( const char *option, char *line, int lino )
{
#ifdef UNICODE
int len = strlen( line ) - strlen( option ) + 1;
char *name = new char[ len ];
char *dir = new char[ len ];
char *lang = new char[ len ];
char *lmdir = new char[ len ];
int param,
charset_id = 0;
param = sscanf( line + strlen( option ), "%s%s%s%s", name, lang,
dir, lmdir);
if ( param >= 3 &&
( charset_id = LoadCharsetU2V( lang, name, dir ) ) == -1 )
{
logger.log( CAT_FILE, L_WARN, "Charset %s has not been loaded\n",
name );
delete [ ] name;
delete [ ] dir;
delete [ ] lang;
delete [ ] lmdir;
return 1;
}
#ifdef USE_CHARSET_GUESSER
if ( param == 4 && charset_id > 0 )
langs.AddLang( charset_id, lmdir );
#endif
if ( param < 3 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Missing "
"argument(s) for %s\n", lino, option );
delete [ ] name;
delete [ ] dir;
delete [ ] lang;
delete [ ] lmdir;
return 1;
}
delete [ ] name;
delete [ ] dir;
delete [ ] lang;
delete [ ] lmdir;
return 0;
#else
sprintf( conf_err_str, "Warning: in config file at line %d: Invalid "
"keyword %s, Unicode support not compiled into aspseek\n",
lino, option );
return 0;
#endif
}
/*--------------------------------------------------------------------------*
* Dictionary2 lang file [charset]
* Loads dictionary for lang from file. If charset is
* not specified, it is assumed that the file is in
* Unicode. Dictionary is used for tokenizing of text
* in Chinese, Japanese and Korean languages.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Dictionary2( const char *option, char *line, int lino )
{
#ifdef UNICODE
int len = strlen( line ) - strlen( option ) + 1;
char *dir = new char[ len ];
char *lang = new char[ len ];
char *charset = new char[ len ];
int param;
if ( ( param = sscanf( line + strlen( option ), "%s%s%s", lang,
dir, charset ) ) >= 2 )
LoadDictionary2( lang, dir, param > 2 ? charset : 0 );
else
{
sprintf( conf_err_str, "Error: in config file at line %d: Missing "
"argument(s) for %s\n", lino, option );
delete [ ] dir;
delete [ ] lang;
delete [ ] charset;
return 1;
}
delete [ ] dir;
delete [ ] lang;
delete [ ] charset;
return 0;
#else
sprintf( conf_err_str, "Warning: in config file at line %d: Invalid "
"keyword %s, Unicode support not compiled into aspseek\n",
lino, option );
return 0;
#endif
}
/*--------------------------------------------------------------------------*
* CharsetTable charset lang file [lmfile]
* Loads the table for charset of language lang from
* file. Optionally load langmap file lmfile, which
* is used for charset guesser.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_CharsetTable( const char *option, char *line, int lino )
{
#ifndef UNICODE
int len = strlen( line ) - strlen( option ) + 1;
char *name = new char[ len ];
char *dir = new char[ len ];
char *lang = new char[ len ];
char *lmdir = new char[ len ];
int charset_id = 0;
int param = sscanf( line + strlen( option ), "%s%s%s%s", name, lang,
dir, lmdir );
if ( param >= 3 &&
( charset_id = LoadCharset( lang, name, dir ) ) == -1 )
{
logger.log(CAT_FILE, L_WARN, "Charset %s has not been loaded\n",
name);
delete [ ] name;
delete [ ] dir;
delete [ ] lang;
delete [ ] lmdir;
return 1;
}
#ifdef USE_CHARSET_GUESSER
if ( param == 4 && charset_id > 0 )
langs.AddLang( charset_id, lmdir );
#endif
if ( param < 3 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Missing "
"argument(s) for %s\n", lino, option );
delete [ ] name;
delete [ ] dir;
delete [ ] lang;
delete [ ] lmdir;
return 1;
}
delete [ ] name;
delete [ ] dir;
delete [ ] lang;
delete [ ] lmdir;
return 0;
#else
sprintf( conf_err_str, "Warning: in config file at line %d: Option %s is "
"superfluous with Unicode support\n", lino, option );
return 0;
#endif
}
/*--------------------------------------------------------------------------*
* Charset charset
* Usable to set charset for the servers that do not
* return it. Argument should be known charset name
* (see below for charset configuration). Alterna-
* tively, you can use charset guesser feature of
* index(1).
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Charset( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
csrv.m_charset = line;
return 0;
}
/*--------------------------------------------------------------------------*
* Proxy [host.com[:port]]
* Use proxy rather than direct connection. You can
* also index FTP sites via proxy. If port is not
* specified, default is 3128 (sqiud). Proxy without
* arguments disables proxy.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Proxy( const char *option, char *line, int lino )
{
long val;
char *s,
*ep;
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
if ( ( s = strrchr( line, ':' ) ) != 0 )
*s++ = '\0';
csrv.m_proxy = line;
if ( s == 0 )
return 0;
if ( *s == '\0' )
{
sprintf( conf_err_str, "Error: in config file at line %d: Missing "
"port number after ':' for %s\n", lino, option );
return 1;
}
val = strtol( s, &ep, 10 );
// Check if we got a number
if ( s == ep )
{
sprintf( conf_err_str, "Error: in config file at line %d: Argument is "
"not a number for %s\n", lino, option );
return 1;
}
// Check if there was junk after the number
if ( ep && *ep != 0 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Junk after "
"argument for %s\n", lino, option );
return 1;
}
// Check that the value is a valid (non-privileged) port number
if ( errno == ERANGE || val < 1 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Argument to "
"%s is not a valid port number\n", lino, option );
return 1;
}
if ( val > 65535 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"port number, must be below 65536\n",
lino );
return 1;
}
csrv.m_proxy_port = ( int ) val;
return 0;
}
/*--------------------------------------------------------------------------*
* HTTPHeader header
* Add header to headers that index(1) sends in HTTP
* request. You should not use If-Modified-Since or
* Accept-Charset headers here, as index(1) sends it
* anyway. Header User-Agent: aspseek/@VERSION@ is
* sent too, but you may override it.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_HTTPHeader( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
if ( ! STRNCMP( line, "User-Agent: " ) )
strncpy( _user_agent, line + strlen( "User-Agent: " ),
sizeof _user_agent - 1 );
else
{
strcat( _extra_headers, line );
strcat( _extra_headers, "\r\n" );
}
return 0;
}
/*--------------------------------------------------------------------------*
* undocumented keyword
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_AuthBasic( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
char* s = new char[ BASE64_LEN( strlen( line ) ) + 1 ];
base64_encode( line, s, strlen( line ) );
csrv.m_basic_auth = s;
delete [ ] s;
return 0;
}
/*--------------------------------------------------------------------------*
* undocumented keyword, doesn't seem to be used
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_HTDBList( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
csrv.m_htdb_list = line;
return 0;
}
/*--------------------------------------------------------------------------*
* undocumented keyword, doesn't seem to be used
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_HTDBDoc( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
csrv.m_htdb_doc = line;
return 0;
}
/*--------------------------------------------------------------------------*
* Converter from/type to/type[; charset=cset] command line
* Specifies that for converting documents with MIME-
* type from/type to MIME-type to/type the command
* specified by command line will be used. Argument
* from/type can be any type returned by Web server.
* Argument to/type can be either text/plain or
* text/html. If you add ;charset=cset string after
* to/type, index will know that resulting document
* has a charset cset, otherwise it is assumed to be
* us-ascii.
*
* In the command line you usually specify program or
* script to run, together with its options. Program
* is expected to to read from stdin and write the
* converted document to stdout.
*
* If your program can't deal with stdin/stdout
* streams, you should use $in and $out strings in
* command line, and they will be substituted with two
* file names in /tmp directory. index(1) will create
* files with unique names, write the document down-
* loaded to the first file (referenced as $in), run
* the /bin/prog, read the second file (referenced as
* $out) into memory, and then delete both files.
*
* You can also use $url in command line, it will be
* substituted with the actual URL of downloaded docu-
* ment. You can use it in your own scripts to distin-
* guish between a different document variations, or
* to be able to write one script for many different
* MIME-types.
*
* Please note that index(1) relies on a Content-Type
* header returned by a Web server. Some Web-servers
* are misconfigured and give wrong info (for example,
* return header Content-Type: audio/x-pn-realaudio-
* plugin for .rpm files).
*
* Examples:
* Converter app/ps text/plain; charset=iso8859-1 ps2ascii
* # ps2ascii can't deal with PDF files from stdin
* Converter application/pdf text/plain ps2ascii $in $out
*
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Converter( const char *option, char *line, int lino )
{
#ifndef USE_EXT_CONV
sprintf( conf_err_str, "Warning: in config file at line %d: aspseek was "
"not compiled with support for external converters.\n", lino );
return 0;
#else
// Arguments are: from/type to/type[;charset=some] command
// Example:
// application/msword text/plain;charset=windows-1251 catdoc -a $in
char* from;
char* to;
char* charset;
char* cmd;
char* lt;
// parse the args
from = GetToken( line + strlen( option ), " \t", < );
to = GetToken( 0, "; \t", < );
cmd = GetToken( 0, "\r\n", < );
if ( ! cmd )
{
sprintf( conf_err_str, "Error: in config file at line %d: Too few "
"arguments for %s\n", lino, option );
return 1;
}
else
{
if ( ( charset = strstr( cmd, "charset=" ) ) != 0 )
{
charset += 8;
cmd = strchr( charset + 1, ' ' );
if ( *cmd != '\0' )
{
*cmd = '\0';
cmd++;
while ( *cmd == ' ' || *cmd == '\t' )
cmd++;
}
else
cmd = 0;
}
else
charset = 0;
if ( ! cmd )
{
sprintf( conf_err_str, "Error: in config file at line %d: Too few "
"arguments for %s\n", lino, option );
return 1;
}
else
converters[ from ] = new CExtConv( from, to, charset, cmd );
}
return 0;
#endif
}
/*--------------------------------------------------------------------------*
* undocumented keyword, does nothing (but there's an AddAliads() function
* in charsets.cpp which could be called...)
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Alias( const char *option, char *line, int lino )
{
char buf1[ STRSIZ ];
char buf2[ STRSIZ ];
if ( sscanf( line + strlen( option ), "%s%s", buf1, buf2 ) == 2 )
{
// AddAlias( buf1, buf2 );
}
else
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"argument for %s\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* Server URL
* Add URL as an URL to start indexing from. You can
* specify many Server commands, and set the different
* options for different sites - see below. Note that
* if URL contains path, the whole site will be
* indexed nevertheless, so to limit indexing to some
* subdirectory of site use Disallow parameter
* described below.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Server( const char *option, char *line, int lino )
{
CUrl from;
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
if ( from.ParseURL( line ) )
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"URL: %s", lino, line );
return 1;
}
csrv.m_url = line;
AddServer( csrv );
if ( local_load_flags & FLAG_ADD_SERV )
{
string srv = from.m_schema + string( "://" )
+ from.m_hostinfo + string( "/" );
if ( ! strcasecmp( from.m_schema, "http" ) &&
( local_load_flags & FLAG_ADD_SERV ) && csrv.m_userobots )
{
char *robots = new char[ strlen( from.m_schema )
+ strlen( from.m_hostinfo ) + 15 ];
sprintf( robots, "%s://%s/robots.txt",
from.m_schema, from.m_hostinfo );
AddHref( robots, 0, 0, srv.c_str( ), 1 );
delete [ ] robots;
}
if ( local_load_flags & FLAG_ADD_SERV )
AddHref( line, 0, 0, srv.c_str( ), 0 );
}
return 0;
}
/*--------------------------------------------------------------------------*
* MaxBandwidth bytes [starttime [endtime]]
* Sets maximum used bandwidth for incoming traffic to
* bytes per second for the specified period of time
* of day. Arguments starttime and endtime are in sec-
* onds from midnight (0:00). If endtime is omitted,
* then it is implied to be the end of the day
* (86400). If both starttime and endtime are omitted,
* then the limit is for the whole day. You can use
* several MaxBandwidth commands. Note that if end-
* time is less than starttime, index(1) will handle
* it correctly, setting two intervals from starttime
* to midnight and from midnight to endtime. By
* default bandwidth is not limited.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_MaxBandwidth( const char *option, char *line, int lino )
{
long int bandwidth;
int start, finish = 0;
switch ( sscanf( line + strlen( option ), "%li%i%i",
&bandwidth, &start, &finish ) )
{
case 1 :
bwSchedule.m_defaultBandwidth = bandwidth;
break;
case 2 : case 3 :
bwSchedule.AddInterval( start, finish, bandwidth );
break;
default :
sprintf( conf_err_str, "Error: in config file at line %d: Missing "
"argument for %s\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* FollowOutside yes | no
* Sets whether index(1) should index outside sites
* defined in Server directives. Default is no. If you
* set it to yes, be sure to limit the scope of index-
* ing in some other way (for example, with MaxHops).
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_FollowOutside( const char *option, char *line, int lino )
{
return config_boolean( option, line, lino, &csrv.m_outside );
}
/*--------------------------------------------------------------------------*
* Index yes | no
* Sets whether to store words into database. Default
* value is yes.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Index( const char *option, char *line, int lino )
{
return config_boolean( option, line, lino, &csrv.m_gindex );
}
/*--------------------------------------------------------------------------*
* Follow yes | no
* Sets whether to store links found into database.
* Default value is yes.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Follow( const char *option, char *line, int lino )
{
return config_boolean( option, line, lino, &csrv.m_gfollow );
}
/*--------------------------------------------------------------------------*
* Robots yes | no
* Sets whether the robot exclusion standard
* (robots.txt file and META NAME="robots") will be
* honored. Default is yes.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Robots( const char *option, char *line, int lino )
{
return config_boolean( option, line, lino, &csrv.m_userobots );
}
/*--------------------------------------------------------------------------*
* DeleteBad yes | no
* Sets whether to delete bad (not found, forbidden
* etc.) URLs from the database. Default value is no.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_DeleteBad( const char *option, char *line, int lino )
{
return config_boolean( option, line, lino, &csrv.m_deletebad );
}
/*--------------------------------------------------------------------------*
* DeleteNoServer yes | no
* Sets whether to delete URLs which have no corre-
* spondent "Server" commands. Default value is yes.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_DeleteNoServer( const char *option, char *line, int lino )
{
return config_boolean( option, line, lino, &csrv.m_delete_no_server );
}
/*--------------------------------------------------------------------------*
* Clones yes | no
* Sets whether to enable clones eliminating. Clone is
* a document which is absolutely the same as another
* document. If this set to yes, clone is not
* parsed/stored in the database, instead word infor-
* mation for original document is used. Default value
* is yes.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Clones( const char *option, char *line, int lino )
{
return config_boolean( option, line, lino, &csrv.m_use_clones );
}
/*--------------------------------------------------------------------------*
* AddressExpiry time
* Sets expiration time for "DNS name -> IP" entry in
* address cache. After entry is expired, resolver
* will make DNS lookup again. Argument time can be
* set in seconds, or the same way as in Period com-
* mand below. Default value is 1 hour.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_AddressExpiry( const char *option, char *line, int lino )
{
return config_time_val( option, line, lino, &AddressExpiry );
}
/*--------------------------------------------------------------------------*
* NextDocLimit number
* Maximum number of URLs loaded from database at each
* request. Default value is 1000.
*
* This option is used only if URLs to be indexed are
* ordered by next index time; otherwise, if -o option
* to index(1) is used, all URLs for current hop is
* taken at once.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_NextDocLimit( const char *option, char *line, int lino )
{
#if ( SIZEOF_LONG_INT == 4 )
if ( sscanf( line + strlen( option ), "%lu", &NextDocLimit ) != 1 )
#elif ( SIZEOF_INT == 4 )
if ( sscanf( line + strlen( option ), "%u", &NextDocLimit ) != 1 )
#endif
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"argument for %s\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* WordCacheSize number
* Maximum word count in the word cache. Word cache is
* used to reduce database load for converting word to
* its word ID. Default value is 50000.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_WordCacheSize( const char *option, char *line, int lino )
{
#if ( SIZEOF_LONG_INT == 4 )
if ( sscanf( line + strlen( option ), "%lu", &WordCacheSize ) != 1 )
#elif ( SIZEOF_INT == 4 )
if ( sscanf( line + strlen( option ), "%u", &WordCacheSize ) != 1 )
#endif
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"argument for %s\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* HrefCacheSize number
* Maximum URL count in the href cache. Href cache is
* used to reduce database load for converting URL of
* outgoing hyperlink to its URL ID. Default value is
* 10000.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_HrefCacheSize( const char *option, char *line, int lino )
{
#if ( SIZEOF_LONG_INT == 4 )
if ( sscanf( line + strlen( option ), "%lu",
#elif ( SIZEOF_INT == 4 )
if ( sscanf( line + strlen( option ), "%u",
#endif
&StoredHrefs.m_maxSize ) != 1 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"argument for %s\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* DeltaBufferSize kilobytes
* Size of buffer for each of 100 delta files, in
* kilobytes. Setting of low value for this parameter
* can result in big fragmentation of delta files.
* Value of this parameter affects used memory. If
* default value is used, then 50 Mb of memory is used
* for buffers. Default value is 512.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_DeltaBufferSize( const char *option, char *line, int lino )
{
#if ( SIZEOF_LONG_INT == 4 )
if ( sscanf( line + strlen( option ), "%lu", &DeltaBufferSize ) != 1 )
#elif ( SIZEOF_INT == 4 )
if ( sscanf( line + strlen( option ), "%u", &DeltaBufferSize ) != 1 )
#endif
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"argument for %s\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* UrlBufferSize kilobytes
* Size of read and write buffer allocated during
* inverted index merging for ind files, in kilobytes.
* Value of this parameter affects used memory during
* inverted index merging. Default value is
* DeltaBufferSize * 8.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_UrlBufferSize( const char *option, char *line, int lino )
{
#if ( SIZEOF_LONG_INT == 4 )
if ( sscanf( line + strlen( option ), "%lu", &UrlBufferSize ) != 1 )
#elif ( SIZEOF_INT == 4 )
if ( sscanf( line + strlen( option ), "%u", &UrlBufferSize ) != 1 )
#endif
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"argument for %s\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* Tag number
* Use this field to "tag" several Servers with value
* number, which can later be used with option -t num-
* ber of index(1) command. Note that if you want to
* group several sites together for searching pur-
* poses, you should use "spaces" or "subsets" fea-
* tures of ASPseek, not tag.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Tag( const char *option, char *line, int lino )
{
return config_int_arg( option, line, lino, &csrv.m_hint );
}
/*--------------------------------------------------------------------------*
* ReadTimeOut time
* Sets the maximum timeout to time for downloading a
* document from site. Argument can be expressed in
* seconds, or in the same form as in Period command
* above. Default value is 90 seconds.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_ReadTimeOut( const char *option, char *line, int lino )
{
return config_time_val( option, line, lino,
( time_t * ) &csrv.m_read_timeout );
}
/*--------------------------------------------------------------------------*
* Period time
* Sets the re-index period to time. Value can be set
* just in seconds, or using a special characters
* right after the number (no spaces allowed): s for
* seconds, M for minutes, h for hours, d for days, m
* for months and y for years. You can combine several
* values together, for example string 1m12d means
* "one month and twelve days". You can also specify
* negative numbers, say 1m-10d stands for "one month
* minus ten days". Default value is 7d.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Period( const char *option, char *line, int lino )
{
return config_time_val( option, line, lino,
( time_t * ) &csrv.m_period );
}
/*--------------------------------------------------------------------------*
* MaxHops number
* Sets the maximum hops ("mouse clicks") from URL
* specified by Server command, so documents that are
* "deeper" will not be indexed. Default value is
* 256.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_MaxHops( const char *option, char *line, int lino )
{
if ( config_int_arg( option, line, lino, &csrv.m_maxhops ) )
return 1;
if ( csrv.m_maxhops < 0 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"argument to %s, minimum is 1\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* MaxDocsPerServer number
* Sets that no more than number of documents will be
* indexed from one site during one run of index(1).
* Default value is -1, which means no limits.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_MaxDocsPerServer( const char *option, char *line, int lino )
{
if ( config_int_arg( option, line, lino, &csrv.m_server_maxdocs ) )
return 1;
if ( csrv.m_server_maxdocs < -1 )
csrv.m_server_maxdocs = -1;
return 0;
}
/*--------------------------------------------------------------------------*
* IncrementHopsOnRedirect yes | no
* Sets whether index(1) should increment hops value
* when HTTP redirect is encountered. Applies only to
* redirects generated by "Location:" HTTP headers.
* Setting this option to no allows a greater number
* of documents to be indexed for sites that redirect
* frequently (e.g. for cookie testing, typically on
* each page). Default value is yes.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_IncrementHopsOnRedirect( const char *option, char *line,
int lino )
{
return config_boolean( option, line, lino,
&csrv.m_increment_redir_hops );
}
/*--------------------------------------------------------------------------*
* RedirectLoopLimit number
* Allow no more than number of contiguous redirects.
* This option is especially useful if you set Incre-
* mentHopsOnRedirect to no, because index(1) can fall
* in an endless redirect loop. Limiting the number of
* redirects prevents index from such redirect loops.
* Default value is 8.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_RedirectLoopLimit( const char *option, char *line, int lino )
{
if ( config_int_arg( option, line, lino, &csrv.m_redir_loop_limit ) )
return 1;
if ( csrv.m_redir_loop_limit < 0 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"argument to %s, minimum is 1\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* MinDelay time
* Sets minimum time between finishing of access to
* server and beginning of next access to the server.
* This is useful if site owner blames you for "bomb-
* ing" his site with your index(1) queries. Argument
* time can be set in seconds, or in the same way as
* described in Period command above. Default value is
* 0.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_MinDelay( const char *option, char *line, int lino )
{
return config_time_val( option, line, lino,
( time_t * ) &csrv.m_minDelay );
}
/*--------------------------------------------------------------------------*
* undocumented keyword, doesn't seem to be used
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_IspellCorrectFactor( const char *option, char *line,
int lino )
{
return config_int_arg( option, line, lino, &csrv.m_correct_factor );
}
/*--------------------------------------------------------------------------*
* undocumented keyword, doesn't seem to be used
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_IspellIncorrectFactor( const char *option, char *line,
int lino )
{
return config_int_arg( option, line, lino, &csrv.m_incorrect_factor );
}
/*--------------------------------------------------------------------------*
* undocumented keyword, doesn't seem to be used
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_NumberFactor( const char *option, char *line, int lino )
{
return config_int_arg( option, line, lino, &csrv.m_number_factor );
}
/*--------------------------------------------------------------------------*
* undocumented keyword, doesn't seem to be used
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_AlnumFactor( const char *option, char *line, int lino )
{
return config_int_arg( option, line, lino, &csrv.m_alnum_factor );
}
/*--------------------------------------------------------------------------*
* MinWordLength number
* Sets the minimum length of word to be stored in the
* database, so words shorter than number is not
* stored. Default value is 1.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_MinWordLength( const char *option, char *line, int lino )
{
if ( config_int_arg( option, line, lino, &csrv.m_min_word_length ) )
return 1;
if ( csrv.m_min_word_length < 1 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"argument to %s, minimum is 1\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* MaxWordLength number
* Sets the maximum length of word to be stored in the
* database, so words longer than number is not
* stored. Default value is 32. Note that you can't
* set the value higher than 32.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_MaxWordLength( const char *option, char *line, int lino )
{
if ( config_int_arg( option, line, lino, &csrv.m_max_word_length ) )
return 1;
if ( csrv.m_min_word_length > 32 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"argument to %s, maximum is 32\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* MaxDocSize bytes
* Sets the maximum document size in bytes, so if doc-
* ument size is bigger than bytes, only the first
* bytes of the document will be processed. Default
* value is 1048576 bytes (1Mb).
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_MaxDocSize( const char *option, char *line, int lino )
{
if ( config_int_arg( option, line, lino, &_max_doc_size ) )
return 1;
if ( _max_doc_size < 1 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"argument to %s\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* MaxDocsAtOnce number
* Sets the maximum number of pages to be downloaded
* from the same host before switching to the next
* host. Large values are believed to increase index-
* ing performance when number of indexed sites is
* large. Default value is 1.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_MaxDocsAtOnce( const char *option, char *line, int lino )
{
if ( config_int_arg( option, line, lino, &csrv.m_maxDocs ) )
return 1;
if ( csrv.m_maxDocs < 1 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"argument to %s\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* undocumented keyword
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_NoIndex( const char *option, char *line, int lino )
{
if ( * ( line + strlen( option ) ) != '\0' )
{
sprintf( conf_err_str, "Error: in config file at line %d: Option "
"%s does not allow arguments\n", lino, option );
return 1;
}
csrv.m_gindex = 0;
return 0;
}
/*--------------------------------------------------------------------------*
* undocumented keyword
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_NoFollow( const char *option, char *line, int lino )
{
if ( * ( line + strlen( option ) ) != '\0' )
{
sprintf( conf_err_str, "Error: in config file at line %d: Option "
"%s does not allow arguments\n", lino, option );
return 1;
}
csrv.m_gfollow = 0;
return 0;
}
/*--------------------------------------------------------------------------*
* undocumented keyword, seems to be used
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_OnlineGeo( const char *option, char *line, int lino )
{
return config_boolean( option, line, lino, &OnlineGeo );
}
/*--------------------------------------------------------------------------*
* IncrementalCitations yes | no
* Sets whether to build citation index, ranks of
* pages and lastmod incrementally. If value of this
* parameter is set to yes, then calculating of cita-
* tions, ranks of pages and lastmod file will require
* less memory and take less time on large databases.
* So it is very handy if you want to index large num-
* ber of URLs and have relatively small amount of
* memory. Default is yes.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_IncrementalCitations( const char *option, char *line,
int lino )
{
return config_boolean( option, line, lino, &IncrementalCitations );
}
/*--------------------------------------------------------------------------*
* CompactStorage yes | no
* Sets the storage mode of reverse index. In compact
* storage mode, file/BLOB is not created for each
* word. Instead, information about all words is
* stored in 300 files. In this mode, updating of
* reverse index is generally much faster and requires
* a bit less memory than in the old mode. Default is
* yes.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_CompactStorage( const char *option, char *line, int lino )
{
return config_boolean( option, line, lino, &CompactStorage );
}
/*--------------------------------------------------------------------------*
* HiByteFirst yes | no
* Sets the byte ordering used in field wor-
* durl[1].word (only in Unicode version). Default is
* no.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_HiByteFirst( const char *option, char *line, int lino )
{
#ifdef UNICODE
return config_boolean( option, line, lino, &HiLo );
#else
sprintf( conf_err_str, "Warning: in config file at line %d: Invalid "
"option %s, Unicode support not compiled into aspseek\n",
lino, option );
return 0;
#endif
}
/*--------------------------------------------------------------------------*
* UtfStorage yes | no
* This parameter has sense only in Unicode version
* and only for MySQL back-end. In UTF8 storage mode
* fields wordurl[1].word are stored in UTF8 encoding.
* This mode can reduce sizes of data and index files
* for wordurl table. To convert existing Unicode
* database to this mode, run index -b. Default value
* is no.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_UtfStorage( const char *option, char *line, int lino )
{
#ifdef UNICODE
return config_boolean( option, line, lino, &UtfStorage );
#else
sprintf( conf_err_str, "Warning: in config file at line %d: Invalid "
"option %s, Unicode support not compiled into aspseek\n",
lino, option );
return 0;
#endif
}
/*--------------------------------------------------------------------------*
* Include file
* Includes the contents of file at this point, so you
* can specify some parameters in that included file.
* File name is relative to ASPseek etc directory
* (@sysconfdir@).
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Include( const char *option, char *line, int lino )
{
if ( local_config_level >= 5 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Too deeply "
"nested Includes\n", lino );
return 1;
}
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
if ( LoadConfig( line, local_config_level + 1, local_load_flags ) )
return 1;
return 0;
}
/*--------------------------------------------------------------------------*
* Countries file
* Loads countries IP information from file. File con-
* sists of lines in the form "sss.sss.sss.sss -
* eee.eee.eee.eee cc", where sss.sss.sss.sss is
* starting IP address, eee.eee.eee.eee is ending IP
* address, and cc is a country code (like ru, de,
* etc.). Note that value of ending address should be
* more than starting address.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Countries( const char *option, char *line, int lino )
{
if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
return 1;
string file;
if ( isAbsolutePath( line ) )
file = line;
else
file = ConfDir + line;
ImportCountries( file.c_str( ) );
return 0;
}
/*--------------------------------------------------------------------------*
* undocumented keyword, doesn't seem to be used
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_MirrorRoot( const char *option, char *line, int lino )
{
sprintf( conf_err_str, "Warning: in config file at line %d: Option %s "
"not yet implemented\n", lino, option );
return 0;
#if 0
line += strlen( option );
while ( *line && isspace( *line ) )
line++;
if ( *line )
{
if ( isAbsolutePath( line ) )
MirrorRoot = buf;
else
MirrorRoot = ConfDir + line;
}
else
MirrorRoot = ConfDir + "/mirrors";
csrv.m_use_mirror = csrv.m_use_mirror > 0 ? csrv.m_use_mirror : 0;
return 0;
#endif
}
/*--------------------------------------------------------------------------*
* undocumented keyword, doesn't seem to be used
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_MirrorHeadersRoot( const char *option, char *line, int lino )
{
sprintf( conf_err_str, "Warning: in config file at line %d: Option %s "
"not yet implemented\n", lino, option );
return 0;
#if 0
line += strlen( option );
while ( *line && isspace( *line ) )
line++;
if ( *line )
{
if ( isAbsolutePath( line ) )
MirrorHeadersRoot = line;
else
MirrorHeadersRoot = ConfDir + line;
}
else
MirrorHeadersRoot = ConfDir + "/headers";
csrv.m_use_mirror = csrv.m_use_mirror > 0 ? csrv.m_use_mirror : 0;
return 0;
#endif
}
/*--------------------------------------------------------------------------*
* undocumented keyword, doesn't seem to be used
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_MirrorPeriod( const char *option, char *line, int lino )
{
sprintf( conf_err_str, "Warning: in config file at line %d: Option %s "
"not yet implemented\n", lino, option );
return 0;
}
/*--------------------------------------------------------------------------*
* Replace [regexp [replacement]]
* This parameter allows to replace URL matching reg-
* exp by replacement, or by empty string if replace-
* ment is not specified. This is useful for sites
* with dynamic contents where the same information
* can be obtained by many different URLs. Replace
* without arguments disables any replacements for
* subsequent Server commands.
*
* As in sed(1) command s, the replacement can contain
* \N (N being a number from 1 to 9, inclusive) refer-
* ences, which refer to the portion of the match
* which is contained between Nth '\(' and its match-
* ing '\)'. To include a literal '\', precede it
* with another '\'.
* Function returns 0 on success and 1 on error
*--------------------------------------------------------------------------*/
static int config_Replace( const char *option, char *line, int lino )
{
csrv.AddReplacement( line + strlen( option ) );
return 0;
}
/*-------------------------------------------------------------------------*
* Function for dealing with boolean arguments - they must be either "yes"
* or "no" (case insensitive). Returns 0 on success and 1 on error.
*-------------------------------------------------------------------------*/
static int config_boolean( const char *option, char *line, int lino,
int *param )
{
for ( line += strlen( option ); *line && isspace( *line ); line++ )
/* empty */ ;
if ( ! *line ||
( STRNCASECMP( line, "yes" ) && STRNCASECMP( line, "no" ) ) )
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"or missing argument for %s\n", lino, option );
return 1;
}
*param = ! STRNCASECMP( line, "yes" );
return 0;
}
/*--------------------------------------------------------------------------*
* Function for dealing with time arguments - they must be in a form
* recognized by tstr2time_t(). Returns 0 on success and 1 on error.
*--------------------------------------------------------------------------*/
static int config_time_val( const char *option, char *line, int lino,
time_t *param )
{
if ( ( *param = tstr2time_t( line + strlen( option ) ) ) == BAD_DATE )
{
sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
"argument for %s\n", lino, option );
return 1;
}
return 0;
}
/*--------------------------------------------------------------------------*
* Function for dealing with the argument for filter settings - it
* must be a reasonable regexp. Returns 0 on success and 1 on error.
*--------------------------------------------------------------------------*/
static int config_Filter( const char *option, char *line, int lino,
int filter_type, int reverse )
{
char *lt;
line += strlen( option );
if ( ! ( line = GetToken( line, " \t", < ) ) )
{
sprintf( conf_err_str, "Error: in config file at line %d: Missing "
"argument(s) for %s\n", lino, option );
return 1;
}
do
{
if ( AddFilter( line, filter_type, reverse ) )
return 1;
} while ( ( line = GetToken( 0, " \t", < ) ) );
return 0;
}
/*--------------------------------------------------------------------------*
* Function for dealing with string arguments, returns 0 on success and
* 1 on error.
*--------------------------------------------------------------------------*/
static char *config_charp_arg( const char *option, char *line, int lino )
{
line += strlen( option );
while ( *line && isspace( *line ) )
line++;
if ( ! *line )
{
sprintf( conf_err_str, "Error: in config file at line %d: Missing "
"argument for %s\n", lino, option );
return 0;
}
return line;
}
/*--------------------------------------------------------------------------*
* Function for dealing with integer arguments, returns 0 on success and
* 1 on error.
*--------------------------------------------------------------------------*/
static int config_int_arg( const char *option, char *line, int lino,
int *param )
{
char *ep;
long val;
line += strlen( option );
val = strtol( line, &ep, 10 );
// Check if we got a number
if ( line == ep )
{
sprintf( conf_err_str, "Error: in config file at line %d: Argument is "
"not a number for %s\n", lino, option );
return 1;
}
// Check if the value can be stored in an integer
if ( errno == ERANGE || val > INT_MAX || val < INT_MIN )
{
sprintf( conf_err_str, "Error: in config file at line %d: Argument is "
"too large for %s\n", lino, option );
return 1;
}
// Check if there was junk after the number
if ( ep && *ep != 0 )
{
sprintf( conf_err_str, "Error: in config file at line %d: Junk after "
"argument for %s\n", lino, option );
return 1;
}
*param = ( int ) val;
return 0;
}
|
|
|