[aseek-devel] Refactoring of config.cpp

From: Jens Thoms Toerring (no email)
Date: Mon Sep 15 2003 - 13:00:53 EDT


Hi,

    Sorry for the empty mail, here's the real one...

Since I had a bit of a problem understanding some finer points of
the documentation for aspseek.conf I tried to read the sources.
Unfortunately, the way config.cpp was written made it less than
easy to understand what were going on. So I started to refactor
config.cpp. The result you can find at the end of this email
(sorry it's not a diff but the new file itself, a diff would have
been even longer because nearly all of both the old and the new
file would have ended up in it and the new version alone is already
long enough...).

The most important change is that I took apart the huge if-else
construct where the different keywords are tested for. Instead I
put all keywords into a table, together with pointers to handler
functions. Hopefully, that makes it more readable (and easier to
maintain in case new keywords need to be added or removed).
Unfortunately I had to use two additional global variables (but
with scope restricted to config.cpp), but I hope you don't object
too much.

Of course there's now a huge set of functions, one for each keyword,
plus a few additional helper functions. In the process of setting
them up I also found some inconsistencies and several potential
buffer overruns, which I hopefully managed to get rid of.

I also started to make the syntax check for the configuration file
much more picky - until now wrong arguments often were simply
discarded and the default taken instead. I don't think that this
was a very good idea, because it's against the principle of least
surprise: when a user accidentally mistypes an argument he/she
should be told so instead of having the program silently discard
the user input and work in an unexpected way. So now the arguments
to most keywords are checked (as far as it was possible without
changing code in other files) and on errors a message is printed
and parsing of the configuration file is abandonded.

Another point is that there are now some additional checks to be
able to warn the user when he/she uses keywords that make no sense
with or without Unicode support.

All uses of alloca() are thrown out and replaced by new/delete calls
- I am currently in the process of trying to get aspseek to run on
IRIX and there's no alloca() function. More about this and the
required changes in a different mail...

Finally, while comparing the code with the man page for aspseek.conf
I found some incongruencies:

1. For DBAddr the man pages says

   DBAddr DBType:[//[User[:Pass]@]Host[:Port]]/DBName

   while in config.cpp we're looking for DBUser, DBPass, DBHost and
   DBPort. Fortunately, all of them should be obsolete, but it would
   be better to either remove them completely from config.cpp or to
   change the man page.

2. In the description for 'Server' one reads:

      Add URL as an URL to start indexing from. You can
      specify many Server commands, and set the different
      options for different sites - see below. Note that
      if URL contains path, the whole site will be
      indexed nevertheless, so to limit indexing to some
      subdirectory of site use Disallow parameter
      described below.

   This could be interpreted as if 'Disallow' would be a server
   specific keyword, which it isn't. As far as I can see excluding
   subdirectories will apply to all servers, not just one. I think
   that's a major drawback and 'Allow', 'Disallow', etc. should be
   made server specific keywords instead of global ones (as should
   be several other keywords). Since I probably will need to have
   the ability to restrict the search to certain subdirectories for
   some of the servers I will have to index implementing it is some-
   thing on my todo-list.

3. The keywords

     AuthBasic
     Alias
     NoIndex (but meaning is obvious)
     NoFollow (but meaning is obvious)
     OnlineGeo

   are not documented.

4. The keywords

     HTDBList
     HTDBDoc
     IspellCorrectFactor
     IspellIncorrectFactor
     NumberFactor
     AlnumFactor
     MirrorRoot
     MirrorHeadersRoot
     MirrorPeriod

   are recognized by the program but are neither documented nor seem
   to do anything useful at the moment.

   Can someone tell me what they are supposed to do or if some or all
   of them are just left over from some older versions of aspseek and
   should be removed?

I hope you like the new version of config.cpp. Please excuse my
idiosyncracies when it comes to indentation - I like to have lots
of white space both vertically and horizontally (perhaps it's a
sign of my age, my eyes aren't getting any better). I also can't
stand lines longer than 80 charcters.

Unfortunately, I am not much of a C++-programmer, feeling much more
at home with C. Thus some of the code may look quite a bit awkward
to real C++-programmers, but there's not too much I can do about it
at the moment, I hope you can cope..

                                   Regards, Jens

-- 
 Freie Universitaet Berlin     Jens Thoms Toerring
 Universitaetsbibliothek
 Webteam                       Tel: 0049 30 838 56055
 Garystrasse 39                Fax: 0049 30 838 53738
 14195 Berlin                  e-mail: 
----------8<-------------------------------------------------------
/* Copyright (C) 2000, 2001, 2002 by SWsoft
 *
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation; either version 2 of the License, or
 *  (at your option) any later version.
 *
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with this program; if not, write to the Free Software
 *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */
/*  $Id: config.cpp,v 1.52 2002/10/08 08:18:12 kir Exp $
    Author : Alexander F. Avdonkin
    Uses parts of UdmSearch code
*/
#include "aspseek-cfg.h"
#include <stdio.h>
#include <errno.h>
#include <string>
#include <vector>
#include "config.h"
#include "parse.h"
#include "sqldb.h"
#include "defines.h"
#include "charsets.h"
#include "index.h"
#include "filters.h"
#include "paths.h"
#include "misc.h"
#include "logger.h"
#include "stopwords.h"
#include "datetime.h"
#include "geo.h"
#ifdef UNICODE
#include "ucharset.h"
#endif
#include "mimeconv.h"
static int config_trim_line( char *line, char *cur_line, char **start_of_line,
                             int lino );
static int config_boolean( const char *option, char *line, int lino,
                           int *param );
static int config_time_val( const char *option, char *line, int lino, 
                            time_t *param );
static int config_Filter( const char *option, char *line, int lino,
                          int filter_type, int reverse );
static char *config_charp_arg( const char *option, char *line, int lino );
static int config_int_arg( const char *option, char *line, int lino,
                           int *param );
static int config_DBHost( const char *option, char *line, int lino );
static int config_DBWordDir( const char *option, char *line, int lino );
static int config_DataDir( const char *option, char *line, int lino );
static int config_DBName( const char *option, char *line, int lino );
static int config_DBUser( const char *option, char *line, int lino );
static int config_DBPass( const char *option, char *line, int lino );
static int config_DBPort( const char *option, char *line, int lino );
static int config_DBAddr( const char *option, char *line, int lino );
static int config_DBLibDir( const char *option, char *line, int lino );
static int config_DebugLevel( const char *option, char *line, int lino );
static int config_LocalCharset( const char *option, char *line, int lino );
static int config_DBType( const char *option, char *line, int lino );
static int config_AllowCountries( const char *option, char *line, int lino );
static int config_DisallowNoMatch( const char *option, char *line, int lino );
static int config_Disallow( const char *option, char *line, int lino );
static int config_AllowNoMatch( const char *option, char *line, int lino );
static int config_Allow( const char *option, char *line, int lino );
static int config_CheckOnlyNoMatch( const char *option, char *line, int lino );
static int config_CheckOnly( const char *option, char *line, int lino );
static int config_AddType( const char *option, char *line, int lino );
static int config_StopwordFile( const char *option, char *line, int lino );
static int config_CharsetAlias( const char *option, char *line, int lino );
static int config_CharsetTableU1( const char *option, char *line, int lino );
static int config_CharsetTableU2( const char *option, char *line, int lino );
static int config_Dictionary2( const char *option, char *line, int lino );
static int config_CharsetTable( const char *option, char *line, int lino );
static int config_Charset( const char *option, char *line, int lino );
static int config_Proxy( const char *option, char *line, int lino );
static int config_HTTPHeader( const char *option, char *line, int lino );
static int config_AuthBasic( const char *option, char *line, int lino );
static int config_HTDBList( const char *option, char *line, int lino );
static int config_HTDBDoc( const char *option, char *line, int lino );
static int config_Converter( const char *option, char *line, int lino );
static int config_Alias( const char *option, char *line, int lino );
static int config_Server( const char *option, char *line, int lino );
static int config_MaxBandwidth( const char *option, char *line, int lino );
static int config_FollowOutside( const char *option, char *line, int lino );
static int config_Index( const char *option, char *line, int lino );
static int config_Follow( const char *option, char *line, int lino );
static int config_Robots( const char *option, char *line, int lino );
static int config_DeleteBad( const char *option, char *line, int lino );
static int config_DeleteNoServer( const char *option, char *line, int lino );
static int config_Clones( const char *option, char *line, int lino );
static int config_AddressExpiry( const char *option, char *line, int lino );
static int config_NextDocLimit( const char *option, char *line, int lino );
static int config_WordCacheSize( const char *option, char *line, int lino );
static int config_HrefCacheSize( const char *option, char *line, int lino );
static int config_DeltaBufferSize( const char *option, char *line, int lino );
static int config_UrlBufferSize( const char *option, char *line, int lino );
static int config_Tag( const char *option, char *line, int lino );
static int config_ReadTimeOut( const char *option, char *line, int lino );
static int config_Period( const char *option, char *line, int lino );
static int config_MaxHops( const char *option, char *line, int lino );
static int config_MaxDocsPerServer( const char *option, char *line, int lino );
static int config_IncrementHopsOnRedirect( const char *option,
                                           char *line, int lino );
static int config_RedirectLoopLimit( const char *option, char *line,
                                     int lino );
static int config_MinDelay( const char *option, char *line, int lino );
static int config_IspellCorrectFactor( const char *option, char *line,
                                       int lino );
static int config_IspellIncorrectFactor( const char *option, char *line,
                                         int lino );
static int config_NumberFactor( const char *option, char *line, int lino );
static int config_AlnumFactor( const char *option, char *line, int lino );
static int config_MinWordLength( const char *option, char *line, int lino );
static int config_MaxWordLength( const char *option, char *line, int lino );
static int config_MaxDocSize( const char *option, char *line, int lino );
static int config_MaxDocsAtOnce( const char *option, char *line, int lino );
static int config_NoIndex( const char *option, char *line, int lino );
static int config_NoFollow( const char *option, char *line, int lino );
static int config_OnlineGeo( const char *option, char *line, int lino );
static int config_IncrementalCitations( const char *option, char *line,
                                        int lino );
static int config_CompactStorage( const char *option, char *line, int lino );
static int config_HiByteFirst( const char *option, char *line, int lino );
static int config_UtfStorage( const char *option, char *line, int lino );
static int config_Include( const char *option, char *line, int lino );
static int config_Countries( const char *option, char *line, int lino );
static int config_MirrorRoot( const char *option, char *line, int lino );
static int config_MirrorHeadersRoot( const char *option, char *line,
                                     int lino );
static int config_MirrorPeriod( const char *option, char *line, int lino );
static int config_Replace( const char *option, char *line, int lino );
using std::string;
using std::vector;
char conf_err_str[ STRSIZ ] = "";
static char _user_agent[ STRSIZ ] = "";
static char _extra_headers[ STRSIZ ] = "";
int _max_doc_size = MAXDOCSIZE;
string MirrorRoot,
       MirrorHeadersRoot;
string DataDir( DATA_DIR );
string ConfDir( CONF_DIR );
vector<string> dblib_paths;
CBWSchedule bwSchedule;
CMimes Mimes;
ULONG MaxMem = 10000000;          // seems to be unused !
ULONG WordCacheSize = 50000;
ULONG HrefCacheSize = 10000;
int IncrementalCitations = 1;
#define BASE64_LEN( len )  ( 4 * ( ( ( len ) + 2 ) / 3 ) + 2 )
static char base64[ ] =
    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
/* 
 * The following array of structures contains a list of all keywords that can
 * be used in the configuration file (usually aspseek.conf) and the address of
 * the function to be called when the keyword is found in the configuration
 * file. In order to create a new keyword just append a new element to the
 * array of structures and it will be included into the handling automatically.
 *
 * Note: The functions for handling keywords always have to return an integer,
 * which must be 0 on success and 1 on error (in which case the function should
 * print a message explaining the problem into the 'conf_err_str' char array).
 * The functions all receive three arguments, first the name of the keyword,
 * a pointer to a char array which contains the complete line with the keyword
 * and the arguments (with the pointer pointing to the first character of the
 * keyword and the line guaranteed to end with a non-white-space character) and
 * an integer with the number of the line in the configuration file.
 */
static struct cfg_fcnts {
    const char *option;
    int ( * fnct )( const char *, char *, int );
} config_Functions[ ] = {
    { "DBHost",                  config_DBHost                  },
    { "DBWordDir",               config_DBWordDir               },
    { "DataDir",                 config_DataDir                 },
    { "DBName",                  config_DBName                  },
    { "DBUser",                  config_DBUser                  },
    { "DBPass",                  config_DBPass                  },
    { "DBPort",                  config_DBPort                  },
    { "DBAddr",                  config_DBAddr                  },
    { "DBLibDir",                config_DBLibDir                },
    { "DebugLevel",              config_DebugLevel              },
    { "LocalCharset",            config_LocalCharset            },
    { "DBType",                  config_DBType                  },
    { "AllowCountries",          config_AllowCountries          },
    { "DisallowNoMatch",         config_DisallowNoMatch         },
    { "Disallow",                config_Disallow                },
    { "AllowNoMatch",            config_AllowNoMatch            },
    { "Allow",                   config_Allow                   },
    { "CheckOnlyNoMatch",        config_CheckOnlyNoMatch        },
    { "CheckOnly",               config_CheckOnly               },
    { "AddType",                 config_AddType                 },
    { "StopwordFile",            config_StopwordFile            },
    { "Charset",                 config_Charset                 },
    { "CharsetAlias",            config_CharsetAlias            },
    { "CharsetTableU1",          config_CharsetTableU1          },
    { "CharsetTableU2",          config_CharsetTableU2          },
    { "Dictionary2",             config_Dictionary2             },
    { "CharsetTable",            config_CharsetTable            },
    { "Charset",                 config_Charset                 },
    { "Proxy",                   config_Proxy                   },
    { "HTTPHeader",              config_HTTPHeader              },
    { "AuthBasic",               config_AuthBasic               },
    { "HTDBList",                config_HTDBList                },
    { "HTDBDoc",                 config_HTDBDoc                 },
    { "Converter",               config_Converter               },
    { "Alias",                   config_Alias                   },
    { "Server",                  config_Server                  },
    { "MaxBandwidth",            config_MaxBandwidth            },
    { "FollowOutside",           config_FollowOutside           },
    { "Index",                   config_Index                   },
    { "Follow",                  config_Follow                  },
    { "Robots",                  config_Robots                  },
    { "DeleteBad",               config_DeleteBad               },
    { "DeleteNoServer",          config_DeleteNoServer          },
    { "Clones",                  config_Clones                  },
    { "AddressExpiry",           config_AddressExpiry           },
    { "NextDocLimit",            config_NextDocLimit            },
    { "WordCacheSize",           config_WordCacheSize           },
    { "HrefCacheSize",           config_HrefCacheSize           },
    { "DeltaBufferSize",         config_DeltaBufferSize         },
    { "UrlBufferSize",           config_UrlBufferSize           },
    { "Tag",                     config_Tag                     },
    { "ReadTimeOut",             config_ReadTimeOut             },
    { "Period",                  config_Period                  },
    { "MaxHops",                 config_MaxHops                 },
    { "MaxDocsPerServer",        config_MaxDocsPerServer        },
    { "IncrementHopsOnRedirect", config_IncrementHopsOnRedirect },
    { "RedirectLoopLimit",       config_RedirectLoopLimit       },
    { "MinDelay",                config_MinDelay                },
    { "IspellCorrectFactor",     config_IspellCorrectFactor     },
    { "IspellIncorrectFactor",   config_IspellIncorrectFactor   },
    { "NumberFactor",            config_NumberFactor            },
    { "AlnumFactor",             config_AlnumFactor             },
    { "MinWordLength",           config_MinWordLength           },
    { "MaxWordLength",           config_MaxWordLength           },
    { "MaxDocSize",              config_MaxDocSize              },
    { "MaxDocsAtOnce",           config_MaxDocsAtOnce           },
    { "NoIndex",                 config_NoIndex                 },
    { "NoFollow",                config_NoFollow                },
    { "OnlineGeo",               config_OnlineGeo               },
    { "IncrementalCitations",    config_IncrementalCitations    },
    { "CompactStorage",          config_CompactStorage          },
    { "HiByteFirst",             config_HiByteFirst             },
    { "UtfStorage",              config_UtfStorage              },
    { "Include",                 config_Include                 },
    { "Countries",               config_Countries               },
    { "MirrorRoot",              config_MirrorRoot              },
    { "MirrorHeadersRoot",       config_MirrorHeadersRoot       },
    { "MirrorPeriod",            config_MirrorPeriod            },
    { "Replace",                 config_Replace                 }
};
static const size_t num_config_keywords = sizeof config_Functions /
                                          sizeof config_Functions[ 0 ];
/*--------------------------------------------------------------------------*
 *--------------------------------------------------------------------------*/
static void base64_encode( const char *s, char *store, int length )
{
    int i;
    unsigned char *p = ( unsigned char * ) store;
    for ( i = 0; i < length; s += 3, i += 3 )
    {
        *p++ = base64[ s[ 0 ] >> 2 ];
        *p++ = base64[ ( ( s[ 0 ] & 3 ) << 4 ) + ( s[ 1 ] >> 4 ) ];
        *p++ = base64[ ( ( s[ 1 ] & 0xf ) << 2 ) + ( s[ 2 ] >> 6 ) ];
        *p++ = base64[ s[ 2 ] & 0x3f ];
    }
    // Pad the result
    if ( i == length + 1 )
        *( p - 1 ) = '=';
    else if ( i == length + 2 )
        *( p - 1 ) = *( p - 2 ) = '=';
    *p = '\0';
}
/*--------------------------------------------------------------------------*
 *--------------------------------------------------------------------------*/
char* UserAgent( )
{
    return _user_agent;
}
/*--------------------------------------------------------------------------*
 *--------------------------------------------------------------------------*/
char* ExtraHeaders( )
{
    return _extra_headers;
}
/*--------------------------------------------------------------------------*
 *--------------------------------------------------------------------------*/
int AddType( char *mime_type, char *reg, char *errstr )
{
    CMime* m = new CMime;
    m->SetType( reg, mime_type );
    if ( m->m_mime_type.size( ) > 0 )
    {
        Mimes.push_back( m );
        return 0;
    }
    else
    {
        delete m;
        return 1;
    }
}
/*--------------------------------------------------------------------------*
 *--------------------------------------------------------------------------*/
void CServer::AddReplacement( char* str )
{
    char find[ STRSIZ ] = "",
         replace[ STRSIZ ] = "";
    int n = sscanf( str, "%s%s", find, replace );
    if ( n > 0 && 
         ( strlen( find ) > MAX_URL_LEN || strlen( replace ) > MAX_URL_LEN ) )
    {
            sprintf( conf_err_str, "Error: in config file: URL is too long "
                     "for Replace\n" );
            m_replace = 0;
            return;
    }
    switch ( n )
    {
        case 0 : case -1 :
            m_replace = 0;
            break;
        case 1 : case 2 :
            CReplacement* repl = new CReplacement;
            if ( repl->SetFindReplace( find, replace ) )
            {
                delete repl;
            }
            else
            {
                if ( m_replace == 0 )
                    m_replace = new CReplaceVec;
                m_replace->push_back( repl );
            }
            break;
    }
}
/*--------------------------------------------------------------------*
 * Main function for parsing the configuration file
 *--------------------------------------------------------------------*/
CServer csrv;         // the server server-specific arguments are applied to
// These local variables (scope is restricted to this file) are required for
// a few handler functions that need additional arguments beside keyword,
// line and line number
static string localcharset;
static int local_load_flags;
static int local_config_level;
int LoadConfig( char *conf_name, int config_level, int load_flags )
{
    int line_number = 0;
    FILE *config;
    char line[ STRSIZ ] = "";
    char cur_line[ STRSIZ ];
    char *start;
    local_config_level = config_level;
    if ( config_level == 0 )   // Do some initialization
    {
        sprintf( _user_agent, "%s/%s", USER_AGENT, VERSION );
        _extra_headers[ 0 ] = 0;
        _max_doc_size = MAXDOCSIZE;
        DBPort = 0;
        SetDefaultCharset( CHARSET_USASCII );
        local_load_flags = load_flags;
    }
    string config_file_name;
    // check if the path is absolute
    if ( isAbsolutePath( conf_name ) )
        config_file_name = conf_name;
    else
        config_file_name = ConfDir + "/" + conf_name;
    // Open config
    if ( ! ( config = fopen( config_file_name.c_str( ), "r" ) ) )
    {
        sprintf( conf_err_str, "Error: can't open config file '%s': %s",
                 config_file_name.c_str( ), strerror( errno ) );
        local_config_level--;
        return 1;
    }
    logger.log( CAT_FILE, L_INFO, "Loading configuration from %s\n",
                config_file_name.c_str( ) );
    // Read lines and parse
    while ( fgets( cur_line, sizeof cur_line, config ) )
    {
        switch ( config_trim_line( line, cur_line, &start, ++line_number ) )
        {
            case -1 :
                fclose( config );
                local_config_level--;
                return 1;
            case 1 :                 // line ended with a backslash, was
                continue;            // empty or a comment
        }
        // Now that we have a full line evaluate it (we could get a bit faster
        // by sorting the array of keyword/function structures and then do a
        // binary search, but since this is going to speed it up by not more
        // then a few milliseconds only it's probably not worth the hassle)
        size_t i;
        for ( i = 0; i < num_config_keywords; i++ )
            if ( ! STRNCASECMP( start, config_Functions[ i ].option ) )
            {
                if ( config_Functions[ i ].fnct( config_Functions[ i ].option,
                                                 start, line_number ) )
                {
                    fclose( config );
                    local_config_level--;
                    return 1;
                }
                break;
            }
        if ( i == num_config_keywords )         // unknown option ?
        {
            sprintf( conf_err_str, "Unknown keyword in config file at line "
                     "%d: %s\n", line_number, start );
            fclose( config );
            local_config_level--;
            return 1;
        }
        *line = '\0';
    }
    fclose( config );
    // Test that we weren't in a continued line on end of file
    if ( *line )
    {
        sprintf( conf_err_str, "Error: in config file: Premature end of "
                 "file\n" );
        local_config_level--;
        return 1;
    }
#ifndef UNICODE
    if ( ! GetDefaultCharset( ) )
    {
        SetDefaultCharset( GetCharset(localcharset.c_str( ) ) );
        logger.log(CAT_ALL, L_DEBUG, "Set localcharset to [%s]\n",
                   localcharset.c_str( ) );
    }
#endif // UNICODE
    if ( DBWordDir.empty( ) )
        DBWordDir = DataDir + "/" + DBName;
    // On level0 : Free some variables, prepare others, etc
    if ( config_level == 0 )
    {
        // Add one virtual server if we want FollowOutside
        // or DeleteNoServer no
        if ( csrv.m_outside || ! csrv.m_delete_no_server )
        {
            csrv.m_url = "";
            AddServer( csrv );
        }
        else
            csrv.m_url = "";
        if ( UrlBufferSize == 0 )
            UrlBufferSize = DeltaBufferSize << 3;
#ifdef UNICODE
        FixLangs( );
#endif
        if ( *conf_err_str )
            logger.log( CAT_ALL, L_WARN, "Warnings loading config: %s\n",
                        conf_err_str );
    }
    local_config_level--;
    return 0;
}
/*------------------------------------------------------------------------*
 * Returns -1 on error, 1 if the line is either empty or is a comment line
 * or if it ends with a backslash, and 0 when we got a complete line ready
 * for parsing.
 *------------------------------------------------------------------------*/
static int config_trim_line( char *line, char *cur_line, char **start_of_line,
                             int lino )
{
    char *start, *end;
    // Find first non-white-space character in current line
    
    for ( start = cur_line; *start && isspace( *start ); start++ )
            /* empty */ ;
    // Remove white-space from end of current line
    for ( end = start + strlen( start ) - 1;
          end >= start && isspace( *end ); end-- )
        /* empty */ ;
    *( end + 1 ) = '\0';
    // Return when line is empty or a comment line (but make sure we get
    // handling of continuation lines right)
    if ( *start == '\0' )
        return *line ? 0 : 1;
    if ( *start == '#' )
    {
        if ( *line != '\0' )
        {
            sprintf( conf_err_str, "Error: in config file at line %d: "
                     "Comment within continued line\n", lino );
            return -1;
        }
        return 1;
    }
    // Make sure we have enough space left before we try to append the
    // current line to the whole line
    if ( strlen( line ) + strlen( start ) + 3 >  STRSIZ )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Line is "
                 "too long\n", lino );
        return -1;
    }
    // If this is a new line (not a continuation line) pass back the pointer
    // to the first non-white-space character
    if ( ! *line )
        *start_of_line = start;
    // Append the current line to the whole line (make sure to also insert
    // a space when the current line is a continuation)
    if ( *line )
        strcat( line, " " );
    strcat( line, start );
    // Test if the line ends in a backslash, in which case we need to read the
    // next line
    end = start + strlen( start ) - 1;
    if ( *end == '\\' )
    {
        // Remove backslash and all white-space before the backslash
        for ( --end; isspace( *end ); end-- )
            /* empty */ ;
        *++end = '\0';
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * obsolete keyword
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_DBHost( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    DBHost = line;
    return 0;
}
/*--------------------------------------------------------------------------*
 * obsolete keyword
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_DBWordDir( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    DBWordDir = line;
    return 0;
}
/*--------------------------------------------------------------------------*
 * DataDir  /some/dir
 *        Sets directory in which delta files and files  with
 *        information  about  words,  subsets, spaces will be
 *        stored. Default is @localstatedir at  dot 
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_DataDir( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    DataDir = line;
    return 0;
}
/*--------------------------------------------------------------------------*
 * obsolete keyword
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_DBName( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    DBName = line;
    return 0;
}
/*--------------------------------------------------------------------------*
 * obsolete keyword
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_DBUser( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    DBUser = line;
    return 0;
}
/*--------------------------------------------------------------------------*
 * obsolete keyword
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_DBPass( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    DBPass = line;
    return 0;
}
/*--------------------------------------------------------------------------*
 * obsolete keyword
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_DBPort( const char *option, char *line, int lino )
{
    return config_int_arg( option, line, lino, &DBPort );
}
/*--------------------------------------------------------------------------*
 * DBAddr DBType:[//[User[:Pass]@]Host[:Port]]/DBName/
 *        Defines SQL server connection parameters.
 *        DBType  is SQL server type, it can be mysql or ora-
 *        cle8 for now.
 *        User is a SQL server's user to connect as.
 *        Pass is a User's password. If this field  is  omit-
 *        ted, no password is used.
 *        Host  is  a host name or IP address of host to con-
 *        nect to. If you are running SQL server on the  same
 *        machine, use localhost.
 *        Port  is  a port number on which SQL server is lis-
 *        tening at.  Default is the same as default port  of
 *        used SQL server.
 *        DBName is a name of the database used.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_DBAddr( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    ParseDBAddr( line );
    return 0;
}
/*--------------------------------------------------------------------------*
 * DBLibDir /some/dir
 *        Adds /some/dir to list of directories to search for
 *        database  backend  library  (libdbname-version.so).
 *        Default  library  search  path is @libdir at  dot  Several
 *        such options can be  used,  each  adding  one  more
 *        directory to the list. Last added directory is used
 *        first; compiled in path is last.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_DBLibDir( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    dblib_paths.push_back( string( line ) );
    return 0;
}
/*--------------------------------------------------------------------------*
 * DebugLevel none | error | warning | info | debug
 *        Sets the level of debugging. If set to none,  noth-
 *        ing will be logged. If set to debug, you will get a
 *        bunch of messages. Default value is info.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_DebugLevel( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    logger.setloglevel( line );
    if ( logger.getLevel( ) == L_NONE && STRNCASECMP( line, "none" ) )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "argument for %s\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * LocalCharset charset
 *        Sets  the local charset for ASPseek, so all data in
 *        the database is assumed to be in that charset.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_LocalCharset( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    localcharset = line;
    return 0;
}
/*--------------------------------------------------------------------------*
 * obsolete keyword
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_DBType( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    DBType = line;
    return 0;
}
/*--------------------------------------------------------------------------*
 * AllowCountries cc1 [cc2...]
 *        Specifies to index only sites from countries speci-
 *        fied by cc1, cc2, etc. Should be used together with
 *        the Countries.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_AllowCountries( const char *option, char *line, int lino )
{
    AddCountries( line );
    return 0;
}
/*--------------------------------------------------------------------------*
 * DisallowNoMatch regexp [regexp...]
 *        Disallows to index URLs not matching regexp.
 * Function returns 0 on success and 1 on error
 --------------------------------------------------------------------------*/
static int config_DisallowNoMatch( const char *option, char *line, int lino )
{
    return config_Filter( option, line, lino, DISALLOW, 1 );
}
/*--------------------------------------------------------------------------*
 * Disallow regexp [regexp...]
 *        Do not allow to index URLs matching regexp.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Disallow( const char *option, char *line, int lino )
{
    return config_Filter( option, line, lino, DISALLOW, 0 );
}
/*--------------------------------------------------------------------------*
 * AllowNoMatch regexp [regexp...]
 *        Allows to index URLs not matching regexp.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_AllowNoMatch( const char *option, char *line, int lino )
{
    return config_Filter( option, line, lino, ALLOW, 1 );
}
/*--------------------------------------------------------------------------*
 * Allow regexp [regexp...]
 *        Allows to index URLs matching regexp.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Allow( const char *option, char *line, int lino )
{
    return config_Filter( option, line, lino, ALLOW, 0 );
}
/*--------------------------------------------------------------------------*
 * CheckOnlyNoMatch regexp [regexp...]
 *        Use HEAD request instead of GET for URLs not match-
 *        ing  regexp.  So, such URLs will not be downloaded,
 *        just information about them will be stored in  url-
 *        word table.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_CheckOnlyNoMatch( const char *option, char *line, int lino )
{
    return config_Filter( option, line, lino, HEAD, 1 );
}
/*--------------------------------------------------------------------------*
 * CheckOnly regexp [regexp...]
 *        Use  HEAD  request instead of GET for URLs matching
 *        regexp.  So, such URLs will not be downloaded, just
 *        information  about  them  will be stored in urlword
 *        table.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_CheckOnly( const char *option, char *line, int lino )
{
    return config_Filter( option, line, lino, HEAD, 0 );
}
/*--------------------------------------------------------------------------*
 * undocumented keyword
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_AddType( const char *option, char *line, int lino )
{
    char *s,
         *s1,
         *lt;
    if ( ( s1 = GetToken( line + strlen( option ), " \t", &lt ) ) )
        while ( ( s = GetToken( 0, " \t", &lt ) ) )
            if ( AddType( s1, s, conf_err_str ) )
                sprintf( conf_err_str, "Problem in config file at line %d: "
                         "Can't add %s for MIME type %s\n", lino, s, s1 );
    else
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Missing "
                 "argument(s) for %s\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * StopwordFile lang file [charset]
 *        Loads  stopwords  for  language  lang from file. If
 *        charset is not specified, file contents is  assumed
 *        to  be in LocalCharset, otherwise it is in charset.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_StopwordFile( const char *option, char *line, int lino )
{
    int len = strlen( line ) - strlen( option ) + 1;
    char lang[ 3 ];
    char *file = new char[ len ];
    char* encoding = new char[ len ];
    int n = sscanf( line + strlen( option ), "%2s%s%s", lang,
                    file, encoding );
#ifdef UNICODE
    if ( n >= 2 )
    {
        if ( Stopwords.Load( file, lang,
                             n == 3 ? encoding : localcharset.c_str( ) ) < 0 )
        {
            sprintf( conf_err_str, "Error: in config file at line %d: Can't "
                     "load stopword file '%s'\n", lino, file );
            delete [ ] file;
            delete [ ] encoding;
            return 1;
        }
    }
#else
    if ( n >= 2 )
    {
        if ( Stopwords.Load( file, lang ) < 0 )
        {
            sprintf( conf_err_str, "Error: in config file at line %d: Can't "
                     "load stopword file '%s'\n", lino, file );
            delete [ ] file;
            delete [ ] encoding;
            return 1;
        }
        if ( n == 3 )
            sprintf( conf_err_str, "Warning: in config file at line %d: "
                     "Option %s doesn't accept charset argument without "
                     "Unicode support\n", lino, file );
    }
#endif
    else
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Missing "
                 "argument(s) for %s\n", lino, option );
        delete [ ] file;
        delete [ ] encoding;
        return 1;
    }
    delete [ ] file;
    delete [ ] encoding;
    return 0;
}
/*--------------------------------------------------------------------------*
 * CharsetAlias charset alias1 [alias2...]
 *        Defines alias1, alias2, ... as aliases (alternative
 *        names) for charset. This is needed because in  many
 *        cases there is no "one true name" for the charset -
 *        different web servers and page authors use  differ-
 *        ent names.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_CharsetAlias( const char *option, char *line, int lino )
{
    int len = strlen( line ) - strlen( option ) + 1;
    char *name = new char[ len ];
    char *aliases = new char[ len ];
    if ( sscanf( line + strlen( option ), "%s%s",
                 name, aliases ) == 2 )
        AddAlias( name, aliases );
    else
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Missing "
                 "argument(s) for %s\n", lino, option );
        delete [ ] name;
        delete [ ] aliases;
        return 1;
    }
    delete [ ] name;
    delete [ ] aliases;
    return 0;
}
/*--------------------------------------------------------------------------*
 * CharsetTableU1 charset lang file [lmfile]
 *        Loads  the  Unicode mapping for charset of language
 *        lang  from  file.  Optionally  load  langmap   file
 *        lmfile, which is used for charset guesser.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_CharsetTableU1( const char *option, char *line, int lino )
{
#ifdef UNICODE
    int len = strlen( line ) - strlen( option ) + 1;
    char *name = new char[ len ];
    char *dir = new char[ len ];
    char *lang = new char[ len ];
    char *lmdir = new char[ len ];
    int param,
        charset_id = 0;
    param = sscanf( line + strlen( option ), "%s%s%s%s", name, lang,
                    dir, lmdir );
    if ( param >= 3 &&
         ( charset_id = LoadCharsetU1( lang, name, dir ) ) == -1 )
    {
        logger.log( CAT_FILE, L_WARN, "Charset %s has not been loaded\n",
                    name);
        delete [ ] name;
        delete [ ] dir;
        delete [ ] lang;
        delete [ ] lmdir;
        return 1;
    }
#ifdef USE_CHARSET_GUESSER
    if ( param == 4 && charset_id > 0 )
        langs.AddLang( charset_id, lmdir );
#endif
    if ( param < 3 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Missing "
                 "argument(s) for %s\n", lino, option );
        delete [ ] name;
        delete [ ] dir;
        delete [ ] lang;
        delete [ ] lmdir;
        return 1;
    }
    delete [ ] name;
    delete [ ] dir;
    delete [ ] lang;
    delete [ ] lmdir;
    return 0;
#else
    sprintf( conf_err_str, "Warning: in config file at line %d: Invalid "
             "keyword %s, Unicode support not compiled into aspseek\n",
             lino, option );
    return 0;
#endif
}
/*--------------------------------------------------------------------------*
 * CharsetTableU2 charset lang file [lmfile]
 *        Loads  the Unicode mapping for multibyte charset of
 *        language lang from file.  Optionally  load  langmap
 *        file lmfile, which is used for charset guesser.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_CharsetTableU2( const char *option, char *line, int lino )
{
#ifdef UNICODE
    int len = strlen( line ) - strlen( option ) + 1;
    char *name = new char[ len ];
    char *dir = new char[ len ];
    char *lang = new char[ len ];
    char *lmdir = new char[ len ];
    int param,
        charset_id = 0;
    param = sscanf( line + strlen( option ), "%s%s%s%s", name, lang,
                    dir, lmdir);
    if ( param >= 3 &&
         ( charset_id = LoadCharsetU2V( lang, name, dir ) ) == -1 )
    {
        logger.log( CAT_FILE, L_WARN, "Charset %s has not been loaded\n",
                    name );
        delete [ ] name;
        delete [ ] dir;
        delete [ ] lang;
        delete [ ] lmdir;
        return 1;
    }
#ifdef USE_CHARSET_GUESSER
    if ( param == 4 && charset_id > 0 )
        langs.AddLang( charset_id, lmdir );
#endif
    if ( param < 3 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Missing "
                 "argument(s) for %s\n", lino, option );
        delete [ ] name;
        delete [ ] dir;
        delete [ ] lang;
        delete [ ] lmdir;
        return 1;
    }
    delete [ ] name;
    delete [ ] dir;
    delete [ ] lang;
    delete [ ] lmdir;
    return 0;
#else
    sprintf( conf_err_str, "Warning: in config file at line %d: Invalid "
             "keyword %s, Unicode support not compiled into aspseek\n",
             lino, option );
    return 0;
#endif
}
/*--------------------------------------------------------------------------*
 * Dictionary2 lang file [charset]
 *        Loads  dictionary for lang from file. If charset is
 *        not specified, it is assumed that the  file  is  in
 *        Unicode.  Dictionary is used for tokenizing of text
 *        in Chinese, Japanese and Korean languages.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Dictionary2( const char *option, char *line, int lino )
{
#ifdef UNICODE
    int len = strlen( line ) - strlen( option ) + 1;
    char *dir = new char[ len ];
    char *lang = new char[ len ];
    char *charset = new char[ len ];
    int param;
    if ( ( param = sscanf( line + strlen( option ), "%s%s%s", lang,
                           dir, charset ) ) >= 2 )
        LoadDictionary2( lang, dir, param > 2 ? charset : 0 );
    else
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Missing "
                 "argument(s) for %s\n", lino, option );
        delete [ ] dir;
        delete [ ] lang;
        delete [ ] charset;
        return 1;
    }
    delete [ ] dir;
    delete [ ] lang;
    delete [ ] charset;
    return 0;
#else
    sprintf( conf_err_str, "Warning: in config file at line %d: Invalid "
             "keyword %s, Unicode support not compiled into aspseek\n",
             lino, option );
    return 0;
#endif
}
/*--------------------------------------------------------------------------*
 * CharsetTable charset lang file [lmfile]
 *        Loads  the  table for charset of language lang from
 *        file.  Optionally load langmap file  lmfile,  which
 *        is used for charset guesser.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_CharsetTable( const char *option, char *line, int lino )
{
#ifndef UNICODE
    int len = strlen( line ) - strlen( option ) + 1;
    char *name = new char[ len ];
    char *dir = new char[ len ];
    char *lang = new char[ len ];
    char *lmdir = new char[ len ];
    int charset_id = 0;
    int param = sscanf( line + strlen( option ), "%s%s%s%s", name, lang,
                        dir, lmdir );
    if ( param >= 3 &&
         ( charset_id = LoadCharset( lang, name, dir ) ) == -1 )
    {
        logger.log(CAT_FILE, L_WARN, "Charset %s has not been loaded\n",
                   name);
        delete [ ] name;
        delete [ ] dir;
        delete [ ] lang;
        delete [ ] lmdir;
        return 1;
    }
#ifdef USE_CHARSET_GUESSER
    if ( param == 4 && charset_id > 0 )
        langs.AddLang( charset_id, lmdir );
#endif
    if ( param < 3 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Missing "
                 "argument(s) for %s\n", lino, option );
        delete [ ] name;
        delete [ ] dir;
        delete [ ] lang;
        delete [ ] lmdir;
        return 1;
    }
    delete [ ] name;
    delete [ ] dir;
    delete [ ] lang;
    delete [ ] lmdir;
    return 0;
#else
    sprintf( conf_err_str, "Warning: in config file at line %d: Option %s is "
             "superfluous with Unicode support\n", lino, option );
    return 0;
#endif
}
/*--------------------------------------------------------------------------*
 * Charset charset
 *        Usable to set charset for the servers that  do  not
 *        return  it.  Argument  should be known charset name
 *        (see below  for  charset  configuration).  Alterna-
 *        tively,  you  can  use  charset  guesser feature of
 *        index(1).
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Charset( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    csrv.m_charset = line;
    return 0;
}
/*--------------------------------------------------------------------------*
 * Proxy [host.com[:port]]
 *        Use  proxy  rather  than direct connection. You can
 *        also index FTP sites via  proxy.  If  port  is  not
 *        specified,  default  is 3128 (sqiud). Proxy without
 *        arguments disables proxy.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Proxy( const char *option, char *line, int lino )
{
    long val;
    char *s,
         *ep;
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    if ( ( s = strrchr( line, ':' ) ) != 0 )
        *s++ = '\0';
    csrv.m_proxy = line;
    if ( s == 0 )
        return 0;
    if ( *s == '\0' )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Missing "
                 "port number after ':' for %s\n", lino, option );
        return 1;
    }
    val = strtol( s, &ep, 10 );
    // Check if we got a number
    if ( s == ep )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Argument is "
                 "not a number for %s\n", lino, option );
        return 1;
    }
    // Check if there was junk after the number
    if ( ep && *ep != 0 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Junk after "
                 "argument for %s\n", lino, option );
        return 1;
    }
    // Check that the value is a valid (non-privileged) port number
    if ( errno == ERANGE || val < 1 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Argument to "
                 "%s is not a valid port number\n", lino, option );
        return 1;
    }
    if ( val > 65535 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "port number, must be below 65536\n",
                 lino );
        return 1;
    }
    csrv.m_proxy_port = ( int ) val;
    return 0;
}
/*--------------------------------------------------------------------------*
 * HTTPHeader header
 *        Add header to headers that index(1) sends  in  HTTP
 *        request.  You  should  not use If-Modified-Since or
 *        Accept-Charset headers here, as index(1)  sends  it
 *        anyway.  Header  User-Agent:  aspseek/@VERSION@  is
 *        sent too, but you may override it.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_HTTPHeader( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    
    if ( ! STRNCMP( line, "User-Agent: " ) )
        strncpy( _user_agent, line + strlen( "User-Agent: " ),
                 sizeof _user_agent - 1 );
    else
    {
        strcat( _extra_headers, line );
        strcat( _extra_headers, "\r\n" );
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * undocumented keyword
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_AuthBasic( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    char* s = new char[ BASE64_LEN( strlen( line ) ) + 1 ];
    base64_encode( line, s, strlen( line ) );
    csrv.m_basic_auth = s;
    delete [ ] s;
    return 0;
}
/*--------------------------------------------------------------------------*
 * undocumented keyword, doesn't seem to be used
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_HTDBList( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    csrv.m_htdb_list = line;
    return 0;
}
/*--------------------------------------------------------------------------*
 * undocumented keyword, doesn't seem to be used
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_HTDBDoc( const char *option, char *line, int lino )
{
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    csrv.m_htdb_doc = line;
    return 0;
}
/*--------------------------------------------------------------------------*
 * Converter from/type to/type[; charset=cset] command line
 *        Specifies that for converting documents with  MIME-
 *        type  from/type  to  MIME-type  to/type the command
 *        specified by command line will  be  used.  Argument
 *        from/type  can  be any type returned by Web server.
 *        Argument  to/type  can  be  either  text/plain   or
 *        text/html.   If  you add ;charset=cset string after
 *        to/type, index will know  that  resulting  document
 *        has  a  charset cset, otherwise it is assumed to be
 *        us-ascii.
 *
 *        In the command line you usually specify program  or
 *        script  to  run, together with its options. Program
 *        is expected to to read from  stdin  and  write  the
 *        converted document to stdout.
 *
 *        If   your  program  can't  deal  with  stdin/stdout
 *        streams, you should use $in  and  $out  strings  in
 *        command line, and they will be substituted with two
 *        file names in /tmp directory. index(1) will  create
 *        files  with  unique names, write the document down-
 *        loaded to the first file (referenced as  $in),  run
 *        the  /bin/prog, read the second file (referenced as
 *        $out) into memory, and then delete both files.
 *
 *        You can also use $url in command line, it  will  be
 *        substituted with the actual URL of downloaded docu-
 *        ment. You can use it in your own scripts to distin-
 *        guish  between  a different document variations, or
 *        to be able to write one script for  many  different
 *        MIME-types.
 *
 *        Please  note that index(1) relies on a Content-Type
 *        header returned by a Web server.  Some  Web-servers
 *        are misconfigured and give wrong info (for example,
 *        return header  Content-Type:  audio/x-pn-realaudio-
 *        plugin for .rpm files).
 *
 *        Examples:
 *        Converter app/ps text/plain; charset=iso8859-1 ps2ascii
 *        # ps2ascii can't deal with PDF files from stdin
 *        Converter application/pdf text/plain ps2ascii $in $out
 *
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Converter( const char *option, char *line, int lino )
{
#ifndef USE_EXT_CONV
    sprintf( conf_err_str, "Warning: in config file at line %d: aspseek was "
             "not compiled with support for external converters.\n", lino );
    return 0;
#else
    // Arguments are: from/type to/type[;charset=some] command
    // Example:
    // application/msword text/plain;charset=windows-1251 catdoc -a $in
    char* from;
    char* to;
    char* charset;
    char* cmd;
    char* lt;
    // parse the args
    from = GetToken( line + strlen( option ), " \t", &lt );
    to = GetToken( 0, "; \t", &lt );
    cmd = GetToken( 0, "\r\n", &lt );
    if ( ! cmd )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Too few "
                 "arguments for %s\n", lino, option );
        return 1;
    }
    else
    {
        if ( ( charset = strstr( cmd, "charset=" ) ) != 0 )
        {
            charset += 8;
            cmd = strchr( charset + 1, ' ' );
            if ( *cmd != '\0' )
            {
                *cmd = '\0';
                cmd++;
                while ( *cmd == ' ' || *cmd == '\t' )
                    cmd++;
            }
            else
                cmd = 0;
        }
        else
            charset = 0;
        if ( ! cmd )
        {
            sprintf( conf_err_str, "Error: in config file at line %d: Too few "
                     "arguments for %s\n", lino, option );
            return 1;
        }
        else
            converters[ from ] = new CExtConv( from, to, charset, cmd );
    }
    return 0;
#endif
}
/*--------------------------------------------------------------------------*
 * undocumented keyword, does nothing (but there's an AddAliads() function
 * in charsets.cpp which could be called...)
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Alias( const char *option, char *line, int lino )
{
    char buf1[ STRSIZ ];
    char buf2[ STRSIZ ];
    if ( sscanf( line + strlen( option ), "%s%s", buf1, buf2 ) == 2 )
    {
//      AddAlias( buf1, buf2 );
    }
    else
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "argument for %s\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * Server URL
 *        Add  URL  as an URL to start indexing from. You can
 *        specify many Server commands, and set the different
 *        options  for different sites - see below. Note that
 *        if URL  contains  path,  the  whole  site  will  be
 *        indexed  nevertheless, so to limit indexing to some
 *        subdirectory  of  site   use   Disallow   parameter
 *        described below.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Server( const char *option, char *line, int lino )
{
    CUrl from;
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    if ( from.ParseURL( line ) )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "URL: %s", lino, line );
        return 1;
    }
    csrv.m_url = line;
    AddServer( csrv );
    if ( local_load_flags & FLAG_ADD_SERV )
    {
        string srv =   from.m_schema + string( "://" )
                     + from.m_hostinfo + string( "/" );
        if ( ! strcasecmp( from.m_schema, "http" ) &&
             ( local_load_flags & FLAG_ADD_SERV ) && csrv.m_userobots )
        {
            char *robots = new char[   strlen( from.m_schema )
                                     + strlen( from.m_hostinfo ) + 15 ];
            sprintf( robots, "%s://%s/robots.txt",
                     from.m_schema, from.m_hostinfo );
            AddHref( robots, 0, 0, srv.c_str( ), 1 );
            delete [ ] robots;
        }
        if ( local_load_flags & FLAG_ADD_SERV )
            AddHref( line, 0, 0, srv.c_str( ), 0 );
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * MaxBandwidth bytes [starttime [endtime]]
 *        Sets maximum used bandwidth for incoming traffic to
 *        bytes  per  second for the specified period of time
 *        of day. Arguments starttime and endtime are in sec-
 *        onds  from  midnight (0:00). If endtime is omitted,
 *        then it is  implied  to  be  the  end  of  the  day
 *        (86400). If both starttime and endtime are omitted,
 *        then the limit is for the whole day.  You  can  use
 *        several  MaxBandwidth  commands.  Note that if end-
 *        time is less than starttime, index(1)  will  handle
 *        it  correctly, setting two intervals from starttime
 *        to  midnight  and  from  midnight  to  endtime.  By
 *        default bandwidth is not limited.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_MaxBandwidth( const char *option, char *line, int lino )
{
    long int bandwidth;
    int start, finish = 0;
    switch ( sscanf( line + strlen( option ), "%li%i%i",
                     &bandwidth, &start, &finish ) )
    {
        case 1 :
            bwSchedule.m_defaultBandwidth = bandwidth;
            break;
        case 2 : case 3 :
            bwSchedule.AddInterval( start, finish, bandwidth );
            break;
        default :
            sprintf( conf_err_str, "Error: in config file at line %d: Missing "
                     "argument for %s\n", lino, option );
            return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * FollowOutside yes | no
 *        Sets  whether  index(1)  should index outside sites
 *        defined in Server directives. Default is no. If you
 *        set it to yes, be sure to limit the scope of index-
 *        ing in some other way (for example, with  MaxHops).
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_FollowOutside( const char *option, char *line, int lino )
{
    return config_boolean( option, line, lino, &csrv.m_outside );
}
/*--------------------------------------------------------------------------*
 * Index yes | no
 *        Sets whether to store words into database.  Default
 *        value is yes.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Index( const char *option, char *line, int lino )
{
    return config_boolean( option, line, lino, &csrv.m_gindex );
}
/*--------------------------------------------------------------------------*
 * Follow yes | no
 *        Sets  whether  to  store links found into database.
 *        Default value is yes.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Follow( const char *option, char *line, int lino )
{
    return config_boolean( option, line, lino, &csrv.m_gfollow );
}
/*--------------------------------------------------------------------------*
 * Robots yes | no
 *        Sets   whether   the   robot   exclusion   standard
 *        (robots.txt  file  and  META NAME="robots") will be
 *        honored. Default is yes.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Robots( const char *option, char *line, int lino )
{
    return config_boolean( option, line, lino, &csrv.m_userobots );
}
/*--------------------------------------------------------------------------*
 * DeleteBad yes | no
 *        Sets whether to delete bad  (not  found,  forbidden
 *        etc.) URLs from the database.  Default value is no.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_DeleteBad( const char *option, char *line, int lino )
{
    return config_boolean( option, line, lino, &csrv.m_deletebad );
}
/*--------------------------------------------------------------------------*
 * DeleteNoServer yes | no
 *        Sets whether to delete URLs which  have  no  corre-
 *        spondent  "Server" commands.  Default value is yes.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_DeleteNoServer( const char *option, char *line, int lino )
{
    return config_boolean( option, line, lino, &csrv.m_delete_no_server );
}
/*--------------------------------------------------------------------------*
 * Clones yes | no
 *        Sets whether to enable clones eliminating. Clone is
 *        a  document which is absolutely the same as another
 *        document.  If  this  set  to  yes,  clone  is   not
 *        parsed/stored  in the database, instead word infor-
 *        mation for original document is used. Default value
 *        is yes.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Clones( const char *option, char *line, int lino )
{
    return config_boolean( option, line, lino, &csrv.m_use_clones );
}
/*--------------------------------------------------------------------------*
 * AddressExpiry time
 *        Sets expiration time for "DNS name -> IP" entry  in
 *        address  cache.   After  entry is expired, resolver
 *        will make DNS lookup again.  Argument time  can  be
 *        set  in  seconds, or the same way as in Period com-
 *        mand below. Default value is 1 hour.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_AddressExpiry( const char *option, char *line, int lino )
{
    return config_time_val( option, line, lino, &AddressExpiry );
}
/*--------------------------------------------------------------------------*
 * NextDocLimit number
 *        Maximum number of URLs loaded from database at each
 *        request.  Default value is 1000.
 *
 *        This option is used only if URLs to be indexed  are
 *        ordered by next index time; otherwise, if -o option
 *        to index(1) is used, all URLs for  current  hop  is
 *        taken at once.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_NextDocLimit( const char *option, char *line, int lino )
{
#if ( SIZEOF_LONG_INT == 4 )
    if ( sscanf( line + strlen( option ), "%lu", &NextDocLimit ) != 1 )
#elif ( SIZEOF_INT == 4 )
    if ( sscanf( line + strlen( option ), "%u", &NextDocLimit ) != 1 )
#endif
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "argument for %s\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * WordCacheSize number
 *        Maximum word count in the word cache. Word cache is
 *        used to reduce database load for converting word to
 *        its word ID. Default value is 50000.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_WordCacheSize( const char *option, char *line, int lino )
{
#if ( SIZEOF_LONG_INT == 4 )
    if ( sscanf( line + strlen( option ), "%lu", &WordCacheSize ) != 1 )
#elif ( SIZEOF_INT == 4 )
    if ( sscanf( line + strlen( option ), "%u", &WordCacheSize ) != 1 )
#endif
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "argument for %s\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * HrefCacheSize number
 *        Maximum URL count in the href cache. Href cache  is
 *        used  to reduce database load for converting URL of
 *        outgoing hyperlink to its URL ID.  Default value is
 *        10000.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_HrefCacheSize( const char *option, char *line, int lino )
{
#if ( SIZEOF_LONG_INT == 4 )
    if ( sscanf( line + strlen( option ), "%lu",
#elif ( SIZEOF_INT == 4 )
    if ( sscanf( line + strlen( option ), "%u",
#endif
                 &StoredHrefs.m_maxSize ) != 1 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "argument for %s\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * DeltaBufferSize kilobytes
 *        Size  of  buffer  for  each  of 100 delta files, in
 *        kilobytes. Setting of low value for this  parameter
 *        can  result  in  big  fragmentation of delta files.
 *        Value of this parameter  affects  used  memory.  If
 *        default value is used, then 50 Mb of memory is used
 *        for buffers. Default value is 512.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_DeltaBufferSize( const char *option, char *line, int lino )
{
#if ( SIZEOF_LONG_INT == 4 )
    if ( sscanf( line + strlen( option ), "%lu", &DeltaBufferSize ) != 1 )
#elif ( SIZEOF_INT == 4 )
    if ( sscanf( line + strlen( option ), "%u", &DeltaBufferSize ) != 1 )
#endif
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "argument for %s\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * UrlBufferSize kilobytes
 *        Size of read  and  write  buffer  allocated  during
 *        inverted index merging for ind files, in kilobytes.
 *        Value of this parameter affects used memory  during
 *        inverted    index   merging.   Default   value   is
 *        DeltaBufferSize * 8.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_UrlBufferSize( const char *option, char *line, int lino )
{
#if ( SIZEOF_LONG_INT == 4 )
    if ( sscanf( line + strlen( option ), "%lu", &UrlBufferSize ) != 1 )
#elif ( SIZEOF_INT == 4 )
    if ( sscanf( line + strlen( option ), "%u", &UrlBufferSize ) != 1 )
#endif
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "argument for %s\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * Tag number
 *        Use this field to "tag" several Servers with  value
 *        number, which can later be used with option -t num-
 *        ber of index(1) command. Note that if you  want  to
 *        group  several  sites  together  for searching pur-
 *        poses, you should use "spaces"  or  "subsets"  fea-
 *        tures of ASPseek, not tag.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Tag( const char *option, char *line, int lino )
{
    return config_int_arg( option, line, lino, &csrv.m_hint );
}
/*--------------------------------------------------------------------------*
 * ReadTimeOut time
 *        Sets  the maximum timeout to time for downloading a
 *        document from site.  Argument can be  expressed  in
 *        seconds,  or  in the same form as in Period command
 *        above. Default value is 90 seconds.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_ReadTimeOut( const char *option, char *line, int lino )
{
    return config_time_val( option, line, lino,
                            ( time_t * ) &csrv.m_read_timeout );
}
/*--------------------------------------------------------------------------*
 * Period time
 *        Sets the re-index period to time. Value can be  set
 *        just  in  seconds,  or  using  a special characters
 *        right after the number (no spaces allowed):  s  for
 *        seconds,  M for minutes, h for hours, d for days, m
 *        for months and y for years. You can combine several
 *        values  together,  for  example  string 1m12d means
 *        "one month and twelve days".  You can also  specify
 *        negative  numbers, say 1m-10d stands for "one month
 *        minus ten days". Default value is 7d.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Period( const char *option, char *line, int lino )
{
    return config_time_val( option, line, lino,
                            ( time_t * ) &csrv.m_period );
}
/*--------------------------------------------------------------------------*
 * MaxHops number
 *        Sets  the  maximum  hops  ("mouse clicks") from URL
 *        specified by Server command, so documents that  are
 *        "deeper"  will  not  be  indexed.  Default value is
 *        256.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_MaxHops( const char *option, char *line, int lino )
{
    if ( config_int_arg( option, line, lino, &csrv.m_maxhops ) )
        return 1;
    if ( csrv.m_maxhops < 0 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "argument to %s, minimum is 1\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * MaxDocsPerServer number
 *        Sets  that no more than number of documents will be
 *        indexed from one site during one run  of  index(1).
 *        Default value is -1, which means no limits.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_MaxDocsPerServer( const char *option, char *line, int lino )
{
    if ( config_int_arg( option, line, lino, &csrv.m_server_maxdocs ) )
        return 1;
    if ( csrv.m_server_maxdocs < -1 )
        csrv.m_server_maxdocs = -1;
    return 0;
}
/*--------------------------------------------------------------------------*
 * IncrementHopsOnRedirect yes | no
 *        Sets whether index(1) should increment  hops  value
 *        when  HTTP redirect is encountered. Applies only to
 *        redirects generated by  "Location:"  HTTP  headers.
 *        Setting  this  option to no allows a greater number
 *        of documents to be indexed for sites that  redirect
 *        frequently  (e.g.  for cookie testing, typically on
 *        each page). Default value is yes.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_IncrementHopsOnRedirect( const char *option, char *line,
                                           int lino )
{
    return config_boolean( option, line, lino,
                           &csrv.m_increment_redir_hops );
}
/*--------------------------------------------------------------------------*
 * RedirectLoopLimit number
 *        Allow no more than number of contiguous  redirects.
 *        This  option is especially useful if you set Incre-
 *        mentHopsOnRedirect to no, because index(1) can fall
 *        in an endless redirect loop. Limiting the number of
 *        redirects prevents index from such redirect  loops.
 *        Default value is 8.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_RedirectLoopLimit( const char *option, char *line, int lino )
{
    if ( config_int_arg( option, line, lino, &csrv.m_redir_loop_limit ) )
        return 1;
    if ( csrv.m_redir_loop_limit < 0 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "argument to %s, minimum is 1\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * MinDelay time
 *        Sets  minimum  time  between finishing of access to
 *        server and beginning of next access to the  server.
 *        This  is useful if site owner blames you for "bomb-
 *        ing" his site with your index(1) queries.  Argument
 *        time  can  be set in seconds, or in the same way as
 *        described in Period command above. Default value is
 *        0.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_MinDelay( const char *option, char *line, int lino )
{
    return config_time_val( option, line, lino,
                            ( time_t * ) &csrv.m_minDelay );
}
/*--------------------------------------------------------------------------*
 * undocumented keyword, doesn't seem to be used
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_IspellCorrectFactor( const char *option, char *line,
                                       int lino )
{
    return config_int_arg( option, line, lino, &csrv.m_correct_factor );
}
/*--------------------------------------------------------------------------*
 * undocumented keyword, doesn't seem to be used
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_IspellIncorrectFactor( const char *option, char *line,
                                         int lino )
{
    return config_int_arg( option, line, lino, &csrv.m_incorrect_factor );
}
/*--------------------------------------------------------------------------*
 * undocumented keyword, doesn't seem to be used
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_NumberFactor( const char *option, char *line, int lino )
{
    return config_int_arg( option, line, lino, &csrv.m_number_factor );
}
/*--------------------------------------------------------------------------*
 * undocumented keyword, doesn't seem to be used
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_AlnumFactor( const char *option, char *line, int lino )
{
    return config_int_arg( option, line, lino, &csrv.m_alnum_factor );
}
/*--------------------------------------------------------------------------*
 * MinWordLength number
 *        Sets the minimum length of word to be stored in the
 *        database, so  words  shorter  than  number  is  not
 *        stored. Default value is 1.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_MinWordLength( const char *option, char *line, int lino )
{
    if ( config_int_arg( option, line, lino, &csrv.m_min_word_length ) )
        return 1;
    if ( csrv.m_min_word_length < 1 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "argument to %s, minimum is 1\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * MaxWordLength number
 *        Sets the maximum length of word to be stored in the
 *        database,  so  words  longer  than  number  is  not
 *        stored.  Default  value  is 32. Note that you can't
 *        set the value higher than 32.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_MaxWordLength( const char *option, char *line, int lino )
{
    if ( config_int_arg( option, line, lino, &csrv.m_max_word_length ) )
        return 1;
    if ( csrv.m_min_word_length > 32 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "argument to %s, maximum is 32\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * MaxDocSize bytes
 *        Sets the maximum document size in bytes, so if doc-
 *        ument size is bigger than  bytes,  only  the  first
 *        bytes  of  the document will be processed.  Default
 *        value is 1048576 bytes (1Mb).
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_MaxDocSize( const char *option, char *line, int lino )
{
    if ( config_int_arg( option, line, lino, &_max_doc_size ) )
        return 1;
    if ( _max_doc_size < 1 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "argument to %s\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * MaxDocsAtOnce number
 *        Sets  the  maximum number of pages to be downloaded
 *        from the same host before  switching  to  the  next
 *        host.  Large values are believed to increase index-
 *        ing performance when number  of  indexed  sites  is
 *        large. Default value is 1.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_MaxDocsAtOnce( const char *option, char *line, int lino )
{
    if ( config_int_arg( option, line, lino, &csrv.m_maxDocs ) )
        return 1;
    if ( csrv.m_maxDocs < 1 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "argument to %s\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * undocumented keyword
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_NoIndex( const char *option, char *line, int lino )
{
    if ( * ( line + strlen( option ) ) != '\0' )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Option "
                 "%s does not allow arguments\n", lino, option );
        return 1;
    }
    csrv.m_gindex = 0;
    return 0;
}
/*--------------------------------------------------------------------------*
 * undocumented keyword
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_NoFollow( const char *option, char *line, int lino )
{
    if ( * ( line + strlen( option ) ) != '\0' )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Option "
                 "%s does not allow arguments\n", lino, option );
        return 1;
    }
    csrv.m_gfollow = 0;
    return 0;
}
/*--------------------------------------------------------------------------*
 * undocumented keyword, seems to be used
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_OnlineGeo( const char *option, char *line, int lino )
{
    return config_boolean( option, line, lino, &OnlineGeo );
}
/*--------------------------------------------------------------------------*
 * IncrementalCitations yes | no
 *        Sets  whether  to  build  citation  index, ranks of
 *        pages and lastmod incrementally. If value  of  this
 *        parameter  is set to yes, then calculating of cita-
 *        tions, ranks of pages and lastmod file will require
 *        less  memory and take less time on large databases.
 *        So it is very handy if you want to index large num-
 *        ber  of  URLs  and  have relatively small amount of
 *        memory. Default is yes.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_IncrementalCitations( const char *option, char *line,
                                        int lino )
{
    return config_boolean( option, line, lino, &IncrementalCitations );
}
/*--------------------------------------------------------------------------*
 * CompactStorage yes | no
 *        Sets the storage mode of reverse index. In  compact
 *        storage  mode,  file/BLOB  is  not created for each
 *        word.  Instead,  information  about  all  words  is
 *        stored  in  300  files.  In  this mode, updating of
 *        reverse index is generally much faster and requires
 *        a bit less memory than in the old mode.  Default is
 *        yes.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_CompactStorage( const char *option, char *line, int lino )
{
    return config_boolean( option, line, lino, &CompactStorage );
}
/*--------------------------------------------------------------------------*
 * HiByteFirst yes | no
 *        Sets   the   byte   ordering  used  in  field  wor-
 *        durl[1].word (only in Unicode version). Default  is
 *        no.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_HiByteFirst( const char *option, char *line, int lino )
{
#ifdef UNICODE
    return config_boolean( option, line, lino, &HiLo );
#else
    sprintf( conf_err_str, "Warning: in config file at line %d: Invalid "
             "option %s, Unicode support not compiled into aspseek\n",
             lino, option );
    return 0;
#endif
}
/*--------------------------------------------------------------------------*
 * UtfStorage yes | no
 *        This parameter has sense only  in  Unicode  version
 *        and  only for MySQL back-end.  In UTF8 storage mode
 *        fields wordurl[1].word are stored in UTF8 encoding.
 *        This  mode can reduce sizes of data and index files
 *        for wordurl table.   To  convert  existing  Unicode
 *        database to this mode, run index -b.  Default value
 *        is no.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_UtfStorage( const char *option, char *line, int lino )
{
#ifdef UNICODE
    return config_boolean( option, line, lino, &UtfStorage );
#else
    sprintf( conf_err_str, "Warning: in config file at line %d: Invalid "
             "option %s, Unicode support not compiled into aspseek\n",
             lino, option );
    return 0;
#endif
}
/*--------------------------------------------------------------------------*
 * Include file
 *        Includes the contents of file at this point, so you
 *        can  specify some parameters in that included file.
 *        File name is  relative  to  ASPseek  etc  directory
 *        (@sysconfdir@).
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Include( const char *option, char *line, int lino )
{
    if ( local_config_level >= 5 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Too deeply "
                 "nested Includes\n", lino );
        return 1;
    }
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    if ( LoadConfig( line, local_config_level + 1, local_load_flags ) )
        return 1;
    return 0;
}
/*--------------------------------------------------------------------------*
 * Countries file
 *        Loads countries IP information from file. File con-
 *        sists of  lines  in  the  form  "sss.sss.sss.sss  -
 *        eee.eee.eee.eee   cc",   where  sss.sss.sss.sss  is
 *        starting IP address, eee.eee.eee.eee is  ending  IP
 *        address,  and  cc  is  a country code (like ru, de,
 *        etc.).  Note that value of ending address should be
 *        more than starting address.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Countries( const char *option, char *line, int lino )
{   
    if ( ( line = config_charp_arg( option, line, lino ) ) == 0 )
        return 1;
    string file;
    if ( isAbsolutePath( line ) )
        file = line;
    else
        file = ConfDir + line;
    ImportCountries( file.c_str( ) );
    return 0;
}
/*--------------------------------------------------------------------------*
 * undocumented keyword, doesn't seem to be used
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_MirrorRoot( const char *option, char *line, int lino )
{
    sprintf( conf_err_str, "Warning: in config file at line %d: Option %s "
             "not yet implemented\n", lino, option );
    return 0;
#if 0
    line += strlen( option );
    while ( *line && isspace( *line ) )
            line++;
    if ( *line )
    {
        if ( isAbsolutePath( line ) )
            MirrorRoot = buf;
        else
            MirrorRoot = ConfDir +  line;
    }
    else
        MirrorRoot = ConfDir + "/mirrors";
    csrv.m_use_mirror = csrv.m_use_mirror > 0 ? csrv.m_use_mirror : 0;
    return 0;
#endif
}
/*--------------------------------------------------------------------------*
 * undocumented keyword, doesn't seem to be used
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_MirrorHeadersRoot( const char *option, char *line, int lino )
{
    sprintf( conf_err_str, "Warning: in config file at line %d: Option %s "
             "not yet implemented\n", lino, option );
    return 0;
#if 0
    line += strlen( option );
    while ( *line && isspace( *line ) )
            line++;
    if ( *line )
    {
        if ( isAbsolutePath( line ) )
            MirrorHeadersRoot = line;
        else
            MirrorHeadersRoot = ConfDir + line;
    }
    else
        MirrorHeadersRoot = ConfDir + "/headers";
    csrv.m_use_mirror = csrv.m_use_mirror > 0 ? csrv.m_use_mirror : 0;
    return 0;
#endif
}
/*--------------------------------------------------------------------------*
 * undocumented keyword, doesn't seem to be used
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_MirrorPeriod( const char *option, char *line, int lino )
{
    sprintf( conf_err_str, "Warning: in config file at line %d: Option %s "
             "not yet implemented\n", lino, option );
    return 0;
}
/*--------------------------------------------------------------------------*
 * Replace [regexp [replacement]]
 *        This parameter allows to replace URL matching  reg-
 *        exp  by replacement, or by empty string if replace-
 *        ment is not specified. This  is  useful  for  sites
 *        with  dynamic  contents  where the same information
 *        can be obtained by  many  different  URLs.  Replace
 *        without  arguments  disables  any  replacements for
 *        subsequent Server commands.
 *
 *        As in sed(1) command s, the replacement can contain
 *        \N (N being a number from 1 to 9, inclusive) refer-
 *        ences, which refer to  the  portion  of  the  match
 *        which  is contained between Nth '\(' and its match-
 *        ing '\)'.  To include a  literal  '\',  precede  it
 *        with another '\'.
 * Function returns 0 on success and 1 on error
 *--------------------------------------------------------------------------*/
static int config_Replace( const char *option, char *line, int lino )
{
    csrv.AddReplacement( line + strlen( option ) );
    return 0;
}
/*-------------------------------------------------------------------------*
 * Function for dealing with boolean arguments - they must be either "yes"
 * or "no" (case insensitive). Returns 0 on success and 1 on error.
 *-------------------------------------------------------------------------*/
static int config_boolean( const char *option, char *line, int lino, 
                           int *param )
{
    for ( line += strlen( option ); *line && isspace( *line ); line++ )
        /* empty */ ;
    if ( ! *line ||
         ( STRNCASECMP( line, "yes" ) && STRNCASECMP( line, "no" ) ) )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "or missing argument for %s\n", lino, option );
        return 1;
    }
    *param = ! STRNCASECMP( line, "yes" );
    return 0;
}
/*--------------------------------------------------------------------------*
 * Function for dealing with time arguments - they must be in a form
 * recognized by tstr2time_t(). Returns 0 on success and 1 on error.
 *--------------------------------------------------------------------------*/
static int config_time_val( const char *option, char *line, int lino,
                            time_t *param )
{
    if ( ( *param = tstr2time_t( line + strlen( option ) ) ) == BAD_DATE )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Invalid "
                 "argument for %s\n", lino, option );
        return 1;
    }
    return 0;
}
/*--------------------------------------------------------------------------*
 * Function for dealing with the argument for filter settings - it
 * must be a reasonable regexp. Returns 0 on success and 1 on error.
 *--------------------------------------------------------------------------*/
static int config_Filter( const char *option, char *line, int lino,
                          int filter_type, int reverse )
{
    char *lt;
    line += strlen( option );
    if ( ! ( line = GetToken( line, " \t", &lt ) ) )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Missing "
                 "argument(s) for %s\n", lino, option );
        return 1;
    }
    do
    {
        if ( AddFilter( line, filter_type, reverse ) )
            return 1;
    } while ( ( line = GetToken( 0, " \t", &lt ) ) );
    return 0;
}
/*--------------------------------------------------------------------------*
 * Function for dealing with string arguments, returns 0 on success and
 * 1 on error.
 *--------------------------------------------------------------------------*/
static char *config_charp_arg( const char *option, char *line, int lino )
{
    line += strlen( option );
    while ( *line && isspace( *line ) )
        line++;
    if ( ! *line )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Missing "
                 "argument for %s\n", lino, option );
        return 0;
    }
    return line;
}
/*--------------------------------------------------------------------------*
 * Function for dealing with integer arguments, returns 0 on success and
 * 1 on error.
 *--------------------------------------------------------------------------*/
static int config_int_arg( const char *option, char *line, int lino,
                           int *param )
{
    char *ep;
    long val;
    line += strlen( option );
    val = strtol( line, &ep, 10 );
    // Check if we got a number
    if ( line == ep )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Argument is "
                 "not a number for %s\n", lino, option );
        return 1;
    }
    // Check if the value can be stored in an integer
    if ( errno == ERANGE || val > INT_MAX || val < INT_MIN )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Argument is "
                 "too large for %s\n", lino, option );
        return 1;
    }
    // Check if there was junk after the number
    if ( ep && *ep != 0 )
    {
        sprintf( conf_err_str, "Error: in config file at line %d: Junk after "
                 "argument for %s\n", lino, option );
        return 1;
    }
    *param = ( int ) val;
    return 0;
}







Hosted Email Solutions

Invaluement Anti-Spam DNSBLs



Powered By FreeBSD   Powered By FreeBSD