From: Kir Kolyshkin (no email)
Date: Wed Aug 07 2002 - 05:35:07 EDT
Radosław Maciaszek wrote:
>
> Kir Kolyshkin wrote:
>
> >More correct variant is:
> >
> >Disallow http://podatki\.interia\.pl/urzedy/
> >
> >because this is a regular expression, and . means "any char",
> >so if you want '.' exactly, you should write it as '\.'
> >
> >In other words, first expression also disables
> >http://podatki-interia.pl/urzedy/, while second does not.
> >
> Yess! :))) Thanx for help. I really forgot 'Indexing Scope' section of
> manual.
>
> I thought about process of indexing. Could it work this way: index
> process download document,
> exctract url-s from this document, not indexing this one document but
> index url-s found in it?
> It is for indexing some small sites other search engines. I don't wont
> search page results in my database but
> stes from results search pages yes. It is possible?
Yes, there are also several ways.
1). If you are webmaster, you can surround the text you do
not want to index with <!--noindex--> ... <!--/noindex-->
tags, so everything between these tags will not be indexed.
Tags <!--htdig_noindex--> and <!--UdmComment--> (and their
closing counterparts) are also supported for compatibility
with ht://Dig and MnogoSearch.
[Oops, I have just found (looking at the code) that closing
tag <!--/UdmComment--> is not processed correctly, so don't
use it for now; I have just fixed it in CVS, it will be
available in both 1.2.11 and 1.3.0.]
2). You can also use <META NAME="robots" CONTENT="noindex">
in the HEAD section of HTML page. For more details see
http://www.robotstxt.org/wc/meta-user.html
3). If you are _not_ webmaster of the site you index, you can
set "Index no" for one particular server, say:
Index no
Server http://www.bestlinks.net/
Index yes
Server http://www.myserver.com/
So words from www.bestlinks.net will not be put into database,
only links from that pages will be found and stored.
-- ICQ UIN 7551596 Phone +7 903 6722750 --
Guinness a Day Keeps a Doctor Away (people's wisdom)
|
|
|