MACROMEDIA COLDFUSION 4.5-ADMINISTRING COLDFUSION SERVER Technical Information Page 140

  • Download
  • Add to my manuals
  • Print
  • Page
    / 174
  • Table of contents
  • TROUBLESHOOTING
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 139
135
CONFIGURING AND ADMINISTERING COLDFUSION 9
Indexing Collections with Verity Spider
Last updated 2/21/2012
The default value is unlimited. If you see large numbers of documents in a collection where you do not expect them,
consider experimenting with this option, with the Content options, to pare down your collection.
-nodocrobo
Specifies to ignore ROBOT META tag directives.
In HTML 3.0 and earlier, robot directives could only be given as the file robots.txt under the root directory of a website.
In HTML 4.0, every document can have robot directives embedded in the META field. Use this option to ignore them.
Use this option with discretion.
-nofollow
Type
Web crawling only
Syntax
-nofollow "exp"
Specifies that Verity Spider cannot follow any URLs that match the exp expression. If you do not specify an exp value
for the
-nofollow option, Verity Spider assumes a value of "*", where no documents are followed.
You can use wildcard expressions, where the asterisk (*) is for text strings and the question mark (?) is for single
characters. Always encapsulate the exp values in double-quotation marks to ensure that they are properly interpreted.
If you use backslashes, double them so that they are properly escaped; for example:
C:\\test\docs\path
To use regular expressions, also specify the “-regexp” on page 128 option.
Earlier versions of Verity Spider did not allow the use of an expression. This meant that for each starting point URL,
only the first document would be indexed. With the addition of the expression functionality, you can now selectively
skip URLs, even within documents.
See also
-regexp” on page 128
-norobo
Type
Web crawling only
Specifies to ignore any robots.txt files encountered. The robots.txt file is used on many websites to specify what parts
of the site indexers should avoid. The default is to honor any robots.txt files.
If you are reindexing a site and the robots.txt file has changed, Verity Spider deletes documents that have been newly
disallowed by the robots.txt file.
Use this option with discretion and extreme care, especially with the “-cgiok” on page 133 option.
See also
-nodocrobo” on page 135.
Page view 139
1 2 ... 135 136 137 138 139 140 141 142 143 144 145 ... 173 174

Comments to this Manuals

No comments