Index Server was designed to be relatively free of administrative chores. In fact, because many administrative functions are handled automatically by Index Server, it can be used to support 24 hour operations, 7 days a weeka must for most Web and intranet sites. All this can be accomplished with minimal human intervention.
While Index Server performs many self-administrative tasks, site and system administrators must be aware of a number of issues. There are also a number of administrative functions that can be performed manually to either augment or override Index Server's self-administrative functions.
This chapter begins with an explanation of how to create and modify virtual roots, followed by a discussion of directory-scanning methods. Next, the chapter discusses and demonstrates how to develop administrative tools and utilities using query and administrative script files. The remainder of the chapter presents information about security issues and error detection and recovery, and covers several methods for monitoring Index Server's performance using tools you develop as well as using the functionality built into Windows NT.
As you have learned in previous chapters, the use of virtual roots plays a large part in how documents are scanned, filtered, and indexed by Index Server. Therefore, it is important to understand how to create virtual roots and set their directory properties. Fortunately, this is easy to accomplish using the IIS Internet Service Manager application as follows:
Figure 12.6. The entry for the newly created virtual root.
Note that virtual directories can also be added by using Microsoft FrontPage to create new webs on your site (provided you are using the FrontPage server extensions for IIS). For example, we used FrontPage's web-creation wizards to create a sample web (called fpweb) to be administered by IIS. Figure 12.7 demonstrates how the physical paths to directories created by FrontPage and their corresponding virtual paths (aliases) have been added to the list of those used by the WWW service. Any content added to this web will subsequently be scanned and indexed by Index Server.
Figure 12.7. Physical- and virtual-path entries for directories created by Microsoft FrontPage.
To maintain an up-to-date index that reflects the most recent state of document content and properties on your site, it is necessary to perform a periodic inventory of the corpus. Directory scanning is the process by which this inventory is performed.
During a scan, all the files and subdirectories pointed to by virtual roots are checked for files that have been modified and should be indexed. All readable virtual roots are indexed by default (and thus undergo scanning). However, indexing of virtual roots can be explicitly enabled or disabled. Enabling indexing on virtual roots will be covered later in this chapter.
Scanning is a process that is performed in the background. Therefore, queries can still be executed during a scan, but query execution may run more slowly and result sets won't necessarily include recently added or modified documents.
Index Server performs two types of directory scanning:
Both methods of scanning can be performed automatically. Index Server determines the appropriate type of scan to perform and does it without any human intervention. It is also possible to perform these scans manually (that is, to force the scan). Forcing scans will be covered later in this chapter.
Full scans occur when every document (in virtual roots to be indexed) is added to the list of documents that are to be filtered and subsequently added to the index. Documents are added to this list regardless of whether they have been changed or not. As a result, the index for the corpus is completely rebuilt.
Full scans occur under the following situations:
Unlike full scans, which add all documents to the list of files to be filtered, incremental scans target only those documents that have been modified since the last time they were filtered. Only these modified documents are added to the list of files to be filtered. As a result, incremental scans are typically much quicker and less resource intensive. The goal of incremental scans is to update the current index rather completely rebuild it.
Incremental scans occur under the following situations:
Figure 12.8. Default entry for the ForcedNetPathScanInterval registry entry, which is used to force incremental scans on remote shares without change notification.
Back in Chapter 7, "Internet Data Query Files," you were introduced to Internet Data Query (.idq) files. .idq files are used to help interpret, convert, and process queries received from HTML query forms. These .idq files provide a mechanism by which users can interact with HTML forms, enter query parameters, and pass variables that are ultimately used within the .idq file to establish the query parameters Index Server used to resolve the user's request.
So what does this have to do with .ida scripts? Well, .ida scripts are really nothing more than specialized instances of .idq files. Like .idq files, .ida scripts can utilize variables defined in and passed from HTML forms as well as specify a desired .htx report template file. The primary difference between .ida scripts and .idq files is that .ida scripts are geared toward administrative operations, whereas .idq files are used primarily to satisfy user queries. As such, .ida scripts provide a few additional parameters specific to administrative tasks.
.ida files allow administrators to define a variety of parameters, such as:
These parameters can be set directly within the script or passed from HTML forms and substituted within the scripts using the same %. . .% syntax employed by .idq files. The latter method of setting script parameters provides a powerful mechanism for administrators to develop their own customized administrative support tools that integrate the functionality of .idq files, .ida scripts, and .htx report templates.
Because some operations performed by .ida scripts can cause changes in the state of the index, administrative operations are restricted based on the Windows NT access-control list (ACL) settings. Security issues are covered in more detail in subsequent sections of this chapter.
Note that .ida script files should not be placed on any virtual root that points to a remote share named using the uniform naming convention (UNC).
.ida scripts are composed of a single section, which is used to set parameters for performing administrative functions. The general template of an .ida script is shown in the following:
#.ida script [Admin] entry for required report template parameter entries for other desired admin parameters #end of idq file
As previously stated, .ida scripts are used to support administrative requests. Variables defined in HTML forms can be passed (and used for substitutions within the script or "pass-throughs" to .htx files). Additionally, parameters can be explicitly set within the script. Table 12.1 lists the parameters available within .ida scripts.
Parameter Name | Parameter Description | Required? |
CiTemplate | Specifies the desired report template (.htx) file if the administrative operation is successful. See Chapter 8, "HTML Extension Files," for a detailed discussion on the use of HTML extension files. | YES |
CiAdminOperation | Used to specify the desired administrative operation. Valid values are GetState, ForceMerge, ScanRoots and UpdateRoots. GetState is the default if this parameter is not set in the script. | NO |
CiCatalog | Used to specify the location of the desired catalog. The value specified in the registry is the default if this parameter is not set in the script. | NO |
CiLocale | Used to specify the locale used for issuing queries and requests. Note that HTML locale encoding is supported. | NO |
Listing 12.1 shows a sample .ida script and demonstrates just how simple these scripts really are to develop. In this case, the script specifies that you want to obtain the current state of the Index Server and format the results with the specified .htx file.
Listing 12.1. A sample .ida script.
#sample .ida script [Admin] # Required report template parameter CiTemplate=/scripts/test_web/showstate.htx # Other parameters CiCatalog=D:\ CiAdminOperation=GetState # Use default locale #CiLocale=
You have seen that .idq and .ida files can be employed to develop powerful, easy-to-use form-based query and administrative applications. In this section, you will look at some of the administrative tasks that can be performed using these files. These include:
Sometimes files are not filtered properly due because they are corrupt or because there is a problem with a filter DLL in use. Administrators can obtain a listing of these files by invoking an .idq file that meets the following requirements:
For example, the following code snippet shows you how to create a form on an HTML page.
<FORM ACTION="/scripts/test_web/list_unfiltered.idq" METHOD="GET"> <INPUT TYPE="SUBMIT" VALUE="List Unfiltered Files"></TD> </FORM>
When submitted by clicking the pushbutton, the form in Listing 12.2 invokes the list_unfiltered.idq file.
Listing 12.2. This form invokes the list unfiltered.idq file.
[Names] Unfiltered (DBTYPE_BOOL) = 49691c90-7e17-101a-a91c-08002b2ecda9 7 [Query] CiScope=/ CiColumns=vpath, path, write # Restriction required to list unfiltered files CiRestriction=@unfiltered=TRUE CiTemplate=/scripts/test_web/list_unfiltered.htx CiMaxRecordsPerPage=100 # Don't assume index is up-to-date CiForceUseCi=FALSE
Any unfiltered files that exist in indexed virtual roots would be listed using the specified .htx file. This allows administrators to identify and fix problem files and perhaps trace problems with filter DLLs.
While the Unfiltered property is not restricted to use by administrators, the fact that it must be explicitly included in the [Names] section of the .idq file prevents users from executing this type of query.
The Unfiltered property cannot be used with other properties when making a query. For example, the following query is not valid
@Unfiltered=TRUE & @DocAuthor=Drew Kittel
Obtaining a list of virtual roots and the state of indexing on those roots is a simple matter of executing a special query using an .idq file and displaying the results with an .htx report template. Performing this special query requires the following:
Following is a brief example of how to implement a script that lists virtual roots and their state of indexing. . The following code snippet can be used to create a form with a single pushbutton in an HTML page. When this form is clicked, it invokes the listvroots.idq file.
<FORM ACTION="/scripts/test_web/listvroots.idq" METHOD="GET"> <INPUT TYPE="SUBMIT" VALUE="List Virtual Roots"></TD> </FORM>
The listvroots.idq file code, which satisfies the special query requirements outlined in the previous snippet, is shown in Listing 12.3.
Listing 12.3. The listvroots.idq file code.
[Names] # Query Metadata propset MetaVRootUsed(DBTYPE_BOOL, 1) = 624c9360-93d0-11cf-a787-00004c752752 2 MetaVRootAuto(DBTYPE_BOOL, 1) = 624c9360-93d0-11cf-a787-00004c752752 3 MetaVRootManual(DBTYPE_BOOL, 1) = 624c9360-93d0-11cf-a787-00004c752752 4 MetaPropertyGuid(DBTYPE_GUID, 36) = 624c9360-93d0-11cf-a787-00004c752752 5 MetaPropertyDispId(DBTYPE_I4, 1) = 624c9360-93d0-11cf-a787-00004c752752 6 MetaPropertyName(DBTYPE_WSTR, 15) = 624c9360-93d0-11cf-a787-00004c752752 7 [Query] # CiCatalog=D:\ # Columns to use in the .htx file CiColumns=vpath, path, metavrootused, metavrootauto, metavrootmanual # Restriction gets all virtual roots CiRestriction=#vpath * CiMaxRecordsInResultSet=20 CiMaxRecordsPerPage=20 # Indicte this is a special query CiScope=VIRTUAL_ROOTS # .htx file CiTemplate=/scripts/test_web/listvroots.htx # Don't assume the index is up-to-date CiForceUseCi=FALSE
The detail section of the listvroots.htx file is shown in the following code snippet:
<%begindetail%> <INPUT TYPE="HIDDEN" NAME="PROOT_<%vpath%>" VALUE="<%path%>"> <tr> <td><%vpath%></td> <td><%path%></td> <td><%if metavrootused ne 0%>YES<%else%>NO<%endif%> </tr> <%enddetail%>
This code results in query results being formatted into a table as seen in Figure 12.9. Note that conditional logic is used in the detail section to check the value of the metavrootused property. If the value is non-zero (TRUE), the virtual root is currently indexed and YES is displayed for that root.
Figure 12.9. Using a special query to obtain a list of virtual roots.
There may come a time when you need to explicitly disable or enable indexing on a virtual root or roots. To do so, the following is required:
After changes have been submitted to Index Server, the new virtual-root information is compared with the existing virtual-root information. If the indexing state for the new root matches the existing indexing state, nothing is done. However, if they do not match, changes are made to the index as appropriate. Several changes can take place:
Following is a brief example of how virtual root enabling/disabling can be implemented. In this case, a single pushbutton HTML form (enablevroots.htm) invokes enablevroots.idq, which uses an .htx template to list virtual roots in a manner similar to the listvroots example. In this case, the enablevroots.htx file is used to list the virtual roots and to build and present a secondary form. For each virtual root listed, the detail section of the .htx file creates a hidden variable using NAME=PROOT_<%vpath%> and VALUE=<%path%>, and creates a check-box variable using NAME=INDEX_<%vpath%>. Conditional logic is used to determine whether the box should be checked when initially displayed. If the virtual root has indexing enabled, the box is checked. Finally, a pushbutton has been added, which invokes the .ida script enablevroots.ida when the form is submitted.
The pertinent code from the enablevroots.htx file is as follows:
<form action="/scripts/test_web/enablevroots.ida" METHOD=GET> <table> <tr> <th width=150 align="left">Virtual Root</th> <th width=150 align="left">Physical Root</th> <th width=150 align="left">Currently Indexed?</th> </tr> <%begindetail%> <INPUT TYPE="HIDDEN" NAME="PROOT_<%vpath%>" VALUE="<%path%>"> <tr> <td><%vpath%></td> <td><%path%></td> <td align="center"><input type=checkbox <%if metavrootused ne 0%> checked <%endif%> NAME="INDEX_<%vpath%>"></td> </tr> <%enddetail%> </table> <input type="submit" value="Update Root Indexing"> </form>
The results of this code are shown in Figure 12.10.
Figure 12.10. .htx form used to enable/disable virtual roots. In this case, indexing for all virtual roots is enabled except for /e_books, which was previously indexed but has had indexing disabled by the administrator.
Indexing of virtual roots is enabled (disabled) by checking (un-checking) the boxes for each respective root and then clicking the pushbutton to submit the form. When the form is submitted, PROOT_virtual_root name/value pairs are submitted for all roots, however, INDEX_virtual_root variables are only submitted for those virtual roots selected to be indexed, or in other words, for those that have their check box checked. If a box is left unchecked, no INDEX_virtual_root variable is submitted and the virtual root's indexing is disabled. The code for the enablevroots.ida script invoked is shown in Listing 12.4.
Listing 12.4. The code for the enablevroots.ida script.
[Admin] CiTemplate=/test_web/enablevroots.htm CiAdminOperation=UpdateRoots
This script specifies that the special variables sent from the form should be used to update the virtual roots used by Index Server to build the index.
There may occasionally be times when you want to force full or incremental scans on specific virtual roots. For example, installation of a new filter DLL might require virtual roots to be fully scanned, or an incremental scan might be forced on a remote share (one that does not support change notifications) if you decide not to wait for the automatic periodic scan. Forcing virtual-root scans requires the following:
Following is a brief example of how forced scans of virtual roots can be implemented. In this case, you use a single pushbutton HTML form (scanvroots.htm) to invoke scanvroots.idq, which uses an .htx template to list virtual roots in a manner similar to the enablevroots example in the previous section. In this case, the scanvroots.htx file is used to list the virtual roots and to build and present a secondary form. Conditonal logic is used to list only those virtual roots with indexing enabled. For each indexed virtual root listed, the detail section of the .htx file creates a hidden variable using NAME=PROOT_<%vpath%> and VALUE=<%path%>, and creates a set of radio buttons (using NAME=SCAN_<%vpath%>) and selections (with values of NoScan, IncrementalScan, and FullScan representing the types of scans that can be performed). A pushbutton has been added, which invokes the .ida script scanvroots.ida when the form is submitted.
The pertinent code from the scanvroots.htx file is as follows:
<form action="/scripts/test_web/scanvroots.ida" METHOD=GET> <table> <tr> <th width=150 align="left">Virtual Root</th> <th width=150 align="left">Physical Root</th> <th colpan=3 align="left">Type of Scan</th> </tr> <%begindetail%> <INPUT TYPE="HIDDEN" NAME="PROOT_<%vpath%>" VALUE="<%path%>"> <%if metavrootused ne 0%> <tr> <td><%vpath%></td> <td><%path%></td> <td><input type=radio checked NAME="SCAN_<%vpath%>" value="NoScan">None</td> <td><input type=radio NAME="SCAN_<%vpath%>" value="IncrementalScan">Incremental</td> <td><input type=radio NAME="SCAN_<%vpath%>" value="FullScan">Full</td> </tr> <%endif%> <%enddetail%> </table> <input type="submit" value="Scan Roots"> </form>
The results of this .htx file are shown in Figure 12.11.
Figure 12.11. .htx form used to force scanning of virtual roots.
You can force scanning on a listed virtual root by clicking the type of scanning you prefer and then clicking the pushbutton to submit the form. When the form is submitted, PROOT_virtual_root and SCAN_virtual_root name/value pairs are submitted for all roots. The code for the scanvroots.ida script invoked is shown in Listing 12.5.
Listing 12.5. The code for the invoked scanvroots.ida script.
[Admin] CiTemplate=/test_web/scanvroots.htm CiAdminOperation=ScanRoots
This script specifies that the special variables sent from the form should be used to force Index Server to perform the specified type of scan for each virtual root.
Typically, Index Server automatically performs master merges at the time specified by the registry entry MasterMergeTime or when the number of documents that have changed since the last merge exceeds the registry entry MaxFreshCount. To maintain a site that provides optimal query response, you might decide to force a master merge before Index Server institutes one automatically. Doing so consolidates any existing shadow indexes and the current master index into a single master, frees resources, removes redundant data from the indexes, and results in improved query performance and response.
Master-merge operations can be extremely CPU intensive and slow query response on your system for some period of time while the merge occurs. This is especially true for large document corpuses, which incur large numbers of changes.
Forcing a master merge only requires that an .ida script be invoked and that the CiAdminOperation parameter be set to ForceMerge. For example, the following code snippet illustrates how an HTML form can be created.
<FORM ACTION="/scripts/test_web/forcemerge.ida" METHOD="GET"> <INPUT TYPE="SUBMIT" VALUE="Force Master Merge"></TD> </FORM>
This form invokes the forcemerge.ida script when the pushbutton is clicked and the form is submitted. The code in the forcemerge.ida script is shown in Listing 12.6.
Listing 12.6. The code in the forcemerge.ida script.
[Admin] CiTemplate=/test_web/forcemerge.htm CiAdminOperation=ForceMerge
After the merge has commenced, its progress can be monitored using other .ida scripts, which you can develop, or by using the Window NT Performance Monitor. These methods are discussed in the next section.
As previously stated, Index Server was designed to ease the administrative burden placed upon system and Web administrators. In does so by automatically performing many common administrative tasks such as scanning virtual roots, forcing merges, and recovering from many error conditions. There are, however, always circumstances and situations in which the system or Web administrator needs the ability to personally monitor the state and performance of Index Server. For example, you, as an administrator, might be interested in monitoring the following:
This and other information about the use and performance of Index Server can help you make decisions regarding:
Such decisions can ultimately profoundly affect how well your site services user requests as well as the level of traffic you can expect to adequately handle in the future.
There a number of ways in which the state and performance of Index Server can be monitored, including
A wealth of information about the state of Index Server is available by running an .ida script and specifying an .htx file that formats and displays the value of some special variables. To obtain this information, the .ida script must include the following lines of code:
CiAdminOperation=GetState CiTemplate=path_to_desired_report_template
These lines of code simply tell Index Server to retrieve information about its current state and to make this information available in the form of special state variables, which are displayed using the specified .htx template file. A variety of variables are available. Most of these correspond with fields that are also available in the NT Performance Content Index and HTTP Content Index objects. Use of the NT Performance Monitor is covered later in this chapter. Available state variables are listed with descriptions in Tables 12.2 and 12.3.
.htx variable | Corresponding NT Performance Monitor field | Variable description |
CiAdminCacheActive | Active queries | Indicates the number of queries currently being executed. |
CiAdminCache Count | Cache items | Indicates the number of queries cached. |
CiAdminCacheHits | % Cache hits | Indicates the percentage of HTTP requests that use an existing cached query. |
CiAdminCacheMisses | % Cache misses | Indicates the percentage of HTTP requests that execute a new query. |
CiAdminCachePending | Current requests queued | Indicates the number of pending queries that await execution. |
CiAdminCache Rate | Queries per minute | Indicates the rate at which queries are being serviced. |
CiAdminCacheRejected | Total requests rejected | Indicates the number of queries that were rejected because the query engine was too busy. |
CiAdminCacheTotal | Total queries | Indicates the number of queries that have been executed since the Web server was started. |
.htx variable | Corresponding NT Performance Monitor field | Variable description |
CiAdminIndexCountDeltas | Not applicable | Indicates the number of documents that have been indexed or deleted since the last occurred. |
CiAdminIndexCountFiltered# | Documents filtered | Indicates the number of documents that have been filtered since Index Server was started. |
CiAdminIndexCountPersIndex | Persistent indexes | Indicates the total number of shadow indexes and master indexes in the catalog. |
CiAdminIndexCountQueries | Running queries | This is a count of queries that have open cursors against the catalog. This count can differ from the number of active queries in the cache because 1) some cached queries may be enumerated (that is, non-indexed) and 2) some quiescent cached queries may still have cursors open. |
CiAdminIndexCountToFilter | Files to be filtered | Indicates the number of documents that have been added or modified since the last time they were filtered and thus require filtering. |
CiAdminIndexCountTotal | Total # documents | This indicates the total number of documents in the catalog. |
CiAdminIndexCountUnique | Unique Keys | This indicates the number of unique words (keys) in the catalog. This count is only updated after the completion of master-merge operations. |
CiAdminIndexCountWordlists | Word list | Indicates the current number of temporary word lists in the catalog. |
CiAdminIndexMergeProgress | Merge progress | Indicates the status of current merge operations. The current merge type can be ascertained by checking the value of CiAdminIndexStateAnnealingMerge,CiAdminIndexStateMasterMerge, or CiAdminIndexStateShadow Merge. A value of 100% indicates a completed merge. |
CiAdminIndexSize | Index size (in megabytes) | Indicates the size of the index and includes temporary word lists in memory as well as persistent shadow and master indexes on disk. This value does not, however, include the property cache. |
CiAdminIndexCountPendingScans | Not applicable | Indicates the number of directories remaining to be scanned. |
CiAdminIndexStateScanning | Not applicable | A value of TRUE indicates a directory scan is in progress. |
CiAdminIndexStateRecovering | Not applicable | A value of TRUE indicates that an index-recovery operation is in progress. |
CiAdminIndexStateAnnealingMerge | Not applicable | A value of TRUE indicates that an annealing is in progress. |
CiAdminIndexStateMasterMerge | Not applicable | A value of TRUE indicates that a master is in progress. |
CiAdminIndexStateScanRequired | Not applicable | A value of TRUE indicates that thecatalog needs to be rebuilt. Index Server will automatically perform a rebuild when appropriate. |
CiAdminIndexStateShadowMerge | Not applicable | A value of TRUE indicates that a shadow is in progress. |
Using .ida scripts to obtain the Index Server state yields a "snapshot" of the server's state at a moment in time. Therefore, it if you want manually to monitor the progression of server activities, you must run the script numerous times to display refreshed variable values.
While the need to perform a manual refresh of the server state might seem disadvantageous, using .ida scripts does provide a couple of distinct advantages:
Let's look at a brief example of using .ida scripts and .htx files to get a glimpse of the state of Index Server. Listing 12.7 illustrates a minimal HTML form used to invoke an .ida script that consists of a single pushbutton. Clicking the pushbutton on this form results in the code shown in Listing 12.8.
Listing 12.7. A minimal HTML form consisting of a single pushbutton.
<HTML> <HEAD> <TITLE>Get Server State</TITLE> </HEAD> <BODY> <FORM ACTION="/scripts/test_web/showstate.ida" METHOD="GET"> <INPUT TYPE="SUBMIT" VALUE="Show Server State"></TD> </FORM> </BODY> </HTML>
Listing 12.8. Clicking the pushbutton on the form in Listing 12.7 invokes this .ida script.
[Admin] # Required report template parameter CiTemplate=/scripts/test_web/showstate.htx
This script simply specifies which report template to use. The administrative operation performed is set to GetState by default. Listing 12.9 shows the code for the indicated report template, showstate.htx.
Listing 12.9. The code for the indicated report template, showstate.htx.
<html> <head> <title>Index Server Stats</title> </head> <body> <em>Documents in Catalog <%CiCatalog%></em>: <%CiAdminIndexCountTotal%><br> <em>Size of Index (MB)</em>: <%CiAdminIndexSize%><br> <em>Date/Time of Query</em>: <%CiQueryDate%> <%CiQueryTime%> </body> </html>
The results of this minimal report template can be seen in Figure 12.12. The template simply substitutes a couple of variables values (from the previously presented tables), then formats and displays the number of documents in the index, the index size, and the date/time the administrative request was made.
Figure 12.12 Results of an .ida script that obtains the current state of Index Server
In the previous section, you learned how to develop .ida scripts for monitoring Index Server. The Windows NT Performance Monitor also provides a nice method for monitoring Index Server. Using the Performance Monitor rather than .ida scripts allows the automatic refreshment of information, the presentation state information in the form of graphs and charts, and the ability to log information.
NT Performance Monitor provides you with the ability to select the information you want to display from fields in an expansive list of performance objects. You can then customize how the information is presented and save desired configurations as "chart" files, which can then be re-used whenever you want to monitor Index Server. By selecting desired fields within the Performance Monitor Content Index and HTTP Content Index objects, you can monitor almost exactly the same information as when you use .ida scripts and the previously presented tables of .htx file variables.
The following steps detail how to set up and save a Performance Monitor configuration for monitoring Index Server:
Figure 12.17. Making selections in the Chart Options dialog
Figure 12.18. Customized histogram chart used to monitor Index Server during scan and merge operations.
Once configured, customized charts can be saved to a file and used at a later time. To save a custom chart configuration, click File on the main menu bar and select Save Chart Settings As. You will be presented with a file browser dialog allowing you to name the chart file (the default extension is .pmc) and store it in a desired location. This capability is handy because several chart files can be constructed at one time. More importantly, each chart can be customized to meet your specific needs and requirements. For example, specific charts can be constructed to monitor the instantaneous load on the query engine (that is, the running and queued queries), %CPU in use, merge operations and status, index attributes (size, documents, filtering completed and in-progress), and so on. These files can then be recalled at a later time by clicking File on the main menu and selecting Open. The file browser dialog can then be used to load the desired chart file and render specific charts for monitoring Index Server.
System and Web administrators should periodically review the Windows NT Event Log entries as part of their everyday activities. The Windows NT Event Viewer allows you to review system-, security-, and application-event messages written to the log. This allows you to keep an eye on overall system health, and can provide additional insights, foreshadow impending system failures, serve as an audit trail of events that lead to significant failures, and provide status information about subsequent recovery attempts.
Index Server system errors and other pertinent events are reported in the Windows NT application event log under the CiFilterService category. System errors and events that are logged include:
A complete listing of Index Server messages written to the Windows NT application event log can be found in Appendix B, "Internet Data Query (.idq) File and HTML Extension (.htx) File Variables."
To following steps detail how to use the NT Event Viewer to view Index Server-related events:
Figure 12.21. Using the Event Viewer to show details of Index Server-specific events. The event-log message indicates that indexing has been initiated on d:\catalog.wci.
Figure 12.22. Using the Event Viewer to show details of Index Server-specific events. The event-log message indicates that a master merge on d:\catalog.wci has completed.
All queries to Index Server are logged through the standard IIS HTTP logging mechanism. IIS provides the capability to log information about HTTP requests to files on a daily, weekly or monthly basis. IIS also provides the capability to log this information directly to ODBC data sources.
IIS logging can provide you with important information such as:
Figure 12.23 illustrates Index Server query entries to an IIS daily log file.
Figure 12.23. Index Server query entries in an IIS log file. The highlighted entry shows invocation of a .ida script along with the query parameters sent.
This information allows you to infer the peak query times as well as which queries are candidates for optimization. It also helps you determine how the system might be tuned to improve overall performance.
See your IIS documentation for more details about logging to files and ODBC data sources.
Advanced administrators can use IIS ODBC logging to augment administrative utilities they have developed using HTML forms, .idq files and .ida scripts, and .htx report templates. IIS provides a mechanism called the Internet Database Connector (IDC), which utilities .idc files (which are very similar in concept to .idq files) to query ODBC data sources. The now familiar .htx report template file is then used to format and display results as HTML pages.
Rather than reviewing log files, you can develop .idc applications to extract specific Index Server query information logged to an ODBC data source. Because .idc applications can be invoked from HTML forms, they can be directly integrated with your .idq and .ida applications and invoked from the same HTML pages.
Please consult the online IIS documentation for more detailed information about the use of IDC.
Index Server automatically detects and attempts to recover from several types of errors. Only two situations require significant human involvement: hardware failures and insufficient disk space on the catalog drive.
Following are some of the error conditions detected by Index Server and the resulting automatic recovery operations attempted (when possible):
There is no way to stop only Index Server. It is only stopped when Internet Information Server or Peer Web Services is stopped. Also, Index Server is not started when IIS or PWS is started. Instead, a query must be issued against the particular catalog to restart indexing.
When automatic recoveries occur, check event-log messages to make sure that the appropriate virtual roots are enabled for indexing. If some virtual roots previously had indexing disabled, you will need to explicitly disable indexing on these roots again.
The fact that security is covered in the final sections of this chapter is not meant to imply that security issues regarding Index Server are not important. In fact, the opposite is true. Security is a major concern for all system and Web administrators. This is especially true for sites connected to the Internet and/or sites that contain sensitive information, some of which may be selectively available to privileged users. Safeguarding system security, ensuring data privacy, and maintaining the integrity of content on a site is of vital importance.
This section briefly discusses security issues as they relate to Index Server. Because Index Server is so tightly integrated with Internet Information Server and Windows NT security, we highly recommend that you read the system documentation for these products.
The following sections cover security issues such as:
Index Server is tightly integrated with NT security, and can thus take advantage of NT security features such as access control lists (ACLs). To take advantage of NT security, any information to be published, indexed, and queried by Index Server should be placed on an NTFS storage volume to take full advantage of NTFS security features. These features are NOT available for files stored on FAT-formatted volumes.
It is extremely important that appropriate ACLs be placed on the following files as well as in the directories in which these files reside:
Securing the initial query form is not enough. If .idq, .ida, and .htx files are not adequately protected with appropriate file and directory ACLs, sophisticated users can bypass the query form and access these other files directly. The results can be especially damaging if malevolent users are able to gain access to .ida administrative scripts.
The authentication of Index Server clients depends on IIS authentication methods. Implementing some form of client authentication is often the easiest way to place some form of access control on HTML forms used to issue queries against Index Server. Three forms of authentication are supported by IIS:
Catalog and registry files should also be placed on NTFS volumes and have should ACL entries set to ensure that only the administrator has the ability to peer into or modify these files. When Index Server is installed, the ACL for the catalog is set up to allow access only by system administrators and Windows NT system services. This is done for the following reasons:
If multiple catalog configurations are used, take great care in ensuring that these additional catalogs and their index files are given the appropriate level of access control. Catalog directories should be given access for administrators and for the system account.
Access-control information for all indexed documents is placed in the catalog and checked against a user's permissions when he submits a query. Any document to which a user does not have access will NOT be reported as part of the result set returned to the user. To these unauthorized users, it appears as though the documents do not exist. Additionally, document audit records can be generated as follows:
Be aware that when using Index Server to index and access documents on a virtual root pointing to a remote share, access to the virtual root and any documents in it is controlled by the account configured to access the remote share. Any documents that are accessible by the account will be indexed and are thus available to any client that connects to IIS and Index Server. Access checks against query results are explicitly disabled in this case. This may allow unintended access to some documents on the remote share, so be cognizant of the access privileges held by the account that performs the remote share access.
If you configure Index Server to index remote virtual roots, documents on the remote root may be disclosed to unauthorized clients unintentionally. This can occur because the username/password that the Web administrator specified in the Internet Service Manager Directories property sheet is used for all access to the remote share.
Because certain Index Server administrative functions can be performed over the Web, it is necessary to adequately control all administrative accesses. Index Server administrative operations are control by the ACL that is placed on the following registry key:
HKEY_LOCAL_MACHINE \System \CurrentControlSet \Control \ContentIndex
You should therefore be certain that the ACL on this key is appropriately set.
As previously mentioned, you should also place appropriate additional ACL access controls on .ida, .idq, and .htx files used for the administration of your site.
You certainly covered a lot of ground in this chapter! Hopefully, it has given you a much greater understanding of issues pertaining to Index Server administration. In this chapter, you learned how to create virtual roots, and you learned about the types of directory scanning Index Server performs. You also discovered the power of .ida scripts and how they can be used in conjunction with knowledge of .idq and .htx files to create your own customized administrative tools and utilities. After that, you were introduced to the variety of methods that can be employed to monitor the state and performance of Index Server. Finally, you learned about Index Server error detection and recovery as well as a variety of security issues of which you should be aware as an administrator.