Google logo
Google Search Appliance Documentation

Managing Search for Controlled-Access Content
PDF Previous Next
Use Cases with Public and Secure Serve for Multiple Authentication Mechanisms

Use Cases with Public and Secure Serve for Multiple Authentication Mechanisms

This section provides more detailed explanation of how to set up crawl for controlled-access content and how to set up the Google Search Appliance to centralize serve-time authentication.

Back to top

Use Case 1: HTTP Basic or NTLM HTTP Controlled-Access Content with Public Serve

The ABC Company wants to make its controlled-access content discoverable using intranet search. The content is stored on these internal servers:

events.abc.int is a simple web server that uses HTTP Basic authentication. This server contains information about internal company events.
announce.abc.int is a Microsoft IIS web server that uses Integrated Windows Authentication over NTLM HTTP. This server contains announcements for employees.
directory.abc.int is another Microsoft IIS server. This server provides phone and office location information about employees. For the purpose of this example, let’s suppose that content from this server is best provided by a web feed.

All these servers are located on the same domain, abc_corp. Although authentication is required by each of these servers, this information isn’t sensitive. ABC Company wants to serve the snippet results as public content, viewable by any employee. There is no reason to require the search appliance to perform document-level authentication when serving results.

ABC Company has these people who interact with this content:

Setting up Crawl and Index

First, the system administrator creates a user account for the search appliance, called ABCsearch, and sets up access policies that ensure that the ABCsearch user account is authorized to view all files on events.abc.int, and announce.abc.int. The feed process on directory.abc.int has its own account with similar permissions, called ABCfeeder.

Next, the search appliance administrator logs into the Admin Console and performs these actions:

1.
To provide the search appliance with credentials for crawl and index, Sandra opens Content Sources > Web Crawl > Secure Crawl > Crawler Access, and adds rows using the account names and passwords given to her by the system administrator:

 

https://events.abc.int/

ABCsearch

******

******

X

https://announce.abc.int/

ABCsearch

abc_corp

******

******

X

https://directory.abc.int/

ABCfeeder

abc_corp

******

******

X

Here, omitting the domain for events.abc.int instructs the search appliance to authenticate using HTTP Basic. For all other servers in this example, the domain entry tells the search appliance to authenticate against a Microsoft IIS Server using NTLM HTTP.
Because Basic Authentication sends credentials as base-64 encoded clear text, the patterns for events.abc.int all use HTTPS, which protects user names and passwords. Although the use of HTTPS is recommended for Basic Authentication, the search appliance can also authenticate over HTTP. Make Public is selected for all URL patterns.
2.
Under Content Sources > Web Crawl > Start and Block URLs, Sandra clicks Add under Start URLs and adds the URL patterns "https://events.abc.int/" and "https://announce.abc.int/".
3.
Sandra also adds the URL patterns "https://events.abc.int/", "https://announce.abc.int/", and "https://directory.abc.int/" under Follow Patterns.
4.
Finally, she clicks Save to save the changes.
5.
She pushes a web feed to the appliance that includes the URLs from directory.abc.int, using the following syntax:
<record url="http://directory.abc.int/" authmethod="ntlm">
Because the record has authmethod=ntlm, the search appliance attempts to authenticate using NTLM HTTP when crawling this content.

Now that the search appliance has access to all of ABC Company’s press releases, the search appliance administrator starts the crawl and waits for the controlled-access content to appear in the index.

Populating the Index for Controlled-Access Content

During crawl, the search appliance goes through each of the content sources that have been configured, and uses the credentials under Crawler Access to obtain the controlled-access content.

The search appliance can use multiple protocols to crawl and index controlled-access content.

The search appliance connects to events.abc.int over HTTPS. The web server asks for credentials using HTTP Basic Authentication: the search appliance provides the username “ABCsearch” and the password entered in the Admin Console. The web server verifies that ABCsearch has access to view documents on events.abc.int. The search appliance crawls through all documents on events.abc.int and adds them to the index.
The search appliance connects to announce.abc.int over HTTPS. The Microsoft IIS server asks for credentials using Windows Authentication: the search appliance provides an NTLM HTTP message that contains the username “ABCsearch” and a response based on the password entered in the Admin Console. The IIS server verifies that ABCsearch has access to view documents on announce.abc.int. The search appliance crawls through all documents on announce.abc.int and adds them to the index.
The search appliance receives a web feed that directs it to directory.abc.int with authmethod=ntlm. It connects to directory.abc.int over HTTPS. The Microsoft IIS server asks for credentials using Windows Authentication: the search appliance provides an NTLM HTTP message that contains the username “ABCfeeder” and a response based on the password entered in the Admin Console. The IIS server verifies that ABCfeeder has access to view documents on directory.abc.int. The search appliance crawls through all documents on directory.abc.int and adds them to the index.

Serving Controlled-Access Content to the User as Public Content

ABC Company has decided to make the search results public: the events, announce, and directory servers control access to their content, but employees can discover the information they need by performing a search query.

Eric is an employee of ABC Company. He wants to find an announcement about a colleague’s recent promotion to Director. Eric opens the search page in a web browser and enters a query about “Maria Jones director”. The search appliance performs the following steps before sending Eric to the search results page:

1.
The search appliance checks to see whether any of the content sources require authorization. Although the search appliance had to provide credentials to index the content, the Make Public? checkbox is selected for all of ABC Company’s content sources. All content in the index is labeled as public: no authorization check is required.
3.
Eric sees search results from events.abc.int, announce.abc.int, and directory.abc.int that match the query “Maria Jones director”. For instance, Eric finds an all-hands meeting that Maria scheduled from events, a notice about her promotion from announce, and her office phone number and location from directory.

When Eric clicks on one of the links in the search results page, the server that hosts the page requests a response that includes an authentication header. If Eric hasn’t logged in elsewhere, he’ll have to enter a username and password on a login form. Although the search appliance indexed the content as “public,” the server still requires credentials before it displays the full document.

The next time that Eric clicks a link on his search results page, however, his browser forwards an authentication header based on his user name and password to the server. If all the servers in this example are on the same domain and accept the same credentials, Eric shouldn’t have to log in again for as long as he keeps the browser open and the session time hasn’t expired.

Back to top

Use Case 2: One Set of Credentials for Multiple Authentication Mechanisms

AlphaLyon is a multi-national corporation that has various different content servers that use different authentication mechanisms.

http://insidealpha.com is the URL for content protected by a single sign-on (SSO) server.
apacheserver.alphainside.com is a server for content protected by a custom apache script that uses cookies from the SSO system.
comp.alpha.int is a simple web server that uses HTTP Basic authentication. This server hosts some personnel information from North America.
pers.def.int is a Microsoft IIS web server that uses NTLM v2 HTTP. This server hosts global personnel information, excluding North America.
AlphaLCM is a connector manager with one connector instance that is used to traverse and index information (including some global personnel information) from AlphaLyon’s Documentum content management system.

There is a single corporate-wide set of credentials for each employee.

Currently, when employees search for protected personnel information, they are prompted for their credentials by each authentication mechanism separately. AlphaLyon’s Information Technology department has set an objective to centralize serve-time authentication for the various servers hosting personnel information. This way, users need to provide their credentials only once for content protected by several authentication mechanisms.

AlphaLyon has these people who interact with this content:

This use case is based on the assumption that Tanya has added a connector for Documentum and the content from the CMS has been traversed and fed into the search appliance. For information about adding connectors, see Introducing Connectors.

Setting Up Crawl and Index

Ashish, the system administrator creates a user account for the search appliance, called ALSearch, and sets up access policies that ensure that the ALSearch user account is authorized to view all files on comp.alpha.int, and pers.def.int.

Next, Tanya sets up crawl and index of the controlled-access content by performing the following steps:

1.
To provide the search appliance with credentials for crawling and indexing comp.alpha.int, which is protected by HTTP Basic Authentication, and pers.def.int, which uses NTLM HTTP, Tanya opens Content Sources > Web Crawl > Secure Crawl > Crawler Access.

 

http://comp.alpha.int/

ALSearch

*******

*******

https://pers.def.int/

ALSearch

aphalyon_corp

*******

*******

Tanya uses the account name and password for ALSearch that was provided by Ashish, the system administrator. Note that, for http://comp.alpha.int/, the In Domain text box is cleared. This cleared checkbox instructs the search appliance to authenticate using HTTP Basic. For http://pers.def.int/, Tanya supplies the domain, which tells the search appliance to authenticate against the server using NTLM HTTP.
The Make Public checkbox is also cleared. The search appliance has full access to the server, but labels any results from them as “secure” and requires authentication and authorization checks before displaying secure content in the search results.
3.
4.
Next, Tanya needs to provide the search appliance with credentials for crawling and indexing content protected by single sign-on systems (http://insidealpha.com and apacheserver.alphainside.com), so she opens Content Sources > Web Crawl > Secure Crawl > Forms Authentication.
5.
In the Sample Forms Authentication protected URL box, Tanya enters http://insidealpha.com/inside.html.
6.
In the URL Pattern for this rule box, Tanya enters http://insidealpha.com/ and clicks Create a New Forms Authentication Rule.
The search appliance stores the rule for use in crawl for all content under http://insidealpha.com/. When a cookie expires, the search appliance uses the stored crawler account to request a new session cookie.
8.
Next, Tanya uses the Content Sources > Web Crawl > Secure Crawl > Forms Authentication page to add credentials for crawling and indexing apacheserver.alphainside.com. In the Sample Forms Authentication protected URL box, Tanya enters apacheserver.alphainside.com/alphainsider.html.
9.
In the URL Pattern for this rule box, Tanya enters apacheserver.alphainside.com/ and clicks Create.
The search appliance stores the rule for use in crawl for all content under apacheserver.alphainside.com/. When a cookie expires, the search appliance uses the stored crawler account to request a new session cookie.
12.
13.
Tanya clicks Add under Start URLs and adds the following URL patterns:
14.
Tanya also adds these URL patterns in the Follow Patterns box and clicks Save.
15.
To check that the crawling system is currently running, Tanya opens Content Sources > Diagnostics > Crawl Status. The crawl status indicates that the crawl system is running.

Now that the search appliance has access to all this protected content, it can populate the index, as described in the following section.

Populating the Index with Controlled-Access Content

During crawl, the search appliance goes through each of the content sources that have been configured, and obtains the controlled-access content by using the HTTP Basic Authentication credentials configured on Content Sources > Web Crawl > Secure Crawl > Crawler Access and the forms authentication credentials configured Content Sources > Web Crawl > Secure Crawl > Forms Authentication.

For content on comp.alpha.int, which is protected by HTTP Basic Authentication:

1.
The search appliance connects to http://comp.alpha.int/.
3.
The search appliance provides the username “ALSearch” and the password entered in the Admin Console.
4.
The web server verifies that ALSearch has access to view documents on comp.alpha.int.
5.
The search appliance crawls through all documents on comp.alpha.int and adds them to the index.

For content on pers.def.int, which is protected by NTLM HTTP:

1.
The search appliance connects to pers.def.int over HTTPS.
3.
The search appliance provides an NTLM HTTP message that contains the username “ALSearch” and a response based on the password entered in the Admin Console.
4.
The IIS server verifies that ALSearch has access to view documents on pers.def.int. The search appliance crawls through all documents on pers.def.int and adds them to the index.

For content on http://insidealpha.com and apacheserver.alphainside.com, which are protected by forms authentication:

1.
4.
The web server verifies that crawler has access to view documents in the controlled access directory.
5.
The search appliance crawls through all documents on http://insidealpha.com/ and adds them to the index. Because these documents were accessed through a forms authentication rule with Make Public cleared, they are labeled as secure in the index.
6.
Next, the search appliance connects to apacheserver.alphainside.com/ and repeats steps 2 through 5 by interacting with the apache server.

When the crawl completes, the index contains content from the sources.

Setting Up Serve

To centralize serve-time authentication for the protected content, Tanya, the system administrator, configures the Default credential group:

1.
First, to add the single sign-on server http://insidealpha.com to the credential group, Tanya opens Search > Secure Search > Universal Login Auth Mechanisms > Cookie.
2.
Tanya types http://insidealpha.com/inside.html, a sample URL for the site, in the Sample URL box. Options for adding another cookie-based domain appear on the page. The Default credential group is already selected.
3.
4.
Next, to add apacheserver.alphainside.com, Tanya types apacheserver.alphainside.com/alphainsider.html, a sample URL for the content protected by a custom apache script, in the Sample URL box and clicks Save.
5.
Next, to add the comp.alpha.int web server, which uses HTTP Basic authentication, to the credential group, Tanya clicks the HTTP tab.
6.
Tanya types http://comp.alpha.int/na.html, a sample URL for the site in the Sample URL box, and clicks Save.
7.
To add pers.def.int, which uses NTLM HTTP authentication, Tanya clicks the NTLM checkbox, types pers.def.int/emea.html in the Sample URL box and clicks Save.

Serving Controlled-Access Content to a User with One Set of Credentials

Joseph is a manager who wants to gather all the personnel records for Pat Smith, an employee who recently joined Joseph’s group from another department. Several systems in the Default credential group contain information about Pat Smith.

The following steps give an overview of the process of serving controlled-access content with Default credential group configured.

1.
2.
The Universal Login Form checks the existing cookies that Joseph already has to see whether the credential group is already satisfied.
4.
Joseph enters his username and password on the Universal Login Form and clicks Login.

Back to top

Use Case 3: Two Sets of Credentials for Two Connectors

AlphaLyon, from use case 2 (see Use Case 2: One Set of Credentials for Multiple Authentication Mechanisms), has acquired ABC company, from use case 1 (see Use Case 1: HTTP Basic or NTLM HTTP Controlled-Access Content with Public Serve). Content for the merged companies is managed by two different content management systems (CMSs).

Employees of the merged companies have two corporate-wide sets of credentials:

AlphaLyon’s IT department wants to centralize serve-time authentication for both systems, using both sets of credentials.

AlphaLyon has these people who interact with this content:

This use case assumes that Tanya has added connectors for Documentum and Livelink and the content from the CMS’s has been traversed and fed into the search appliance. For information about adding connectors, see Introducing Connectors.

Creating a Credential Group

Tanya needs to configure two credential groups, one credential group for each of the connectors. However, because she is going to configure the Default credential group for Documentum, she only needs to create one additional credential group, for Livelink.

1.
Tanya opens Search > Secure Search > Universal Login.
4.
Tanya does not click Require a user-name for this credential group? because no ACLs need it.
5.
Tanya checks Group is optional? because not everyone has a login to this credential group.
6.

Adding Connectors to the Credential Groups

Next, Tanya configures the Default credential group and the ABCLivelink credential group by adding the connectors to each group:

1.
First, to add the Documentum connector to the Default credential group, Tanya clicks Search > Secure Search > Universal Login Auth Mechanisms > Connectors.
2.
Tanya types AlphaCM, a mechanism name for this entry in the Mechanism Name box.
3.
4.
Next, to add the Livelink connector to the ABCLivelink credential group, Tanya creates a new entry by selecting the ABCLivelink credential group from the pull-down menu, typing a Mechanism Name, and clicking Save.

Serving Controlled-Access Content to a User with Two Sets of Credentials

Leslie is an employee who works on the “Island” project. She began working on this project in ABC company and continues to work on it after the merger. Both the Documentum and Livelink CMS have information about this project. Leslie wants to view information about project Island from both systems.

The following steps give an overview of the process of serving controlled-access content with two credential groups (Default and ABCLivelink) configured.

1.
2.
The Universal Login Form checks to see whether the two credential groups are already satisfied.
3.
The search appliance prompts Leslie for her user credentials (user name and password) for both systems by presenting the Universal Login Form with two logins—one for the system in the Default credential group and one for the system in the ABCLivelink credential group.
4.
Leslie enters her two usernames and passwords on the Universal Login Form and clicks Login.

Back to top

Use Case 4: Windows Authentication with Kerberos Tickets for Secure Serve

AlphaLyon has decided to upgrade older servers and implement a new security policy that uses Integrated Windows Authentication (IWA) on all machines throughout their internal domain. The domain controller is a Windows server named hal.alphalyon.com.

AlphaLyon is going to upgrade the following servers:

products.alphalyon.int is a simple web server that uses HTTP Basic authentication. This server contains information about the company’s products.
news.alphalyon.int is a Microsoft IIS web server that uses NTLM HTTP. This server contains news announcements.
emp.alphalyon.int is another Microsoft IIS server that uses NTLM HTTP. It provides internal information about employees, such as email addresses and phone numbers.
sales.alphalyon.int is a web server that uses HTTP Basic authentication. This server stores general information used by everyone on the sales team.
customers.alphalyon.int is a Microsoft IIS server that uses NTLM HTTP. It stores customer directory information, such as phone numbers and addresses.

Our search appliance administrator, Tanya, wants to use Kerberos authentication to enable the search appliance to silently authenticate the user without requiring an HTTP Basic login box.

This use case is based on the following assumptions:

The following two servers have been crawled with Make Public selected: products.alphalyon.int and news.alphalyon.int and their content is public. Content on the other servers is secure.

Once again, AlphaLyon has these people who interact with this content:

Obtaining a keytab File

Before configuring and activating Kerberos support, Tanya must obtain a Kerberos Service Key Table (keytab) file from the domain controller.

Tanya performs the following actions:

2.

Configuring and Activating Kerberos Support

Now, Tanya needs to configure the search appliance to check for a user’s session ticket during serve. She also needs to activate Kerberos support:

1.
Tanya opens Search > Secure Search > Universal Login Auth Mechanisms > Kerberos.
2.
Under Specify a Kerberos Key Distribution Center (KDC) / Windows Domain Controller (DC), Tanya enters hal.alphalyon.com in the Kerberos KDC Hostname box, and clicks Save to save the change.
3.
Under Import a Kerberos Service Key Table (keytab) File, Tanya clicks Choose File and navigates to her Desktop folder.
4.
She selects the keytab file, searchappliance.keytab, and clicks OK to upload the Kerberos key table file to the search appliance.
5.
She clicks Import Kerberos Keytab File to save the change.
6.
In the section labeled Activate IWA (Integrated Windows Authentication) / Kerberos Authentication, she clicks Enable Kerberos support, and clicks Save. Because she is configuring Kerberos support for the Default credential group, she does not need to select a credential group from the pull-down menu.

Now that the search appliance is configured to use Kerberos authentication, any time a user requests secure content, the search appliance attempts to authenticate with the user’s Kerberos session key. No additional setup is needed for secure serve.

Serving Controlled-Access Content to the User as Secure Content with Kerberos Authentication

AlphaLyon now has public and secure search results available on the search appliance, and the search appliance is able to authenticate users against a Windows Domain Controller.

Search by an Authorized User

Salim is looking for a detailed report that discusses sales figures for the new “AlphaLyon Product” release. Salim opens the search page in a web browser and enters a query for “AlphaLyon Product fall sales report”.

The search appliance performs the following steps before sending Salim’s browser to the search results page:

2.
The search appliance filters the list of results as specified by the front end that applies to Salim’s search. It applies Filters defined in Search > Search Features > Front Ends > Filters and excludes all URLs listed in Search > Search Features > Front Ends > Remove URLs.
The URL is public or Salim has authorization to view the URL.

When Salim clicks on one of the links in his search results page, the browser provides his Kerberos ticket in the authentication header. The next time that Salim performs a search, the search appliance recognizes his session cookie and skips directly to the HTTP HEAD request in step 8. The session cookie set by the search appliance remains valid as long as he keeps the browser open.

The search results page doesn’t tell Salim how many search results match his query or display “Goooooogle” links, since that reveals how many secure documents exist in the index.

Search by an Unauthorized User

Eric isn’t a member of the sales team, but he’s also interested in the new AlphaLyon Product release and wants to know when the sales figures will be posted. Eric opens the search page in a web browser and enters the same query for AlphaLyon Product fall sales report. The search appliance performs the following steps before sending Eric’s browser to the search results page:

2.
The search appliance filters the list of results as specified by the front end that applies to Eric’s search. It applies Filters defined in Search > Search Features > Front Ends > Filters and excludes all URLs listed in Search > Search Features > Front Ends > Remove URLs.
The URL is public or Eric has authorization to view the URL.
10.
The search appliance directs Eric’s browser to the search results page that contains all public documents that match the query “AlphaLyon product”. Eric should see results from products.alphalyon.int and news.alphalyon.int, but unlike Salim, he doesn’t see any results from emp.alphalyon.int, sales.alphalyon.int or customers.alphalyon.int.

The search results page doesn’t tell Eric how many search results match his query or display “Goooooogle” links, since that reveals how many secure documents exist in the index.