Search Appliance Internationalization
About this Document
Which Languages are Searchable and Indexable?
To improve search quality for your language, upload synonym files on the Query Settings page. If the search appliance includes a synonym file for your language, you can improve search quality by providing synonyms for your business’s internal abbreviations, code names, and other terms particular to the business. For more information on synonym files, see Best Practices in Creating the Search Experience.
Which Character Encodings can be Crawled and Indexed?
What is the Recommended Character Encoding for Documents?
How are Bi-Directional Languages Handled?
How are Chinese, Japanese, and Korean Handled?
How is Segmentation Used?
How Does the Search Appliance Detect the Language of a Document?
Which Character Encodings can be used to Enter Queries and Serve Results?
To support searching documents in multiple languages and character encodings, Google provides the ie and oe parameters:
•
|
The ie parameter indicates how to interpret characters in the search request.
|
•
|
The oe parameter indicates how to encode characters in the search results.
|
To appropriately decode the search query and correctly encode the search results, supply the correct ie and oe parameters, respectively, in the search request. When you are providing search for multiple languages, Google recommends using the utf8 encoding value for the ie and oe parameters. To view a list of supported character encodings, see the Internationalization section of the Search Protocol Reference.
How Does the Search Appliance Detect the Language of a Query?
In What Languages are the Admin Console and Help System Displayed?
This section lists the languages in which the Admin Console and help system are displayed.
How Does the Search Appliance Determine the Display Language for the Admin Console and Help System?
For information on changing language settings, refer to the help system for your browser.
What Languages and Character Sets can be Typed into the Admin Console?
The default character encoding in modern browsers is UTF-8, a Unicode encoding capable of displaying characters in most writing system. (For a complete listing of the writing systems that can be displayed with Unicode encodings, see the “Unicode Character Code Charts By Script,” at http://www.unicode.org/charts/.) The UTF-8 encoding supports a full range of characters in any language.
There are some restrictions on collection names and other search appliance parameters and fields. For more information on these parameters, see Are there Restrictions on Using Non-ASCII Characters in the Admin Console Fields?.
Are there Restrictions on Using Non-ASCII Characters in the Admin Console Fields?
The following fields are restricted to alphanumeric ASCII characters, underscores, and hyphens only:
•
|
The name you assign on the Admin Console to a Query Expansion synonym file. This is not the name of the file to which you browse in the File field, which may contain any UTF-8 characters, but the name you assign in the Search > Search Features > Query Settings > Name field.
|
•
|
The name you assign on the Admin Console to a Query Expansion blacklist file. This is not the name of the file to which you browse in the File field, which may contain any UTF-8 characters, but the name you assign in the Search > Search Features > Query Settings > Name field.
|
Can Searches be Made Accent-Insensitive?
By default, searches on the Google Search Appliance are accent-sensitive. For example, if you search for the term distribuicao, it will not match the Portuguese word distribuição. Accent-sensitivity works for most commonly-occurring words that contain diacritical marks in each language.
There are two ways to make searches accent-insensitive: by using the lr parameter or by using the Accept-Language HTTP header.
Most of the time, accent-insensitive search works in both directions. For example, when accent- insensitive search is enabled, searching for distribuicao returns results containing distribuição, and searching for distribuição returns results containing distribuicao.
You use the lr parameter in search URLs. For example, if you set the lr parameter to the value lang_pt, for Portuguese (lr=lang_pt), a search for distribuicao will match distribuição.
Accent-insensitive search with the lr parameter is available in German, French, Spanish, Finnish, Norwegian, Swedish, and Portuguese. For more information on how to use the lr parameter in constructing search URLs, see the Search Protocol Reference.
Use the Accept-Language HTTP header to enable accent-insensitive search for the most common words containing diacritical marks in the specified language. For example, Accept-Language: pt in the request results in case-insensitive search in Portuguese. the Accept-Language header is set in end-user browsers. Refer to the documentation for your browser for more information.
Take note that accent-insensitive search works only for languages that are part of a language bundle that is currently enabled. For more information about language bundles, see How do Language Bundles Work?.
Which Language-Related Settings are User‑Configurable?
Search users can configure the preferred language setting in their browsers.
Administrators can modify the XSLT style sheet to control various language-related features. For example, if you use older web servers, you can submit queries to and receive results from the search appliance in a National Character Set rather than UTF-8. For more information, see the Internationalization and Language Filters sections of the Search Protocol Reference.
Administrators can configure the following front-end settings and features:
For more information on query expansion, front ends, and language filtering, see Creating the Search Experience.
Which Languages are Spelling-Checked?
How Does the Search Appliance Make Spelling Suggestions?
Which Languages can use Query Expansion?
Search appliance users do not see the effects of query expansion on the original search terms.
You can also create a local query expansion policy using preinstalled and custom synonyms and blacklist files for languages that use the Latin-1 (ISO_8859-1) alphabet, provided that files containing accented characters are UTF-8 encoded. For more information on query expansion, see Using Query Expansion to Widen Searches in Creating the Search Experience.
Which Languages can use Dynamic Result Clustering?
For more information on dynamic result clusters, see Best Practices in Creating the Search Experience.
How do Language Bundles Work?
For more information on language bundles, see Changing Languages for Query Expansion and Spelling Suggestions in Creating the Search Experience.