|
Class Reference
%Library.Text
|
|
![]() |
|||
Private Storage |
ODBC Type: VARCHAR
The %Text data type class represents a content-addressable document that supports word-based searching
and relevance ranking. When you specify %Text as the type class of a property, you must also specify
the maximum length of the document in the
For detailed usage information, see the class documentation for the
Efficient content-based document retrieval requires the use of an index. The type of index you create depends on
the type query that the application requires. The simplest type of content-based query is the Boolean query.
A Boolean query is comprised of a set of search terms, or words, that are combined with AND/OR/NOT operations
to identify the documents of interest. Caché SQL provides the %CONTAINS operator to search for an ANDed list
of search terms. %CONTAINS operations may be combined with OR and NOT to specify any Boolean text query. The terms
need not be adjacent in the document, although queries can be restricted to adjacent terms such as "White House" by
setting
To create an English Text property named myDocument with a full text index suitable for Boolean queries you could specify:
PROPERTY myDocument As %Text (MAXLEN = 256, LANGUAGECLASS = "%Text.English"); INDEX myIndex ON myDocument(KEYS) [ TYPE=BITMAP ];
An issue with Boolean queries is that it can be difficult to specify a query that returns all of the relevant documents, but only those documents. Almost invariably the results will either omit some of the relevant documents because the query is too specific, or will include some non-relevant documents because the query is too general. For example, to locate information about full text indexing in the Caché documentation, the terms
text document search query SQL SELECT index similarity ranking Boolean %CONTAINS 'full text'
SELECT TOP 20 document FROM OnlineDocs WHERE document %CONTAINS ('SQL', 'full text', 'query') ORDER BY %SIMILARITY (document, 'text document search query SQL SELECT index similarity ranking Boolean %CONTAINS') DESC
Similarity queries are much more computationally expensive than Boolean queries. Performing similarity queries
efficiently requires an index that contains additional information with each indexed term, and so bitmap indexes
cannot be used. Beginning with Caché 2007.1, the structure of an index that can be used for similarity
queries is determined by the
PROPERTY myDocument As %Text (MAXLEN = 256, LANGUAGECLASS = "%Text.English", SIMILARITYINDEX = "mySimilarityIndex"); INDEX mySimilarityIndex ON myDocument(KEYS) [ DATA = myDocument(ELEMENTS) ];
|
|
Methods | |||
---|---|---|---|
BuildValueArray | ChooseSearchKey | CreateQList | DisplayToLogical |
IsValid | LogicalToDisplay | LogicalToOdbc | LogicalToXSD |
MakeSearchTerms | Normalize | Similarity | SimilarityIdx |
Standardize | XSDToLogical |
|
The
LANGUAGECLASS parameter specifies the fully qualified name of the language implementation class. Optionally, heLANGUAGECLASS may be set to the name of a global that indirectly defines the language class name. If a global name is specified, then the global must be defined and available at index build time and at SQL query execution time.
The
MAXLEN parameter specifies the maximum length of the %Text property in bytes. Note that, unlike the %String class, the MAXLEN parameter must be explicitly set to a positive integer on each %Text property.
Beginning with Caché 2007.1, theSIMILARITYINDEX parameter specifies the name of an index on the current property that has the structure expected by the SimilarityIdx class method of the class specified in the LANGUAGECLASS parameter. The SimilarityIdx class method in the %Text.Text class requires the index global to have the structure: ^textIndexGlobal([constantSubscripts,]key,ID) = value. An index with this structure can be created by compiling an index specification such as:The SimilarityIdx method of the %Text.Text class requires the index specified in the SIMILARITYINDEX parameter to have exactly this structure. The index may not be a bitmap index, additional subscripts or data values may not be added to the Index specification, and the index must inherit the collation of the property.PROPERTY myDocument As %Text (MAXLEN = 256, LANGUAGECLASS = "%Text.English", SIMILARITYINDEX = "myIndex"); INDEX myIndex ON myDocument(KEYS) DATA [ myDocument(VALUES) ];
|
Tests if the logical value %val, which is a string, is valid. The validation is based on the class parameter settings used for the class attribute this data type is associated with. In this case, MINLEN, MAXLEN, VALUELIST, and PATTERN.