|
Class Reference
%Regex.Matcher
|
|
![]() |
|||
Private Storage |
The Class %Regex.Matcher creates an object that does pattern matching using regular expressions. The regular expressions come from the International Components for Unicode (ICU). The ICU maintains web pages at http://www.icu-project.org.
The definition and features of the ICU regular expression package
can be found in http://userguide.icu-project.org/strings/regexp.
In particular, the definition and syntax of a regular expression
are found in:
http://userguide.icu-project.org/strings/regexp#TOC-Regular-Expression-Metacharacters,
http://userguide.icu-project.org/strings/regexp#TOC-Regular-Expression-Operators,
http://userguide.icu-project.org/strings/regexp#TOC-Replacement-Text and
http://userguide.icu-project.org/strings/regexp#TOC-Flag-Options.
On most platforms, installing Cache will also install an appropiate version of the ICU libraries. On platforms that do not have an ICU library available, evaluating any regular expression function or method will result in an <UNIMPLEMENTED> error.
A %Regex.Matcher object can be created by evaluating
##class(%Regex.Matcher).%New(pattern) or
##class(%Regex.Matcher).%New(pattern,text).
The first parameter to
If x is a %Regex.Matcher object then the built-in method
None of the methods or operations in the %Regex.Matcher package return
a
The methods and operations in a %Regex.Matcher object will catch any
<REGULAR EXPRESSION> system error and will generate a
do $system.Status.DisplayError(%objlasterror)
Some other system errors, like <STRING STACK>, are passed through the %Regex.Matcher methods without modification.
Note that some ICU operation errors are not considered errors by the
%Regex.Matcher package. Examples are evaluating the
|
|
Properties | ||||
---|---|---|---|---|
End | Group | GroupCount | HitEnd | OperationLimit |
Pattern | Start | Status | Text |
|
The property End without a subscript contains the character position in propertyText one beyond of the final character of the string found by the last match.The value of End(i) when subscripted with an integer i between 1 and
GroupCount is the character position one beyond the of the last character of the last string successfuly captured by capture group i.The value of End(i) is -1 if capture group i did not participate in the last match. The values of End and End(i) are -2 if the last match attempt failed.
Note: In addition to integer subscripts between 1 and GroupCount, the value of End(0) is identical to the value of End without a subscript. When the property End(...) is subscripted with values not described above then the attempt to evaluate the property End(...) is undefined.
The property Group without a subscript contains the string found by the last match.The value of Group(i) when subscripted with an integer i between 1 and
GroupCount is the last string successfuly captured by capture group i.If the last match operation was unsuccessful or if the specified capture group was not used during the last match operation then Group and Group(i) contain the empty string. Note that
End andEnd (i) have negative values when the last match operation did not use the specified capture group or did not succeed in matching.Note: In addition to integer subscripts between 1 and GroupCount, the value of Group(0) is identical to the value of Group without a subscript. When the property Group(...) is subscripted with values not described above then the attempt to evaluate the property Group(...) is undefined.
The Property GroupCount contains the number of capturing groups in the regular expressionPattern .
The Property HitEnd is true if the most recent matching operation touched the end of propertyText at any point during its processing. In this case, appending additional input characters to theText property could change the result of that match attempt.
The property OperationLimit provides a way to limit the time taken by a regular expression match. The default value for OperationLimit is 0 which indicates that there is no limit. Setting OperationLimit to a positive integer will cause a match operation to signal a TimeOut error after the specified number of clusters of steps by the match engine.Correspondence with actual processor time will depend on the speed of the processor and the details of the specific pattern, but cluster size is chosen such each cluster's execution time will typically be on the order of milliseconds.
The Property Pattern is the string representation of the regular expression of the Matcher. Assigning to Pattern resets all saved state concerning the last matching operation.On an installation using an NLS 8-bit character set different from Latin-1 then you you must be careful with patterns using a character class of the form [x-y] where x or y are national usage characters not in Latin-1. All regular expression matching is done in Unicode so characters x and y are converted Unicode. The character class [x-y] reprsents all characters between the Unicode translations of x and y and not the NLS 8-bit characters between x and y.
The property Start without a subscript contains the character position in propertyText of the first character of the string found by the last match. If the matched string is the empty string then Start is the character position one beyond where the empty string was located (and the property Start equals the propertyEnd .)The value of Start(i) when subscripted with an integer i between 1 and
GroupCount is the character position of the first character of the last string successfuly captured by capture group i. If the captured string is the empty string then Start(i) is the character position one beyond where the empty string that was captured (and the property Start(i) equals the propertyEnd (i).)The value of Start(i) is -1 if capture group i did not participate in the last match. The values of Start and Start(i) are -2 if the last match attempt failed.
Note: In addition to integer subscripts between 1 and GroupCount, the value of Start(0) is identical to the value of Start without a subscript. When the property Start(...) is subscripted with values not described above then the attempt to evaluate the property Start(...) is undefined.
The Property Status contains a%Status value which may provide more information about the last System exception thrown by this object. It is initially $$$OK. Its value remains unchanged by any successful operation. The Status property is changed only when an error is thrown the kernel functions implementing %Regex.Matcher or by a COS Set assignment to the Status property done by the user.
The Property Text is the string to which the regular expression will be applied. Assigning to Text resets all saved state resulting from the most recent match operation. On installations using an 8-bit character code, the internal representation of Text is converted to Unicode. Therefore, on an installation using 8-bit characters the maximum length of the Text property is only half the maximum string length supported by that installation.
|
The EndGet method implements theEnd property.
The GroupCountGet method implements theGroupCount property.
The GroupGet method implements theGroup property.
The HitEndGet method implements theHitEnd property.
The class method LastStatus returns the%Status value containing additional details about the most recent <REGULAR EXPRESSION> system error. If a %Regex.Matcher object encounters a <REGULAR EXPRESSION> error then this status is already available in theStatus property of the object. Executing
Do $SYSTEM.Status.DisplayError(##class(%Regex.Matcher).LastStatus())
is useful when debugging a <REGULAR EXPRESSION> error following a call on $MATCH, $LOCATE or ##class(%Regex.Matcher).%New(x) where a %Regex.Matcher oref value is not available.
The method Locate finds a match for the regular expressionPattern in the text stringText .If the optional argument position is defined as an integer 1 or greater then the search for a match begins at that character position of
Text .If the argument position is not defined then the search for the match begins the character position following the previous match.
Locate returns 1 if the match is found; 0 otherwise.
The method LookingAt attempts to find a match in the PropertyText that must start at a particular character position. The match need not extend to the end ofText .The argument position gives starting character position of the attempted match.
LookingAt returns 1 if the match is found; 0 otherwise.
The method Match returns true if the entire stringText is matched byPattern ; it returns false if it does not match.The argument text is optional. If the argument text is defined then the Property
Text is set to its value before the match is executed.
The OperationLimitSet method implements the side effects of doing a Set assignment to change the value of theOperationLimit property.
The PatternSet method implements Set assignments to thePattern property.
The method ReplaceAll returns a modified copy of the PropertyText . It replaces every substring ofText that matches thePattern with a replacement string. Portions ofText that are not matched are copied without change. The value of ReplaceAll is the resulting string. The PropertyText is not modified.The argument replacement supplies the string to replace each matched region. The replacement string may contain references to capture groups which take the form of $1, $2, etc. The replacement string may reference the entire matched region with $0.
The method ReplaceFirst returns a modified copy of the PropertyText . It replaces the first substring ofText that matches thePattern with a replacement string. Portions ofText that are not matched are copied without change. The value of ReplaceFirst is the resulting string. The PropertyText is not modified.The argument replacement supplies the string to replace the matched region. The replacement string may contain references to capture groups which take the form of $1, $2, etc. The replacement string may reference the entire matched region with $0.
The method ResetPosition resets any saved state from the previous match. It also causes the next call to the methodLocate () without an argument to begin at the specified character position.The argument position is the character position from which the next call to
Locate () without an argument will begin match attempts.
The StartGet method implements theStart property.
The method SubstituteIn returns the string that results from substituting capturing groups from the most recent regular expression match into components of the argumentText . This method is undefined if the most recent regular expression match operation was not successful.This method can be used as a low level step in regular expression replacement. It does not modify the property
Text . For example, the method ..ReplaceFirst (x) is equivalent to:
Quit:'..Locate(1) ..Text Quit $Extract(..Text,1,..Start-1)_..SubstituteIn(x)_ $Extract(..Text,..End,*)The argument Text supplies the string that will be modified by the matched region and then returned. The string may contain references to capture groups which take the form of $1, $2, etc. The string may reference the entire matched region with $0.
The TextSet method implements Set assignments to theText property.