diff options
Diffstat (limited to 'luni/src/main/java/java/util/regex/Pattern.java')
-rw-r--r-- | luni/src/main/java/java/util/regex/Pattern.java | 21 |
1 files changed, 14 insertions, 7 deletions
diff --git a/luni/src/main/java/java/util/regex/Pattern.java b/luni/src/main/java/java/util/regex/Pattern.java index 46984b9..cbd5965 100644 --- a/luni/src/main/java/java/util/regex/Pattern.java +++ b/luni/src/main/java/java/util/regex/Pattern.java @@ -82,16 +82,23 @@ import java.io.Serializable; * </table> * <p>Most of the time, the built-in character classes are more useful: * <table> - * <tr> <td> \d </td> <td>Any digit character.</td> </tr> - * <tr> <td> \D </td> <td>Any non-digit character.</td> </tr> - * <tr> <td> \s </td> <td>Any whitespace character.</td> </tr> - * <tr> <td> \S </td> <td>Any non-whitespace character.</td> </tr> - * <tr> <td> \w </td> <td>Any word character.</td> </tr> - * <tr> <td> \W </td> <td>Any non-word character.</td> </tr> + * <tr> <td> \d </td> <td>Any digit character (see note below).</td> </tr> + * <tr> <td> \D </td> <td>Any non-digit character (see note below).</td> </tr> + * <tr> <td> \s </td> <td>Any whitespace character (see note below).</td> </tr> + * <tr> <td> \S </td> <td>Any non-whitespace character (see note below).</td> </tr> + * <tr> <td> \w </td> <td>Any word character (see note below).</td> </tr> + * <tr> <td> \W </td> <td>Any non-word character (see note below).</td> </tr> * <tr> <td> \p{<i>NAME</i>} </td> <td> Any character in the class with the given <i>NAME</i>. </td> </tr> * <tr> <td> \P{<i>NAME</i>} </td> <td> Any character <i>not</i> in the named class. </td> </tr> * </table> - * <p>There are a variety of named classes: + * <p>Note that these built-in classes don't just cover the traditional ASCII range. For example, + * <code>\w</code> is equivalent to the character class <code>[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}]</code>. + * For more details see <a href="http://www.unicode.org/reports/tr18/#Compatibility_Properties">Unicode TR-18</a>, + * and bear in mind that the set of characters in each class can vary between Unicode releases. + * If you actually want to match only ASCII characters, specify the explicit characters you want; + * if you mean 0-9 use <code>[0-9]</code> rather than <code>\d</code>, which would also include + * Gurmukhi digits and so forth. + * <p>There are also a variety of named classes: * <ul> * <li><a href="../../lang/Character.html#unicode_categories">Unicode category names</a>, * prefixed by {@code Is}. For example {@code \p{IsLu}} for all uppercase letters. |