summaryrefslogtreecommitdiffstats
path: root/luni/src/main/java/java/util/regex/Pattern.java
diff options
context:
space:
mode:
Diffstat (limited to 'luni/src/main/java/java/util/regex/Pattern.java')
-rw-r--r--luni/src/main/java/java/util/regex/Pattern.java21
1 files changed, 14 insertions, 7 deletions
diff --git a/luni/src/main/java/java/util/regex/Pattern.java b/luni/src/main/java/java/util/regex/Pattern.java
index 46984b9..cbd5965 100644
--- a/luni/src/main/java/java/util/regex/Pattern.java
+++ b/luni/src/main/java/java/util/regex/Pattern.java
@@ -82,16 +82,23 @@ import java.io.Serializable;
* </table>
* <p>Most of the time, the built-in character classes are more useful:
* <table>
- * <tr> <td> \d </td> <td>Any digit character.</td> </tr>
- * <tr> <td> \D </td> <td>Any non-digit character.</td> </tr>
- * <tr> <td> \s </td> <td>Any whitespace character.</td> </tr>
- * <tr> <td> \S </td> <td>Any non-whitespace character.</td> </tr>
- * <tr> <td> \w </td> <td>Any word character.</td> </tr>
- * <tr> <td> \W </td> <td>Any non-word character.</td> </tr>
+ * <tr> <td> \d </td> <td>Any digit character (see note below).</td> </tr>
+ * <tr> <td> \D </td> <td>Any non-digit character (see note below).</td> </tr>
+ * <tr> <td> \s </td> <td>Any whitespace character (see note below).</td> </tr>
+ * <tr> <td> \S </td> <td>Any non-whitespace character (see note below).</td> </tr>
+ * <tr> <td> \w </td> <td>Any word character (see note below).</td> </tr>
+ * <tr> <td> \W </td> <td>Any non-word character (see note below).</td> </tr>
* <tr> <td> \p{<i>NAME</i>} </td> <td> Any character in the class with the given <i>NAME</i>. </td> </tr>
* <tr> <td> \P{<i>NAME</i>} </td> <td> Any character <i>not</i> in the named class. </td> </tr>
* </table>
- * <p>There are a variety of named classes:
+ * <p>Note that these built-in classes don't just cover the traditional ASCII range. For example,
+ * <code>\w</code> is equivalent to the character class <code>[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}]</code>.
+ * For more details see <a href="http://www.unicode.org/reports/tr18/#Compatibility_Properties">Unicode TR-18</a>,
+ * and bear in mind that the set of characters in each class can vary between Unicode releases.
+ * If you actually want to match only ASCII characters, specify the explicit characters you want;
+ * if you mean 0-9 use <code>[0-9]</code> rather than <code>\d</code>, which would also include
+ * Gurmukhi digits and so forth.
+ * <p>There are also a variety of named classes:
* <ul>
* <li><a href="../../lang/Character.html#unicode_categories">Unicode category names</a>,
* prefixed by {@code Is}. For example {@code \p{IsLu}} for all uppercase letters.