Unicode Character Classes
These are the Unicode “General Category” character class names used in regular expression matching, e.g. in Perl, \pP or \p{Punctuation} to match all Unicode characters having the “punctuation” property.
| Expression | Syntax | Long Name | Description |
|---|---|---|---|
| Letter | :L | Letter | Matches any letter, Ll | Lm | Lo | Lt | Lu |
| Uppercase letter | :Lu | Uppercase_Letter | Matches any one capital letter. For example, :Luhe matches “The” but not “the”. |
| Lowercase letter | :Ll | Lowercase_Letter | Matches any one lower case letter. For example, :Llhe matches “the” but not “The”. |
| Title case letter | :Lt | Titlecase_Letter | Matches characters that combine an uppercase letter with a lowercase letter, such as Nj and Dz. |
| Modifier letter | :Lm | Modifier_Letter | Matches letters or punctuation, such as commas, cross accents, and double prime, used to indicate modifications to the preceding letter. |
| Other letter | :Lo | Other_Letter | Matches other letters, such as gothic letter ahsa. |
| Cased letter | :LC | Cased_Letter | Matches any letter with case, Ll | Lt | Lu |
| Mark | :M | Mark | Matches any mark, Mc | Me | Mn |
| Non-spacing mark | :Mn | Nonspacing_Mark | Matches non-spacing marks. |
| Combining mark | :Mc | Spacing_Mark | Matches combining marks. |
| Enclosing mark | :Me | Enclosing_Mark | Matches enclosing marks. |
| Number | :N | Number | Matches any number, Nd | Nl | No |
| Decimal digit | :Nd | Decimal_Number | Matches decimal digits such as 0-9 and their full-width equivalents. |
| Letter digit | :Nl | Letter_Number | Matches letter digits such as roman numerals and ideographic number zero. |
| Other digit | :No | Other_Number | Matches other digits such as old italic number one. |
| Punctuation | :P | Punctuation | Matches any puncutation, Pc | Pd | Pe | Pf | Pi | Po | Ps |
| Connector punctuation | :Pc | Connector_Punctuation | Matches the underscore or underline mark. |
| Dash punctuation | :Pd | Dash_Punctuation | Matches the dash mark. |
| Open punctuation | :Ps | Open_Punctuation | Matches opening punctuation such as open brackets and braces. |
| Close punctuation | :Pe | Close_Punctuation | Matches closing punctuation such as closing brackets and braces. |
| Initial quote punctuation | :Pi | Initial_Punctuation | Matches initial double quotation marks. |
| Final quote punctuation | :Pf | Final_Punctuation | Matches single quotation marks and ending double quotation marks. |
| Other punctuation | :Po | Other_Punctuation | Matches commas (,), ?, “, !, @, #, %, &, *, \, colons (:), semi-colons (;), ‘, and /. |
| Symbol | :S | Symbol | Matches any symbol, Sc | Sk | Sm | So |
| Math symbol | :Sm | Math_Symbol | Matches +, =, ~, |, <, and >. |
| Currency symbol | :Sc | Currency_Symbol | Matches $ and other currency symbols. |
| Modifier symbol | :Sk | Modifier_Symbol | Matches modifier symbols such as circumflex accent, grave accent, and macron. |
| Other symbol | :So | Other_Symbol | Matches other symbols, such as the copyright sign, pilcrow sign, and the degree sign. |
| Separator | :Z | Separator | Matches any separator, Zl | Zp | Zs |
| Paragraph separator | :Zp | Paragraph_Separator | Matches the Unicode character U+2029. |
| Space separator | :Zs | Space_Separator | Matches blanks. |
| Line separator | :Zl | Line_Separator | Matches the Unicode character U+2028. |
| Other control | :Cc | Control | Matches end of line. |
| Other format | :Cf | Format | Formatting control character such as the bidirectional control characters. |
| Surrogate | :Cs | Surrogate | Matches one half of a surrogate pair. |
| Other private-use | :Co | Private_Use | Matches any character from the private-use area. |
| Other not assigned | :Cn | Unassigned | Matches characters that do not map to a Unicode character. |