Collation & text¶
Locale-aware comparison, sorting, substring search, word/sentence segmentation, truncation, case mapping, and quotation — all backed by ICU's collator and break iterators.
Availability
Everything on this page works identically in PHP, JavaScript, Python, and Java
— except quote(), which is JS-blocked (see the last section). (The collation
and segmentation methods landed in PHP in v3.)
Compare & sort¶
sort() takes an optional key accessor (Python) for sorting objects/dicts by a
field; Java uses the collator directly as a Comparator. Pass options with
numeric (so "file2" < "file10") or caseFirst to tailor collation in any port.
Substring search¶
Collation-aware contains() can ignore case and accents.
Sensitivity: base (ignore case & accents, default), accent, case, variant.
Word & sentence segmentation¶
splitWords() keeps only word-like segments (drops whitespace and punctuation),
following the locale's boundary rules — essential for languages without spaces.
splitGraphemes() breaks on user-perceived characters (emoji/ZWJ sequences stay
whole).
Grapheme-safe truncation¶
Truncates to at most N graphemes, breaking on a word boundary and never splitting a combining sequence.
Locale-aware case¶
Unlike a plain strtoupper/toUpperCase, these honour locale rules — Turkish
dotted/dotless I, German ß, Lithuanian accents, and so on.
Quotation marks¶
Wrap text in the locale's own quotation marks, straight from CLDR delimiter data.
quote() is PHP, Python & Java
The CLDR delimiter data isn't exposed by the JavaScript Intl API, so quote()
is omitted there (these tabs show no JS). See
Platform notes.