GHC.Unicode

Copyright (c) The University of Glasgow 2003
License see libraries/base/LICENSE
Maintainer [email protected]
Stability internal
Portability non-portable (GHC extensions)
Safe Haskell Trustworthy
Language Haskell2010

Description

Implementations for the character predicates (isLower, isUpper, etc.) and the conversions (toUpper, toLower). The implementation uses libunicode on Unix systems if that is available.

data GeneralCategory Source

Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard (the Unicode Character Database, in particular).

Examples
Expand

Basic usage:

>>> :t OtherLetter
OtherLetter :: GeneralCategory

Eq instance:

>>> UppercaseLetter == UppercaseLetter
True
>>> UppercaseLetter == LowercaseLetter
False

Ord instance:

>>> NonSpacingMark <= MathSymbol
True

Enum instance:

>>> enumFromTo ModifierLetter SpacingCombiningMark
[ModifierLetter,OtherLetter,NonSpacingMark,SpacingCombiningMark]

Read instance:

>>> read "DashPunctuation" :: GeneralCategory
DashPunctuation
>>> read "17" :: GeneralCategory
*** Exception: Prelude.read: no parse

Show instance:

>>> show EnclosingMark
"EnclosingMark"

Bounded instance:

>>> minBound :: GeneralCategory
UppercaseLetter
>>> maxBound :: GeneralCategory
NotAssigned

Ix instance:

>>> import Data.Ix ( index )
>>> index (OtherLetter,Control) FinalQuote
12
>>> index (OtherLetter,Control) Format
*** Exception: Error in array index

Constructors

UppercaseLetter

Lu: Letter, Uppercase

LowercaseLetter

Ll: Letter, Lowercase

TitlecaseLetter

Lt: Letter, Titlecase

ModifierLetter

Lm: Letter, Modifier

OtherLetter

Lo: Letter, Other

NonSpacingMark

Mn: Mark, Non-Spacing

SpacingCombiningMark

Mc: Mark, Spacing Combining

EnclosingMark

Me: Mark, Enclosing

DecimalNumber

Nd: Number, Decimal

LetterNumber

Nl: Number, Letter

OtherNumber

No: Number, Other

ConnectorPunctuation

Pc: Punctuation, Connector

DashPunctuation

Pd: Punctuation, Dash

OpenPunctuation

Ps: Punctuation, Open

ClosePunctuation

Pe: Punctuation, Close

InitialQuote

Pi: Punctuation, Initial quote

FinalQuote

Pf: Punctuation, Final quote

OtherPunctuation

Po: Punctuation, Other

MathSymbol

Sm: Symbol, Math

CurrencySymbol

Sc: Symbol, Currency

ModifierSymbol

Sk: Symbol, Modifier

OtherSymbol

So: Symbol, Other

Space

Zs: Separator, Space

LineSeparator

Zl: Separator, Line

ParagraphSeparator

Zp: Separator, Paragraph

Control

Cc: Other, Control

Format

Cf: Other, Format

Surrogate

Cs: Other, Surrogate

PrivateUse

Co: Other, Private Use

NotAssigned

Cn: Other, Not Assigned

Instances
Instances details
Bounded GeneralCategory

Since: base-2.1

Instance details

Defined in GHC.Unicode

Enum GeneralCategory

Since: base-2.1

Instance details

Defined in GHC.Unicode

Eq GeneralCategory

Since: base-2.1

Instance details

Defined in GHC.Unicode

Ord GeneralCategory

Since: base-2.1

Instance details

Defined in GHC.Unicode

Read GeneralCategory

Since: base-2.1

Instance details

Defined in GHC.Read

Show GeneralCategory

Since: base-2.1

Instance details

Defined in GHC.Unicode

Ix GeneralCategory

Since: base-2.1

Instance details

Defined in GHC.Unicode

generalCategory :: Char -> GeneralCategory Source

The Unicode general category of the character. This relies on the Enum instance of GeneralCategory, which must remain in the same order as the categories are presented in the Unicode standard.

Examples
Expand

Basic usage:

>>> generalCategory 'a'
LowercaseLetter
>>> generalCategory 'A'
UppercaseLetter
>>> generalCategory '0'
DecimalNumber
>>> generalCategory '%'
OtherPunctuation
>>> generalCategory '♥'
OtherSymbol
>>> generalCategory '\31'
Control
>>> generalCategory ' '
Space

isAscii :: Char -> Bool Source

Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.

isLatin1 :: Char -> Bool Source

Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.

isControl :: Char -> Bool Source

Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.

isAsciiUpper :: Char -> Bool Source

Selects ASCII upper-case letters, i.e. characters satisfying both isAscii and isUpper.

isAsciiLower :: Char -> Bool Source

Selects ASCII lower-case letters, i.e. characters satisfying both isAscii and isLower.

isPrint :: Char -> Bool Source

Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).

isSpace :: Char -> Bool Source

Returns True for any Unicode space character, and the control characters \t, \n, \r, \f, \v.

isUpper :: Char -> Bool Source

Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.

isLower :: Char -> Bool Source

Selects lower-case alphabetic Unicode characters (letters).

isAlpha :: Char -> Bool Source

Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to isLetter.

isDigit :: Char -> Bool Source

Selects ASCII digits, i.e. '0'..'9'.

isOctDigit :: Char -> Bool Source

Selects ASCII octal digits, i.e. '0'..'7'.

isHexDigit :: Char -> Bool Source

Selects ASCII hexadecimal digits, i.e. '0'..'9', 'a'..'f', 'A'..'F'.

isAlphaNum :: Char -> Bool Source

Selects alphabetic or numeric Unicode characters.

Note that numeric digits outside the ASCII range, as well as numeric characters which aren't digits, are selected by this function but not by isDigit. Such characters may be part of identifiers but are not used by the printer and reader to represent numbers.

isPunctuation :: Char -> Bool Source

Selects Unicode punctuation characters, including various kinds of connectors, brackets and quotes.

This function returns True if its argument has one of the following GeneralCategorys, or False otherwise:

These classes are defined in the Unicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Punctuation".

Examples
Expand

Basic usage:

>>> isPunctuation 'a'
False
>>> isPunctuation '7'
False
>>> isPunctuation '♥'
False
>>> isPunctuation '"'
True
>>> isPunctuation '?'
True
>>> isPunctuation '—'
True

isSymbol :: Char -> Bool Source

Selects Unicode symbol characters, including mathematical and currency symbols.

This function returns True if its argument has one of the following GeneralCategorys, or False otherwise:

These classes are defined in the Unicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Symbol".

Examples
Expand

Basic usage:

>>> isSymbol 'a'
False
>>> isSymbol '6'
False
>>> isSymbol '='
True

The definition of "math symbol" may be a little counter-intuitive depending on one's background:

>>> isSymbol '+'
True
>>> isSymbol '-'
False

toUpper :: Char -> Char Source

Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.

toLower :: Char -> Char Source

Convert a letter to the corresponding lower-case letter, if any. Any other character is returned unchanged.

toTitle :: Char -> Char Source

Convert a letter to the corresponding title-case or upper-case letter, if any. (Title case differs from upper case only for a small number of ligature letters.) Any other character is returned unchanged.

wgencat :: Int -> Int Source

© The University of Glasgow and others
Licensed under a BSD-style license (see top of the page).
https://downloads.haskell.org/~ghc/8.10.2/docs/html/libraries/base-4.14.1.0/GHC-Unicode.html