libdonut  2.3.2
Application framework for cross-platform game development in C++20
Classes | Functions | Variables
donut::unicode Namespace Reference

Classes

class  UTF8Iterator
 Iterator type for decoding Unicode code points from a UTF-8 string, wrapping an existing iterator for UTF-8 code units. More...
 
struct  EncodeUTF8FromCodePointResult
 Result of the encodeUTF8FromCodePoint() function. More...
 
struct  UTF8Sentinel
 Sentinel type for UTF8Iterator. More...
 
class  UTF8Iterator< It, Sentinel >
 Specialization of UTF8Iterator that works even for input iterators. More...
 
class  UTF8View
 Non-owning view type for decoding Unicode code points from a contiguous UTF-8 string. More...
 

Functions

constexpr bool isValidCodePoint (char32_t codePoint) noexcept
 Check if a 32-bit unsigned integer value falls within the valid ranges for a Unicode code point. More...
 
template<typename InputIt , typename Sentinel >
constexpr std::pair< char32_t, InputIt > decodeCodePointFromUTF8 (InputIt it, Sentinel end)
 Decode a single Unicode code point from an iterator of UTF-8 code units in a UTF-8-encoded string. More...
 
constexpr EncodeUTF8FromCodePointResult encodeUTF8FromCodePoint (char32_t codePoint) noexcept
 Encode a Unicode code point into a sequence of UTF-8 code units. More...
 

Variables

constexpr char32_t CODE_POINT_ERROR {0xFFFFFFFF}
 Invalid code point value, used as a return value in Unicode decoding algorithms for conveying encoding errors. More...
 

Function Documentation

◆ isValidCodePoint()

constexpr bool donut::unicode::isValidCodePoint ( char32_t  codePoint)
constexprnoexcept

Check if a 32-bit unsigned integer value falls within the valid ranges for a Unicode code point.

Parameters
codePoint32-bit code point value to check.
Returns
true if the code unit is a valid code point, false otherwise.

◆ decodeCodePointFromUTF8()

template<typename InputIt , typename Sentinel >
constexpr std::pair<char32_t, InputIt> donut::unicode::decodeCodePointFromUTF8 ( InputIt  it,
Sentinel  end 
)
constexpr

Decode a single Unicode code point from an iterator of UTF-8 code units in a UTF-8-encoded string.

Parameters
itinput iterator to a sequence of UTF-8 code units. The expression *it++ must be convertible to char8_t.
endend iterator or sentinel that marks the end of the UTF-8 code unit sequence.
Returns
a pair where:
  • the first element contains the decoded Unicode code point, or CODE_POINT_ERROR on failure to decode a code point due to an encoding error in the UTF-8 string, and
  • the second element contains the input iterator, positioned at the start of the next UTF-8 code unit after the parsed code point sequence.
Exceptions
anyexception thrown by the iterator implementation.

◆ encodeUTF8FromCodePoint()

constexpr EncodeUTF8FromCodePointResult donut::unicode::encodeUTF8FromCodePoint ( char32_t  codePoint)
constexprnoexcept

Encode a Unicode code point into a sequence of UTF-8 code units.

Parameters
codePointcode point to encode.
Returns
a struct containing an array of up to 4 UTF-8 code units along with a size that defines the length of the sequence in the array, starting at index 0.
Note
The returned array of code units is NOT guaranteed to be null-terminated. The size value must be used to determine the actual length of the code point sequence.

Variable Documentation

◆ CODE_POINT_ERROR

constexpr char32_t donut::unicode::CODE_POINT_ERROR {0xFFFFFFFF}
inlineconstexpr

Invalid code point value, used as a return value in Unicode decoding algorithms for conveying encoding errors.