Repertorium of Old Bulgarian Literature and Letters

Maintained by: David J. Birnbaum (djbpitt@gmail.com)

Last modified: 2016-07-07T09:35:51+0000

Unicode report

This report identifies PUA characters and superscript script characters in manuscript description files. It is sorted by filename and, within the individual files, it lists only the items that include PUA or superscript characters, and only the textual snippets that contain those characters.

PUA characters are in red and their Unicode codepoint values are reported in parentheses after the sample text, along with a count (in square brackets) of how many times each PUA character occurs in that sample.

As of Unicode 9.0, superscript Cyrillic letters are in Cyrillic Extended-A U+2DE0–U+2DFF and Cyrillic Extended-B U+A674–U+A67B. The superscript characters are in blue and their Unicode codepoint values are reported in parentheses after the sample text, along with a count (in square brackets) of how many times each superscript character occurs in that sample. Because Unicode does not provide combining superscript versions of all Cyrillic letters, even were we to use the ones that are available, we would have to fall back on an alternative for others, which would introduce inconsistencies into the representation of superscription in the corpus. For that reason, our policy is to represent all instances of superscription by wrapping markup around regular Cyrillic letters, so that for example, е<seg rend="sup">г<seg> would be rendered as е^г. If titlo, porkytie, or an accentual or breathing diacritic appears over the superscript letter, it should be included with the letter inside the same <seg> element.