Repertorium of Old Bulgarian Literature and Letters

Last modified: 2016-07-06T17:10:02+0000

Unicode superscript report

This report identifies Unicode superscript Cyrillic characters in manuscript description files, which as of Unicode 9.0 are in Cyrillic Extended-A U+2DE0–U+2DFF and Cyrillic Extended-B U+A674–U+A67B. It is sorted by filename and, within the individual files, it lists only the items that include supscript characters, and only the textual snippets for each of those that contain those characters. The superscript characters are in red and their Unicode codepoint values are reported in parentheses after the sample text, along with a count (in square brackets) of how many times each superscript character occurs in that sample. Because Unicode does not provide combining superscript versions of all Cyrillic letters, even were we to use the ones that are available, we would have to fall back on an alternative for others, which would introduce inconsistencies into the representation of superscription in the corpus. For that reason, our policy is to represent all instances of superscription by wrapping markup around regular Cyrillic letters, so that for example, е<seg rend="sup">г<seg> would be rendered as ег. If titlo, porkytie, or an accentual or breathing diacritic appears over the superscript letter, it should be included with the letter inside the same <seg> element.