Shibboleth User Manual
Shibboleth is designed to assist in the text entry of foreign languages by providing a visual interface to help you select the correct characters. At the time of writing, Shibboleth supports Hebrew, Aramaic, Greek, Syriac, Arabic, Ethiopic, Armenian, Coptic, Ugaritic, South Arabian, Egyptian Hieroglyphs, transliteration and Akkadian, Hittite and Old Persian cuneiform.
Shibboleth is designed to output the foreign languages in Unicode, the international standard for multi-lingual encoding. If you don’t know much about Unicode, the next section provides a brief overview. If you’re Unicode savvy, skip it.
In the bad old days, each character in a given font had a unique 1 byte code - a byte consisting of 8 bits. A bit is a unit that can store a one or a zero. So 8 bits allowed one font to have a maximum of 256 characters (2 to the 8th power), and every font shared the same 256 available code points. So that computer users could switch fonts with impunity, standards organizations assigned each letter of the Roman alphabet, as well as European numerals and common punctuation marks, to set code points. In this 8-bit world, the common solution to multilingual encoding was to use the same code points used for Roman encoding to encode foreign scripts. For example, the ASCII code point for the letter ‘A’ (00101001 in base 2 or bits, which can be expressed as ‘29’ in base 16 or ‘41’ in base 10) would be used for a Greek Alpha. Then if one wanted to see an Alpha instead of an A, they would change the font to a Greek font.
There were four major problems with this approach: 1) some languages needed more than 256 characters and thus required font changes even within words to encode all the marks. 2) There was no standardization for the encoding of the marks. For example, one Greek font Chi on the ASCII code point for X, and another might place it on C. Then if a user switched from one Greek font to another, the Greek text would break. 3) There was no way to write universally useful searching tools for working in foreign languages, because to a search engine, an Alpha was indistinguishable from an ‘A’ (and from a Hebrew Aleph, and so forth). These letters all use the same code point internally. 4) There was no universal solution for right-to-left languages. For years many computer platforms have required users to type Hebrew backwards!
The solution to these problems was to switch from 8 bit encoding to 16-bit encoding - 2 bytes per character instead of one. This provides over 65,536 unique code points (2 to the 16th power) for character assignments. Now a Greek Alpha can have a separate code point from those used to encode an English A and a Hebrew Aleph. The Unicode Consortium is responsible for publishing the standard code points for each script, so users can switch at will between any Unicode font that has the characters they need without having to change their encoding scheme. Like the humble 8-bit byte of yesteryear, the 16 bit Unicode points are often described using hexadecimal digits with values from zero to f (the next value after 9 is ‘a’, rather than ‘10’ in base 16). The Roman A is still assigned to 29 for backwards compatibility, but in Unicode we’ll more often call this 0029 to reflect the use of 16 bits. The Greek Alpha is assigned to 0391 and the Hebrew Aleph is 05D0.
Over time, the 65,536 code points available by using 16-bits for assigning characters began to get crowded, but from the start, Unicode had planned ahead for the days when 24 or 32 bits would be used to encode some scripts. So many of the newer additions to Unicode, such as Ugaritic, Egyptian Hieroglyphs and various cuneiform scripts actually use more than 16 bits for each character.
Shibboleth is designed for Windows 7, Windows Vista and Windows XP. It requires the installation of version 4 of the .NET framework.
If you intend to copy text into Microsoft Word, version 2010 is recommended.
The installation of Shibboleth is fairly straightforward, but a number of supporting files must also be installed for full functionality.
- If you haven't done so already, install version 4 of the .NET framework. There is a link to Download .NET on the Shibboleth installation page. It can also be found on Microsoft.com. In order for the Shibboleth install to work, you may need to reboot after installing .NET, even if the .NET installer doesn't prompt you to do so.
- Click the Download Shibboleth button on www.logos.com/Shibboleth.
- Shibboleth uses embedded versions of all the necessary fonts, but if you are copying text into a word processor, you will need to download and install the various fonts. Users of the current version of Logos Bible Software may already have all of these fonts installed. There are links to download the fonts at www.logos.com/Shibboleth. These fonts are typically copied to c:\Windows\Fonts. At the time of writing, the fonts used by Shibboleth are: SBL Hebrew, Ezra SIL, Gentium, GentiumAlt, BibliaLS, SBL Greek, Serto Jerusalem, East Syriac Adiabene, Sinaiticus, Abyssinica SIL, Antinoou, Musnad Sabaic Unicode, Charis SIL, Scheherazade, Zebul Open, Aegyptus, CuneiformComposite, Assurbanipal, CunieformOB, Santakku, SantakkuM, Akkadian, UllikummiA, B and C, Bisitun and Persepolis. Some of these fonts have undergone extensive revisions, so it is important to have the latest version. Shibboleth also uses some fonts that come with Windows, such as Estrangelo Edessa, Sylfaen and Palatino Linotype.
To launch Shibboleth, click on the short-cut that was added to your Start menu during installation. The default location for the short-cut is Start | All Programs | Logos Bible Software | Shibboleth.
You’ll see something like this:
The main ribbon, stretching across the top of Shibboleth, is divided into three sections. The first section with the green button, the Language Selector, consists of two drop-down boxes. The first drop down (which does the same thing as clicking the green button) allows you to choose which language you intend to encode. You’ll notice that both Aramaic and Hebrew are listed, even though they use the same script and the same Unicode code points. The difference is that when these are copied, they will be wrapped in different XML tags that are used in Logos books to allow one to search Aramaic or Hebrew separately or KeyLink to different lexicons.
The second drop down allows one to select between different views for the text input box. Many languages only have one option here, but for Ethiopian and Armenian, you can choose between seeing Unicode in the edit box or seeing the names of the syllables typed out in Roman characters. For Hebrew and Syriac, you can choose between seeing Unicode in the Text Input Box as you type or seeing an old-school ASCII equivalent. Regardless of what you see as you type in the Text Input Box, you will see Unicode in the Preview Pane.
The middle section of ribbon, the Copy Options, is dominated by a yellow button. This is the button you click to copy the contents of the Preview Pane to your Windows clipboard. From there, you can paste the text into other applications. In the current release, we do support a short-cut of Alt+c for copy. This is different than the Windows standard of Ctrl+c. We happen to be using the Control shift state for some rare Coptic characters. For the next release, we will likely free up Ctrl+c and Ctrl+v (for pasting) so that the Windows standard shortcuts work. But for now, Alt+c and Alt+v are available.
The options to the left of the Copy Button allow you to choose the format of the text to be copied. The default option is XML, which copies Unicode characters in plain text, wrapped with XML tags used to identify the language of the enclosed string. The is the format Logos uses to store text for its XML source files.
‘Escaped’ indicates the use of hexadecimal codes to represent the Unicode text in a way that is sometimes easier for some programs to manipulate. In this mode, for example, a Hebrew aleph would copy out as א instead of the literal character. Escaped mode also wraps the text in XML tags.
The ASCII option allows one to copy text out in our older, pre-Unicode encoding formats. These are useful for us at Logos, because they allow us to use Shibboleth to maintain older resources. (However, for the transliteration ‘language’ only Unicode and Escaped are really supported, so Shibboleth may not be used to edit transliterations encoded in SemiticaDict, our old transliteration solution.)
The final Copy option, RTF, puts a Rich Text string in the clipboard that preserves the chosen font and doesn't wrap the text in XML tags. This option was added to make pasting into Microsoft Word, Logos Bible Software or other Unicode applications easier.
The right-most section on the ribbon is dominated by a blue Paste Button. You can copy text into your Windows clipboard from an external document and paste it into Shibboleth for editing by clicking the Paste Button. Shibboleth needs to know if what is being pasted in is Unicode or ASCII, so that it will treat the string properly. If you are working on Logos XML documents, you may copy the XML language tags as well as the language string itself. These tags will be stripped when pasted into Shibboleth, but added again when you click the Copy button, making it easy to paste right over the old word, tags and all.
The Application Window
Below the ribbon, the main Shibboleth screen is divided into 4 sections: three stacked horizontally along the left side, and a long 4th section covering the right side of the application window.
The top left pane is the Preview Pane. This allows you to see the foreign language texts previewed in the chosen font. Above the Preview Pane the font being used for display is listed. If more than one font is listed, you can click on the font name to change fonts. This is particularly useful in languages like Syriac where there are many different scripts that encode the same language. In this case, switching fonts is not merely cosmetic, but vitally important to helping you select the select the correct characters.
Beneath the Preview Pane is the Text Edit Box. Shibboleth supports many methods of text entry (described in the next section), but they all put characters into the Text Edit Box. This is where you edit the string of text you are encoding.
This window always uses unescaped characters. This is an important change from previous encoding methods used by our contractors, where we were often required to type high ASCII values in escaped form, for example, instead of a literal broken vertical bar (¦), the high ASCII value ¦ was used. Escaped high ASCII was easy to work with because there were no standardized keys to place the high ASCII values on. However, Shibboleth does not allow one to type the escaped form – such a form will be treated as a literal &#xxx; instead of being converted to the proper character.
The principle benefit of using the literal high ASCII characters instead of the escaped high ASCII in the ASCII versions of the Text Edit Box, is that if one types in the wrong key, one only has to hit the backspace once – rather than 5 times – to get rid of the incorrect character in the Text Edit Box. When users were typing escaped forms in LGM, we often encountered doubled or missing semi-colons and ampersands for this and similar reasons, so the accuracy of the input should improve with the abolishment of typing escaped high ASCII.
The x in the upper right corner of the Text Input Box clears the box.
Below the Text Edit Box is the Virtual Keyboard. On the right side of the application window are the Character Palettes. These will be described in the next section.
Text Entry Methods
There are a number of different ways to enter text into Shibboleth.
- Type text directly into the Text Edit Box. The Keyboard Pane at the bottom of the screen shows the currently selected encoding system, mapped to keys on a keyboard. The Shibboleth keyboards use many shift states to access a wide range of characters. The main shift states are: normal (unshifted), Shift, AltGr (the right Alt key or Ctrl+Alt), Shift+AltGr and Ctrl+Shift. As you hold down the various keys that toggle the shift states, the keyboard down below will change to show what is encoded on the various shift planes. (Other less commonly used shift states that are supported include the Ctrl state, the SGCaps state and the Shift+SGCaps state. SGCaps is entered by toggling the CapsLock key for languages that support Swiss German Caps instead of using CapsLock to toggle to the Shift state.
- The keys on the on-screen Virtual Keyboard can also be clicked on with a mouse to insert the character. The keys that toggle the shift states can be clicked as well, and they will un-toggle when either clicked again, or the physical key is pressed. You can also hover your mouse cursor over the keys on the onscreen keyboard to get a pop-up with the name for the character, which can be useful in identifying marks that are very similar in appearance. The keyboard and the mouse can be used in combination, for example by holding the shift key to toggle to the shift plane, but clicking on the key with the mouse.
- The right-most pane contains one or more character palettes which characters organized in a logical fashion. The Hebrew palette, for example, is sorted to assist in encoding marks in the correct order, so consonants are listed at the top, followed by consonant distinctors, then vowel marks, then accent marks, Hebrew punctuation marks, and commonly used non-Hebrew characters. Some languages have more than one palette, selectable by the tabs at the top of the pane. For example, Hebrew and Armenian each have one palette that sorts the characters by shape, making it easier for people unfamiliar with the scripts to select the correct character, and another that sorts the letters alphabetically, a more intuitive layout for trained eyes familiar with the script. Like with the keyboards on the lower-left pane, hovering the mouse cursor over the characters in the palette will display the character name to help identify the correct character. Some of the cuneiform languages support character palettes that organize the marks by the number in various published sign lists. In these cases, hovering over the character gives you the sign list number.
- To reduce clutter, and reduce the potential of mistaking a very common character for a similarly shaped rare character, some extremely rare marks are placed on the keyboard, but not on the palettes. For this and other reasons, in some instances users may wish to find marks in the printed documentation of the encoding standards. If you are using one of the encodings where you are seeing Roman characters in the Text Edit Box, you can make use of the escaped high ASCII characters listed in the documentation. But you cannot type ¦, for example. However, for ease of entry, it is possible to use the left Alt key and the Number Pad to enter high ASCII values. Simply hold the LeftAlt key and type 0+ the high ASCII value and then release the LeftAlt key. It is important not to forget the 0. So if the print documentation has ¦, you can type LeftAlt+0166 and a literal ¦ will paste into the Text Edit Box, and the Unicode value that converts to will display in the Preview Pane.
- For those encoding methods that expect Unicode in the Text Edit Box, you can type in a 4 digit hexadecimal Unicode value, such as 05D0 for the Hebrew letter Aleph and then type Alt+x and it will be translated into the literal Unicode value. This can be especially helpful when inputting a very rare character that the font supports which is not on the character palette. This can also help you by letting you find the character you need in the official Unicode documentation and use that to enter the correct character in Shibboleth. This method only works properly all the time if the encoding option in the dropdown next to the language selector is one of the Unicode methods, and the language doesn’t have a keyboard that prevents the letters a through f from being written without converting to a foreign script. (This will be most useful for Transliteration, since the Charis SIL font supports a wide range of characters. But it can also be used with Ethiopic and Armenian if in the utf8 encoding mode.) Alt+x just grabs the previous 4 characters when transforming the code you type to Unicode, which means it does not work for Ugaritic, which uses more than 4 hexadecimal digits to describe its characters. However, this was deemed desirable compared to MS Word’s use of Alt+x where if you wanted to put a macron over the letter a and typed a0304, since ‘a’ is a valid hexadecimal number, Word grabs five letters back and returns a gibberish character.
Greek, Hebrew and Aramaic Notes
Our contract partners will need to read the language specific documentation found in the following documents: ‘grc - Logos Biblical Greek Encoding.pdf’, ‘heb - Logos Biblical Hebrew Encoding.pdf’ Consult these manuals before and while keying these languages to properly encode them.
The Hebrew and Aramaic character palettes now allow you to choose between seeing the letters in alphabetic order or seeing them grouped by shape to assist in selecting between commonly confused letters.
Arabic and Syriac Notes
For Arabic and Syriac, each letter can have up to four shapes, depending on if it stands alone, or occurs at the beginning, middle or end of a word. In Shibboleth, all four forms are shown in the Character Palettes, making it easier to select the right character no matter which shape the letter is taking.
The Syriac abbreviation mark is not currently rendering properly in .NET 4. The Syriac character palette now allows you to choose between seeing the letters in alphabetic order or seeing them grouped by shape to assist in selecting between commonly confused letters.
Our contractors should read ‘ara – Buckwalter Writing System.pdf’ and ‘syr - Logos Syriac Encoding.pdf’ for more information on keying these languages.
Ethiopic and Armenian Notes
The ASCII text edit modes of Ethiopic and Armenia are handled somewhat differently. Ethiopic is based on one mark per syllable. Since there are more than 40 consonants and 8 vowel sounds, there is no good way to use one ASCII character per Ethiopic symbol. Instead, each Ethiopic symbol is typed out with 2 to 4 letters (1 or 2 letters for the consonant value and 1 or 2 letters for the vowel) with a forward slash placed in between. This means that, since there isn’t a one-to-one relationship between the character in the Preview Pane and the characters in the LGM Entry Pane, that to remove a character completely requires that the backspace key be pressed several times. The forward slashes then provide useful visual cues to know how far to delete.
Armenian is alphabetic rather than syllabic, but because there are so many consonants, and we encounter Armenian so infrequently, no serious thought was given to the best encoding scheme – instead a scheme similar to Ethiopic was designed, with each letter spelled out and separated by forward slashes. As with Ethiopic, using the character palettes will be the easiest input method, and the forward slashes provide a guide for when deletion is necessary. Armenian is a good candidate for getting a better encoding system eventually.
Both Ethiopic and Armenian support alternate encoding systems where you see Unicode in both the Preview Pane and the Text Input Box. These methods are selected in the drop-down menu next to the language selector.
Contractors, please consult ‘eth - Geez Writing System.pdf’ and the ‘arm - Armenian Writing System.pdf’ for more information on encoding these languages.
Version 0.9b of Shibboleth now uses the Antinoou font for Coptic, the first font to support the newer combining macrons designed to encode Coptic abbreviations properly. Contract partners should consult the document entitled Logos Coptic Encoding.pdf to learn about the issues when encoding Coptic, including the differences between the different overline characters.
Ugaritic, South Arabian and Old Persian Cuneiform Notes
Ugaritic cuneiform and South Arabian both use a wide variety of very similar looking shapes. The character palettes for Ugaritic and South Arabian group symbols based on physical features so that commonly confused characters appear in close proximity to each other, making it easier to find the right glyph.
Logos uses many characters in the Private Use Area to encode the wonderful variety of Ugaritic glyphs. We then have special indexing code that allows one to search on the standard Unicode code point and find the variant glyphs that are located in the Private use Area. But for folk Using Shibboleth Ugaritic for other applications, or with fonts other than our own Zebul Open, you may wish to switch your Character Palette to the Simple Alphabet, which only contains the official Unicode code points.
Our contractor partners should consult the documents entitled ‘uga - Logos Ugaritic Encoding.pdf’ and ‘sab – Logos South Arabian Encoding.pdf’ for detailed information on encoding Ugaritic and South Arabian.
Hittite and Akkadian Cuneiform Notes
The default glyph shapes for Akkadian in the Unicode standard are more or less Old Babylonian. But the same code points are used for Neo-Assyrian and Hittite, even though the glyphs can look quite different. (This may change in future versions of Unicode.) To copy Neo-Assyrian glyphs, choose the Akkadian language and then switch fonts. Since Hittite uses a separate language code, change to the Hittite language (rather than just changing fonts) to get to the Hittite glyphs.
Hittite and Akkadian cuneiform have several character palettes to resort the glyphs according to different sign lists. Because the Unicode specification hasn't yet supported each cuneiform dialect separately, in many cases a symbol in one dialect is made up of 2 symbols from another dialect that should be treated as one new glyph, but are not in the current Unicode specification. There will be times when selecting one character in the sign list will actually put several characters in the edit pane at once to account for these sorts of discrepancies. That means if you enter the wrong glyph, you'll have to carefully check how far to delete - it may not be as simple as hitting the Backspace once.
The Hittite fonts represent variant glyphs. At this time, Shibboleth only supports one font at a time, so if you need a variant glyph from UllikummiC, you'll want to copy and paste out what you've done with UllikummiA first, then clear the edit box and get the glyph you need from UllikummiC, then clear the box and switch back to UllikummiA. It is planned that the next release of Shibboleth will support font changes and the kinds of simple formatting useful in transliterations, like superscript and subscript.
We have included support for the Private Use (i.e. non-standard) glyphs from the Ullikummi fonts as well, though these may be replaced by official code points if Unicode gets around to giving Hittite its own official code page instead of recycling Akkadian.