Shibboleth User Manual
Version 0.9
Introduction
Shibboleth is designed to assist in the text entry of foreign languages by providing a visual interface to help you select the correct characters. At the time of writing, Shibboleth supports Hebrew, Aramaic, Greek, Syriac, Arabic, Ethiopic, Armenian, Coptic, Ugaritic, South Arabian and Transliteration.
Shibboleth is designed to output the foreign languages in Unicode, the international standard for multi-lingual encoding. If you don’t know much about Unicode, the next section provides a brief overview. If you’re Unicode savvy, skip it.
Unicode
In the bad old days, each character in a given font had a unique 8 bit code. A bit is a unit that can store a one or a zero. So 8 bits allowed one font to have a maximum of 256 characters (2 to the 8th power), and every font shared the same 256 available code points. So that computer users could switch fonts with impunity, standards organizations assigned each letter of the Roman alphabet, as well as European numerals and common punctuation marks, to set code points. In this 8-bit world, the common solution to multilingual encoding was to use the same code points used for Roman encoding to encode foreign scripts. For example, the ASCII code point for the letter ‘A’ (00101001 in base 2 or bits, which can be expressed as ‘29’ in base 16 or ‘41’ in base 10) would be used for a Greek Alpha. Then if one wanted to see an Alpha instead of an A, they would change the font to a Greek font.
There were four major problems with this approach: 1) some languages needed more than 256 characters and thus required font changes even within words to encode all the marks. 2) There was no standardization for the encoding of the marks. For example, one Greek font Chi on the ASCII code point for X, and another might place it on C. Then if a user switched from one Greek font to another, the Greek text would break. 3) There was no way to write universally useful searching tools for working in foreign languages, because to a search engine, an Alpha was indistinguishable from an ‘A’ (and from a Hebrew Aleph, and so forth). These letters all use the same code point internally. 4) There was no universal solution for right-to-left languages. For years many computer platforms have required users to type Hebrew backwards!
The solution to these problems was to switch from 8 bit encoding to 16-bit encoding. This provides over 65,000 code points to assign characters to. Now a Greek Alpha can have a separate code point from those used to encode an English A and a Hebrew Aleph. The Unicode Consortium is responsible for publishing the standard code points for each script, so users can switch at will between any Unicode font that has the characters they need without having to change their encoding scheme. Like the humble 8-bit byte of yesteryear, the 16 bit Unicode points are often described using 4 hexadecimal digits with values from zero to f (the next value after 9 is ‘a’, rather than ‘10’ in base 16). The Roman A is still assigned to 41 for backwards compatibility, but in Unicode we’ll more often call this 0041 to reflect the use of 16 bits. The Greek Alpha is assigned to 0391 and the Hebrew Aleph is 05D0.
(Unicode also allows for characters that use more than 16 bits, however very few software companies, including present editions of Microsoft software, handle characters with more than 16 bits gracefully. Ugaritic – supported in the latest edition of Shibboleth - uses 24 bits per character.)
System Requirements
Shibboleth is designed for Windows XP and Windows Vista. It requires the installation of version 3 of the .NET framework, which is included in Windows Vista, but may be downloaded for Windows XP users.
Ugaritic is one of the first languages to use more than 16 bits to encode each letter, so many older applications (and some current ones) will not support it well. If you plan to use Shibboleth to copy Ugaritic text to Microsoft Word, you will want to be running Word 2007 (or a later version).
Installation Instructions
The installation of Shibboleth is fairly straightforward, but a number of supporting files must also be installed for full functionality.
- If you are running Windows XP, you must download version 3 of the .NET framework. There is a link to Download .NET on the Shibboleth installation page. It can also be found on Microsoft.com.
- Click the Download Shibboleth button on www.logos.com/Shibboleth.
- Shibboleth uses embedded versions of all the necessary fonts, but if you are copying text into a word processor, you will need to download and install the various fonts. Users of the current version of Logos Bible Software may already have all of these fonts installed. There are links to download the fonts at www.logos.com/Shibboleth. These fonts are typically copied to c:\Windows\Fonts. At the time of writing, the fonts required are: SBL Hebrew, Ezra SIL, Gentium, GentiumAlt, BibliaLS, Serto Jerusalem, East Syriac Adiabene, Sinaiticus, Abyssinica SIL, New Athena Unicode, Musnad Sabaic, Charis SIL, Scheherazade, and Zebul Open. Some of these fonts have undergone extensive revisions, so it is important to have the latest version. Shibboleth also uses some fonts that come with Windows, such as Times New Roman, Estrangelo Edessa and Palatino Linotype.
Getting Started
To launch Shibboleth, click on the short-cut that was added to your Start menu during installation. The default location for the short-cut is Start | All Programs | Logos Bible Software | Shibboleth.
You’ll see something like this (but with no red labels or text entered):
The Ribbon
The main ribbon, stretching across the top of Shibboleth, is divided into three sections. The first section with the green button, the Language Selector, consists of two drop-down boxes. The first drop down (which does the same thing as clicking the green button) allows you to choose which language you intend to encode. You’ll notice that both Aramaic and Hebrew are listed, even though they use the same script and the same Unicode code points. The difference is that when these are copied, they will be wrapped in different XML tags that are used in Logos books to allow one to search Aramaic or Hebrew separately or KeyLink to different lexicons.
The second drop down allows one to select between different views for the text input box. Many languages only have one option here, but for Ethiopian and Armenian, you can choose between seeing Unicode in the edit box or seeing the names of the syllables typed out in Roman characters. For Hebrew and Syriac, you can choose between seeing Unicode in the Text Input Box as you type or seeing an old-school ASCII equivalent. Regardless of what you see as you type in the Text Input Box, you will see Unicode in the Preview Pane.
The middle section of ribbon, the Copy Options, is dominated by a yellow button. This is the button you click to copy the contents of the Preview Pane to your Windows clipboard. From there, you can paste the text into other applications. In the current release, we do support a short-cut of Alt+c for copy. This is different than the Windows standard of Ctrl+c. We happen to be using the Control shift state for some rare Coptic characters. For the next release, we will likely free up Ctrl+c and Ctrl+v (for pasting) so that the Windows standard shortcuts work. But for now, Alt+c and Alt+v are available.
The options to the left of the Copy Button allow you to choose the format of the text to be copied. The default option is Unicode, which copies Unicode characters in plain text, wrapped with XML tags. ‘Escaped’ indicates the use of hexadecimal codes to represent the Unicode text in a way that is sometimes easier for some programs to manipulate. In this mode, for example, a Hebrew aleph would copy out as א instead of the literal character.
The Ascii option allows one to copy text out in our older, pre-Unicode encoding formats. These are useful for us at Logos, because they allow us to use Shibboleth to maintain older resources. (However, for the transliteration ‘language’ only Unicode and Escaped are really supported, so Shibboleth may not be used to edit transliterations encoded in SemiticaDict, our old transliteration solution.)
The right-most section on the ribbon is dominated by a blue Paste Button. You can copy text into your Windows clipboard from an external document and paste it into Shibboleth for editing by clicking the Paste Button. Shibboleth needs to know if what is being pasted in is Unicode or ASCII, so that it will treat the string properly. If you are working on Logos XML documents, you may copy the XML language tags as well as the language string itself. These tags will be stripped when pasted into Shibboleth, but added again when you click the Copy button, making it easy to paste right over the old word, tags and all.
The Application Window
Below the ribbon, the main Shibboleth screen is divided into 4 sections: three stacked horizontally along the left side, and a long 4th section covering the right side of the application window.
The top left pane is the Preview Pane. This allows you to see the foreign language texts previewed in the chosen font. Above the Preview Pane the font being used for display is listed. If more than one font is listed, you can click on the font name to change fonts. This is particularly useful in languages like Syriac where there are many different scripts that encode the same language. In this case, switching fonts is not merely cosmetic, but vitally important to helping you select the select the correct characters.
Beneath the Preview Pane is the Text Edit Box. Shibboleth supports many methods of text entry (described in the next section), but they all put characters into the Text Edit Box. This is where you edit the string of text you are encoding.
This window always uses unescaped characters. This is an important change from previous encoding methods used by our contractors, where we were often required to type high ASCII values in escaped form, for example, instead of a literal broken vertical bar (¦), the high ASCII value ¦ was used. Escaped high ASCII was easy to work with because there were no standardized keys to place the high ASCII values on. However, Shibboleth does not allow one to type the escaped form – such a form will be treated as a literal &#xxx; instead of being converted to the proper character.
The principle benefit of using the literal high ASCII characters instead of the escaped high ASCII in the ASCII versions of the Text Edit Box, is that if one types in the wrong key, one only has to hit the backspace once – rather than 5 times – to get rid of the incorrect character in the Text Edit Box. When users were typing escaped forms in LGM, we often encountered doubled or missing semi-colons and ampersands for this and similar reasons, so the accuracy of the input should improve with the abolishment of typing escaped high ASCII.
The x in the upper right corner of the Text Input Box clears the box.
Below the Text Edit Box is the Virtual Keyboard. On the right side of the application window are the Character Palettes. These will be described in the next section.
Text Entry Methods
There are a number of different ways to enter text into Shibboleth.
- Type text directly into the Text Edit Box. The Keyboard Pane at the bottom of the screen shows the currently selected encoding system, mapped to keys on a keyboard. The Shibboleth keyboards use many shift states to access a wide range of characters. The main shift states are: normal (unshifted), Shift, AltGr (the right Alt key or Ctrl+Alt), Shift+AltGr and Ctrl+Shift. As you hold down the various keys that toggle the shift states, the keyboard down below will change to show what is encoded on the various shift planes. (Other less commonly used shift states that are supported include the Ctrl state, the SGCaps state and the Shift+SGCaps state. SGCaps is entered by toggling the CapsLock key for languages that support Swiss German Caps instead of using CapsLock to toggle to the Shift state.
- The keys on the on-screen Virtual Keyboard can also be clicked on with a mouse to insert the character. The keys that toggle the shift states can be clicked as well, and they will un-toggle when either clicked again, or the physical key is pressed. You can also hover your mouse cursor over the keys on the onscreen keyboard to get a pop-up with the name for the character, which can be useful in identifying marks that are very similar in appearance. The keyboard and the mouse can be used in combination, for example by holding the shift key to toggle to the shift plane, but clicking on the key with the mouse.
- The right-most pane contains one or more character palettes which characters organized in a logical fashion. The Hebrew palette, for example, is sorted to assist in encoding marks in the correct order, so consonants are listed at the top, followed by consonant distinctors, then vowel marks, then accent marks, Hebrew punctuation marks, and commonly used non-Hebrew characters. Some languages have more than one palette, selectable by the tabs at the top of the pane. Armenian, for example, has one palette that mixes capital letters and lower case letters, and another palette that separates capitals from lower case, so the user can choose which layouts is most desirable. Like with the keyboards on the lower-left pane, hovering the mouse cursor over the characters in the palette will display the character name to help identify the correct character.
- To reduce clutter, and reduce the potential of mistaking a very common character for a similarly shaped rare character, some extremely rare marks are placed on the keyboard, but not on the palettes. For this and other reasons, in some instances users may wish to find marks in the printed documentation of the encoding standards. If you are using one of the encodings where you are seeing Roman characters in the Text Edit Box, you can make use of the escaped high ASCII characters listed in the documentation. But you cannot type ¦, for example. However, for ease of entry, it is possible to use the left Alt key and the Number Pad to enter high ASCII values. Simply hold the LeftAlt key and type 0+ the high ASCII value and then release the LeftAlt key. It is important not to forget the 0. So if the print documentation has ¦, you can type LeftAlt+0166 and a literal ¦ will paste into the Text Edit Box, and the Unicode value that converts to will display in the Preview Pane.
- For those encoding methods that expect Unicode in the Text Edit Box, you can type in a 4 digit hexadecimal Unicode value, such as 05D0 for the Hebrew letter Aleph and then type Alt+x and it will be translated into the literal Unicode value. This can be especially helpful when inputting a very rare character that the font supports which is not on the character palette. This can also help you by letting you find the character you need in the official Unicode documentation and use that to enter the correct character in Shibboleth. This method only works properly all the time if the encoding option in the dropdown next to the language selector is one of the Unicode methods, and the language doesn’t have a keyboard that prevents the letters a through f from being written without converting to a foreign script. (This will be most useful for Transliteration, since the Charis SIL font supports a wide range of characters. But it can also be used with Ethiopic and Armenian if in the utf8 encoding mode.) Alt+x just grabs the previous 4 characters when transforming the code you type to Unicode, which means it does not work for Ugaritic, which uses more than 4 hexadecimal digits to describe its characters. However, this was deemed desirable compared to MS Word’s use of Alt+x where if you wanted to put a macron over the letter a and typed a0304, since ‘a’ is a valid hexadecimal number, Word grabs five letters back and returns a gibberish character.
Greek, Hebrew and Aramaic Notes
Hebrew has a few marks that will not display properly using the current version of Microsoft’s C# rendering functions, but will show a white box or dotted circle instead. You can hover over these boxes to get a name of the character that is not being shown. The left-shifted Masora Circle (the one that appears between consonants instead of directly over them) has this problem, and the Masora circle displays over a dotted circle. These marks will look fine when viewed within the Libronix DLS application, and they will look fine in MS Word, which uses a different rendering engine than Microsoft’s software development tools.
Our contract partners will need to read the language specific documentation found in the following documents: ‘grc - Logos Biblical Greek Encoding.pdf’, ‘heb - Logos Biblical Hebrew Encoding.pdf’ Consult these manuals before and while keying these languages to properly encode them.
Arabic and Syriac Notes
For Arabic and Syriac, each letter can have up to four shapes, depending on if it stands alone, or occurs at the beginning, middle or end of a word. In Shibboleth, all four forms are shown in the Character Palettes, making it easier to select the right character no matter which shape the letter is taking.
Out contractors should read ‘ara – Buckwalter Writing System.pdf’ and ‘syr - Logos Syriac Encoding.pdf’ for more information on keying these languages.
Ethiopic and Armenian Notes
Ethiopic and Armenia are handled somewhat differently. Ethiopic is based on one mark per syllable. Since there are more than 40 consonants and 8 vowel sounds, there is no good way to use one ASCII character per Ethiopic symbol. Instead, each Ethiopic symbol is typed out with 2 to 4 letters (1 or 2 letters for the consonant value and 1 or 2 letters for the vowel) with a forward slash placed in between. This means that, since there isn’t a one-to-one relationship between the character in the Preview Pane and the characters in the LGM Entry Pane, that to remove a character completely requires that the backspace key be pressed several times. The forward slashes then provide useful visual cues to know how far to delete.
Armenian is alphabetic rather than syllabic, but because there are so many consonants, and we encounter Armenian so infrequently, no serious thought was given to the best encoding scheme – instead a scheme similar to Ethiopic was designed, with each letter spelled out and separated by forward slashes. As with Ethiopic, using the character palettes will be the easiest input method, and the forward slashes provide a guide for when deletion is necessary. Armenian is a good candidate for getting a better encoding system eventually.
Both Ethiopic and Armenian support alternate encoding systems where you see Unicode in both the Preview Pane and the Text Input Box. These methods are selected in the drop-down menu next to the language selector.
Contractors, please consult ‘eth - Geez Writing System.pdf’ and the ‘arm - Armenian Writing System.pdf’ for more information on encoding these languages.
Coptic Notes
The New Athena Unicode font isn’t perfect, especially when it comes to diacritical marks functioning properly. In particular, the macron appears very low, often colliding with the letter, while the combining overline appear high, but does not always (ever?) join with adjacent combining overlines. Until a more suitable Coptic font replaces New Athena Unicode, you will have to look at the height of the line, rather than whether it joins adjacent lines, when proofing to see if the correct mark has been chosen.
Contract partners should consult the document entitled Logos Coptic Encoding.pdf to learn about the issues when encoding Coptic, including the differences between the different overline characters.
Ugaritic and South Arabian Notes
Ugaritic cuneiform and South Arabian both use a wide variety of very similar looking shapes. The character palettes for Ugaritic and South Arabian group symbols based on physical features so that commonly confused characters appear in close proximity to each other, making it easier to find the right glyph.
Logos uses many characters in the Private Use Area to encode the wonderful variety of Ugaritic glyphs. We then have special indexing code that allows one to search on the standard Unicode code point and find the variant glyphs that are located in the Private use Area. But for folk Using Shibboleth Ugaritic for other applications, or with fonts other than our own Zebul Open, you may wish to switch your Character Palette to the Simple Alphabet, which only contains the official Unicode code points.
Note that at the time of writing, South Arabian is encoded from left to right, even though it is a right to left language. This will be fixed when Unicode is updated to support South Arabian correctly. Until then, we must encode it backwards.
Our contractor partners should consult the documents entitled ‘uga - Logos Ugaritic Encoding.pdf’ and ‘sab – Logos South Arabian Encoding.pdf’ for detailed information on encoding Ugaritic and South Arabian.
Last Updated: 12/20/2007