The vBulletin Unicode How-To

Heathen Dawn · Sep 21, 2003

Here begins the vBulletin Unicode How-To

What is Unicode?

Your US PC keyboard contains the following characters:

  ! " # $ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^ _
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~

Those are the graphical characters of the ASCII character set, a character set capable of encoding 128 characters (the 33 that don’t appear are control characters). ASCII is adequate only for English and for a few other Latin-script languages that don’t use any diacritics (accents, umlauts etc) on the letters. Once you step outside of English, ASCII isn’t enough. For that purpose, a new character set called Unicode was devised. Unicode can encode 1,112,064 characters, enough for all living languages and for many historical scripts. Each character has its own unique, unambiguous number in Unicode.

With Unicode, one can write, for example, mathematical symbols, polytonic Greek as in the New Testament, Biblical Hebrew with cantillation marks and the Arabic of the Qur’an. The ability of Unicode is therefore an asset in debates on science and religion, which are so common on the boards.

Using Unicode on vBulletin boards

The way to use Unicode on boards based on vBulletin software (and also UltimateBB, but not EZBoard) is to use numeric character references, or NCRs for short. NCRs are escape sequences for representing Unicode characters in web pages. A valid NCR consists of the following sequence: ampersand (& ), hash mark (#), Unicode number in decimal, and semicolon. For example, the Unicode number of Hebrew Letter Alef is 1488 in decimal, so it is written as &#1488; which gives the following: א

Looking up the Unicode number

For an individual character it is efficient to look up its number and type it manually. This can be done either with a character map utility, such as is available on Windows XP (charmap.exe), or by looking up the database on http://www.unicode.org/charts/. Both sources give the Unicode number in hexadecimal (base 16); to convert it to decimal, use a calculator (such as Windows’ calc.exe). For example, 05D0 is hexadecimal for 1488. Another good resource is Alan Wood’s pages on Unicode, at http://www.alanwood.net/unicode/, which gives the decimal value as well.

Batch converting

For a long run of characters, manually typing the numeric character references is too tedious and time-consuming. If you have an operating system that supports Unicode, such as Windows 2000 or XP or Linux (eg Red Hat 7.0 and onwards), it is best to input the characters the natural way, or to copy from them an Web source. Then it is possible to convert them all to NCRs by software.

On Linux there are utilities such iconv or recode enabling one to convert from Unicode to HTML notation, which means NCRs. All that is necessary is to run the Unicode text file through such a utility and produce a new file in which there are NCRs to be copied and pasted on the boards.

On Windows XP (or 2000), such utilities are usually lacking. The best way to convert Unicode to NCRs is to use Internet Explorer. Follow these stages:

1. Write your text file and save in Unicode.
2. Drag the text file icon into an Internet Explorer window.
3. Choose “Save As” from the file menu.
4. Save as a text file, but in a different encoding, such as “Baltic (ISO)”

The new text file will contain your Unicode text in NCRs. It is important to save the text file in a different encoding than the characters in the file. For example, if your Unicode text file contains Hebrew characters, don’t save as “Hebrew (Windows)”, but as a different encoding, such as “Greek (ISO)” or “Baltic (ISO)”.

For example, the following Hebrew text, Genesis 1:1:

בראשית ברא אלהים את השמים ואת הארץ

can be converted into NCRs:

&#1489;&#1512;&#1488;&#1513;&#1497;&#1514; &#1489;&#1512;&#1488; &#1488;&#1500;&#1492;&#1497;&#1501; &#1488;&#1514; &#1492;&#1513;&#1502;&#1497;&#1501; &#1493;&#1488;&#1514; &#1492;&#1488;&#1512;&#1509;

Pasting the above NCRs into the input box of the board will result in the Hebrew text.

Font issues

For a character to be displayed correctly, there needs to be not only a Unicode number—that is the easy part—but also a matching font. Here is where things get tricky. For modern monotonic Greek, modern Hebrew without cantillation marks and Arabic without Qur’anic marks, most fonts suffice, so setting a font isn’t necessary. However, for polytonic NT Greek or Hebrew with cantillation marks, a special font must be specified. The problem is that the special font is not always available on the viewer’s computer.

For polytonic NT Greek, three suitable fonts are Palatino Linotype, Athena and Arial Unicode MS. In Linux there is no problem, because Linux comes complete with polytonic Greek in its system fonts. To specify a font, surround the NCR text with a font markup:

[font="Palatino Linotype, Athena, Arial Unicode MS"]&#7977; &#7936;&#947;&#8049;&#960;&#951; &#956;&#945;&#954;&#961;&#959;&#952;&#965;&#956;&#949;&#8150;, &#967;&#961;&#951;&#963;&#964;&#949;&#8059;&#949;&#964;&#945;&#953; &#7969; &#7936;&#947;&#8049;&#960;&#951;[/font]

giving

[font="Palatino Linotype, Athena, Arial Unicode MS"]Ἡ ἀγάπη μακροθυμεῖ, χρηστεύεται ἡ ἀγάπη[/font]

which should be viewable on computers with Windows 2000 or XP or Linux, but not in Windows 98.

For Hebrew with cantillation marks the situation is harder. Fonts containing it are mostly special downloads; the only free one is Arial Unicode MS, which is present on the system only if Office 2000 or later has been installed. It is therefore best to avoid Hebrew with cantillation marks or Ethopian or Runic, which have no fonts available even in Windows XP. Indic scripts, Georgian, Armenian, Thai, Chinese and Japanese can be used, but they will be viewed properly only on those Windows XP installations where the user has installed support for them. Linux users can view all those except Indic scripts, for which support on Linux is very difficult.

For special symbols, such as signs of the zodiac, a font such as Lucida Sans Unicode or Arial Unicode MS must be specified. The former font is available on Windows 2000 or XP. Linux users already have those symbols in their system fonts.

Browser support

Proper display of Unicode characters is dependent on the browser. Internet Explorer beginning with version 5.0 can display Unicode, as can Mozilla from version 1.3 upwards. Version 4.0 browsers of Netscape and Microsoft don’t display Unicode properly. On Linux, Konqueror and Mozilla 1.3+ can display Unicode, including Arabic. See Alan Wood’s pages on Unicode to learn about setting up browsers for Unicode display.

Happy Unicoding! Please contribute to this How-To with questions for me to answer.

Here ends the vBulletin Unicode How-To.

Heathen Dawn · Sep 24, 2003

The source text file for the vBulletin Unicode How-To is on my website, here. Anyone who wishes to post the How-To on another vBulletin board can copy from the text file and paste on the board. Caution: when posting on vBulletin 3.0, use the Standard Toolbar instead of the Enhanced (WYSIWYG) Toolbar.

ashibaka · Sep 24, 2003

This is useful. Maybe it should be in the CF Help section.

Heathen Dawn · Sep 25, 2003

ashibaka said:
This is useful. Maybe it should be in the CF Help section.

It already is in the CF Help section. It’s now also part of the FAQs on Internet Infidels Discussion Boards (iidb.org).

Search

Search

We hope the site problems here are now solved, however, if you still have any issues, please start a ticket in Contact Us

The vBulletin Unicode How-To

Heathen Dawn

Gesta Dei per Francos

More options

Heathen Dawn

Gesta Dei per Francos

More options

ashibaka

ShiiAce

More options

Heathen Dawn

Gesta Dei per Francos

More options

Need help?

Similar threads