Skip to content. | Skip to navigation

Sections
You are here: Home content generated doc.free mohsen PLPC 120036 current articleEnFa

Persian Input Methods

Persian Input Methods


Persian Input Methods
For Emacs And More Broadly Speaking

شیوه‌هایِ درج به فارسی‌


Document #PLPC-120036
Version 0.4
September 19, 2012


Mohsen BANAN محسن بنان
Contact:
http://mohsen.1.banan.byname.net/contact





Copyright ©2012 Mohsen BANAN

Permission is granted to make and distribute complete (not partial)
verbatim copies of this document provided that the copyright notice
and this permission notice are preserved on all copies.


اجازِه چاپ مجدد وجود دارد تا هنگامى که اين اعلام اجازه روى همه کپيها موجود باشد.



Contents

1  Introduction

There are several things we want to accomplish with this document.

1.1  Goal: Widespread Usage of Persian in Emacs

Our first goal in providing this document is to facilitate writing in Persian in the Halaal/Convivial quadrant.

That begins with the promotion of use of Emacs for writing in Persian. Emacs is the ne plus ultra Halaal/Convivial multi-lingual user environment in existence today. It is quite simply the best, far surpassing any other currently available toolset. For complete information about Emacs see http://www.gnu.org/software/emacs/.

The word “Halaal” is very strong and very loaded. For our usage and meaning of this word see our document titled, Introducing Halaal and Haraam into Globish – Based on Moral Philosophy of Abstract Halaal [4] – معرفیِ حلال و حرام به بقیه‌یِ دنیا

For a definition of Halaal Software see our document titled, Defining Halaal Software and Defining Halaal Internet Services [3]. This document is also available in Farsi as [5]. – تعريف نرم افزار حلال و تعريف خدمات اينترنتى حلال

Our use of the terms “convivial” and “conviviality” is based on Ivan Illich’s Tools For Conviviality. For our use of these terms see our document titled, Introducing Convivial into Globish [1]. In that document we also define the “Halaal/Convivial Quadrant.”

Emacs has had full Unicode support for many years. Starting with Emacs 24, full native bidi (bidirectional) support is now also available. Multiple Persian input methods are part of the Emacs 24 distribution. These input methods are documented here.

Emacs comes with a rich mail reader, a personal planner, an address book, a calendar, spell checkers for English and Persian, multi-lingual dictionary interfaces and many other tools and packages; all integrated together. Because Emacs supports Persian, all these tools and packages also support Persian.

Most Iranians today use Microsoft Windows products such as MS Word and MS PowerPoint in the Haraam/Industrial quadrant. Microsoft Windows is closed, proprietary software made by an American corporation.

Our goal is to enable and encourage the transition of Iranians from the proprietary Microsoft Windows products in the Haraam/Industrial quadrant, to the far superior Emacs in the Halaal/Convivial quadrant.

This document provides enough information to enable anyone to obtain Emacs and begin using it as her/his Persian user environment.

1.2  Goal: Widespread Adoption of Persian Blee

Our second goal is to promote the use of Blee (the ByStar Libre Emacs Environment [8]) among Persian speakers in general, and Iranians in particular.

Blee is a layer above Emacs that integrates GNU/Linux capabilities into Emacs, and provides close integration with the ByStar Services. The ByStar Federation of Autonomous Libre Services is a unified Halaal services model, unifying and making consistent a large number of services that currently exist in functional isolation. It is a coherent, integrated family of services, providing the user with a comprehensive, all-encompassing Internet experience. For information about Libre Services see our document titled, Libre Services: A non-proprietary model for delivery of Internet services [7]. For information about the ByStar Federation see our document titled, The ByStar Federation of Autonomous Libre Services [2].

The present document provides enough information to allow a ByStar Autonomous Libre Service owner to use Blee as her/his Persian Halaal Software-Service Continuum.

1.3  Goal: Evaluation of Applicability of farsi-transliterate-banan to other User Environments

Our third goal in producing this document is to encourage adoption of the Multi-Character Persian Reverse Transliteration in other Halaal Digital environments in general, and in Gnome in particular.

This document provides enough information so that in addition to Emacs, implementation of the “Banan Multi-Character Transliteration Persian Input Method” is possible in other user environments.

2  About this Document

The primary URL for this document is: http://www.persoarabic.org/PLPC/120036. The pdf format is authoritative.

Distribution of this document is unrestricted. We encourage you to forward it to others.

We can benefit from your feedback. Please let us know your thoughts. You can send us your comments, criticisms and corrections via the URL http://mohsen.1.banan.byname.net/contact, or by email to feedback@ our base domain, which is mohsen.1.banan.byname.net.

We thank you for your assistance.

3  Scope and Context

Use of the Latin character keyboard to input Persian text into machines, and more generally use of the Latin alphabet for writing in Persian, is an old topic with a lot of history. We reference some of this history and prior work later in this document. See Section 8 for more information.

The terminology in this area is often ambiguous or misused, causing confusion when addressing this topic. In this document we will be consistent in our own terminology, taking pains to define the more ambiguous or problematic terms carefully. We do this by providing our own definitions, or by referencing external definitions.

3.1  Persian vs Farsi

Our use of the terms “Persian” and “Farsi” is consistent with the definitions of these terms established by the Society of Iranian Linguists. Their definitions are available at:
http://www.iranianlinguistics.org/page.cgi?page=persian

In the section below titled “Persian Language” we reproduce the relevant parts of their text.

The current implementation of Persian input methods for Emacs is for Farsi only. Thus in the current implementation the terms Persian and Farsi may be considered equivalent, and in the present version of this document we use these terms interchangeably.

We plan to expand the implementation in the future to include other Persian language variations.

3.1.1  The Persian Language

Persian is an Iranian language within the Indo-Iranian branch of the Indo-European languages. It is spoken in Iran, Afghanistan, and Tajikistan and has official-language status in these three countries.

There are three modern varieties of standard Persian:

  • The Persian variety spoken in Iran has also been called Iranian Persian or Farsi. The writing system is an extended version of the Arabic script.
  • Dari Persian has been used to refer to the Persian language spoken in Afghanistan and Uzbekistan. It uses the same writing system as Iranian Persian.
  • Tajik or Tajiki Persian is the variety used in Tajikistan, Uzbekistan and Russia. Unlike the Persian used in Iran and Afghanistan, it is written in an extended version of the Cyrillic script.

3.2  Terminology

Here we reference the external definitions of various words we will use. Note that our reference to a Wikipedia article in the list below does not necessarily mean that we endorse or conform to their definition; it means only that it exists as an external definition and that we have made the trade-off of mentioning it.

Transliteration – نویسه گردانی:
http://en.wikipedia.org/wiki/Transliteration
Romanization:
In the context of Persian, this amounts to same thing as transliteration.
http://en.wikipedia.org/wiki/Romanization_of_Persian
Latin vs Roman:
In the context of alphabets, we use these terms interchangeably.
Transcription:
This term is not used in this document.
http://en.wikipedia.org/wiki/Transcription_(linguistics)
Pinglish/Finglish:
An informal and loose transliteration for human-to-human communication. Pinglish is word oriented. The Multi-Character Transliteration Input method is character oriented.
Pinglish Web Services:
For example, behnevis.
Persian Multi-Character/Composite Transliteration:
Synonymous with Multi-Character Reverse Transliteration Input Method.
Persian Multi-Character/Compos-it Reverse Transliteration:
Transliteration was the process by which خ became “kh”. Now, the route by which “kh” is becoming خ is reverse-transliteration. But we continue to refer to it as transliteration. farsi-transliterate-banan defined in this document is an example of the Multi-Character Reverse Transliteration Input Method. See Section 5.2 for details.
Input Method / Emacs Input Method:
An "input method" is a kind of character conversion designed specifically for interactive input.
Mapping Input Method:
This simplest kind of input method works by mapping ASCII letters into another alphabet; this allows you to use one other alphabet instead of ASCII.
Composite Input Method:
A more powerful technique is composition: converting sequences of characters into one letter. For example “kh” becomes خ.

3.3  Overview of the Full Picture: The By* Halaal Digital Ecosystem

This document is part of a bigger picture.

We want the world to move towards Halaal Software, and Halaal Internet Services.

The totality of our work is directed towards creation of the ByStar Halaal Digital Ecosystem, as a moral alternative to the proprietary American digital ecosystem. An overview of this is provided in our document titled, The ByStar Halaal/Libre Digital Ecosystem: A Moral Alternative to the Proprietary American Digital Ecosystem [6], available on-line at: http://www.persoarabic.org/PLPC/180016. In that document we present a complete picture for establishing a model and process that can redirect the manner of existence of software and Internet services towards safeguarding humanity. We also describe the framework that is already in place for collaboration and we invite you to participate in this work.

4  Persian With Emacs

This information applies to emacs version 24.2.50.1 or higher.

Enabling Persian in Emacs is very simple.

If you already are an emacs user, you can skip over to section 4.3 and continue reading from there.

If you are completely new to emacs, the information below is sufficient to permit you to install emacs, enable Persian and start using emacs as your Persian user environment.

4.1  About Emacs

Emacs is world’s most potent multilingual editor-centered user experience platform. Emacs comes with a rich mail reader, a personal planner, an address book, a calendar, spell checkers for English and Persian, multi-lingual dictionary interfaces and many other tools and packages; all integrated together. Because Emacs supports Persian, all these tools and packages also support Persian.

Some useful links to emacs related resources are included below:

4.2  Obtaining Emacs

Emacs is halaal/libre/free software.

The primary access page for emacs is:

You can obtain the sources for emacs and build it yourself or you can obtain pre-built binaries.

Instructions for obtaining emacs in various forms and for various platforms are also included below.

4.2.1  Obtaining Emacs Sources

When Emacs 24.3 is released You can obtain the source for emacs 24.3 with:

The latest version from the repository trunk can be obtained with:

git clone git://git.savannah.gnu.org/emacs.git

Then you can build emacs from sources by following the instructions.

4.2.2  Binaries For Debian GNU/Linux and Ubuntu

Snapshots of the repository trunk are regularly built for Debian and Ubuntu. You can obtain these from:

Once Emacs 24 is included in distributions of Debian and Ubuntu, all you have to do is:

sudo apt-get install emacs

4.2.3  Binaries For MS Windows

We do not encourage use of any software on the proprietary/haraam Microsoft Windows platform. From the Halaal Software perspective, use of any software under Windows is at best makruh – مکروه . Use GNU/Linux instead.

Snapshots of the repository trunk are regularly built for MS Windows. You can obtain these from:

4.3  Obtaining Persian Blee

Blee (the ByStar Libre Emacs Environment [8] is a layer above Emacs that integrates GNU/Linux capabilities into Emacs, and provides close integration with the ByStar Services. The ByStar Federation of Autonomous Libre Services [2] is a unified Halaal services model, unifying and making consistent a large number of services that currently exist in functional isolation.

Information about obtaining Blee can be found at: http://www.persoarabic.org/PLPC/180004

4.4  Selecting Persian Language

Using Emacs menus, select:
“Options” - “Multilingual Environment” - “Set Language Environment” - “Persian”.

Or you can select the Persian language with Emacs commands.

The notation “M-:” in the following commands means you press the “Meta” key (often the Esc key) followed by “:”. The “M-:” is then followed by the elisp form. For some commands the “M-:” does not appear; in this case you just need to eval the elisp form.

M-: (set-language-environment "Persian")

To see language environment settings, using Emacs menus, select:
“Options” - “Multilingual Environment” - “Describe Language Environment” - “Persian”.

or invoke the Emacs command:

M-: (describe-language-environment "Persian")

4.5  Selecting Persian Input Methods

Emacs comes with two built-in Persian input methods:

farsi-isiri-9149:
A Persian keyboard based on the Islamic Republic of Iran’s ISIRI-9147 specification. See Section 5.1 for details.
farsi-transliterate-banan:
An intuitive transliteration keyboard for Farsi. See Section 5.2 for details.

With Plain Emacs

With the language environment set to “Persian”, using Emacs menus, select:
“Options” - “Multilingual Environment” - “Toggle Input Method”.

Now, your keyboard is configured for Persian as farsi-transliterate-banan.

To activate the ISIRI-9147 keyboard, enter the command:

M-: (set-input-method ’farsi-isiri-9149)

To activate the transliterate keyboard, enter the command:

M-: (set-input-method ’farsi-transliterate-banan)

Alternatively you can select these options from the “Options-Multilingual Environment” menu.

To toggle back to the English keyboard type C-\ (hold down the Ctrl key while typing the character \).

To see a description of either input method, use the commands:

(describe-input-method 'farsi-transliterate-banan) (describe-input-method 'farsi-isiri-9149)

With Persian Blee

Using Persian Blee, just press the F6 key twice. Your input method and language environment (spell checking, dictionaries, etc.) are then all set to Persian.

Press the F6 twice again to toggle back to the English keyboard.

4.6  A Sample Farsi Editing Session

Let’s start from scratch and walk through the steps involved in writing a simple sentence both in Farsi and in English.

  • Install Emacs 24 on your system based on the information in section 4.1.
  • Open a file: (for example “example.fa”)
    Menu:”File” - “Visit New File” - “example.fa”
  • Select Persian Language (section 4.3)
    Menu: “Options” - “Multilingual Environment” - “Describe Language Environment” - “Persian”.
  • Select the farsi-transliterate-banan Persian Input Method (section 4.4)
    Menu: “Options” - “Multilingual Environment” - “Toggle Input Method”
  • Consider that we want to write:
    حالا، با نرم افزار حلال میتوانیم به فارسی سالم و خوش بنویسیم.
  • Note that we are not writing in pinglish. Ignore the vowels and think of the Persian writing above letter-by-letter.
    Now type:
    Hala, ba nrm afzar Hlal mitvanim bh farsi salm v khush bnvisim.
    
  • Toggle back to English C-\ or
    Menu: “Options” - “Multilingual Environment” - “Toggle Input Method”
  • Now enter something in English, for example:
    Now, with Halaal software we can write well in Persian.
    
    Note that the empty line between the Farsi paragraph and the English paragraph properly took care of directionality.
  • We are done, so let’s save the file and close this buffer.
    Menu: “File” - “Save”.
    Menu: “File” - “Close”.

Kool!

With Emacs, you are using world’s most potent multilingual editor-centered user experience platform. And it is Halaal/Libre/Free. And it is Gratis/Free-of-Charge. And it has everything – a Persian spell checker, an email interface, calendar, address book, personal planner, ...

To learn more and explore more, you can try:
Menu: “Help” - “Read the Emacs Manual”.
and
Menu: “Help” - “Tutorial”.

Also, some Persian specific help is included below.

4.7  Hints for Persian Characters (Unicode) Usage

As you are writing in Persian, you may want to know exactly what Unicode character is at the cursor. To do that place point on the character, then enter the following commands:

ctl+x = meta+x describe-char ctl+u ctl+x =

For example, to verify consistency between this document and code, place the cursor on the character and with “ctl+x =” verify that the Unicode hex numbers match.

To enter a Unicode character directly in decimal or hex:

ctl+x 8 enter (ucs-insert #x0635) (ucs-insert (string-to-number "0635" 16)) (ucs-insert 1589)

4.8  Hints for bidi Emacs Usage

Sometimes you may want to specify the directionality explicitly (i.e. left-to-right or right-to-left).

With Plain Emacs

Here are some of the basic Emacs bidi controls:

(setq bidi-display-reordering t) (setq bidi-display-reordering nil) (setq bidi-paragraph-direction 'right-to-left) (setq bidi-paragraph-direction 'left-to-right)

See the Emacs documentation for more.

With Persian Blee

The keystroke combinations F6-1 and F6-2 are bound to toggle display-reordering.

4.9  Multilingualization (M17n) of Spelling Dictionaries

Debian/Ubuntu includes a Persian dictionary that can also be used with Emacs.

With Plain Emacs

First you need to obtain the spelling dictionary. Enter the following command:

sudo apt-get -y install aspell-fa

Next you need to let Emacs know that you want to use the Persian spelling dictionary.

With Persian Blee

As already noted, pressing the F6 key twice toggles your input method. This also toggles your language environment. ispell/aspell is then configured to work with multiple dictionaries.

So there is nothing else you need to do.

5  Emacs Persian Input Methods

At this time there are two Persian input methods supported in Emacs:

farsi-isiri-9149:
A Persian keyboard based on the Islamic Republic of Iran’s ISIRI-9147 specification.
farsi-transliterate-banan:
An intuitive transliteration keyboard for Farsi.

These are described in the following sections.

5.1  farsi-isiri-9147 Persian Input Method

In Emacs this input method is labeled farsi-isiri-9147. It is based on the ISIRI 9147 – 1st edition. ISIRI-9147 defines the layout of Iran’s Persian keyboard. See section 6.3 and section 6.1 for more information.

Layers 1, 2 and 3 of ISIRI-9147 are fully implemented with the exception of the Backslash ’\’ , Alt-Backslash, Shift-Space and Alt-Space keys.

The Backslash key is used to replace کلید با دگر ساز راست‌ (the Alt or Meta key).

Layer 3 is then entered with the Backslash key, and Layer 3 is implemented as two-letter key combinations as specified in ISIRI-9147.

The character corresponding to Backslash is entered with Backslash-Backslash. Alt-Backslash has been moved to Backslash-r. Shift-Space has been moved to Backslash-y. Alt-Space has been moved to Backslash-t.

With these modifications farsi-isiri-9147 is a full implementation of ISIRI-9147. In addition, with these modifications this implementation is ascii input stream based, as well as being a keyboard layout.

If a key on Layer 1 were reserved to replace دگر ساز راست‌ (the Alt or Meta key), then farsi-isiri-9147 would be fully compliant, without needing the above description/modifications.

Perhaps this can be considered a defect in the base ISIRI-9147 specification, to be addressed in the next revision.

All inputs for each Persian letter Unicode for farsi-isiri-9147 are shown in Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10.

5.2  farsi-transliterate-banan Persian Input Method

In Emacs this input method is labeled farsi-transliterate-banan.

The ISIRI-9147 Persian keyboard is not well suited to Iranian expatriates living in the West. Persian-speaking expatriates are usually already completely familiar and accustomed to the standard qwerty keyboard, and they don’t want to have to learn and adapt to ISIRI-9147. Rather, they expect software to adapt to them.

This is what the farsi-transliterate-banan – “Banan Multi-Character (Reverse) Transliteration Persian Input Method” – accomplishes. This input method addresses the needs of a user who:

  • Can write in Farsi (not just speak it).
  • Is familiar with and accustomed to the qwerty Latin keyboard.
  • Is unfamiliar with ISIRI-9147 and does not wish to learn it.
  • Writes and otherwise communicates in mixed Globish/Persian, not pure Persian.
  • Is intuitively familiar with the transliteration of Farsi/Persian into Latin based on two-letter phonetic mapping to Persian characters. (For example: gh ق – kh خ – sh ش – ch چ – zh ژ ).

The transliteration keyboard is intuitive in design, so that the mappings are natural and easy to remember for a Persian writer. It provides equivalent capability to farsi-isiri-9147, allowing input of all characters enumerated in ISIRI-6219.

farsi-transliterate-banan is phonetically oriented. But it is very different from Pinglish. Pinglish is word-oriented, where you sound out the word using Latin letters, including the vowels. farsi-transliterate-banan is letter-oriented, where you type the Latin letter(s) closest to the Persian letter, and usually omit vowels.

For some Persian characters there are multiple ways of inputting the same character. For example both “i” and “y” produce ی. For یک “yk”, “y” is more natural, and for این “ain”, “i” is more natural.

The more commonly used letters are mapped to lower case; the less commonly used letters are mapped to upper case. For example “s” is س while “S” is ص. And “h” is ه while “H” is ح. Table 1 shows these mappings.

Postfix composition is based on “h”. The letter “h” is used as a postfix for the following two-character mappings: gh ق – kh خ – sh ش – ch چ – zh ژ – Th ة – Yh ی. Table 2 shows these mappings.

Prefix composition is based on the prefix characters \, & and /.

Prefix letter \ is used for two-character inputs when an alternative form of a letter is desired. For example \− is “÷” while is “−”.

Prefix letter & is used for multi-character inputs when special characters are desired based on their abbreviated name. For example you can enter ‎ to enter the “LEFT-TO-RIGHT MARK” character.

Prefix letter / is used to provide two specific characters. / is “ZERO WIDTH NON-JOINER” and // is /.

The letter “h” is used in a number of two-character postfix mappings; for example “sh” ش. So if you need the sequence “s” then “h” you have to repeat the “s”. For example: سهم = ’s’ ’s’ ’h’ ’m’.

Table 1 shows the single-character keyboard layout for farsi-transliterate-banan. It is based on the results of (describe-input-method ’farsi-transliterate-banan).

Table 2 shows the multi-character mappings for farsi-transliterate-banan. It is based on the results of (describe-input-method ’farsi-transliterate-banan).

All inputs for each Persian letter Unicode for farsi-transliterate-banan are shown in Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10.


۱!۲ْ۳ً۴ٰ۵٪۶َ۷&۸*۹(۰)=+ّٔ
‬غ‌قعءٍِرRت‌طی‌يو ٓی‌ئٌُپP[{]} 
 اآس‌صدٱف‌إگ‌غه‌حج‍ک‌كلL؛:’" 
  زذض‌ظث ٕوؤبBن«م»،<.> 
Table 1: Banan Transliteration Keyboard Layout for Single Keys


chچ khخsh شzhژghق GhغhhحYhى Thة
Table 2: Banan Transliteration of “h” Postfix Multi Keys Mappings


 farsiEmacsUni-  
فارسىtransliterateISIRcodeUnicode Name
نامِ نویسه
 banan9147Hex  
ءWM0621ARABIC LETTER HAMZA
حرف فارسی همزه
آAH0622ARABIC LETTER ALEF WITH MADDA ABOVE
حرف فارسی الف با کلاه
اah0627ARABIC LETTER ALEF
حرف فارسی الف
أ\aG0623ARABIC LETTER ALEF WITH HAMZA ABOVE
حرف فارسی الف با همزه بالا
بbf0628ARABIC LETTER BEH
حرف فارسی ب
پpm067eARABIC LETTER PEH
حرف فارسی پ
تt ttj062aARABIC LETTER TEH
حرف فارسی ت
ثc cce062bARABIC LETTER THEH
حرف فارسی ث
جj[062cARABIC LETTER JEEM
حرف فارسی جیم
چch]0686ARABIC LETTER TCHEH
حرف فارسی چ
حH hhp062dARABIC LETTER HAH
حرف فارسی ح
خkho062eARABIC LETTER KHAH
حرف فارسی خ
دdn062fARABIC LETTER DAL
حرف فارسی دال
ذZb0630ARABIC LETTER THAL
حرف فارسی ذال
رrv0631ARABIC LETTER REH
حرف فارسی ر
زz zzc0632ARABIC LETTER ZAIN
حرف فارسی ز
ژzhC0698ARABIC LETTER JEH
حرف فارسی ژ
سs sss0633ARABIC LETTER SEEN
حرف فارسی سین
شsha0634ARABIC LETTER SHEEN
حرف فارسی شین
صSw0635ARABIC LETTER SAD
حرف فارسی صاد
ضxq0636ARABIC LETTER DAD
حرف فارسی ضاد
طT TTx0637ARABIC LETTER TAH
حرف فارسی طا
ظXz0638ARABIC LETTER ZAH
حرف فارسی ظا
عwu0639ARABIC LETTER AIN
حرف فارسی عین
غq Gh G GGy063aARABIC LETTER GHAIN
حرف فارسی غین
فft0641ARABIC LETTER FEH
حرف فارسی ف
قgh Qr0642ARABIC LETTER QAF
حرف فارسی قاف
کk kk;06a9ARABIC LETTER KEHEH
حرف فارسی کاف
گg gg06afARABIC LETTER GAF
حرف فارسی گاف
لlg0644ARABIC LETTER LAM
حرف فارسی لام
مml0645ARABIC LETTER MEEM
حرف فارسی میم
نnk0646ARABIC LETTER NOON
حرف فارسی نون
وu v,0648ARABIC LETTER WAW
حرف فارسی واو
ؤVA0624ARABIC LETTER WAW WITH HAMZA ABOVE
حرف فارسی واو با همزه بالا
هh Hhi0647ARABIC LETTER HEH
حرف فارسی ه
یi yd06ccARABIC LETTER FARSI YEH
حرف فارسی ی
ئIS0626ARABIC LETTER YEH WITH HAMZA ABOVE
حرف فارسی ی با همزه بالا
Table 3: Main Letters: Mapping of Persian Unicode to farsi-transliterate-banan – Matching Table 5 of isiri-6219


 farsiEmacsUni-  
فارسىtransliterateISIRcodeUnicode Name
نامِ نویسه
 banan9147Hex  
إFF0625ARABIC LETTER ALEF WITH HAMZA BELOW
حرف فارسی الف با همزه پایین
ٱD\h0671ARABIC LETTER ALEF WASLA
حرفِ الفِ وصل
كKZ0643ARABIC LETTER KAF
حرف کاف عربی
ةThZ0629ARABIC LETTER TEH MARBUTA
حرف ت گرد
يY YYD064aARABIC LETTER YEH
حرف ی عربی نقطه دار
ىYhV0649ARABIC LETTER ALEF MAKSURA
حرف ی عربی بی نقطه
Table 4: Arabic Letters: Mapping of Persian Unicode to farsi-transliterate-banan – Matching Table 6 of isiri-6219


 farsiEmacsUni-  
فارسىtransliterateISIRcodeUnicode Name
نامِ نویسه
 banan9147Hex  
۰0006f0EXTENDED ARABIC-INDIC DIGIT ZERO
رقم فارسی صفر
۱1106f1EXTENDED ARABIC-INDIC DIGIT ONE
رقم فارسی یک
۲2206f2EXTENDED ARABIC-INDIC DIGIT TWO
رقم فارسی دو
۳3306f3EXTENDED ARABIC-INDIC DIGIT THREE
رقم فارسی سه
۴4406f4EXTENDED ARABIC-INDIC DIGIT FOUR
رقم فارسی چهار
۵5506f5EXTENDED ARABIC-INDIC DIGIT FIVE
رقم فارسی پنج
۶6606f6EXTENDED ARABIC-INDIC DIGIT SIX
رقم فارسی شش
۷7706f7EXTENDED ARABIC-INDIC DIGIT SEVEN
رقم فارسی هفت
۸8806f8EXTENDED ARABIC-INDIC DIGIT EIGHT
رقم فارسی هشت
۹9906f9EXTENDED ARABIC-INDIC DIGIT NINE
رقم فارسی نه
٫@/066bARABIC DECIMAL SEPARATOR
ممیز فارسی
٬\,U066cARABIC THOUSAND SEPARATOR
جدا کننده هزارهای فارسی
٪%%066aARABIC PERCENT SIGN
درصد فارسی
+++002bPLUS SIGN
علامت به اضافه
---2212MINUS SIGN
علامت منها
×\*^066aMULTIPLICATION SIGN
علامت ضرب
÷\-~00f7DIVISION SIGN
علامت تقسیم
<<>003cARABIC LESS THAN SIGN
علامت کوچکتر
===003dEQUAL SIGN
علامت مساوی
>><003eARABIC GREATER THAN SIGN
علامت بزرگتر
Table 5: Digits and Math Signs: Mapping of Persian Unicode to farsi-transliterate-banan – Matching Table 4 of isiri-6219


 farsiEmacsUni-  
فارسىtransliterateISIRcodeUnicode Name
نامِ نویسه
 banan9147Hex  
  0020SPACE
فاصله
...002eFULL STOP
نقطه
:::003aCOLON
دونقطه
!!!0021EXCLAMATION POINT
علامت تعجب
\.\m2026HORIZONTAL ELLIPSIS
سه نقطه فارسی
---2010HYPHEN
خط تیره
---002dMINUS OR HYPHEN
تیره منها
|||007cVERTICAL BAR
خط عمودی
////002fSLASH
خط اریب
\\ \\\\005cBACKSLASH
خط اریب وارو
***002aASTERISK
ستاره
))(0028ARABIC OPENING PARANTHESIS
پرانتز باز
(()0029ARABIC CLOSING PARANTHESIS
پرانتز بسته
]]O005bARABIC OPENING BRACKET
کروشه باز
[[P005dARABIC CLOSING BRACKET
کروشه بسته
}}{007bARABIC OPENING BRACE
آکولاد باز
{{}007dARABIC CLOSING BRACE
آکولاد بسته
»\> MK00bbRIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
گیومه باز
«\< NL00abLEFT-POINTING DOUBLE ANGLE QUOTATION MARK
گیومه بسته
Table 6: Common Punctuation Marks: Mapping of Persian Unicode to farsi-transliterate-banan – Matching Table 2 of isiri-6219


 farsiEmacsUni-  
فارسىtransliterateISIRcodeUnicode Name
نامِ نویسه
 banan9147Hex  
،,T060cARABIC COMMA
ویرگول فارسی
؛;Y061bARABIC SEMICOLON
نقطه ویرگول فارسی
؟??061fARABIC QUESTION MARK
علامت سوال فارسی
ـ_J0640ARABIC TATWEEL
کشیدگی فارسی
Table 7: Persian Punctuation Marks: Mapping of Persian Unicode to farsi-transliterate-banan – Matching Table 3 of isiri-6219


 farsiEmacsUni-  
فارسىtransliterateISIRcodeUnicode Name
نامِ نویسه
 banan9147Hex  
 ‍J &zwj; 200dZERO WIDTH JOINER
اتصال مجازی
 &lrm; 200eLEFT-TO-RIGHT MARK
نشانه چپ به راست
 &rlm; 200fRIGHT-TO-LEFT MARK
نشانه راست به چپ
 &ls ; 2028LINE SEPARATOR
جدا کننده سطرها
 &ps; 2029PARAGRAPH SEPARATOR
جدا کننده بندها
 &lre; 202aLEFT-TO-RIGHT EMBEDDING
زیر متن چپ به راست
 &rle; 202bRIGHT-TO-LEFT EMBEDDING
زیر متن راست به چپ
 &pdf; 202cPOP DIRECTIONAL FORMATTING
پایان زیر متن
 &lro; 202dLEFT-TO-RIGHT OVERRIDE
زیر متن اکیداْ چپ به راست
 &rlo; 202eRIGHT-TO-LEFT OVERRIDE
زیر متن اکیداْ راست به چپ
 &bom; feffBYTE ORDER MARK
نشانه ترتیب بایت‌ها
Table 8: Control Mark Ups: Mapping of Persian Unicode to farsi-transliterate-banan – Matching Table 1 of isiri-6219


 farsiEmacsUni-  
فارسىtransliterateISIRcodeUnicode Name
نامِ نویسه
 banan9147Hex  
َ^U064eARABIC FATHA
زبر
ِeY0650ARABIC KASRA
زير
ُoT064fARABIC DAMMA
پيش – ضمه
ً#R064bARABIC FATHATAN
دو زبر
ٍEE064bARABIC KASRATAN
دو زير
ٌOW064cARABIC DAMMATAN
دو پیش
ّ~I0651ARABIC SHADDA
تشديد
ْ@Q0652ARABIC SUKUN
ساکن
ٓUX0653ARABIC MADA
مد
ٔ`N0654ARABIC HAMZA ABOVE
همزه فارسی بالا
ٕC\n0655ARABIC HAMZA BELOW
همزه فارسی پایین
ٰ$V0670ARABIC LETTER SUPERSCRIPT ALEF
الف مقصوره
Table 9: Persian Signs: Mapping of Persian Unicode to farsi-transliterate-banan – Matching Table 7 of isiri-6219


 farsiEmacsUni-  
فارسىtransliterateISIRcodeUnicode Name
نامِ نویسه
 banan9147Hex  
@\@ 0040COMMERCIAL AT
علامت در
0\0 0030DIGIT ZERO
رقم صفر لاتین
1\1 0031DIGIT ONE
رقم یک لاتین
2\2 0032DIGIT TWO
رقم دو لاتین
3\3 0033DIGIT THREE
رقم سه لاتین
4\4 0034DIGIT FOUR
رقم چهار لاتین
5\5 0035DIGIT FIVE
رقم پنج لاتین
6\6 0036DIGIT SIX
رقم شش لاتین
7\7 0037DIGIT SEVEN
رقم هفت لاتین
8\8 0038DIGIT EIGHT
رقم هشت لاتین
9\9 0039DIGIT NINE
رقم نه لاتین
Table 10: Extensions: Mapping of Persian Unicode to farsi-transliterate-banan


 BananEmacsUni-  
فارسىReverseISIRcodeUnicode Name
نامِ نویسه عربی
 Translit9147Hex  
‭ۀ  06c0‬ARABIC LETTER HEH WITH YEH ABOVE
حرفِ هِ اردو با همزه‌ی بالا
۰  0660ARABIC-INDIC DIGIT ZERO
رقم صفر عربی
١  0661ARABIC-INDIC DIGIT ONE
رقم یک عربی
٢  0662ARABIC-INDIC DIGIT TWO
رقم دو عربی
٣  0663ARABIC-INDIC DIGIT THREE
رقم سه عربی
٤  0664ARABIC-INDIC DIGIT FOUR
رقم چهار عربی
٥  0665ARABIC-INDIC DIGIT FIVE
رقم پنج عربی
٦  0666ARABIC-INDIC DIGIT SIX
رقم شش عربی
٧  0667ARABIC-INDIC DIGIT SEVEN
رقم هفت عربی
٨  0668ARABIC-INDIC DIGIT EIGHT
رقم هشت عربی
٩  0669ARABIC-INDIC DIGIT NINE
رقم نه عربی
Table 11: Forbidden Characters: Mapping of Persian Unicode to farsi-transliteration-banan – – Matching Table 8 of isiri-6219

6  Relevant Standards/Specifications

We have put together a repository of standards/specifications which are relevant to Persian input methods. That repository is at: http://www.persoarabic.org/standards

Legitimacy of any of these documents as standards is not our focus or concern. We have included them here because they are relevant and useful.

6.1  ISIRI-6219

Based on Unicode, ISIRI-6219 defines the Farsi Character Set. Its full title is:

فنّاوریِ اطلاعات – تبادل و شیوه‌ی نمایش اطلاعاتِ فارسی بر اساس یونی کُد
استاندارد ملی ایران ۶۲۱۹ −− نسخهی نهایی
Institute of Standards and Industrial Research of Iran
Information Technology – Persian Information Interchange and Display Mechanism, using Unicode
ISIRI-6219
Final Version

Published at:
http://www.isiri.org/portal/files/std/6219.htm
and republished at:
http://www.persoarabic.org/Repub/fpf-isiri-6219

6.2  Suggested Enhancements For ISIRI-6219

During the process of developing farsi-transliterate-banan we studied ISIRI-6219. Here are some of our comments and some suggestions.

6.2.1  Clear labeling of ISIRI-6219 as the definition of Farsi Character Set

ISIRI-6219 does many things. It defines the Farsi Character Set and it also includes translation of various global specifications.

ISIRI-6219 does not clearly say that it primarily defines Iran’s Farsi Character Set.

On the title page and early in the specification it should explicitly make it clear that ISIRI-6219 defines the Farsi Character Set for Iran. Something along the lines of:

مجموعه نویسهٔ استاندارد ایران برای تبادل اطلاعات، استاندارد ملی ۶۲۱۹ مؤسسهٔ استاندارد و تحقیقات صنعتی ایران است که مبتنی بر یونی کد است‍.

Being the definition of Farsi Character Set, it should then require that all Farsi Input Methods make it clear that they provide for full support of the Farsi Character Set. And if an input method provides for anything more than ISIRI-6219, those extensions should be explicitly marked as extensions. This is not happening between ISIRI-9147 and ISIRI-6219 today. Specification of farsi-transliterate-banan input method in this document is based on the ISIRI-6219 Farsi Character Set tables. Conformance of farsi-transliterate-banan is explicitly made clear and extensions are explicit.

6.2.2  Missing At Sign – ’@’

ISIRI-6219 does not include ’@’ as part of the Farsi Character Set.

Moving towards use of Internationalized Domain Name (IDN) and use of – .ایران – requires ’@’ for email addresses. This alone makes ’@’ important enough for inclusion in ISIRI-6219.

6.3  ISIRI-9147

ISIRI-9147 defines the layout of Iran’s Persian keyboard. Its full title is:

فنّاوریِ اطلاعات - چیدمان حروف و علائم فارسی بر صفحه کلید رایانه
استاندارد ملی ایران ۹۱۴۷ − چاپ اول
Institute of Standards and Industrial Research of Iran
Information Technology – Layout of Persian Letters and Symbols on Computer Keyboards
ISIRI 9147 -- 1st edition

Published at:
http://www.isiri.org/portal/files/std/9147.pdf
and republished at:
http://www.persoarabic.org/Repub/fpf-isiri-9147

6.4  Suggested Enhancements For ISIRI-9147

Design and specification of ISIRI-9147 is overly tactical. While ISIRI-9147 specifies a keyboard layout, it should strategically leave the door open to more.

Today, a keyboard specification needs to be more than just a layout for a physical keyboard. It is not to be viewed as the sole input method and as such should consider co-habitation topics related to harmony with other input methods.

Difficulties of ISIRI-9147 in fitting well into a multilingual editor such as emacs include:

6.4.1  Entry into Layer 3 with a Layer 1 Key instead of Alt

Specification of ISIRI-9147 provides access to layer 3 through the Alt key.

The Alt key may not be available in some environments – as the Alt key is often an integral part of multilingual editors such as emacs. When the Alt key is not available and when the input model supports 2 letter compositions, entry into layer 3 can be made through a reserved layer 1 key.

So, we suggest reserving the Backslash key to replace the Alt key in such environments. And moving Alt-Backslash to Backslash-r.

6.4.2  Alternates For Shift-Space and Alt-Space

We suggest providing equivalents for Shift-Space and Alt-Space. In our implementation we have placed them at layer 3 as Backslash-y and Backslash-t.

6.4.3  Explicit Identification of Extensions Beyond ISIRI-6219

In its layer 3, ISIRI-9147 goes well beyond ISIRI-6219 without explicitly identifying the extensions. This damages the purpose of ISIRI-6219.

7  The Broader Scope Of farsi-transliterate-banan

Aside from farsi-transliterate-banan, all Persian input methods today are keyboard layout oriented or are single character transliteration mapping input methods. More often now the keyboard layouts conform to ISIRI-9147.

While that convergence point is good and great, we can also be using more powerful input method models.

In this day and age it makes good sense to adopt the more powerful composition input method instead of the simple mapping method. Here we are proposing that farsi-transliterate-banan as defined in Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10 be considered a convergence point for Persian composition input methods.

For example, in Gnome, where we currently only have file:///usr/share/X11/xkb/symbols/ir, it would be nice to also implement the equivalent of farsi-transliterate-banan.

We would very much like to collaborate towards that goal.

8  History and Previous Work

Use of the Latin character keyboard to input Persian text into machines, and more generally use of the Latin alphabet for writing in Persian, is an old topic with a lot of history.

Jon Dehdari has assembled a table that summarizes previous work in this area. We have reproduced it here as Table 12.


فارسىDehdariBuckwalArabTEXUni-DecUni-HexUTF-8Isi3342CP1256Uni-Name
اAAA15750627d8a7c1c7ARABIC LETTER ALEF
بbbb15760628d8a8c3c8ARABIC LETTER BEH
پpPp1662067ed9bec481ARABIC LETTER PEH
تttt1578062ad8aac5caARABIC LETTER TEH
ثVv_t1579062bd8abc6cbARABIC LETTER THEH
جjjj1580062cd8acc7ccARABIC LETTER JEEM
چcJ^c16700686da86c88dARABIC LETTER TCHEH
حHH.h1581062dd8adc9cdARABIC LETTER HAH
خxxx1582062ed8aecaceARABIC LETTER KHAH
دddd1583062fd8afcbcfARABIC LETTER DAL
ذL*_d15840630d8b0ccd0ARABIC LETTER THAL
رrrr15850631d8b1cdd1ARABIC LETTER REH
زzzz15860632d8b2ced2ARABIC LETTER ZAIN
ژJ ^z16880698da98cf8eARABIC LETTER JEH
سsss15870633d8b3d0d3ARABIC LETTER SEEN
شC$^s15880634d8b4d1d4ARABIC LETTER SHEEN
صSS.s15890635d8b5d2d5ARABIC LETTER SAD
ضDD.d15900636d8b6d3d6ARABIC LETTER DAD
طTT.t15910637d8b7d4d8ARABIC LETTER TAH
ظZZ.z15920638d8b8d5d9ARABIC LETTER ZAH
عEE15930639d8b9d6daARABIC LETTER AIN
غGg.g1594063ad8bad7dbARABIC LETTER GHAIN
فfff16010641d981d8ddARABIC LETTER FEH
قqqq16020642d982d9deARABIC LETTER QAF
كKkk16030643d983fddfARABIC LETTER KAF
گgGg171106afdaafdb90ARABIC LETTER GAF
لlll16040644d984dce1ARABIC LETTER LAM
مmmm16050645d985dde3ARABIC LETTER MEEM
نnnn16060646d986dee4ARABIC LETTER NOON
وuwU16080648d988dfe6ARABIC LETTER WAW
هhhh16070647d987e0e5ARABIC LETTER HEH
يyyI1610064ad98afeedARABIC LETTER YEH
َaaa1614064ed98ef0f3ARABIC FATHA
ُouo1615064fd98ff2f5ARABIC DAMMA
ِeie16160650d990f1f6ARABIC KASRA
آ]|’A15700622d8a2c0c2ARABIC LETTER ALEF WITH MADDA ABOVE
ا|Aa15750627d8a7c1c7ARABIC LETTER ALEF # Initial
ةPpT15770629d8a9fcc9ARABIC LETTER TEH MARBUTA
کkkk170506a9daa9da98ARABIC LETTER KEHEH
یiiI174006ccdb8ce1 ARABIC LETTER FARSI YEH
ًM|15690621d8a1c2c1ARABIC LETTER HAMZA
 X H-i172806c0db80 c0ARABIC LETTER HEH WITH YEH ABOVE
ئI}’y15740626d8a6fbc6ARABIC LETTER YEH WITH HAMZA ABOVE
ؤU&U’15720624d8a4fac4ARABIC LETTER WAW WITH HAMZA ABOVE
ًNFaN1611064bd98bf3f0ARABIC FATHATAN
ّxx16170651d991f6f8ARABIC SHADDA
،,,,1548060cd88caca1ARABIC COMMA
؛;;;1563061bd89bbbbaARABIC SEMICOLON
؟???1567061fd89fbfbfARABIC QUESTION MARK
٪%%%1642066ad9aaa5 ARABIC PERCENT SIGN
   0032002020a020SPACE
....0046002e2ea62eFULL STOP
   \\0010000a0a0a0aLINEFEED
«{ \lq017100ababe7abLEFT-POINTING DOUBLE ANGLE …
»} \rq018700bbbbe6bbRIGHT-POINTING DOUBLE ANGLE …
 - \hspace{0ex}8204200ce2808ca19dZERO WIDTH NON-JOINER
Table 12: Jon Dehdari’s Pre-Unicode Character Set Mappings

Because of the widespread adoption of Unicode, this previous work is now largely obsolete.

Most transliteration previous work (Legally’s ArabTeX, Buckwal, Dehdari, ...) consists mostly of single character mappings.

The farsi-transliterate-banan input method documented here is distinctly different from these past transliteration methods with respect to wide use of compositions in general and with regard to the “h” postfix composition in particular.

9  Colophon

This document was produced entirely with Libre-Halaal Software, and is published using Libre-Halaal Internet Services. All tools used to produce and distribute this document conform fully to the definition of Libre-Halaal Software and Libre-Halaal Internet Application Services as specified in [3] and [5].

9.1  Our Libre-Halaal Software Tools

This document has been created based exclusively on the use of Libre-Halaal software tools. We make use of a comprehensive and well-integrated set of tools, including:

  • Debian GNU/Linux is our base platform
  • Emacs is our editor-based user environement
  • TeX, LaTeX, XeTeX, XeLaTeX is our document processor
  • The Emacs bidi (bidirectional) capability is used to write in mixed Persian and Globish
  • The xepersian LaTeX package is used to process Persian documents
  • The LaTeX beamer package is used to prepare presentation slides
  • The Emacs auctex mode is used to create documents in LaTeX
  • Aspell via Emacs is used for spell checking in Persian/Farsi and Globish/English
  • Dict via Emacs is used for dictionary and thesarus lookup in multiple languages
  • Conversion from LaTeX to html is accomplished through HeVea and tex4ht
  • Libre Office is used for creating figures and illustrations
  • CVS via Emacs is used for version control
  • The Emacs Gnus and qmail facilities are used for emailing out drafts and receiving feedback
  • Integration with ByStar Services is through BLEE (the ByStar Libre Emacs Environment)

These Libre-Halaal software tools collectively represent a deeply integrated environment that is far superior in capability to any Haraam software. We question why so many people continue to use the clumsy and ineffective Microsoft Proprietary-Haraam software when such a vastly superior alternative is available.

9.2  Our Libre-Halaal Internet Services

The publication and distribution of this document has been accomplished exclusively by means of Libre-Halaal Internet Application Services. We make use of a comprehensive and well-integrated set of services, including:

  • The ByName Autonomous Libre Service (part of the By* family) is used for autonomous web publication of this document by the author himself
  • The ByContent Federated Libre Service (part of the By* family) is used for web re-publication/distribution of this document
  • All By* Services are based on the Debian GNU/Linux platform
  • Apache2 and Plone3 are used to provide By* Web Services
  • All By* Services related to this document are hosted at LibreCenter.net, a physical data center built exclusively with Halaal software. All routers, servers and other hardware infrastructure at LibreCenter.net run Halaal Software exclusively.
  • The By* Self Publication Facilities, fully integrated with BLEE, are used for publication of this document
  • The By* Library Facilities are used for managing this document in the context of multiple other related documents

These Libre-Halaal Internet Services are comparable in capability to the most high-profile Haraam Internet Services presently available, such as Google or Facebook.

The deep integration between Libre-Halaal Software and Libre-Halaal Internet Services creates a Libre-Halaal Software-Service continuum, which is far superior in capability to any Proprietary-Haraam Software/Service combination.

References

[1]
" Mohsen BANAN ". " introducing convivial into globish ". Permanent Libre Published Content "120037", Autonomously Self-Published, "July" 2011. http://mohsen.banan.1.byname.net/PLPC/120037.
[2]
" Mohsen BANAN ". " overview of bystar digital ecosystem concepts, models and offerings ". Permanent Libre Published Content "180011", Autonomously Self-Published, "February" 2011. http://www.by-star.net/PLPC/180011.
[3]
" Mohsen BANAN ". " defining halaal manner-of-existence of software and defining halaal internet application services ". Permanent Libre Published Content "120041", Autonomously Self-Published, "September" 2012. http://mohsen.1.banan.byname.net/PLPC/120041.
[4]
" Mohsen BANAN ". "introducing halaal and haraam into globish based on moral philosophy of abstract halaal معرفیِ حلال و حرام به بقیه‌یِ دنیا ". Permanent Libre Published Content "120039", Autonomously Self-Published, "September" 2012. http://mohsen.1.banan.byname.net/120039.
[5]
" Mohsen BANAN ". " تعريف نرم افزار حلال و تعريف خدمات اينترنتى حلال ". Permanent Libre Published Content "120035", Autonomously Self-Published, "July" 2013. http://mohsen.1.banan.byname.net/PLPC/120035.
[6]
" Mohsen BANAN ". " the libre-halaal bystar digital ecosystem a unified and non-proprietary model for autonomous internet services a moral alterantive to the proprietary american digital ecosystem ". Permanent Libre Published Content "180016", Autonomously Self-Published, "June" 2013. http://www.persoarabic.org/PLPC/180016.
[7]
Andrew Hammoude " " Mohsen BANAN. " libre services a non-proprietary model for delivery of internet services ". Permanent Libre Published Content "100101", Autonomously Self-Published, "March" 2006. http://www.freeprotocols.org/PLPC/100101.
[8]
Inc. " " Neda Communications. " blee and bxgnome: Bystar software-service continuum based convivial user environments ". Permanent Libre Published Content "180004", Autonomously Self-Published, "September" 2012. http://www.persoarabic.org/PLPC/180004.


Document Actions
Libre/Halaal Internet Services Provided At LibreCenter By Neda

Member of By* Federation Of Autonomous Libre Services

This web site has been created based exclusively on the use of Halaal Software and Halaal Internet Application Services. It is part of the By* Federation of Autonomous Libre Services which in turn are part of the Halaal/Libre By* Digitial Ecosystem which incorporate the following software components: