Let's type Japanese
SCIM-anthy: getting started

HOME

What makes things complicated

There are three types of characters in Japanese: Kanji (漢字), Hiragana (ひらがな) and Katakana (カタカナ). As you can imagine, Kanji was borrowed from Chinese and each has (a) meaning(s) in itself. Hiragana and Katakana are both developed from Manyogana (万葉仮名) which came into use in the fifth century to represent sounds. We also use Romaji (ローマ字) or Latin alphabets for various purposes.

Hiragana alone has more than fifty letters and there are about 2,000 Kanji characters authorized by the Ministry of Education for common use. It is possible to write Japanese using only Hiragana, but such sentences would be very difficult to read partly because Japanese does not put space between words. So, in order to write easy-to-understand Japanese on computers, we need a special input method which usually converts Romaji into Hiragana and then into Kanji or Katakana when necessary. This page tries to explain how to do it using scim-1.4.2 and scim-anthy-0.6.1 in Mandriva Linux 2006 (due on September 15th).

For scim-anthy-0.3.1 in Mandriva Linux LE2005, please refer to the older version of this page.

Contents

  1. Preparations
    1. Installation
    2. Environmental Variables
    3. SCIM Global Setup
  2. Let's type Japanese
    1. How to write Hiragana
    2. How to write Katakana
    3. Convert Hiragana into Kanji
    4. Convert a sentence
  3. Customize scim-anthy
    1. Common Options
    2. Key Bindings
    3. Romaji Typing
    4. Learning
    5. Candidate Window
    6. Toolbar
    7. Appearance
  4. Kasumi - Dictionary management tool
  5. Tomoe - Handwriting recognition
  6. Links

1. Preparation

1. Installation

First off, you need to install the following packages on your system.

[Note]Note
If you choose Japanese for your language when you install/upgrade the distribution, the first three packages and scim-input-pad will be installed by default. If you choose KDE, scim-qtimm will be installed as well. Kasumi is available from contrib media.

[Warning]Warning
If you have ~/.scim created by an older version of scim, it is recommended to delete it and reconfigure scim/scim-anthy from scratch.

2. Environmental Variables

As with other input methods, you need to set some environmental variables to make SCIM work. Adding the following four lines to ~/.i18n or /etc/sysconfig/i18n will make SCIM start with X and enable you to activate it in various applications simply by pressing Ctrl+Space, whether it uses GTK-immodule, Qt-immodule or the XIM (X input method) protocol.

GTK_IM_MODULE=scim
QT_IM_MODULE=scim
XIM_PROGRAM="scim -d"
XMODIFIERS=@im=SCIM

[Note]Note
If you select Japanese as the language or one of the SCIM input methods in LocaleDrake (User), those lines will be automatically added to ~/.i18n. LocaleDrake (System) will add them to /etc/sysconfig/i18n, which will be applied to the whole system unless overridden by ~/.i18n. The latter will also install required packages for the selected input method if they are not present on your system.

[Warning]Warning
In order to use SCIM input methods in Qt/KDE applications without scim-qtimm being installed, QT_IM_MODULE should be set to xim in ~/.i18n or /etc/sysconfig/i18n.

In applications in which SCIM uses the GTK/Qt immodule, you can use all input methods regardless of your locale. However, they are filtered by LC_CTYPE in applications in which SCIM uses the XIM protocol such as OpenOffice.org. In other words, if LC_CTYPE is set to en_US, you cannot use Japanese input methods in such applications. To make scim-anthy available in all applications in non-Japanese environments, a UTF-8 locale is required.

[Tip]Tip
To set up an account with French user interface with the ability to type Japanese, run LocaleDrake (User) from the menu, select French as the language, tick 'Use Unicode by default' in the advanced mode, go to the next step and select your country and SCIM+ANTHY as the input method.

[Note]Note

SCIM supports en_US.UTF-8 by default. If you use other UTF-8 locale such as fr_FR.UTF-8, you need to add it to /etc/scim/global or ~/.scim/global as follows.

/SupportedUnicodeLocales = en_US.UTF-8,fr_FR.UTF-8

Ready. When you start a new session, you will see a keyboard icon in your systray.

3. SCIM Global Setup

Before starting to type Japanese, you might want to configure some global options in the SCIM setup panel.

[Tip]Tip

In applications in which SCIM uses the XIM protocol, however, you still need to activate SCIM to use multi/dead keys. From what I was told, this is because Dynamic Event Flow is enabled for the XIM protocol to satisfy some applications which do not support Static Event Flow. As a result, key events are not sent from clients to SCIM when it is not activated. It is not configurable in the setup panel of scim-1.4.2 but you can disable it by editing ~/.scim/config as follows, which will enable you to use multi/dead keys without activating SCIM.

/FrontEnd/X11/Dynamic = false

In order to make all changes take effect, you may need to start a new session.

Top

2. Let's type Japanese

1. How to write Hiragana

Open GEdit and press Ctrl+Space. It activates SCIM and you will see the SCIM toolbar at the bottom right corner of your screen. If you have other SCIM input methods installed, left-click on the input method label and select Japanese -> Anthy from the menu. scim-anthy starts in the Hiragana input mode by default and the toolbar looks like this.

Hiragana Mode

There are five vowels in Japanese: あいうえお. In the Hiragana input mode, 'aiueo' turns into 'あいうえお' as you type. All other Hiragana letters are generated by a combination of (a) consonant(s) and one of the five vowels. 'kakikukeko' turns into 'かきくけこ', 'sashisuseso' turns into 'さしすせそ' etc. etc.

Hiragana

'ん' is exceptional. A single 'n' turns into 'ん' when followed by a consonant. For example, 'genki' turns into 'げんき (doing well)'. To write 'こんにちは (hello)', however, you need to type 'konnnichiha' or 'konnnitiha'.

The small tsu (っ) is defined as 'xtsu', 'xtu', 'ltsu' and 'ltu' in the default Romaji table of scim-anthy, but you can usually get it more easily by doubling the following consonant as in 'nipponn', which turns into 'にっぽん'.

For more details about romaji-kana conversion, please refer to this page.

Enter or Ctrl+j commits typed characters and the underline disappears. Esc or Ctrl+g cancels it.

When you need to correct part of the underlined string (preedit string), you can move the caret using Left, Right, Home and End.

As in normal texts, Delete deletes a character on the right; BackSpace deletes a character on the left.

2. How to write Katakana

For words borrowed from other languages, we usually use Katakana. Type 'aisukuri-mu' in the Hiragana input mode, then press F7. It will be converted into 'アイスクリーム (icecream)'.

Katakana1

Katakana2

Press Enter or Ctrl+j to commit the word, or simply start typing a next word. Esc or Ctrl+g reverts it to Hiragana.

As you see, a hyphen turns into a prolonged sound mark.

For that matter, F8 converts a string into half-width characters (half-width Katakana or regular Latin alphabets depending on the previous state), Shift+F8 into half-width Katakana, F9 into wide-width Latin alphabets, F10 into regular Latin alphabets and F6 into Hiragana. Also, pressing F9 or F10 in succession performs case conversion: home > HOME > Home.

You can also get 'アイスクリーム' simply by pressing Space after typing 'aisukuri-mu' since most common loan words are included in Anthy.

When you want to type all words in Katakana, click on the 'あ' on the toolbar and select Katakana from the menu.

Katakana Mode

In the Katakana input mode, 'tachitsuteto' turns into 'タチツテト' instead of 'たちつてと' as you type.

You can change input modes more easily using keyboard shortcuts. Ctrl+period cycles through Japanese input modes i.e. Hiragana, Katakana and Half-width Katakana. Ctrl+comma or Ctrl+j switches between Japanese and Latin input modes.

3. Convert Hiragana into Kanji

There are lots of homonyms in Japanese, so you sometimes need to select a right word from multiple candidates. Type 'kikai' for example and press Space. It will turn into the first candidate.

Kanji1

Kanji2

If that is the one you expected, press Enter or Ctrl+j to commit it, or simply start typing a next word. Otherwise, press Space again to see other candidates in the lookup table.

Kanji3

Space, Ctrl+n or Down selects a next candidate; Ctrl+p or Up selects a previous candidate. You can also select a candidate directly by number shown in the lookup table. (For your information, the five candidates mean machine, opportunity, grotesqueness, appliance/instrument and community of Go/Shogi players, respectively.)

PageDown shows a next page of candidates and PageUP shows a previous page of candidates. You can jump to the last candidate of the last page with End and the first candidate of the first page with Home.

Esc or Ctrl+g restores the previous state. For example, when the lookup table is open, it first closes the window, reverts the converted string to Hiragana, then cancels it.

4. Convert a sentence

Anthy is a kana-kanji conversion engine which converts a phrase or a sentence consisting of multiple segments as well.

Type 'karehamisoshirugadaisukidesu. (He likes miso soup very much.)' and press Space.

Phrase1

Phrase2

If the whole string is converted as expected, press Enter or Ctrl+j to commit it, or simply start typing a next word.

When you need to change part of the converted string, select the segment you want to edit using Right or Ctrl+f and press Space.

Phrase3

Other candidates will be shown in the lookup table as above. Select the one you want to use and press Ctrl+Down, which commits the selected segment and those preceding it.

Phrase4

Enter or Ctrl+j commits the whole string no matter which segment is selected.

You can select a previous segment with Left or Ctrl+b. Home or Ctrl+a selects the first segment; End or Ctrl+e selects the last segment.

When you need to shrink or broaden a segment to adjust conversion, use Shift+Left or Ctrl+i to shrink it and Shift+Right or Ctrl+o to broaden it.

A period and a comma turn into double-byte Japanese punctuation marks (。、) by default. A left bracket ([) and a right bracket (]) turn into Japanese quotation marks (「」).

Top

3. Customize scim-anthy

1. Common Options

scim-anthy has a bunch of options which allow you to fine-tune its behaviors to suit your typing/writing habits.

Option

2. Key Bindings

scim-anthy-0.6.1 provides three themes (Default, ATOK and Microsoft IME) and you can select one from the pulldown menu. All key bindings are customizable and your modifications to one of the themes will be saved as 'User defined'.

Hotkey1

Let's suppose that you use the default theme and want to bring up a next page of candidates with Ctrl+n. First, make sure that the key bindings theme is set to Default, then select 'Candidate keys' from the pulldown menu for 'Group' and double click on 'Page down' (or select 'Page down' and press 'Choose keys...'). Then the hotkey editor opens as below.

Hotkey2

Press the square button on the right of 'Key Code', then you will be prompted to grab a key.

Hoteky3

Press n, then it will appear in the Key Code field. Tick Ctrl as the modifier, then click 'Add'.

Hoteky4

Note that in order to make it work for that action, you need to delete the same definition from 'Next candidate'.

[Tip]Tip
If you are using a French AZERTY keyboard, it may be a good idea to customize direct selection keys like this. Numbers on the numeric pad (KP_*) and those in the upper case (Shift+*) cannot be used for candidate selection.

[Warning]Warning
If you reselect a theme from the pulldown menu and apply it, the user defined theme will be lost!

3. Romaji Typing

scim-anthy-0.6.1 provides four romaji tables (Default, ATOK, Microsoft IME and AZIK) and you can select one from the pulldown menu. You can see all definitions in each table by pressing the Customize button. As with key bindings, you can customize them and your modifications to one of the tables will be saved as 'User defined'.

Romaji_typing

I changed 'la/li/lu/le/lo' in the default table to 'ら/り/る/れ/ろ' so that I can type either 'remon' or 'lemon' to write 'レモン (lemon)'.

Romaji

I also added several symbols to my table as below. Those symbols can be input via scim-input-pad.

symbol

[Tip]Tip
You can sort sequences/results in ascending/descending order by clicking the relevant label.

[Warning]Warning
If you reselect a romaji table from the pulldown menu and apply it, the user defined table will be lost!

There are three options that you can enable/disable according to your preference.

4. Learning

With the default setting, Anthy learns from your selection of candidates whether you explicitly commit (a) candidate(s) with Enter or Ctrl+j (manual committing) or simply start typing a next word after getting (a) word(s) converted correctly (auto committing). If you untick the two options, Anthy will stop learning and your selection will not affect the priority of candidates i.e. the order in which candidates are presented. When the both options are enabled (default), you still can tell Anthy not to learn (a) specific word(s) by committing it/them with Shift+Enter. It is also possible to disable learning and use Shift+Enter only for conversion results that you want Anthy to learn. You can define key bindings to commit the first/selected segment in the same manner if you need them.

Learning

5. Candidate Window

The behavior of the candidate window (lookup table) can be configured in this panel.

Candidates_window

6. Toolbar

You can specify which buttons to put on the toolbar depending on your needs.

Toolbar

When all the options except 'Show typing method label' are enabled, the toolbar looks like this.

Toolbar

7. Appearance

You can also customize the appearance of strings in preedit/conversion although those settings are not respected in Qt/KDE and some other applications.

Appearance

Below is a screenshot after changing 'Selected segment' to 'BG color' and setting it to #87CEFA in the built-in color chooser.

BGcolor

Top

4. Kasumi - Dictionary management tool

scim-anthy is pre-configured to use kasumi as the dictionary management tool.

Let's suppose that you have a friend named 春花 (haruka) but it is not found in Anthy's dictionary. You can get the two characters one by one i.e. '春 (haru)' and '花 (ka or hana)' but it is not fun. In order to be able to get her name as one segment, you need to add it to your private dictionary.

When scim-anthy is activated, pressing F12 launches kasumi in 'Adding mode'. Fill in her name in Kanji in the Spelling field and its pronunciation in Hiragana in the Sound field. Select 'Person's name' for the word class, adjust frequency if necessary and click 'Add'.

kasumi

[Tip]Tip
If you copy a word and then press F12, it will be automatically added to the Spelling field.

When you want to see/modify other entries in your dictionary, click 'Manage Mode'. Pressing F11 launches kasumi in this mode. Alternatively, you can start kasumi from the toolbar if the 'Show dictionary menu label' option is enabled under the Toolbar tab of the Anthy panel.

Top

5. Tomoe - Handwriting recognition

Tomoe is a Japanese handwriting recognition engine and scim-tomoe is a module for SCIM to make use of it. The first official version of scim-tomoe was released just a few days after the version freeze of the Mandriva Linux 2006 development cycle, so it is not included in the distribution. Still, you can try out a pre-release CVS version which is available from main media or 0.1.0 from Cooker (at your own risk) when it becomes available after the opening of development for 2007. Descriptions on this page are based on the latter. :)

Open GEdit and select 'Handwriting recognition' from the SCIM menu. When the 'Use auto find' option is enabled in Preferences, candidates will appear as you start drawing a character. When disabled, candidates will be presented when you press the Find button. The Back button undoes a stroke and the Clear button erases the canvas.

tomoe

Click on a candidate, then it will be input in GEdit. The 'clear the canvas when select a candidate' option is available. The three buttons under the separator (Space, BackSpace and Enter) can be used to edit text in the editor. Note that input into Qt/KDE applications is not supported yet.

[Warning]Warning
Stroke order and number of strokes count! If you draw a square in a single stroke, you will not get '口 (mouth)'. It should be written in three strokes.

Move the focus in GEdit and press Ctrl+Space, then you will see the tomoe icon on the SCIM toolbar as below. The icon will stay there until you close the application and you can show/hide the pad by clicking on it. The keyboard icon on the right is for scim-input-pad.

scim-tomoe

Top

6. Links


Thanks to developers/contributors at SCIM, SCIM IMEngine, Anthy, Kasumi, Tomoe and Mandriva.

August 12th, 2005 by Yukiko BANDO --- Most recently updated: September 10th, 2005