| Let's type Japanese |
|---|
| SCIM-anthy: getting started |
| HOME |
There are three types of characters in Japanese: Kanji (漢字), Hiragana (ひらがな) and Katakana (カタカナ). As you can imagine, Kanji was borrowed from Chinese and each has (a) meaning(s) in itself. Hiragana and Katakana are both developed from Manyogana (万葉仮名) which came into use in the fifth century to represent sounds. We also use Romaji (ローマ字) or Latin alphabets for various purposes.
Hiragana alone has more than fifty letters and there are about 2,000 Kanji characters authorized by the Ministry of Education for common use. It is possible to write Japanese using only Hiragana, but such sentences would be very difficult to read partly because Japanese does not put space between words. So, in order to write easy-to-understand Japanese on computers, we need a special input method which usually converts Romaji into Hiragana and then into Kanji or Katakana when necessary. This page tries to explain how to do it using scim-1.4.2 and scim-anthy-0.6.1 in Mandriva Linux 2006 (due on September 15th).
For scim-anthy-0.3.1 in Mandriva Linux LE2005, please refer to the older version of this page.
1. Preparation |
First off, you need to install the following packages on your system.
![]() | Note |
|---|---|
| If you choose Japanese for your language when you install/upgrade the distribution, the first three packages and scim-input-pad will be installed by default. If you choose KDE, scim-qtimm will be installed as well. Kasumi is available from contrib media. | |
![]() | Warning |
|---|---|
| If you have ~/.scim created by an older version of scim, it is recommended to delete it and reconfigure scim/scim-anthy from scratch. | |
As with other input methods, you need to set some environmental variables to make SCIM work. Adding the following four lines to ~/.i18n or /etc/sysconfig/i18n will make SCIM start with X and enable you to activate it in various applications simply by pressing Ctrl+Space, whether it uses GTK-immodule, Qt-immodule or the XIM (X input method) protocol.
GTK_IM_MODULE=scim QT_IM_MODULE=scim XIM_PROGRAM="scim -d" XMODIFIERS=@im=SCIM |
![]() | Note |
|---|---|
| If you select Japanese as the language or one of the SCIM input methods in LocaleDrake (User), those lines will be automatically added to ~/.i18n. LocaleDrake (System) will add them to /etc/sysconfig/i18n, which will be applied to the whole system unless overridden by ~/.i18n. The latter will also install required packages for the selected input method if they are not present on your system. | |
![]() | Warning |
|---|---|
| In order to use SCIM input methods in Qt/KDE applications without scim-qtimm being installed, QT_IM_MODULE should be set to xim in ~/.i18n or /etc/sysconfig/i18n. | |
In applications in which SCIM uses the GTK/Qt immodule, you can use all input methods regardless of your locale. However, they are filtered by LC_CTYPE in applications in which SCIM uses the XIM protocol such as OpenOffice.org. In other words, if LC_CTYPE is set to en_US, you cannot use Japanese input methods in such applications. To make scim-anthy available in all applications in non-Japanese environments, a UTF-8 locale is required.
![]() | Tip |
|---|---|
| To set up an account with French user interface with the ability to type Japanese, run LocaleDrake (User) from the menu, select French as the language, tick 'Use Unicode by default' in the advanced mode, go to the next step and select your country and SCIM+ANTHY as the input method. | |
![]() | Note | |
|---|---|---|
SCIM supports en_US.UTF-8 by default. If you use other UTF-8 locale such as fr_FR.UTF-8, you need to add it to /etc/scim/global or ~/.scim/global as follows.
| ||
Ready. When you start a new session, you will see a keyboard icon in your systray.
Before starting to type Japanese, you might want to configure some global options in the SCIM setup panel.
Embed Preedit String into client window
If you prefer 'On The Spot', open the FrontEnd Global Setup panel and make sure that this option is enabled. If you prefer 'Over The Spot', untick the checkbox. The input style specified here will be applied to all applications as long as they support it. In either case, remember to run qtconfig and make sure that the same style is selected for 'XIM Input style' under the Interface tab.
On The Spot: Typed characters immediately appear in the client window and candidates are shown in the lookup table as below.

Over The Spot: Typed characters first appear in the input table as below and are sent to an application when committed.

Share the same input method among all applications
When this option is enabled, you can use the same input method in all applications without pressing Ctrl+Space for each of them. This feature should be very useful especially when you use Find (Ctrl+F) repeatedly on Japanese texts.
Vertical or horizontal lookup table
If you prefer the vertical lookup table as below, open the GTK panel and tick 'Vertical lookup table'. SCIM uses the horizontal lookup table by default.

Compose key support
If you use multi/dead keys, make sure that English/European is enabled in the IMEngine Global Setup panel. If you disable it, SCIM's built-in compose key support will be turned off where Qt-immodule or the XIM protocol is used and dead/multi keys will stop working due to a fault in XIM.
![]() | Tip | |
|---|---|---|
In applications in which SCIM uses the XIM protocol, however, you still need to activate SCIM to use multi/dead keys. From what I was told, this is because Dynamic Event Flow is enabled for the XIM protocol to satisfy some applications which do not support Static Event Flow. As a result, key events are not sent from clients to SCIM when it is not activated. It is not configurable in the setup panel of scim-1.4.2 but you can disable it by editing ~/.scim/config as follows, which will enable you to use multi/dead keys without activating SCIM.
| ||
In order to make all changes take effect, you may need to start a new session.
2. Let's type Japanese |
Open GEdit and press Ctrl+Space. It activates SCIM and you will see the SCIM toolbar at the bottom right corner of your screen. If you have other SCIM input methods installed, left-click on the input method label and select Japanese -> Anthy from the menu. scim-anthy starts in the Hiragana input mode by default and the toolbar looks like this.

There are five vowels in Japanese: あいうえお. In the Hiragana input mode, 'aiueo' turns into 'あいうえお' as you type. All other Hiragana letters are generated by a combination of (a) consonant(s) and one of the five vowels. 'kakikukeko' turns into 'かきくけこ', 'sashisuseso' turns into 'さしすせそ' etc. etc.

'ん' is exceptional. A single 'n' turns into 'ん' when followed by a consonant. For example, 'genki' turns into 'げんき (doing well)'. To write 'こんにちは (hello)', however, you need to type 'konnnichiha' or 'konnnitiha'.
The small tsu (っ) is defined as 'xtsu', 'xtu', 'ltsu' and 'ltu' in the default Romaji table of scim-anthy, but you can usually get it more easily by doubling the following consonant as in 'nipponn', which turns into 'にっぽん'.
For more details about romaji-kana conversion, please refer to this page.
Enter or Ctrl+j commits typed characters and the underline disappears. Esc or Ctrl+g cancels it.
When you need to correct part of the underlined string (preedit string), you can move the caret using Left, Right, Home and End.
As in normal texts, Delete deletes a character on the right; BackSpace deletes a character on the left.
For words borrowed from other languages, we usually use Katakana. Type 'aisukuri-mu' in the Hiragana input mode, then press F7. It will be converted into 'アイスクリーム (icecream)'.


Press Enter or Ctrl+j to commit the word, or simply start typing a next word. Esc or Ctrl+g reverts it to Hiragana.
As you see, a hyphen turns into a prolonged sound mark.
For that matter, F8 converts a string into half-width characters (half-width Katakana or regular Latin alphabets depending on the previous state), Shift+F8 into half-width Katakana, F9 into wide-width Latin alphabets, F10 into regular Latin alphabets and F6 into Hiragana. Also, pressing F9 or F10 in succession performs case conversion: home > HOME > Home.
You can also get 'アイスクリーム' simply by pressing Space after typing 'aisukuri-mu' since most common loan words are included in Anthy.
When you want to type all words in Katakana, click on the 'あ' on the toolbar and select Katakana from the menu.

In the Katakana input mode, 'tachitsuteto' turns into 'タチツテト' instead of 'たちつてと' as you type.
You can change input modes more easily using keyboard shortcuts. Ctrl+period cycles through Japanese input modes i.e. Hiragana, Katakana and Half-width Katakana. Ctrl+comma or Ctrl+j switches between Japanese and Latin input modes.
There are lots of homonyms in Japanese, so you sometimes need to select a right word from multiple candidates. Type 'kikai' for example and press Space. It will turn into the first candidate.


If that is the one you expected, press Enter or Ctrl+j to commit it, or simply start typing a next word. Otherwise, press Space again to see other candidates in the lookup table.

Space, Ctrl+n or Down selects a next candidate; Ctrl+p or Up selects a previous candidate. You can also select a candidate directly by number shown in the lookup table. (For your information, the five candidates mean machine, opportunity, grotesqueness, appliance/instrument and community of Go/Shogi players, respectively.)
PageDown shows a next page of candidates and PageUP shows a previous page of candidates. You can jump to the last candidate of the last page with End and the first candidate of the first page with Home.
Esc or Ctrl+g restores the previous state. For example, when the lookup table is open, it first closes the window, reverts the converted string to Hiragana, then cancels it.
Anthy is a kana-kanji conversion engine which converts a phrase or a sentence consisting of multiple segments as well.
Type 'karehamisoshirugadaisukidesu. (He likes miso soup very much.)' and press Space.


If the whole string is converted as expected, press Enter or Ctrl+j to commit it, or simply start typing a next word.
When you need to change part of the converted string, select the segment you want to edit using Right or Ctrl+f and press Space.

Other candidates will be shown in the lookup table as above. Select the one you want to use and press Ctrl+Down, which commits the selected segment and those preceding it.

Enter or Ctrl+j commits the whole string no matter which segment is selected.
You can select a previous segment with Left or Ctrl+b. Home or Ctrl+a selects the first segment; End or Ctrl+e selects the last segment.
When you need to shrink or broaden a segment to adjust conversion, use Shift+Left or Ctrl+i to shrink it and Shift+Right or Ctrl+o to broaden it.
A period and a comma turn into double-byte Japanese punctuation marks (。、) by default. A left bracket ([) and a right bracket (]) turn into Japanese quotation marks (「」).
3. Customize scim-anthy |
scim-anthy has a bunch of options which allow you to fine-tune its behaviors to suit your typing/writing habits.

Input mode: You can specify the default input mode in which scim-anthy starts.
Typing method: Romaji typing method uses romaji-kana conversion as described in 'How to write Hiragana'. scim-anthy supports Kana and Thumb-shift typing methods as well but they require a Japanese keyboard with Hiragana glyphs printed on key tops.
Conversion mode: When set to 'Multi segment', a preedit string automatically gets divided into segments when you start conversion and Anthy presents the first candidate for each of them as described in 'Convert a sentence'. When set to 'Single segment', no division occurs and Anthy regards the whole string as one segment and presents candidates for it. 'Convert as you type' is available in both Multi segment and Single segment modes. You can switch among the four modes from the toolbar if the 'Show conversion mode label' option is enabled under the Toolbar tab. Those modes are represented as 連 (Multi segment), 単 (Single segment), 逐 連 (Convert as you type - Multi segment) and 逐 単 (Convert as you type - Single segment) on the toolbar.
Style of comma and period: Four styles are available. You can switch among them from the toolbar if the 'Show period style label' option is enabled under the Toolbar tab.
Space type: When set to 'Wide', Space always puts a double-byte space regardless of the input mode; when set to 'Half', it always puts a single-byte space; when set to 'Follow input mode', it puts a double-byte space in Hiragana and Katakana mode and a single-byte space in Latin and Half width Katakana mode. Shift+Space puts the alternative type of space. For example, when Space type is set to 'Half', Shift+Space always puts a double-byte space.
Input from ten key: As with Space type, three styles are available. This is applied to numbers on the numeric pad that are defined as KP_1, KP_2 etc.
Behavior on a comma or a period: When set to 'Start conversion', scim-anthy automatically starts conversion when you type a comma or a period. When set to 'Commit', preedit strings get committed with a comma or a period without being converted.
scim-anthy-0.6.1 provides three themes (Default, ATOK and Microsoft IME) and you can select one from the pulldown menu. All key bindings are customizable and your modifications to one of the themes will be saved as 'User defined'.

Let's suppose that you use the default theme and want to bring up a next page of candidates with Ctrl+n. First, make sure that the key bindings theme is set to Default, then select 'Candidate keys' from the pulldown menu for 'Group' and double click on 'Page down' (or select 'Page down' and press 'Choose keys...'). Then the hotkey editor opens as below.

Press the square button on the right of 'Key Code', then you will be prompted to grab a key.

Press n, then it will appear in the Key Code field. Tick Ctrl as the modifier, then click 'Add'.

Note that in order to make it work for that action, you need to delete the same definition from 'Next candidate'.
![]() | Tip |
|---|---|
| If you are using a French AZERTY keyboard, it may be a good idea to customize direct selection keys like this. Numbers on the numeric pad (KP_*) and those in the upper case (Shift+*) cannot be used for candidate selection. | |
![]() | Warning |
|---|---|
| If you reselect a theme from the pulldown menu and apply it, the user defined theme will be lost! | |
scim-anthy-0.6.1 provides four romaji tables (Default, ATOK, Microsoft IME and AZIK) and you can select one from the pulldown menu. You can see all definitions in each table by pressing the Customize button. As with key bindings, you can customize them and your modifications to one of the tables will be saved as 'User defined'.

I changed 'la/li/lu/le/lo' in the default table to 'ら/り/る/れ/ろ' so that I can type either 'remon' or 'lemon' to write 'レモン (lemon)'.

I also added several symbols to my table as below. Those symbols can be input via scim-input-pad.

![]() | Tip |
|---|---|
| You can sort sequences/results in ascending/descending order by clicking the relevant label. | |
![]() | Warning |
|---|---|
| If you reselect a romaji table from the pulldown menu and apply it, the user defined table will be lost! | |
There are three options that you can enable/disable according to your preference.
Allow splitting romaji on editing preedit string: This option affects the way you can edit preedit strings. Let me give you an example. 'wi' or 'kyo' turns into 'きょ' in the Hiragana input mode. When splitting is enabled, Left moves the caret a character left i.e. before 'ょ'; when disabled, before 'き'. Likewise, BackSpace deletes 'ょ' when enabled; 'きょ' when disabled.
Use half-width characters for symbols: If this option is enabled, symbols (question mark, exclamation mark, parenthesis etc.) will always be input in half-width regardless of the input mode.
Use half-width characters for numbers: If this option is enabled, numbers on the upper row of your keyboard will always be input in half-width regardless of the input mode. If you leave this option disabled and set 'Input from ten key' to 'Half', you can use numbers on the upper row of your keyboard for wide-width numbers and those on the numeric pad for half-width numbers in Hiragana and Katakana modes.
With the default setting, Anthy learns from your selection of candidates whether you explicitly commit (a) candidate(s) with Enter or Ctrl+j (manual committing) or simply start typing a next word after getting (a) word(s) converted correctly (auto committing). If you untick the two options, Anthy will stop learning and your selection will not affect the priority of candidates i.e. the order in which candidates are presented. When the both options are enabled (default), you still can tell Anthy not to learn (a) specific word(s) by committing it/them with Shift+Enter. It is also possible to disable learning and use Shift+Enter only for conversion results that you want Anthy to learn. You can define key bindings to commit the first/selected segment in the same manner if you need them.

The behavior of the candidate window (lookup table) can be configured in this panel.

Show "Candidates" label: If you disable this option, the upper part of the lookup table where the total number of candidates is indicated will disappear.
Close candidate window when select a candidate directly: When this option is enabled, the lookup table closes when you select a candidate by number. When disabled, it stays open, from where you can reselect a candidate by number or using next/previous candidate keys.
Number of candidates to show in a page: If you use the horizontal lookup table, you do not always get the specified number of candidates in a page because of the maximum width of the lookup table set by scim.
Number of triggers until show: This value specifies how many times you are supposed to press Space on a segment in preedit before the lookup table appears. The descriptions in the previous chapter are based on the default setting i.e. 2.
You can specify which buttons to put on the toolbar depending on your needs.

When all the options except 'Show typing method label' are enabled, the toolbar looks like this.

You can also customize the appearance of strings in preedit/conversion although those settings are not respected in Qt/KDE and some other applications.

Below is a screenshot after changing 'Selected segment' to 'BG color' and setting it to #87CEFA in the built-in color chooser.

4. Kasumi - Dictionary management tool |
scim-anthy is pre-configured to use kasumi as the dictionary management tool.
Let's suppose that you have a friend named 春花 (haruka) but it is not found in Anthy's dictionary. You can get the two characters one by one i.e. '春 (haru)' and '花 (ka or hana)' but it is not fun. In order to be able to get her name as one segment, you need to add it to your private dictionary.
When scim-anthy is activated, pressing F12 launches kasumi in 'Adding mode'. Fill in her name in Kanji in the Spelling field and its pronunciation in Hiragana in the Sound field. Select 'Person's name' for the word class, adjust frequency if necessary and click 'Add'.

![]() | Tip |
|---|---|
| If you copy a word and then press F12, it will be automatically added to the Spelling field. | |
When you want to see/modify other entries in your dictionary, click 'Manage Mode'. Pressing F11 launches kasumi in this mode. Alternatively, you can start kasumi from the toolbar if the 'Show dictionary menu label' option is enabled under the Toolbar tab of the Anthy panel.
5. Tomoe - Handwriting recognition |
Tomoe is a Japanese handwriting recognition engine and scim-tomoe is a module for SCIM to make use of it. The first official version of scim-tomoe was released just a few days after the version freeze of the Mandriva Linux 2006 development cycle, so it is not included in the distribution. Still, you can try out a pre-release CVS version which is available from main media or 0.1.0 from Cooker (at your own risk) when it becomes available after the opening of development for 2007. Descriptions on this page are based on the latter. :)
Open GEdit and select 'Handwriting recognition' from the SCIM menu. When the 'Use auto find' option is enabled in Preferences, candidates will appear as you start drawing a character. When disabled, candidates will be presented when you press the Find button. The Back button undoes a stroke and the Clear button erases the canvas.

Click on a candidate, then it will be input in GEdit. The 'clear the canvas when select a candidate' option is available. The three buttons under the separator (Space, BackSpace and Enter) can be used to edit text in the editor. Note that input into Qt/KDE applications is not supported yet.
![]() | Warning |
|---|---|
| Stroke order and number of strokes count! If you draw a square in a single stroke, you will not get '口 (mouth)'. It should be written in three strokes. | |
Move the focus in GEdit and press Ctrl+Space, then you will see the tomoe icon on the SCIM toolbar as below. The icon will stay there until you close the application and you can show/hide the pad by clicking on it. The keyboard icon on the right is for scim-input-pad.

6. Links |
August 12th, 2005 by Yukiko BANDO --- Most recently updated: September 10th, 2005