Guessing game with languages, keyboards and timezones

For the last three years, I’ve been working as a developer of the Anaconda installer, the installer of Fedora, RHEL and other related GNU/Linux distributions. Aside from other things I’ve been spending my efforts on language, keyboard and timezone configuration in the installation process and since there were maybe 20 bug reports, complaints etc. on how pre-selection of these parameters work, I’d like to explain the background of that process a bit.

The very first thing the installer has to provide is language selection so that everything else may be localized based on user’s choice.  Then timezone and keyboard configuration follows, in case of the Anaconda installer as two screens (spokes) reachable from the summary screen (hub). To preselect the language (or more precisely locale) the installer may use the GeoIP information if it’s available. So we can preselect the most likely locale and let user do the decision whether it is the right one or whether it should be changed to something else. Easy, isn’t it? But what’s next? What is the right keyboard and timezone?

At the first glance it may seem easy. The user has chosen “Czech” language, so we simply preset the “Czech” keyboard layout (cz) and “Europe/Prague” as timezone. No problem, right? But what about other languages? E.g. there is the “French” keyboard layout (fr), but you know what? It’s not used by French people much often, the ‘fr (oss)’ layout is the one that is usually used. How to find out something like that? And how to recognize “Greek” and “Greek, modern 1453-” as the same language (the former one as an official language name, the latter one used by the X server configuration files and ISO codes)? That’s why the langtable project [1] was born to provide this kind of information and help not only Anaconda with preselecting sane defaults. Still, what about non-ascii keyboard layouts like “Russian” (ru)? Should there be also “English” (us) layout configured? And should “ru” or “us” be the default? What should be the default switching option?

The situation with timezones is even more complicated. Formerly, users were asked to choose from languages not locales. If user chose “French” as the langauge which timezone should have been preselected? “Europe/Paris”, “Europe/Zurich” or “America/Montreal”? Things got much better with letting user choose the locale, thus one of “French (France)”,  “French (Switzerland)”, “French (Canada)”, etc. Still, is the chosen language best source of information for timezone preselection? If user chooses the “Czech” language, it’s probably sane to expect they will type “Czech” and thus use  the “cz” keyboard layout even if GeoIP tells us they are e. g. in Japan. But in case of the timezone preselection, isn’t GeoIP information better hint? Of course, if there is no GeoIP information (no network), the chosen language is the only hint the installer has.

Wait, what about doing timezone preselection based on chosen keyboard layout? One of the bug reports [2] mentions following:

Anaconda deduces timezone from language.  Should use keyboard instead.  Many people, including me, whose native language is not English, prefer to use English as main distro language instead of dealing with inaccurate or bad translations along with glitschs when displaying non-AsII characters.  So if user selects US English as primary language it does not mean he lives in America.  The real clue is his keyboard.  Even for laptops: if he selects french keybord, it means the laptop has been bought in France and there is a 99,9% chance user lives in France.  So Anaconda should propose France's timezone not US East-Coast timezone.

Maybe this looks like a good idea. However, there are many wrong assumptions hidden in it. First of all, if user selects french keyboard, are they from France, Switzerland, Canada or some other French speaking country? There is no clue. The other issue is the fact that keyboard selection is done in the spoke rechable from the summary hub. Thus timezone preselection based on chosen keyboard would cause undercover changes in the configuration the user will hardly notice. Let’s say user visits the summary hub and sees e.g. the “America/Montreal” timezone in the Date&Time spoke’s status then visits the Keyboard spoke and selects the “Czech” keyboard layout. If the timezone had changed to “Europe/Prague” behind the scenes at that moment, the user would have hardly noticed the change in the Date&Time spoke’s status and would have probably ended with the wrong configuration of the installed system.

So this is how the Anaconda installer plays the guessing game with languages, keyboards and timezones. To sum it up, hints are processed for pre-selections as follows:

  • language: GeoIP
  • keyboard: langtable using the selected language
  • timezone: GeoIP, if not available, langtable using the selected language

Have an idea how to improve this process? Let me know in the comments! I’m truly open to change the implementation to work better. But don’t forget it won’t be only you and people from your country, using your keyboard layout and living in your timezone using the Anaconda installer.


4 thoughts on “Guessing game with languages, keyboards and timezones

  1. Actually the Canadian French keyboard has, except for diacritics, more in common with British Canadian and US keyboard than with the French keyboard. Belgian layout is identic to the French one for alphanumerics but some non alphanumerics are “out of place”. One of thel is @. With Spanish you have about the same thing Spanish layout is different from, say, Mexican layout so if a user selects Spanish keyboard

    Since it is not uncommon for IT professionals(see below for why it is a good idea) to select US english as distribution language even if they are not native English speakers perhaps rule should be “if user selects US English as language then don’t trust, look at keyboard”.

    BTW: I will tell you a personal anecdote about languages on Linux. The only time since I began using Linux in 1994 I didn’t choose US English as language was several yeatrs ago on a machine I shared with another guy. This machine was supposed to run Oracle. He would be the DBA. Since I was not sure about his ability to read English I selected French as distribution language. Then I installed Oracle and… the installer went into an infinite loop. I retried a couple times same thing. I triple checked my system config all was right. I googled about the problem and everyone was reporting of success running that very same version of Oracle on that very same version of RHEL. I began wondering what was different between me and the others? Then sudenly there was light in my mind: I set the language-related environment variables to American values, reran the Orcale installer and it worked.

    1. It’s not at all easy to setup the heuristics right and with basically any change, somebody starts complaining about something. I myself use English as system’s language and English + Czech (qwerty) as keyboard layouts. Since I install my systems in the Czech Republic, the GeoIP suggested timezones is the best choice for me. Anyway, I think that operating system installation is not an action one does so often that two more clicks to correct the presets would be a big deal. :-)

      Nice story, I’m glade those times are long gone. I believe Anaconda is now really good at dealing with languages, keyboards and timezones. And some more things are right at the horizon. Keep tuned!

  2. I always choose English for the system language to stay international when searching for errors and US keyboard for system tasks. And I have a laptop which I move between countries. I wonder how anyone wants to guess my timezone from that.

    1. By the way Gentoo is now overriding English for some of its system level CLI tools exactly because it’s an internationally used distribution and the developers want to have bug report attachements in English. Few of them would understand an error message in Czech.

Comments are closed.