Author Topic: Making unit text strings translateable  (Read 9009 times)

PolitikerNEU

  • Guest
Re: Making unit text strings translateable
« Reply #25 on: 22 September 2009, 12:05:25 »
Quote
No, it's the same... nothing can be hard-coded anyway, the 'translatable' strings need to be read from the XML, then translations read in to override them... from XML or Lua it's only slightly different, and I don't think harder in either. You need to determine the translatables at run-time in either case, for the Lua solution you then just try to read each translatable from the Lua state, eg if you've determined you have a translatable string for the archmage's 'ice_nova' command, you query the Lua table for magic.archmage.commands.ice_nova, if there's a string there, you have your translation, if there isn't it hasn't been translated, and you fall back as normal.
May be true, but you at least need code to know that if you need the archmage's 'ice_nova' command, you need to lookup magic.marchmage.commands.ice_nova and not anything else

Quote
Well, taking the idea to the extreme, you could eliminate XML from the game completely
True, but IMHO lua should be used for dynamic things and XML for static ones - since again: XML is far easier to parse than lua when it comes to tool support (like a unit editor).

silnarm

  • Local Moderator
  • Behemoth
  • ********
  • Posts: 1,373
    • View Profile
Re: Making unit text strings translateable
« Reply #26 on: 22 September 2009, 13:00:56 »
... but you at least need code to know that if you need the archmage's 'ice_nova' command, you need to lookup magic.marchmage.commands.ice_nova and not anything else
You're loading the 'magic' faction, 'archmage' unit, and you have just read the 'ice_nova' command, all the information is at hand, do your lookup. Simple :)

Quote
Quote
Well, taking the idea to the extreme, you could eliminate XML from the game completely
True, but IMHO lua should be used for dynamic things and XML for static ones - since again: XML is far easier to parse than lua when it comes to tool support (like a unit editor).
Yeah, I'm not suggesting we actually do that.  And yes, for utility programs XML is nice...
Glest Advanced Engine - Code Monkey

Timeline | Downloads

silnarm

  • Local Moderator
  • Behemoth
  • ********
  • Posts: 1,373
    • View Profile
Re: Making unit text strings translateable
« Reply #27 on: 23 September 2009, 02:06:10 »
Excuse my cross-quote...
Quote from: PolitikerNEU
silnarm doesn't like it and wants to have another system.

I never said that ;) I was probably pushing for Lua to hard and came accross wrong.  The whole Lua Vs XML discussion is actually quite irrelevant.

My main concern is that you may be making it harder for yourself than it needs to be.  I know I did that a hell of a lot in my younger days, and still do occasionally now ;)  Designing a whole sub-system might be more 'fun' but I'd prefer to look at what we have, what needs translating, and how we can get the job done as easily as possible.  Don't start with a 'wish-list', start with a 'need-this-list'.

So anyway, I took the liberty of strolling through the xml loading code again...

These are the 'translatables' we need to collect:
Code: [Select]
tech-tree: [tech/tech.xml]
Translatables:
Tech-Tree name, directory name (or filename without extension)
Attack-Types, from xml
Armour-Types, from xml

resources: [tech/resources/]
Translatables:
Resource names, sub-directory names
Individual resource XMLs have no translatables

factions: [tech/factions/faction/faction.xml]
Translatables:
Faction name, directory name
XML contains no translatables

upgrades: [tech/factions/faction/upgrades/]
Translatables:
Upgrade names, sub-directory names
Individual upgrade XMLs have no translatables

units: [tech/factions/faction/units/unit/unit.xml]
Translatables:
Unit name, directory name
Levels, from xml
Slection-Sounds, from xml
Command-Sounds, from xml
Skills:
Skill name, from xml*
Skill sounds, from xml
Commands:
Command name, from xml

* Are skill names actualy ever displayed in game??

Here's where/how we collect it... (excuse my semi-pseudo-code)
[NB: this removes the OO style references to translatables I was using earlier, so there is no
duplication of strings in the translatables table.]
Code: [Select]
// 'Global' (probably in 'Lang', which is a singleton)
map<string,string> translatables;

TechTree::load( string &path ) {
string techname = getNameFromPath( path );
translatables[techname] = techname; // put in translation tables, with default value

// code gets directory names from /tech/resources
foreach ( string name in filenames ) {
translatables[name] = name;
}

// loads tech-tree Xml

foreach ( XmlNode node in AttackTypeNode.childen ) {
translatables[node["name"]] = node["name"];
}
// same for armour types...

// factions... names were passed in as a parameter, in GAE this is a set, I think vanilla Glest
// uses a vector.
foreach ( string name in factionNames ) {
translatables[name] = name;
}
// code loads factions...
}

FactionType::load() {
// code pre-loads unit and upgrade names...
foreach ( string name in (unitNames + upgradeNames) ) {
translatables[name] = name;
}
// code loads units
// code loads upgrades
}

UnitType::load() {
// code starts loading paramaters
if ( levelsNode ) {
foreach ( XmlNode node in levelsNode.children ) {
translatables[node["name"]] = node["name"];
}
}
// code loads more parameters...

// do something with command and selection sounds

// Code loads skills and commands
}

SkillType::load() {
// code gets stuff from xml
translatables[name] = name;
// sounds?
}

CommandTpye::load() {
// code gets stuff from xml
translatables[name] = name;
}

and so then you have all your translatables, you could write out a template file...
Code: [Select]
// create translation template...
FILE *fp = fopen( "translation_template.ini", "w" );
for ( map<string,string>iterator it = translatables.begin(); it != translatables.end(); ++it ) {
fprintf( fp, "%s=\n", it->first.c_str() );
}
fclose( fp );

or if this is a game, translate...
Code: [Select]
// or load translation...
for ( map<string,string>iterator it = translatables.begin(); it != translatables.end(); ++it ) {
string translation = Lang::getTranslation( it->first );
if ( translation.size() ) { // if not empty string
it->second = translation;
}
}

That's the best I could come up with.  I think it's fairly minimal, clean, and perhaps most importantly, doesn't require modifying any existing XML.
Glest Advanced Engine - Code Monkey

Timeline | Downloads

daniel.santos

  • Guest
Re: Making unit text strings translateable
« Reply #28 on: 23 September 2009, 07:41:17 »
ooh, lots of discussion here!  I haven't read everything posted yet, but I propose we don't re-invent any wheels that don't really need it.  There's a GNU utility called gettext.  Out of the box, it will do the translations for stuff in your code, but that wont work completely for us, we'll still need a mechanism to generate a language file from scenarios, and tech trees.  However, the nice thing about buying into gettext is that you get to use the INSANE assortment of tools that exist for creating translations.  I haven't figured out KBabel yet, but it appears that you can download translation databases for a lot of different languages to automatically produce translations (which I presume one would prefer to later have an actual native of that language edit).  These get spit out in these .po files.

As I said, it wont work for us out of the box, we're still going to have to hack it some to get what we want.  None the less, I would much prefer to re-use what the community has done than to write my own, providing it doesn't suck of course. :)

And KBabel's stupid web site is down (how's that for instilling confidence?), but here's a screenshot from freshmeat:



Anyway, I presume that this isn't the only tool like this.  Also, the entire field is called "i18n" which means "internationalization" (see http://en.wikipedia.org/wiki/Internationalization_and_localization for an explaination).
« Last Edit: 23 September 2009, 07:53:03 by daniel.santos »

daniel.santos

  • Guest
Re: Making unit text strings translateable
« Reply #29 on: 23 September 2009, 08:26:16 »
Aside from it being (somewhat of) a standard, we would get to experience the full the joy of automated translations, like these (Big OT!!!):

Code: [Select]
[img]http://failblog.files.wordpress.com/2008/07/fail-owned-translation-fail.jpg[/img]

[img]http://failblog.files.wordpress.com/2009/08/fail-owned-translating-fail.jpg[/img]

[img]http://failblog.files.wordpress.com/2008/07/fail-owned-translation-fail1.jpg[/img]

[img]http://failblog.files.wordpress.com/2008/08/fail-owned-engrish-fail.jpg[/img]

And the fun is that when doing this in a language none of us know, we won't know what we're *really* saying until somebody posts a bug (in a language we can speak/read) telling us that we just insulted their entire country!
« Last Edit: 7 October 2016, 23:10:47 by filux »

PolitikerNEU

  • Guest
Re: Making unit text strings translateable
« Reply #30 on: 23 September 2009, 14:34:32 »
I have made a simple Directory --> .po-File converter as you requested (I hope it is the thing you wanted, of course I can change everything). It doesn't do much except searching for all "interesting" strings by XPath-Queries (there is one file to select interesting "filenames" and one to select interesting attributes)

I made it with java since it is in my opionion far easier to work with it than with e.g. C++ (and since I am not using windows, I won't use C#)

Here are some screenshots - the "Gui" isn't really great, just one FileChooser for selecting the directory and one to place the .po-File
Code: [Select]
[img]http://img89.imageshack.us/img89/3808/openfolder.png[/img]You can select a folder
Code: [Select]
[img]http://img225.imageshack.us/img225/9177/savepo.png[/img]You can choose where you want to place the .po-File
Code: [Select]
[img]http://img225.imageshack.us/img225/8092/pofile.png[/img]The resulting .po-File

As soon as I manage to create a running jar, I'll edit my post

You currently can't read any PO-Files (which is a must IMHO - to just add new strings and remove old ones) and yeah - it can't do much (btw: The string issue "directories" instead of ".po-Files" is already fixed now :-( )

SVN-Url is:
Code: [Select]
svn co http://subversion.assembla.com/svn/rangliste/TransHelper, will be imported in a few minutes

Hmm ... I got some problems generating the .jar - the files I need aren't looked up correctly (they should be in the folder where the .jar is)

EDIT: And here is the .jar:
Code: [Select]
[url=http://billhome.at/glest/PoGenerator.zip]http://billhome.at/glest/PoGenerator.zip[/url]Extract and run. (And install java)

EDIT: My webspace is available again. You can download the file from the link above.
« Last Edit: 7 October 2016, 23:11:30 by filux »

daniel.santos

  • Guest
Re: Making unit text strings translateable
« Reply #31 on: 24 September 2009, 07:15:27 »
How awesome! =)  Yea, almost everything I learned about internationalizing apps was for Java, it's super-easy in Java because it was designed into the core of the language & VM (for instance, all Strings in Java are wide character -- that doesn't mean that every VM *implements* them that way, but it's never something you need to worry about).  I definitely like this, I'll have to give it a spin tomorrow, thanks PolitikerNEU!

Oh yea, I was reading that you can convert your .po files back & forth to Java-style .properties files too, which is essentially the file format of Glest's .lng files.  I am accustom to that format, having never worked with .po files before.  Either way, this may work out really well! :)

PolitikerNEU

  • Guest
Re: Making unit text strings translateable
« Reply #32 on: 24 September 2009, 07:40:49 »
Hmm ... sorry, but the glest .lng-Format is that easy that I think it'll be easier to just write another getStringLng()-Method for the PO-Entry-class returning only msgid=msgstr (or a new IniEntry-Class ... doesn't matter)

Now I have added an IniEntry-Class (only saving, no loading possible currently, but I might add it later that day).
Whenever you save a .po, the corresponding .lng-File (be warned: the "algorithm" by which the filename is generated is simply: replace .po with .lng, so make sure your extension of the saved file is .po and the string doesn't occur anywhere else in the filepath, I might fix this bug later) is written in the same directory as the .po-File, just with .lng instead of .po

Screenshot of the generated .lng-File (not really interesting:)
Code: [Select]
[img]http://img42.imageshack.us/img42/864/savelng.png[/img]Note that there is an "error" in:
Code: [Select]
found %d fatal error=s'ha trobat %d error fatal
found %d fatal errors=s'han trobat %d errors fatals
Because of the converted Test-POs from the GNU PO-Site. While they would still be recognized correctly by glest correctly, the "ID" is wrong because it would have to consist of A-Za-z0-9_- only. A warning is emitted in this case (but currently only on Stderr, I might add some "real" error message display)
Of course not every .po-Entry can be converted in .ini because the .po-Format supports far more than this simple .ini-Format, but it is sufficent for the .po-Files this utility generates.
« Last Edit: 7 October 2016, 23:12:02 by filux »

daniel.santos

  • Guest
Re: Making unit text strings translateable
« Reply #33 on: 24 September 2009, 20:04:57 »
Hmm, so what is your opinion then?  And before you answer that, let me recap a few issues:
  • Afaik the gettext library manages all of the encoding complexities, which would remove this complexity from GAE, although we may have already addressed the majority of those.
  • Using gettext out of the box, you are supposed to compile .po files into some binary format that gettext can then access quickly.  If we continue using the current mechanism, this will not be necessary.  I expect gettext to be slightly faster, but I don't think it matters enough to be a serious consideration (skip the rest of this bullet if you don't care about performance details).  The current Property class uses a std::map<std::string, std::string>, thus relying upon the speed of std::string::compare(const string&) const (on the surface, it's more complicated than that, but that's what it compiles down to).  I believe that std::string::compare() does a character by character comparison and I'm guessing that gettext creates a 32-bit hash code for it's strings so that less processing is needed, but then again, it's comparing the entire text string and we're just using message identifiers, which are shorter.

So, we could (sorry for the formatting here, I can't figure out how to get it to do a numbered list :( )
  • 1. Continue using the Property class as-is (possibly with further enhancements to better manage various character encoding).  This solution is more in-line with the standard Java mechanism (isolating all of your messages/text to a single class and sticking the actual messages in a .properties file).  Then we can use PolitikerNEU's tool to generate and manage language files for techs, scenarios and whatever other .xml, converting between the .properties/.ini format and .po format to use other translation tools like KBabel.
  • 2. Convert to the .po format and convert the Lang class to call gettext instead of our Properties class, but continue to use message IDs instead of actual English text embedded in the code.  This would eliminate the need to further improve the Properties class to manage encoding and leave us directly in the .po format, which a wide assortment of tools have support for.  This will also require modders to compile their .po files prior to release (which, again, can be encapsulated in a tool we distribute and maybe even have PolitikerNEU's tool do it).
  • 3. Entirely eliminate the Lang class and replace it with direct calls to gettext using the _() macro (not my choice personally, but it's used by a lot of software).

With solutions 1 and 2, we'll still need to add some extra glue to the xgettext program to get it to properly strip out our language strings (probably looking for Lang::getInstance().getString("messageID") or lang.getString("messageID")), however, I've learned that this isn't terribly difficult to do, and it even supports parsing the C and C++ languages! :)  We should probably also translate error messages, not all of them, but at least those that may be meaningful to a user.  Those that only a developer would normally understand can be left in hard-coded English as long as all of our development team speaks English.

Any other ideas for how to approach this?  I'm personally leaning towards solution #1, but I'm not ruling out #2 all together.

Final thoughts: I wouldn't mind terribly if we came up with some way to "script" an automatic translation process.  Apparently, these translation databases are large-ish (200-ish MB each) so maybe we can do this on some server or something, so modders don't have to download 2GB of data to do translations.  Finally (and this is thinking a head a little bit) I hope we can have some kind of mechanism to mark a particular message translation as being human-made or -validated, so that later runs through the translator doesn't attempt to change them.  Lastly, to take i18n all the way, if we really want to do it right, we'll have to have support for languages who's text doesn't read from left to right.  Probably, what we have already is enough to bite off for now.


PolitikerNEU

  • Guest
Re: Making unit text strings translateable
« Reply #34 on: 24 September 2009, 20:31:00 »
I too think that gettext won't be much faster than a "normal" map<string,string> lookup and I haven't found (after searching for a very short time) a good way to compile .po from java so I had to do create the binary format from scratch in java - which I don't like.

I don't really know if solution 1 or 2 would be better, but since 2 looks rather compilcated, I'd prefer solution 1 - but if virtual functions are fast enough, we could just create an Lang-"interface" and implement this using either .po or .mo - that way, you could use .po for simply creating a mod and .mo if you got a compiled one.
One problem of using gettext may be - I don't know it - that you cannot change the language on the fly, if that is true, I think we need to support method 1. (Changing language on the fly is useful if you are playing e.g. together with a player using another language - both could switch to e.g. english for a short time to be able to know the correct unit name - but this could be tricky maybe)

I don't know anything about xgettext, but I hope this will be possible - maybe using some macro if nothing else is possible? (for example:
#ifdef xgettext
#define GETSTRING(x)
#else
#define GETSTRING(x) Lang::getInstance().getString(x)
#endif
)

But actually I don't think these translation databases would be of much use for glest players since "normal" programs use strings like "File" or "Edit", but not "Initiate" (at least not in the meaning of the unit in glest), "archmage Tower" or something like that - additionally since we use IDs I doubt this translation database would find anything.
Using a server would be certainly nice, for example there is this launchpad-thing (is it open source already?) which could be used maybe, but I don't know it.

(Sorry, I am rather tired right now :-/ )
« Last Edit: 24 September 2009, 20:32:43 by PolitikerNEU »

daniel.santos

  • Guest
Re: Making unit text strings translateable
« Reply #35 on: 25 September 2009, 17:28:37 »
(Sorry, I am rather tired right now :-/ )

hehe, I know that feeling! :)

You can switch languages dynamically with gettext, and now I'm personally leaning towards using our own stuff, and converting back & forth to .po to make use of all of the translation tools.  As far as language IDs, we can just put the english version in the .po files when sending them through translation stuff.  And as far as accuracy, each language database is about 200 MB, so I'm better it's better than "File, Edit, etc."

I'm still open to feedback on this.  Also, I dunno about the lanchpad thing, I don't think I've ever heard of it.

Omega

  • MegaGlest Team
  • Dragon
  • ********
  • Posts: 6,167
  • Professional bug writer
    • View Profile
    • Personal site
Re: Making unit text strings translateable
« Reply #36 on: 26 September 2009, 10:56:38 »
Looking back at what I've been missing, I really liked silnarm's idea for the translations! Although I'm not so sure if it would work well for sounds...?!?

As well, there's no need to translate skill names. Those are just references for the commands so they know which skills to use. If you tried to translate it, you'd probably end up with no working commands! :D
Edit the MegaGlest wiki: http://docs.megaglest.org/

My personal projects: http://github.com/KatrinaHoffert