Author Topic: UTF-8 Support and Font Rendering  (Read 3686 times)

hailstone

  • GAE Team
  • Battle Machine
  • ********
  • Posts: 1,569
    • View Profile
    • http://blog.nturn.net/
UTF-8 Support and Font Rendering
« on: 13 February 2011, 09:15:19 »
To help me with adding support for unicode how many additional characters after ASCII would be approximately needed for different languages?

Edit: Changed subject
« Last Edit: 15 February 2011, 23:08:14 by hailstone »
Glest Advanced Engine - Admin/Programmer
http://sourceforge.net/apps/trac/glestae/

will

  • Golem
  • ******
  • Posts: 783
    • View Profile
Re: How many additional characters are needed?
« Reply #1 on: 13 February 2011, 09:52:38 »
Joel on Software covers this very well.

I recommend that all strings are on-disk and in-memory as UTF8; using wchar_t might rule out klingon translations and such.

Then you want to use simple font-chaining - if the font doesn't have the glyph, look for it in a fall-back font and so on, until you give up and draw a ? - and you're all set.

hailstone

  • GAE Team
  • Battle Machine
  • ********
  • Posts: 1,569
    • View Profile
    • http://blog.nturn.net/
Re: How many additional characters are needed?
« Reply #2 on: 15 February 2011, 10:22:28 »
The article is good but it doesn't really go into using utf-8 in C++. I found these to be useful: utfcpp, utf-8 and unicode faq, another utf faq, the last resort font is useful to determine missing glyphs.

If someone is trying to learn unicode I suggest not looking at forums or help sites. They did more harm than good for me. Also if using Visual Studio remember to change the source file encoding to "Unicode (UTF-8 without signature) - Codepage 65001" (in File->Advanced Save Options) if you're wanting to hardcode test strings.

Now I understand how to use it I've modified the NeHe freetype tutorial to work with utf-8. I'm fairly confident I can get it working in GAE but I'm going to wait until after 0.4 is released. There are around 170 instances of internationalised text being used and I'd need to go through each to make sure the strings are being handled correctly.

My original idea was more like code pages. The ASCII characters would be loaded and the rest would be remapped to handle the characters loaded from the lang files. The problem is it would still be limited to 256 characters which is why I was asking the question but hopefully it shouldn't be a problem now.

Font chaining can be added on later. Just having utf-8 support is big step forward. Keyboard input needs to be considered too.

I'll upload the modified tutorial code for peer review after I've cleaned it up a bit.
« Last Edit: 15 February 2011, 10:26:31 by hailstone »
Glest Advanced Engine - Admin/Programmer
http://sourceforge.net/apps/trac/glestae/

titi

  • MegaGlest Team
  • Airship
  • ********
  • Posts: 3,981
    • View Profile
    • http://www.titusgames.de
Re: How many additional characters are needed?
« Reply #3 on: 15 February 2011, 10:32:38 »
Does the last ressource fonts license really fit? No export to .....
I don't think this is CC-BY-SA v3 compatible.
Try Megaglest! Improved Engine / New factions / New tilesets / New maps / New scenarios

will

  • Golem
  • ******
  • Posts: 783
    • View Profile
Re: How many additional characters are needed?
« Reply #4 on: 15 February 2011, 11:48:48 »
I recently wrote font rendering for my engine: https://github.com/williame/GlestNG/blob/master/font.cpp

It loads the normal bitmap fonts as produced by AngelCode's Bitmap Font Generator.

It puts the basic ascii set into a quick lookup table and any other codepoints into an array that it can search with a binary search.  Oh, and it does kerning too.

What is missing is a utf-8 decoder on the input strings, but that's a trivial addition.

You could imagine the 'get' function going through a chain of fallback fonts.

Feel free to borrow/adapt that code.

Bitmap fonts are the norm in OpenGL and DirectX games.  Given the inappropriate choice of truetype fonts on Linux by the Glest engines, I'm doubtful that truetype conversion on load is really getting the kind of control that bundling your own fonts - which might as well be bitmapped - gives you.

AngelFonts can include outlines, which is essential for getting good contrast on all screens.

Oh, and with bitmap fonts you easily add features like first-character capitalisation in paragraphs being some pretty picture glyph in keeping with a medieval theme.  Hopefully the font can be per-faction overriding per-total-conversion selectable.
« Last Edit: 15 February 2011, 11:52:25 by will »

Yggdrasil

  • GAE Team
  • Ornithopter
  • ********
  • Posts: 409
    • View Profile
Re: How many additional characters are needed?
« Reply #5 on: 15 February 2011, 12:47:19 »
Just throwing in that physfs uses UTF-8 internally, see bottom:
http://icculus.org/physfs/docs/html/

Not only translation needs unicode support but also file and folder names. At least the path to config dir because it contains the user name. The problem currently is just how we obtain the home directory (getenv()) and save it in GAE. Physfs is able to mount it.
« Last Edit: 15 February 2011, 12:58:00 by Yggdrasil »

hailstone

  • GAE Team
  • Battle Machine
  • ********
  • Posts: 1,569
    • View Profile
    • http://blog.nturn.net/
Re: UTF-8 Support and Font Rendering
« Reply #6 on: 16 February 2011, 11:50:37 »
Quote from: titi
Does the last ressource fonts license really fit? No export to .....
I don't think this is CC-BY-SA v3 compatible.
Good thing I wasn't planning on packaging it :P. The GNU FreeFonts seem like a better fit. FreeSerif.ttf has more characters and glyphs than Microsoft's arial.ttf but doesn't have any kern pairs.

Quote from: will
Bitmap fonts are the norm in OpenGL and DirectX games. Given the inappropriate choice of truetype fonts on Linux by the Glest engines, I'm doubtful that truetype conversion on load is really getting the kind of control that bundling your own fonts - which might as well be bitmapped - gives you.
You should really provide at least one reference if you're going to make such a claim. According to the NeHe tutorial 43, Blizzard uses the FreeType library. Fonts can be changed so I'm not terribly worried about that right now.  It might just be the terminology but from the FTGL Tutorial bitmap fonts look to be one of the least desirable choices whereas pixmap seems to be as good if not better than textures from playing around with their demo.

Quote from: Yggdrasil
Not only translation needs unicode support but also file and folder names. At least the path to config dir because it contains the user name. The problem currently is just how we obtain the home directory (getenv()) and save it in GAE. Physfs is able to mount it.
I hadn't thought about that. getenv() might be ok as long as it doesn't mangle the string internally since it returns a char pointer. If there are problems we'll have to convert to/from wchar_t versions of the functions (for Windows there are function that do it, or we can use utfcpp).

I've uploaded the modified font tutorial. It has a problem with glyph positioning and it's not the font that causes it.
Glest Advanced Engine - Admin/Programmer
http://sourceforge.net/apps/trac/glestae/

will

  • Golem
  • ******
  • Posts: 783
    • View Profile
Re: UTF-8 Support and Font Rendering
« Reply #7 on: 17 February 2011, 07:45:20 »
[Removed by Admin]

Second, lets return to fonts.

The article you cited was old, but its relevant and basically you mixed up bitmaps and textures.

There are two main types of font - "bitmap fonts" and "vector fonts" (truetype are vector).

The bitmap fonts (WGL fonts) they pick on in that Nehe tutorial are monochrome and they observe that they look bad because of that.  That's because the bitmap fonts have no alpha, not because they are bitmapped, if you can see the distinction.

You can't use vector fonts directly in OpenGL.  You have to render them to a texture - with alpha - and then draw quads with them on it.  So you have to rasterise your vector font to actually draw it.  So to use a vector font you have to turn it into a bitmap font, and you want to have alpha when you do so.

The tutorial code went linking against freetype and doing rasterising of glyphs from real fonts on demand and caching the glyphs.  I imagine GLEST/GAE/MG does exactly this, as they use my fonts on linux and get the sizes completely wrong (there have been work-arounds specifying general scaling for fonts in the INI but it's not the best fix).  You can see that this means you are deferring size and weight and other properties to runtime so this can give lots of control if you want.  Its a very 'proper' approach.

The tutorial code goes making individual textures for each glyph.  Think about that...

It is perhaps more common to pre-rasterise the font however - create the bitmap font (with alpha).  I linked to the definitive tool for doing just that, and my code that reads the font files it produces.  This approach you typically get a single tga for all glyphs, tightly packed, so you can then actually turn your text into a list of quads and texture coords and draw it in a single op.

The ultimate answer is of course to have it all - doing your rasterisation on demand, but rasterising US-ASCII printables early on, and doing dynamic packing of new glyphs into a texture that you update on the fly so as to avoid having a gazillion small textures floating about.  This is tedious code to get right, hence me suggesting sticking with pre-rasterised fonts.  You can happily include the few thousand glyphs for various oriental languages and such and still only be using 3 or 5 MB of texture...

Once you have the font rasterised you can do cool things that you see in game text but not on normal word-processing applications - you can use the font bitmap as an alpha channel whilst drawing the font using another texture to get different colours for the actual text e.g. a rainbow pattern is common.  Or you can actually have illustrations for the characters, with different colours - a nice bitmapped glest font with the yellow outline and the silver innards would be very pleasing.

Edit by Omega: See below.
« Last Edit: 19 February 2011, 04:46:38 by Omega »

Omega

  • MegaGlest Team
  • Dragon
  • ********
  • Posts: 6,124
  • I'm totally not a robot
    • View Profile
    • Glest Wiki
    • Email
Re: UTF-8 Support and Font Rendering
« Reply #8 on: 19 February 2011, 04:48:24 »
Notice: All attack posts removed. Please bear in mind that one of the board's most important rules is not to attack other users. If you disagree with their ideas, attack the ideas. Don't insult the poster.

Moving on...
Edit the MegaGlest wiki: http://docs.megaglest.org/

Comprehensive documentation is our goal. Help us reach it!

 

anything