Yea, I'm not sure what went wrong with the i686 windows build. I should probably just rebuild it and make sure the problems are fixed, or maybe some other msvc setting. But you should always use the sse2 build if possible (Intel Pentium 4 and later or AMD Opteron and later) because it will (or should) have better performance.
I've been working on 0.2.11 lately. I've done extensive profiling & experimentation and have a lot of performance improvements in so far, the greatest of which, oddly enough, by limiting how often the "stop" skill (not command) is allowed to update -- it's now limited to 4 times per second (this is where the auto-attack/repair/flee calculations are made). Having 500-600 units in the game should NOT cause lag & unplayability. I had some 450-ish units at one point in a test, but I'm an a phenom 9850, so it's not a good measurement (I mean 500-600 on an older processor with graphics settings appropriately tuned). I did also fix the last two outstanding bugs, the meeting point button not working and units on the client of a network game not (appearing) to respond to commands that you give and then later "jump" to where you told them to go.
Having taken a bite out of CPU usage, I'm now revamping the keyboard input (completely) to enable a keymap via an .ini file. I know this is the stable branch, but this is just something I really feel is needed and I'm making very drastic changes to the engine in the 0.3 branch, so I figured that this is probably the best place to put this in. There are a lot of limitations to the current way that keystrokes are processed, which is the reason I'm revamping that layer.
Lastly, I'm still planning on implementing a change to the rendering system that will improve performance. In the glestadv.ini file, there will be a setting for "low memory usage" that dictates how interpolation will be performed. If set, there will be a single interpolation buffer (instead of one for each model) resulting in identical CPU utilization, but lower memory consumption. If not set, each unit will have it's own interpolation buffer (usually using more memory) but the amount of CPU utilization will be greatly reduced -- I wont get into the details of that now. I'm not sure if that will be in an 0.2.11 or 0.2.12, but I want this in the 0.2 branch because I want to get performance improved as much as possible for the stable branch while we (hopefully will have some contributions from others, otherwise "I" ) work on the 0.3 branch which will integrate lua and all other Glest 3.2 goodies, add new skills, add more flexible particle systems, LOD scaling for better rendering at far distances, integration of ODE (physics engine), improved animations & death sequences, (maybe even skeletal animation with bone weight support), etc.