Author Topic: gae 0.2.13-development  (Read 9581 times)

softcoder

  • MegaGlest Team
  • Battle Machine
  • ********
  • Posts: 2,239
    • View Profile
Re: gae 0.2.13-development
« Reply #50 on: 18 February 2010, 21:03:25 »
You may be right, one of the boys did NOT use the updated build (I merged your latest updates into my local code).

So far the boys have been playing network mode all morning without ANY synch issues at all! THIS IS AWESOME!

The remaining problems that I know of are:

- Auto-repair seems to crashes GAE (single or network player mode)
- Cancel command crashes GAE (single or network player mode)

Like you said, you believe these are fixed so I'll re-check my changes with a clean 3.2.2_network branch and see what I may have merged wrong.

But the great news is that the multi-player code really seems to work in GAE now, just a few last things to weed out.

Good work Silnarm, your persistence is showing :)

EDIT:

Silnarm, could you please tell me which files to look at for the cancel and auto-repair fix? I don't see a difference between the clean SVN code and my changes (outside of network code) that would affect these items?

Thanks
« Last Edit: 18 February 2010, 21:37:00 by softcoder »

silnarm

  • Local Moderator
  • Behemoth
  • ********
  • Posts: 1,373
    • View Profile
Re: gae 0.2.13-development
« Reply #51 on: 19 February 2010, 01:00:14 »
So far the boys have been playing network mode all morning without ANY synch issues at all! THIS IS AWESOME!

You don't know how happy I am to read that... another seg-fault or two I can handle, but if the game was going out of sync again, well, I think I may have cried ;)

Quote
Silnarm, could you please tell me which files to look at for the cancel and auto-repair fix? I don't see a difference between the clean SVN code and my changes (outside of network code) that would affect these items?

Cancel Command: for this problem there is changed code in GameNetworkInterface::requestCommand() [was defined in header, is now in cpp] which constructs cancel commands with a different (new) NetworkCommand constructor.  This new constructor is actually only used for cancel commands, at first I thought 'set meeting point' would need to be handled that way too, hence the extra Vec2i param, which actually isn't needed.

Auto-Repair: Ooops... I only disabled this for the stop command... commenting out the body of UnitUpdater::doAutoRepair() [except the return NULL;] would be the quickest and probably best solution for this problem.

Additional Sync problems: There was another sync problem waiting to bite me, attack-stopped had the same 'real time' update interval problem that the stop command had... as did guard (that one probably wouldn't have been found for a while...)

I've committed additional fixes, very small changes to unit_updater.cpp and command_type.cpp, so you should be able to update without issues.

For checking changes in a revision I recommend using trac, it gives very pretty output showing code removed in red and code added in green :)

ie, yesterdays fixes, http://sourceforge.net/apps/trac/glestae/changeset/478
and the two small fixes just then, http://sourceforge.net/apps/trac/glestae/changeset/480

Thanks to you and your boys for the help here, it's difficult to do a lot of testing with multiplayer when you have to play each faction yourself ;)  Your input is greatly appreciated.  Hopefully I'll be able to get a game in against hailstone soon!
Glest Advanced Engine - Code Monkey

Timeline | Downloads

softcoder

  • MegaGlest Team
  • Battle Machine
  • ********
  • Posts: 2,239
    • View Profile
Re: gae 0.2.13-development
« Reply #52 on: 19 February 2010, 05:57:37 »
Ok making progress. Cancel is fixed but auto-repair still crashes (even when I make doAutoRepair return NULL)... which leads to the next observation... doAutoRepair always had the possibility to return NULL.. which if it does.. crashes GAE since it passes a NULL Command object and there is no NULL guards for Command objects in numerous places.

Recap:

- Cancel command fixed
- Auto-repair broken (I'll try to see if I can find a fix for auto-repair)

Thanks

silnarm

  • Local Moderator
  • Behemoth
  • ********
  • Posts: 1,373
    • View Profile
Re: gae 0.2.13-development
« Reply #53 on: 19 February 2010, 06:16:09 »
which leads to the next observation... doAutoRepair always had the possibility to return NULL.. which if it does.. crashes GAE since it passes a NULL Command object and there is no NULL guards for Command objects in numerous places.

There are guards in place, I think you are coming from a Java background, yes? They wont seem obvious, because Java will only accept an expression that evaluates to a bool in a logical test, this is not the case in C/C++.

Code: [Select]
//we can repair any ally => repair it
if ((autoCmd = doAutoRepair(unit))) {
unit->giveCommand(autoCmd);
return;
}

This assigns the value returned by doAutoRepair() to the variable autoCmd, and the expression [(autoCmd = doAutoRepair(unit))] evaluates to the same, and in C++, NULL == 0 and any integral value can be used as a logical test, 0 == false, anything else == true, so a non-null pointer is true, and a null one is false.

Probably takes some getting used to, but each call to doAutoRepair() is in such a test, and is therefore safe.  So I'm at a loss to explain the problems you are experiencing with it.

I can only hope that helps, happy hunting, and good luck!
Glest Advanced Engine - Code Monkey

Timeline | Downloads

softcoder

  • MegaGlest Team
  • Battle Machine
  • ********
  • Posts: 2,239
    • View Profile
Re: gae 0.2.13-development
« Reply #54 on: 19 February 2010, 07:51:28 »
Ok I found a fix for the auto-repair and cancel is working fine now.. ultimately there was an assert in network_types that didn't like the command type auto-repair (nor did it like cancel for that matter).

I'll get the boys to test tomorrow, but so far all main "show stopper" bugs seem to be dealt with.

P.S. I've got 15+ years of C/C++ (only worked in Java for the last 5) :)

silnarm

  • Local Moderator
  • Behemoth
  • ********
  • Posts: 1,373
    • View Profile
Re: gae 0.2.13-development
« Reply #55 on: 19 February 2010, 09:33:01 »
Ok I found a fix for the auto-repair and cancel is working fine now.. ultimately there was an assert in network_types that didn't like the command type auto-repair (nor did it like cancel for that matter).
Those are things I should have noticed, since I templated the path-finding algorithm I have yet to find an appropriate collection of optimisations for debug configuration, and only fall back to it as a last resort (yes, I debug optimised code), the old networking code clearly didn't want cancel commands going through the channels I am now sending them through, and I never saw the asserts  :-[.

Were you running a full debug (non optimised) build, or was it optimised but still with asserts? I found that without the function objects (in the path-finder) being inlined it was incredibly painful (slow), if you were running a non optimised build, how was performance?

Quote from: softcoder
P.S. I've got 15+ years of C/C++ (only worked in Java for the last 5) :)
My apologies, you know what they say about assumptions...
Glest Advanced Engine - Code Monkey

Timeline | Downloads

softcoder

  • MegaGlest Team
  • Battle Machine
  • ********
  • Posts: 2,239
    • View Profile
Re: gae 0.2.13-development
« Reply #56 on: 19 February 2010, 17:26:25 »
No apologies required :)

I always build glest using the defaults from doing ./autogen etc.

So I do code changes then just run jam. I assume its always building release mode (as that is likely the default for everyone?)

I never looked into how to change compiler settings, and all my debugging thus far is via console output (I did not figure out how to make a codeblocks project so I can step through using the debugger.. that will come one day)

Anyways, let the testing begin.

softcoder

  • MegaGlest Team
  • Battle Machine
  • ********
  • Posts: 2,239
    • View Profile
Re: gae 0.2.13-development
« Reply #57 on: 19 February 2010, 22:06:10 »
Aye carumba! There was a debug variable in the network message packet (totalB4this) which was doing a buffer overflow! Every command sent would count it up one.. after playing a while it overflows and seems to trash memory (and modify the contents of the network packet).

Removed it and we are retesting (had to do a few changes to get cancel and auto-repair fully working over network play buts its working well now)

New updates coming as we find things.

silnarm

  • Local Moderator
  • Behemoth
  • ********
  • Posts: 1,373
    • View Profile
Re: gae 0.2.13-development
« Reply #58 on: 20 February 2010, 03:27:17 »
Hehe, didn't clean up very well did I, it is a 32 bit integer though, you'd have to send a ridiculous amount of commands to get it to overflow... do you have the CircularBuffer disabled still? It could be the problem  ::)

I've just about finished merging the changes to date in 3.2.2_network back to 0.2.x ... I've had too many headaches in the past, so now I'm really paranoid about this, I had been updating the network branch with changes from 0.2.x, so it might've have been easier to do it that way again, but I think I've caused you enough problems there already

Anyway, I'll write tests for the CircularBuffer when I'm done with that, if you are currently using it, I'd be very suspicious ;)
Glest Advanced Engine - Code Monkey

Timeline | Downloads

Yggdrasil

  • Local Moderator
  • Ornithopter
  • ********
  • Posts: 408
    • View Profile
Re: gae 0.2.13-development
« Reply #59 on: 20 February 2010, 11:20:56 »
With the network branch merged back, i think we should make a beta release to get multiplayer and the other features more widely tested. As titi pointed out, there are not so many testers building from svn.

@softcoder: If you made any improvements in your working copy then please commit (now in branch 0.2.x).

wciow

  • Behemoth
  • *******
  • Posts: 968
    • View Profile
Re: gae 0.2.13-development
« Reply #60 on: 20 February 2010, 12:36:27 »
BTW is physfs going to be integrated in the next release?
Check out my new Goblin faction - https://forum.megaglest.org/index.php?topic=9658.0

silnarm

  • Local Moderator
  • Behemoth
  • ********
  • Posts: 1,373
    • View Profile
Re: gae 0.2.13-development
« Reply #61 on: 20 February 2010, 23:23:40 »
With the network branch merged back, i think we should make a beta release to get multiplayer and the other features more widely tested...

This is a probably a good idea, the set auto-repair button is currently crashing the game, I'll get that sorted out, and we need to modify the Lang object to be case insensitive (because of our ENUMS), neither of those things should take too long, so a release today should be doable.

BTW is physfs going to be integrated in the next release?

Alas, this will not be apart of 0.2.13, however I am confident the turn around time between releases should drop drastically now, so 0.3 shouldn't be more than a couple of months away.

Glest Advanced Engine - Code Monkey

Timeline | Downloads

softcoder

  • MegaGlest Team
  • Battle Machine
  • ********
  • Posts: 2,239
    • View Profile
Re: gae 0.2.13-development
« Reply #62 on: 21 February 2010, 03:52:35 »
Ok I have posted my work in a 7zip archive at:

http://www.soft-haus.com/glest/code/gae_3.2.2_network_source.7z

I am fearful at trying to attempt to merge this into the 0.2.x branch (since the differences between 0.2.x and 3.2.2_network alone are quite significant let alone my changes on top). If any of the GAE team has a desire to snag this archive and merge locally on your system (then code review.. the code isn't "clean", its more an attempt to be functional and does not represent what level I usually would do before checking into svn)

EDIT:

*NOTE: After spending the past week trying to get stable play I was not able to. This code does let multi-player work (For a long time) but there appears to be a bug after playing for a while (I use Elimnator and Tiger.. sometimes Piggy as my QA :) ) where one end of the network cannot find a unitid sent by the other end! I saw this behaviour over over again all week and I cannot see why? I also don't know if this is an old bug or a new one introduced by my changes. I know that fundamentally my changes work 100% in the mega-glest code, but porting it over to GAE wasn't so easy since I was trying to port and make changes to some game-play in GAE so that I could test the network changes.

I have painstakingly checked over and over and verified that there is a unitid on one client that does NOT exist on the other client. I don't know why or how but I know it does happen in this code, so beware! Aside from that the network plays fine (smooth) and auto-repair and cancel work fine.

Thanks
« Last Edit: 21 February 2010, 05:17:29 by softcoder »

Super Tom

  • Draco Rider
  • *****
  • Posts: 311
    • View Profile
Re: gae 0.2.13-development
« Reply #63 on: 21 February 2010, 05:25:32 »
Network instability problem is to solve the problem??? Please tell me! Thank you!

softcoder

  • MegaGlest Team
  • Battle Machine
  • ********
  • Posts: 2,239
    • View Profile
Re: gae 0.2.13-development
« Reply #64 on: 21 February 2010, 05:49:00 »
People of glest... sick him!

silnarm

  • Local Moderator
  • Behemoth
  • ********
  • Posts: 1,373
    • View Profile
Re: gae 0.2.13-development
« Reply #65 on: 21 February 2010, 06:04:25 »
Ok I have posted my work in a 7zip archive at:

Thanks softcoder, I'll take a look tonight and see if I can get it merged into 0.2.x

Quote
I am fearful at trying to attempt to merge this into the 0.2.x branch (since the differences between 0.2.x and 3.2.2_network alone are quite significant let alone my changes on top).

It's actually merged back now, so they should look very similar, but I've got your archive now, so you can leave it with me and take a well deserved break :)

Quote
*NOTE: After spending the past week trying to get stable play I was not able to. This code does let multi-player work (For a long time) but there appears to be a bug after playing for a while (I use Elimnator and Tiger.. sometimes Piggy as my QA :) ) where one end of the network cannot find a unitid sent by the other end! I saw this behaviour over over again all week and I cannot see why?

More synchronisation problems  :'(

Thanks again for all your work and testing, I just hope something obvious jumps out when reviewing your changes, if the synchronisation problem doesn't manifest itself until late in the game, it could be very difficult indeed to track down  :-\
Glest Advanced Engine - Code Monkey

Timeline | Downloads

softcoder

  • MegaGlest Team
  • Battle Machine
  • ********
  • Posts: 2,239
    • View Profile
Re: gae 0.2.13-development
« Reply #66 on: 21 February 2010, 14:15:15 »
Just a note on the synch problem. I don't think it is a "synch" problem like what we saw in the past with network packet synch issues. Somehow I think this is a newly introduced error in GAE only (but I could be wrong).

My 2 theories are:

#1: Somewhere the unit id is incorrectly calculated or assigned so that one client uses the wrong id but the other client expects the right id and therefore at the point in time where the packet is sent you get a "out of synch" problem. (I think it is likely something of this nature since I cannot reproduce any such behaviour in mega-glest)

#2: It could be that there is a problem that has been in glest multi-player all along. Perhaps when sending a "group" of network commands (commandlist) that one command "creates" a unit and another command in that same batch "moves" or attacks using that same unit. Perhaps somewhere in the glest logic it does validation when receiving this "batch" of commands and thinks that it can't find the unit ID because it hasn't created it yet because its in a batch of commands in a loop?

Those are my thoughts for now. Thanks.
« Last Edit: 21 February 2010, 14:17:56 by softcoder »