It appears that right now all MG headless servers are missing again. I checked the vps and everything seems to be running as usual. I'm at work right now so i cant do any further tests but maybe some of you could check if its possible to connect directly to one of the engineers.
78.47.225.60:61357/61457.
I can confirm this. I just tried to connect the game to 78.47.225.60:61357 and, after some seconds, got bumped. Same on summoner. netcat to all three servers returns the MegaGlest protocol greeting quickly. So this situation is the same as before. Summoner is still logging to debug log (now at 6.8 GB) as well as to standard output (server.log).
And the backtrace on the running process looks the same as before again:
megaglest@rms:~/megaglest$ gdb -q -n -ex bt -batch megaglest `pidof megaglest`
warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386
warning: Architecture rejected target-supplied description
[Thread debugging using libthread_db enabled]
[New Thread 0x2ba1fa510700 (LWP 5863)]
[New Thread 0x2ba1f9d0f700 (LWP 5862)]
[New Thread 0x2ba1f950e700 (LWP 5861)]
[New Thread 0x2ba1f8d0d700 (LWP 5860)]
[New Thread 0x2ba1f850c700 (LWP 5859)]
[New Thread 0x2ba1f7d0b700 (LWP 5858)]
[New Thread 0x2ba1f6af2700 (LWP 5856)]
[New Thread 0x2ba1f62aa700 (LWP 5855)]
[New Thread 0x2ba1f4ea0700 (LWP 5853)]
Can't read symbols from system-supplied DSO at 0x7fff93fe5000: File truncated
0x00002ba1ee1004bd in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
#0 0x00002ba1ee1004bd in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00002ba1edeb19eb in SDL_Delay () from /usr/lib/libSDL-1.2.so.0
#2 0x00000000006f1038 in Glest::Game::ProgramState::canRender (this=0x11c17d20, sleepIfCannotRender=true) at /home/softcoder/Code/megaglest/branches/release-3.6.0.3/source/glest_game/main/program.cpp:84
#3 0x00000000007adb8f in Glest::Game::MainMenu::render (this=0x11c17d20) at /home/softcoder/Code/megaglest/branches/release-3.6.0.3/source/glest_game/menu/main_menu.cpp:103
#4 0x00000000006f1a02 in Glest::Game::Program::loopWorker (this=0x11b75290) at /home/softcoder/Code/megaglest/branches/release-3.6.0.3/source/glest_game/main/program.cpp:360
#5 0x00000000006e8cf3 in Glest::Game::glestMain (argc=<value optimized out>, argv=<value optimized out>) at /home/softcoder/Code/megaglest/branches/release-3.6.0.3/source/glest_game/main/main.cpp:3628
#6 0x00000000006eb323 in Glest::Game::glestMainWrapper (argc=5, argv=0x7fff93f5a628) at /home/softcoder/Code/megaglest/branches/release-3.6.0.3/source/glest_game/main/main.cpp:3784
#7 0x00002ba1f0ae8eff in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x00000000005bffb9 in _start () at ../sysdeps/x86_64/elf/start.S:113
TCP connections to the first FTP port, 61358, are accepted but immediately dropped. To all other FTP ports they are rejected. I believe this is standard procedure as long as you have not 'authenticated' via the game protocol.
However, if you use the game (client) to connect to the game server, and also connect via FTP at the same time, you can actually get an FTP server greeting (
220 Hi), authenticate via FTP and browse directories until the connection on the game protocol gets dropped.
Strangely the Annex server (which is hosted on the same vps) is still available on the annex master list (http://annex.megaglest.org/)
That's true. We need to keep in mind, though, that the MegaGlest servers are way more used than the Annex one. I assume this situation we see here is triggered by some client with sketchy Internet access, causing packets with broken checksums to be received, or in an order not expected by the game (due to one packet being retransmitted but not others), or with a delay not expected by the game. We will need to wait for Softcoder to have a chance to look at the logs.
Edit:
@tomreyn: if possible could you please check the last announcement times again?
The last announcement received from engineer.megaglest.org with externalconnectport=61357 was at [23/Mar/2012:03:36:35 +0000].
The last announcement received from engineer.megaglest.org with externalconnectport=61457 was at [23/Mar/2012:03:35:54 +0000].
The last announcement received from summoner.megaglest.org with externalconnectport=61357 was at [22/Mar/2012:18:46:48 +0000].
Edit2:
Just remembered that afaik the restart script treba wrote some time ago is still active for the annex server so it's no wonder that that one is still up.
OK, that's a good explanation, too. Feel free to reactivate it on the engineer MegaGlest gameservers for now. I think we have all the information we could get about this situation for now, it won't be more interesting to recreate this situation until we have an attempted fix (which, I'm afraid, unless we also make a new release, will need to be based on 3.6.0.3 to make it useful for immediate testing, since we do not yet know what exactly triggers this situation).
Edit: I'be just restarted summoner so people have something to play on. I've also started a tcpdump listening in on TCP port 61357 only, maybe analyzing this later will provide more information on what's going wrong.