Author Topic: [fixed] 3.6.0.3: Server stops accepting network players, announcing to MS  (Read 2279 times)

tomreyn

  • Local Moderator
  • Airship
  • ********
  • Posts: 2,764
    • View Profile
    • MegaGlest - the free and open source cross platform 3D real-time strategy game
This is again about summoner, the server I've reported about previously, which is a sometimes rather stressed VM.
This time it did not crash, but, like a couple times before (I just never reported this, not knowing where to start), it stopped accepting inbound connections (even attaempts to connect directly by specifying its IP address failed after some 10 or so seconds) and failed getting/keeping listed on the masterserver.

I'm not sure why this happened and hat exactly triggers this. The server log indicates that there were some peek errors before it stopped working. I also ran gdb, attaching to the process to get a backtrace, deattaching, and repeating, several times, and always got the same trace.
« Last Edit: 12 November 2012, 00:53:43 by tomreyn »
atibox: Ryzen 1800X (8 cores @3.6GHz), 32 GB RAM, MSI Radeon RX 580 Gaming X 8G, PCI subsystem ID [1462:3417], (Radeon RX 580 chipset, POLARIS10) @3440x1440; latest stable Ubuntu release, (open source) radeon (amdgpu) / mesa video driver
atibox (old): Core2Quad Q9400 (4 cores @2.66GHz), 8 GB RAM, XFX HD-467X-DDF2, PCI subsystem ID [1682:2931], (Radeon HD 4670, RV730 XT) @1680x1050; latest stable Ubuntu release, (open source) radeon / mesa video driver
notebook: HP envy13d020ng
internet access: VDSL2+

· · · How YOU can contribute to MG · Latest development snapshot · How to build yourself · Megapack techtree · Currently hosted MG games · · ·

softcoder

  • MegaGlest Team
  • Battle Machine
  • ********
  • Posts: 2,239
    • View Profile
Re: 3.6.0.3: Server stops to accept connections
« Reply #1 on: 21 March 2012, 05:13:06 »
If possible i would like to see:

Code: [Select]
debugNetwork=true
on this server to hopefully log network information. Perhaps it will give more detail.

x211321

  • Guest
Re: 3.6.0.3: Server stops to accept connections
« Reply #2 on: 21 March 2012, 07:27:34 »
This also happened to engineer 1 and 2 yesterday, they were not listed on the master all day long untill i restarted them. Now they are doing fine again. I didnt do any further tests because of lack of time but in generall all headless servers seemd to be missing when i checked the master list.

tomreyn

  • Local Moderator
  • Airship
  • ********
  • Posts: 2,764
    • View Profile
    • MegaGlest - the free and open source cross platform 3D real-time strategy game
Re: 3.6.0.3: Server stops to accept connections
« Reply #3 on: 21 March 2012, 10:45:07 »
Hmm this sounds more like a network outage at Hetzner then - it's been this way before. None has been documented, though. Still this should not stop inbound connections to summoner.

I'll enable the network debug log on summoner but it can take some days or a week until this situation occurs again (but then it's fine, since the logfile is overwritten on the next run). Maybe we can catch it next time. I'm not sure whether it matters, but IF it does then the INI option should be spelled in camel case:
Code: [Select]
DebugNetwork=true
For what it's worth, I've looked up the announcements by the three gameservers on the masterserver logs:

The last announcement received by the masterserver from the Summoner server was at [20/Mar/2012:16:20:05 +0000]. After this gap, the next announcement received was at [21/Mar/2012:02:41:52 +0000].
The last announcement received by the masterserver from the Engineer server listening on port 61357 was at [20/Mar/2012:03:59:46 +0000]. After this gap, the next announcement received was at  [20/Mar/2012:19:12:39 +0000].
The last announcement received by the masterserver from the Engineer server listening on port 61457 was at [20/Mar/2012:03:57:53 +0000]. After this gap, the next announcement received was at [20/Mar/2012:19:12:40 +0000].

So the times when these servers' announcements stopped being received by the masterserver differ quite a bit, probably enough to rule out a relation between these events, such as a network outage on the masterserver side.

x211321: Two questions for you:
  • When the engineer server stopped announcing to the master server, did you also try to connect to it by running something like this? ./start_megaglest --connecthost=78.47.225.60
  • The times of the last received announcements before the gap on the engineer servers is close to 04:00 AM - cronjob time?
« Last Edit: 21 March 2012, 11:39:47 by tomreyn »
atibox: Ryzen 1800X (8 cores @3.6GHz), 32 GB RAM, MSI Radeon RX 580 Gaming X 8G, PCI subsystem ID [1462:3417], (Radeon RX 580 chipset, POLARIS10) @3440x1440; latest stable Ubuntu release, (open source) radeon (amdgpu) / mesa video driver
atibox (old): Core2Quad Q9400 (4 cores @2.66GHz), 8 GB RAM, XFX HD-467X-DDF2, PCI subsystem ID [1682:2931], (Radeon HD 4670, RV730 XT) @1680x1050; latest stable Ubuntu release, (open source) radeon / mesa video driver
notebook: HP envy13d020ng
internet access: VDSL2+

· · · How YOU can contribute to MG · Latest development snapshot · How to build yourself · Megapack techtree · Currently hosted MG games · · ·

tomreyn

  • Local Moderator
  • Airship
  • ********
  • Posts: 2,764
    • View Profile
    • MegaGlest - the free and open source cross platform 3D real-time strategy game
Re: 3.6.0.3: Server stops to accept connections
« Reply #4 on: 21 March 2012, 15:02:37 »
The exact same situation has occurred again today with summoner: the masterserver no longer receives its announcements and directly connecting a MegaGlest client to the IP address of the server by running ./start_megaglest --connecthost=173.0.51.246 fails after some seconds (during which you only get to see the background scene, and when it fails you're bumped to the LAN screen). Setting up a TCP connection to the server works, however:
Code: [Select]
$ nc -vv 173.0.51.246 61357 | hd
Connection to 173.0.51.246 61357 port [tcp/*] succeeded!
00000000  01 2a 41 00 00 76 33 2e  36 2e 30 2e 33 2d 47 4e  |.*A..v3.6.0.3-GN|
00000010  55 43 3a 20 34 30 34 30  31 20 5b 36 34 62 69 74  |UC: 40401 [64bit|
00000020  5d 2d 52 65 76 3a 20 33  30 38 30 00 00 00 00 00  |]-Rev: 3080.....|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000080  00 00 00 00 00 72 6d 73  00 00 00 00 00 00 00 00  |.....rms........|
00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000a0  00 00 00 00 00 03 00 01  00 00 00 00 ae ef 00 00  |................|
000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000ec
The connection remained open at this point and I pressed Ctrl-D to end it.

Obviously the process is still running on summoner:
Code: [Select]
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
10xx     28xxx  1.5  1.0 512940 10652 ?        Sl   15:03   2:15 ./megaglest --verbose --ini-path=./ --data-path=./ --headless-server-mode=vps,exit
(I've redacted PID and UID.)

The servers' timezone is MSK, which seems to be UTC +0400. The clock there is 1.5 minutes late now, apparently it's not really synched to a timeserver (even though ntpd is installed).

And this time we have, in addition to the verbose terminal log, also (network) debug logs (XZ, not TAR, compressed file sized 3.5 MB, expands to 1.1 GB). The last lines of the terminal log (peek error) make me think that when I connected to the server (the clientIP listed there was mine then) the problem which occurred there was related to the FTP server.

For what it's worth, the file system has not run full, there is plenty of space left. It clearly seems like an issue with either the code or the servers' virtualization layer/hardware.

Code: [Select]
megaglest@rms:~$ uname -a
Linux rms 2.6.18-274.7.1.el5.028stab095.1 #1 SMP Mon Oct 24 20:49:24 MSD 2011 x86_64 x86_64 x86_64 GNU/Linux

This is an Ubuntu 11.04 x86_64 system running in an OpenVZ container with a current Redhat Enterprise Linux 5 based kernel image (which is based on a stone age Linux version).
atibox: Ryzen 1800X (8 cores @3.6GHz), 32 GB RAM, MSI Radeon RX 580 Gaming X 8G, PCI subsystem ID [1462:3417], (Radeon RX 580 chipset, POLARIS10) @3440x1440; latest stable Ubuntu release, (open source) radeon (amdgpu) / mesa video driver
atibox (old): Core2Quad Q9400 (4 cores @2.66GHz), 8 GB RAM, XFX HD-467X-DDF2, PCI subsystem ID [1682:2931], (Radeon HD 4670, RV730 XT) @1680x1050; latest stable Ubuntu release, (open source) radeon / mesa video driver
notebook: HP envy13d020ng
internet access: VDSL2+

· · · How YOU can contribute to MG · Latest development snapshot · How to build yourself · Megapack techtree · Currently hosted MG games · · ·

softcoder

  • MegaGlest Team
  • Battle Machine
  • ********
  • Posts: 2,239
    • View Profile
Re: 3.6.0.3: Server stops to accept connections
« Reply #5 on: 21 March 2012, 15:06:55 »
I'll look into the logs and see if i can find anything. The fact that you can connect (even from the game client) it good to know. After connection the server sends an authentication packet back, so if the client does not get it within about 10 seconds it hangs up. Makes me think we have an issue with the tcp-ip layer perhaps relating to sending packets.

tomreyn

  • Local Moderator
  • Airship
  • ********
  • Posts: 2,764
    • View Profile
    • MegaGlest - the free and open source cross platform 3D real-time strategy game
Re: 3.6.0.3: Server stops to accept connections
« Reply #6 on: 21 March 2012, 15:21:09 »
Just to ensure we're talking about the same thing: the game client can connect to the server in terms of setting up a TCP connection and does not get booted immediately, that's correct. And as the netcat output shows, the server does send a greeting. But the client never reaches the point where it is fully logged into the server so it never displays the server screen (the one with the list of connected players, the game setup, the map preview etc.). But this is probably the result of the authentication failing, which is what you suggest may be happening. So we're probably talking about the same thing in the end.

Edit: fixed spelling (sorry)
« Last Edit: 21 March 2012, 19:10:44 by tomreyn »
atibox: Ryzen 1800X (8 cores @3.6GHz), 32 GB RAM, MSI Radeon RX 580 Gaming X 8G, PCI subsystem ID [1462:3417], (Radeon RX 580 chipset, POLARIS10) @3440x1440; latest stable Ubuntu release, (open source) radeon (amdgpu) / mesa video driver
atibox (old): Core2Quad Q9400 (4 cores @2.66GHz), 8 GB RAM, XFX HD-467X-DDF2, PCI subsystem ID [1682:2931], (Radeon HD 4670, RV730 XT) @1680x1050; latest stable Ubuntu release, (open source) radeon / mesa video driver
notebook: HP envy13d020ng
internet access: VDSL2+

· · · How YOU can contribute to MG · Latest development snapshot · How to build yourself · Megapack techtree · Currently hosted MG games · · ·

x211321

  • Guest
Re: 3.6.0.3: Server stops to accept connections
« Reply #7 on: 21 March 2012, 18:18:52 »
    x211321: Two questions for you:
    • When the engineer server stopped announcing to the master server, did you also try to connect to it by running something like this? ./start_megaglest --connecthost=78.47.225.60
    No I didn't had much time then, just wanted them up again, so all I did was a restart.
    Quote
    • The times of the last received announcements before the gap on the engineer servers is close to 04:00 AM - cronjob time?
    Nope, there is no cronjob by that time. At least none that would explain such behaviour.

    Though usualy treba is the one in charge of the MG servers so i might be missing something here, don't have that much of an overview.

    tomreyn

    • Local Moderator
    • Airship
    • ********
    • Posts: 2,764
      • View Profile
      • MegaGlest - the free and open source cross platform 3D real-time strategy game
    It just happened again (on summoner).

    Again we've got the verbose terminal log as well as the debug logs (both XZ, not TAR, compressed, and the debug log again expands to something ridiculously big).

    I also did the gdb trick again and the backtrace came out exactly the same (same files, line numbers) as seen above.
    atibox: Ryzen 1800X (8 cores @3.6GHz), 32 GB RAM, MSI Radeon RX 580 Gaming X 8G, PCI subsystem ID [1462:3417], (Radeon RX 580 chipset, POLARIS10) @3440x1440; latest stable Ubuntu release, (open source) radeon (amdgpu) / mesa video driver
    atibox (old): Core2Quad Q9400 (4 cores @2.66GHz), 8 GB RAM, XFX HD-467X-DDF2, PCI subsystem ID [1682:2931], (Radeon HD 4670, RV730 XT) @1680x1050; latest stable Ubuntu release, (open source) radeon / mesa video driver
    notebook: HP envy13d020ng
    internet access: VDSL2+

    · · · How YOU can contribute to MG · Latest development snapshot · How to build yourself · Megapack techtree · Currently hosted MG games · · ·

    x211321

    • Guest
    It appears that right now all MG headless servers are missing again. I checked the vps and everything seems to be running as usual. I'm at work right now so i cant do any further tests but maybe some of you could check if its possible to connect directly to one of the engineers.

    78.47.225.60:61357/61457.

    Strangely the Annex server (which is hosted on the same vps) is still available on the annex master list (http://annex.megaglest.org/)

    Edit:
    @tomreyn: if possible could you please check the last announcement times again?

    Edit2:
    Just remembered that afaik the restart script treba wrote some time ago is still active for the annex server so it's no wonder that that one is still up.
    « Last Edit: 23 March 2012, 10:31:29 by x211321 »

    tomreyn

    • Local Moderator
    • Airship
    • ********
    • Posts: 2,764
      • View Profile
      • MegaGlest - the free and open source cross platform 3D real-time strategy game
    It appears that right now all MG headless servers are missing again. I checked the vps and everything seems to be running as usual. I'm at work right now so i cant do any further tests but maybe some of you could check if its possible to connect directly to one of the engineers.

    78.47.225.60:61357/61457.

    I can confirm this. I just tried to connect the game to 78.47.225.60:61357 and, after some seconds, got bumped. Same on summoner. netcat to all three servers returns the MegaGlest protocol greeting quickly. So this situation is the same as before. Summoner is still logging to debug log (now at 6.8 GB) as well as to standard output (server.log).

    And the backtrace on the running process looks the same as before again:
    Code: [Select]
    megaglest@rms:~/megaglest$ gdb -q -n -ex bt -batch megaglest `pidof megaglest`

    warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386

    warning: Architecture rejected target-supplied description
    [Thread debugging using libthread_db enabled]
    [New Thread 0x2ba1fa510700 (LWP 5863)]
    [New Thread 0x2ba1f9d0f700 (LWP 5862)]
    [New Thread 0x2ba1f950e700 (LWP 5861)]
    [New Thread 0x2ba1f8d0d700 (LWP 5860)]
    [New Thread 0x2ba1f850c700 (LWP 5859)]
    [New Thread 0x2ba1f7d0b700 (LWP 5858)]
    [New Thread 0x2ba1f6af2700 (LWP 5856)]
    [New Thread 0x2ba1f62aa700 (LWP 5855)]
    [New Thread 0x2ba1f4ea0700 (LWP 5853)]
    Can't read symbols from system-supplied DSO at 0x7fff93fe5000: File truncated
    0x00002ba1ee1004bd in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
    #0  0x00002ba1ee1004bd in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
    #1  0x00002ba1edeb19eb in SDL_Delay () from /usr/lib/libSDL-1.2.so.0
    #2  0x00000000006f1038 in Glest::Game::ProgramState::canRender (this=0x11c17d20, sleepIfCannotRender=true) at /home/softcoder/Code/megaglest/branches/release-3.6.0.3/source/glest_game/main/program.cpp:84
    #3  0x00000000007adb8f in Glest::Game::MainMenu::render (this=0x11c17d20) at /home/softcoder/Code/megaglest/branches/release-3.6.0.3/source/glest_game/menu/main_menu.cpp:103
    #4  0x00000000006f1a02 in Glest::Game::Program::loopWorker (this=0x11b75290) at /home/softcoder/Code/megaglest/branches/release-3.6.0.3/source/glest_game/main/program.cpp:360
    #5  0x00000000006e8cf3 in Glest::Game::glestMain (argc=<value optimized out>, argv=<value optimized out>) at /home/softcoder/Code/megaglest/branches/release-3.6.0.3/source/glest_game/main/main.cpp:3628
    #6  0x00000000006eb323 in Glest::Game::glestMainWrapper (argc=5, argv=0x7fff93f5a628) at /home/softcoder/Code/megaglest/branches/release-3.6.0.3/source/glest_game/main/main.cpp:3784
    #7  0x00002ba1f0ae8eff in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
    #8  0x00000000005bffb9 in _start () at ../sysdeps/x86_64/elf/start.S:113

    TCP connections to the first FTP port, 61358, are accepted but immediately dropped. To all other FTP ports they are rejected. I believe this is standard procedure as long as you have not 'authenticated' via the game protocol.
    However, if you use the game (client) to connect to the game server, and also connect via FTP at the same time, you can actually get an FTP server greeting (220 Hi), authenticate via FTP and browse directories until the connection on the game protocol gets dropped.

    Strangely the Annex server (which is hosted on the same vps) is still available on the annex master list (http://annex.megaglest.org/)

    That's true. We need to keep in mind, though, that the MegaGlest servers are way more used than the Annex one. I assume this situation we see here is triggered by some client with sketchy Internet access, causing packets with broken checksums to be received, or in an order not expected by the game (due to one packet being retransmitted but not others), or with a delay not expected by the game. We will need to wait for Softcoder to have a chance to look at the logs.

    Edit:
    @tomreyn: if possible could you please check the last announcement times again?

    The last announcement received from engineer.megaglest.org with externalconnectport=61357 was at [23/Mar/2012:03:36:35 +0000].
    The last announcement received from engineer.megaglest.org with externalconnectport=61457 was at [23/Mar/2012:03:35:54 +0000].
    The last announcement received from summoner.megaglest.org with externalconnectport=61357 was at [22/Mar/2012:18:46:48 +0000].

    Edit2:
    Just remembered that afaik the restart script treba wrote some time ago is still active for the annex server so it's no wonder that that one is still up.

    OK, that's a good explanation, too. Feel free to reactivate it on the engineer MegaGlest gameservers for now. I think we have all the information we could get about this situation for now, it won't be more interesting to recreate this situation until we have an attempted fix (which, I'm afraid, unless we also make a new release, will need to be based on 3.6.0.3 to make it useful for immediate testing, since we do not yet know what exactly triggers this situation).

    Edit: I'be just restarted summoner so people have something to play on. I've also started a tcpdump listening in on TCP port 61357 only, maybe analyzing this later will provide more information on what's going wrong.
    « Last Edit: 23 March 2012, 12:48:10 by tomreyn »
    atibox: Ryzen 1800X (8 cores @3.6GHz), 32 GB RAM, MSI Radeon RX 580 Gaming X 8G, PCI subsystem ID [1462:3417], (Radeon RX 580 chipset, POLARIS10) @3440x1440; latest stable Ubuntu release, (open source) radeon (amdgpu) / mesa video driver
    atibox (old): Core2Quad Q9400 (4 cores @2.66GHz), 8 GB RAM, XFX HD-467X-DDF2, PCI subsystem ID [1682:2931], (Radeon HD 4670, RV730 XT) @1680x1050; latest stable Ubuntu release, (open source) radeon / mesa video driver
    notebook: HP envy13d020ng
    internet access: VDSL2+

    · · · How YOU can contribute to MG · Latest development snapshot · How to build yourself · Megapack techtree · Currently hosted MG games · · ·

    softcoder

    • MegaGlest Team
    • Battle Machine
    • ********
    • Posts: 2,239
      • View Profile
    Do we have this problem on svn head? Can someone test this as there have been a number of changes around server stability and sockets which may have resolved this?

    tomreyn

    • Local Moderator
    • Airship
    • ********
    • Posts: 2,764
      • View Profile
      • MegaGlest - the free and open source cross platform 3D real-time strategy game
    We have done ~ 0 testing of post 3.6.0.3 headless servers so far, this is really something which needs way more testing, and so I'll tag this as [testing]. That's more as a general reminder to test headless servers, since the server this report was about is now gone (and the log files saved there are lost).

    Edit (tomreyn): fixed spelling
    « Last Edit: 29 July 2012, 11:46:55 by tomreyn »
    atibox: Ryzen 1800X (8 cores @3.6GHz), 32 GB RAM, MSI Radeon RX 580 Gaming X 8G, PCI subsystem ID [1462:3417], (Radeon RX 580 chipset, POLARIS10) @3440x1440; latest stable Ubuntu release, (open source) radeon (amdgpu) / mesa video driver
    atibox (old): Core2Quad Q9400 (4 cores @2.66GHz), 8 GB RAM, XFX HD-467X-DDF2, PCI subsystem ID [1682:2931], (Radeon HD 4670, RV730 XT) @1680x1050; latest stable Ubuntu release, (open source) radeon / mesa video driver
    notebook: HP envy13d020ng
    internet access: VDSL2+

    · · · How YOU can contribute to MG · Latest development snapshot · How to build yourself · Megapack techtree · Currently hosted MG games · · ·

    tomreyn

    • Local Moderator
    • Airship
    • ********
    • Posts: 2,764
      • View Profile
      • MegaGlest - the free and open source cross platform 3D real-time strategy game
    Just a little update: I've done a little bit of headless testing, but that's not comparable to having a headless server which various people (with different network connectivities) actually connect to and play games on.

    I have loose plans to organise a sneak pre-release multi-player session soon, where all (Windows and Linux) players would download a readily built game we would provide (a snapshot build of a fixed revision) and play on headless servers running the same builds. This should both allow for some useful feedback as well as allow for some proper testing before a release.
    atibox: Ryzen 1800X (8 cores @3.6GHz), 32 GB RAM, MSI Radeon RX 580 Gaming X 8G, PCI subsystem ID [1462:3417], (Radeon RX 580 chipset, POLARIS10) @3440x1440; latest stable Ubuntu release, (open source) radeon (amdgpu) / mesa video driver
    atibox (old): Core2Quad Q9400 (4 cores @2.66GHz), 8 GB RAM, XFX HD-467X-DDF2, PCI subsystem ID [1682:2931], (Radeon HD 4670, RV730 XT) @1680x1050; latest stable Ubuntu release, (open source) radeon / mesa video driver
    notebook: HP envy13d020ng
    internet access: VDSL2+

    · · · How YOU can contribute to MG · Latest development snapshot · How to build yourself · Megapack techtree · Currently hosted MG games · · ·

    tomreyn

    • Local Moderator
    • Airship
    • ********
    • Posts: 2,764
      • View Profile
      • MegaGlest - the free and open source cross platform 3D real-time strategy game
    While it wasn't as organized, we've played various games including on headless servers now. Maybe not enough to have triggered this issue, but this is hard to test pre-release. So unless we learn differently, let's consider this fixed.

    Update: Actually I tried to simulate the situation where the masterserver becomes unavailable temporarily (using tcpkill -9 host `dig master.megaglest.org +short` and 127.0.0.1 master.megaglest.org in /etc/hosts) and, while it got delisted on the masterserver after a while, once communication was re-established the server got re-added to the server list. So this would seem to work reliable now. :)
    « Last Edit: 14 November 2012, 07:13:15 by tomreyn »
    atibox: Ryzen 1800X (8 cores @3.6GHz), 32 GB RAM, MSI Radeon RX 580 Gaming X 8G, PCI subsystem ID [1462:3417], (Radeon RX 580 chipset, POLARIS10) @3440x1440; latest stable Ubuntu release, (open source) radeon (amdgpu) / mesa video driver
    atibox (old): Core2Quad Q9400 (4 cores @2.66GHz), 8 GB RAM, XFX HD-467X-DDF2, PCI subsystem ID [1682:2931], (Radeon HD 4670, RV730 XT) @1680x1050; latest stable Ubuntu release, (open source) radeon / mesa video driver
    notebook: HP envy13d020ng
    internet access: VDSL2+

    · · · How YOU can contribute to MG · Latest development snapshot · How to build yourself · Megapack techtree · Currently hosted MG games · · ·

     

    anything