Time Nick Message 00:08 kahrl does anyone else get weird crosses above all dry shrubs when fsaa >= 2? http://i.imgur.com/zvhhjwf.png 00:08 kahrl (this screenshot is with fsaa = 16, but it happens with anything but fsaa = 1 or 0) 00:09 kahrl strangely enough it doesn't seem to happen with other plantlike nodes 00:13 kahrl this is with an ATI Radeon HD 6850 using the opensource ati driver on gentoo 00:14 VanessaE I saw that effect once before with something RBA was working on 00:14 VanessaE at the time I think it was the texture "wrapping through" the bottom of the mesh and back to the top 00:14 ShadowNinja kahrl: I can reproduce, also a line above the hand. 00:15 kahrl VanessaE: seems plausible 00:15 kahrl maybe it happens with other plantlike nodes too but I can't see it because it's too short 00:17 kahrl ShadowNinja: not getting that here 00:17 ShadowNinja kahrl: Got it with fsaa=64 or so. 00:18 kahrl ah, let me try 00:18 kahrl at fsaa=16 the extruded wield meshes are already all broken up through (lines between the pixels) 00:19 kahrl not happening with fsaa=64 either, but I think my card can't handle that setting 00:24 OldCoder I'd really appreciate help with the RE-SEND RELIABLE bug. This is a lockup that seems to be a fundamental issue. 00:24 OldCoder All obvious patches including one by Zeno` are failing 00:26 kahrl OldCoder: only sapier (and maybe c55) are likely to be able to help with that, I think 00:27 OldCoder It is regrettable; this bug is completely fatal. There is no workaround at all. 00:27 OldCoder I will need to shut down and do not wish to do so. I am looking at the code again. 00:27 OldCoder It should be a simple fix. 00:27 OldCoder But it escapes me. 00:28 kahrl OldCoder: issue #? 00:28 OldCoder It has been reported in the wild about 4 times but I don't know if it has an issue number yet 00:28 OldCoder Miner_48 did the research 00:28 OldCoder He forwarded a number of web pages which showed that other people have the issue 00:29 OldCoder Zeno` knows the code and may help when he wakes up 00:29 OldCoder He offered a patch yesterday but it had no effect 00:29 OldCoder I am experimenting with other patches now 00:29 OldCoder Essentially, connection.cpp locks up in the code that says RE-SEND RELIABLE 00:30 OldCoder Millions and millions of the lines are printed 00:30 kahrl writing an issue with all the relevant information might help sapier to get up to speed 00:30 OldCoder You are correct 00:30 OldCoder If Zeno` returns I will seek his advice regarding what to say; as he has now worked on the same code 00:32 VanessaE *looks at clock* Zeno should be here within the next hour or so 00:32 OldCoder Yes, we will see 00:32 OldCoder Now that I have started some new worlds I wish to keep them up; but am now on time IRL. I will speak again with him. 00:33 OldCoder Lockups are the worst type of issue. With crashes, at least you can restart. 00:37 kahrl I can think of something worse: map corruption :P 00:40 OldCoder It is correct 00:40 ShadowNinja OldCoder: You can write a scipt that checks the logs for those re-send messages, and restarts the server if it finds too many of them. 00:40 OldCoder ShadowNinja, the ultimate kludge :-) 00:40 ShadowNinja You'll have to pipe stdout/stderr to it. 00:40 OldCoder No 00:41 OldCoder The messages appear in debug.txt 00:41 OldCoder A daemon could watch there 00:41 OldCoder But the problem occurs often sometimes 00:41 OldCoder Restarting every few minutes is not viable 00:41 ShadowNinja OldCoder: I didn't say it wasn't hacky. ;-) 00:41 OldCoder Never mind hacky 00:41 OldCoder Is a game that shuts down every few minutes going to work? 00:41 OldCoder It appears that I have attracted a lot of mobile devices with one world 00:42 OldCoder The mobiles are confusing the code 00:42 OldCoder In short, Kindle equals Minedeath 00:42 ShadowNinja OldCoder: Yes, that works too, but I'd use the pipe so the high-verbosity messages are just kept in memory (debug logs can baloon to GBs in size with high enough log levels). 00:42 OldCoder Indeed. Not viable either way due to frequency of lockups. 00:43 OldCoder Sometimes hours, but sometimes every few minutes. This has been happening for about four days now. I have just made another experimental patch. 00:44 OldCoder This is probably a high-value issue to address, with the rise of mobiles in popularity 00:45 * kahrl tries https://gist.github.com/kahrl/9f28eca10f3d62c9bd63 right now to reproduce the problem maybe 00:47 ShadowNinja OldCoder: Is this on all of your servers? 00:48 OldCoder On two worlds so far. Ones that have recently become popular. 00:48 OldCoder You are familiar with one of them. 00:48 * OldCoder reviews the gist 00:49 OldCoder kahrl, that is interesting, what does the random disconnect do? 00:49 OldCoder random drops packets? clever 00:49 kahrl ok, doesn't seem to be enough to get it to lock up 00:49 OldCoder So it simply doesn't send them and they pile up. Will they be classified in the RE-SEND RELIABLE group? 00:50 OldCoder What constitutes a RELIABLE packet? 00:50 kahrl yeah it resends them but since the likelihood is still high that the resent ones arrive, it won't lock up 00:50 OldCoder I'm wondering if there is a bug elsewhere. If it helps, sequence numbers sometimes jump from 500 to 65000 00:50 OldCoder Would this be normal? 00:50 kahrl a reliable packet is one that one can't afford to drop 00:51 kahrl e.g. position updates can be dropped (they will be resent soon anyway), but chat messages can't 00:51 OldCoder Not RELIABLE then but ESSENTIAL ? 00:51 kahrl reliable = essential 00:51 OldCoder Got it, thanks 00:51 OldCoder So what might trigger millions of resends? 00:51 OldCoder If we look at it differently 00:52 kahrl a peer suddenly disappearing? 00:52 OldCoder Hm 00:52 OldCoder In this case 00:52 kahrl although it wouldn't be literally millions 00:52 OldCoder It *is* 00:52 OldCoder This is what has caused the lockups 00:52 OldCoder Millions of lines (literally) of RE-SEND RELIABLE 00:52 OldCoder Is this a clue? 00:52 kahrl over what time period? 00:53 OldCoder 1062624 of those lines today 00:53 OldCoder And that is *with* patches intended to limit them 00:53 OldCoder I suspect it was 5 million yesterday 00:55 kahrl perhaps the server fails to time out such peers for some odd reason 00:55 OldCoder kahrl, I'm guessing that something corrupts sequence numbers or queues 00:56 OldCoder The default code drops the packets, reliable or not, if the resend count exceeds 5 00:56 OldCoder Yet millions of lines are printed 00:56 OldCoder What might this imply? 00:56 OldCoder I think the resend count might be ending up as -50000 or something 00:56 OldCoder I'm adding a kludge to address this if so 00:57 * OldCoder rests briefly and thanks you 00:57 ShadowNinja OldCoder: Or reliable messages are continually being added to the queue. 00:57 OldCoder Hm 00:57 kahrl could you make a histogram of how many resends there are each minute? 00:57 OldCoder But what would add so many of them? 00:57 OldCoder Yes. If the problem occurs after the latest patch. 00:58 OldCoder Resetting the log file now 00:58 kahrl I wonder if it stays at about 12 re-sends per second all the time or if it suddenly piles up 00:58 OldCoder Let me look at yesterday's 00:59 OldCoder http://minetest.org/lockup.txt 00:59 OldCoder kahrl, if you are curious, kindly glance at this file ^ 00:59 ShadowNinja OldCoder: is your `if k->resend_count > 5) break;` patch before the print line? Also, does breaking cause the packet to be removed from the queue? 01:00 ShadowNinja OldCoder: You can try to add a DisconnectPeer line there instead. 01:00 OldCoder It was before the print line. And I don't think it caused removal. Zeno` produced a patch that was to prevent this but it also did not work. I tried the DisconnectPeer an hour ago. 01:00 OldCoder I think something is corrupted else. Well, it will become clear in time. 01:01 OldCoder Thank you for remarks. I might go over all of the patches tried for more ideas. 01:01 * OldCoder rests briefly 01:01 OldCoder Zzz 01:01 proller yes, connection.cpp corrupted 01:01 OldCoder Hm? Code is believed to need work? 01:02 OldCoder I will fiddle with it further in a few minutes after a nap 01:02 kahrl that was a very brief rest :P 01:02 ShadowNinja OldCoder: proller is a troller, ignore him. 01:02 OldCoder proller troller? 01:02 OldCoder "The wheels on the troll go round and round..." 01:02 ShadowNinja OldCoder: Yep, in fact "troller" is his alt nick. 01:02 OldCoder Um, O.K. 01:03 proller yes, but sometimes i can said too true things 01:03 OldCoder I thought C55 disliked, um, trolling in this channel 01:03 * OldCoder must rest for 5 to 10 minutes 01:04 proller reverting connection.cpp to year ago state can help 01:04 VanessaE /kick proller stop trolling 01:04 VanessaE :P 01:04 ShadowNinja OC: I told him, he agreed to quiet him but said that he wouldn't bother doing it himself. 01:04 proller but i'm trying to help 01:05 VanessaE proller: reverting to code that's 10x slower and far less reliable is no solution and you KNOW it. 01:05 VanessaE strike that, more like 100x slower 01:05 proller old code was tunable to sppeds higher than now 01:05 proller and 100x stable 01:06 VanessaE then how about you offer up a patch that actually fixes the problems you perceive. 01:06 ShadowNinja Seems like no ops are arround now, other than possible kwolekr. 01:07 ShadowNinja I've PMd sfan though. 01:07 proller ShadowNinja, you need to change some spaces around 01:12 OldCoder J, 01:12 OldCoder Hm 01:12 OldCoder What if I put 01:12 OldCoder a few milliseconds delay in the loop? 01:13 OldCoder Maybe the resends are too fast 01:13 kahrl it'll probably lock up even faster 01:14 kahrl but I guess you could try 01:18 OldCoder Hm 01:18 OldCoder Even faster? 01:18 * OldCoder does not think that sounds desirable 01:22 OldCoder kahrl, it appears to be 100s of resends per second if that is what you were curious about 01:22 OldCoder But never on every packet 01:23 OldCoder Perhaps dozens instead of 100s 01:32 OldCoder ShadowNinja, disconnectPeer, forceTimeout, or both? 01:34 ShadowNinja OldCoder: forceTimeout sounds like the right function to use. 01:36 OldCoder Experimenting 01:37 OldCoder Didn't work before but still playing with it 01:37 OldCoder Added a 20ms delay on resend 01:37 OldCoder We'll see if that makes it worse 01:47 OldCoder Hm. The 20ms delay may have helped or may be coincidence. 16:01 OldCoder A problem world stayed up overnight and so did my client. I may have a patch for the RE-SEND RELIABLE problem. 16:01 OldCoder Sokomine, VanessaE, sfan5, ShadowBot, celeron55 ^ 16:02 sfan5 I'd suggest pasting the patch somewhere :) 16:02 OldCoder Of course. But testing is required and I'll also ask you or others 1 or 2 key questions. 16:03 OldCoder Wished to indicate progress on an intractable problem. 16:03 OldCoder Will tweak it further and then post. 16:04 OldCoder sfan5, just 1 question for now. What are negative consequences of increasing minimum resend timeout from 0.1 to 0.5 ? 16:04 OldCoder A question for anybody else as well ^ 16:05 sfan5 OldCoder: a client will experience even more delay if a packet gets lost 16:05 sfan5 s/client/socket/ 16:05 OldCoder sfan5, without the increase, the game locks up 16:06 OldCoder Not the client, but the entire game, it appears 16:06 sfan5 like not being able to look around? 16:06 OldCoder The world goes dead entirely; a lockup 16:07 OldCoder For everybody 16:07 sfan5 hm 16:07 OldCoder I have spent about 5 days on this 16:07 sfan5 someone should debug that 16:07 OldCoder <- did 16:07 OldCoder But no explanation 16:07 sfan5 ..by looking at the traces of all threads when the server is locked up 16:07 OldCoder It continued to run 16:07 OldCoder But was busy with millions of RE-SEND RELIABLs 16:08 OldCoder Literally, millions of them 16:08 OldCoder My guess is corruption somewhere. Zeno` gave me a patch to force timeouts but it was insufficient. 16:08 sfan5 maybe someone can mistake the error in your patch 16:08 sfan5 wat 16:08 OldCoder He didn't make a mistake 16:08 sfan5 maybe someone can find the mistake in the patch* 16:08 OldCoder I said, millions 16:09 OldCoder There was no mistake in the patch; it was simply incomplete 16:09 OldCoder connection.cpp as it stands is not functional 16:09 OldCoder Worlds can easily fall into a state where millions of RE-SEND RELIABLEs occur 16:09 OldCoder When I say, millions, I refer to 10 to the sixth power 16:09 OldCoder That is a lot of zeroes! 16:10 OldCoder sfan5, review if you wish: http://minetest.org/lockup.txt 16:11 OldCoder ^ That file was produced by the unpatched server (i.e., upstream as it was) 16:11 OldCoder 16:11 sfan5 "WARNING: ACKed packet not in outgoing queue" o.o 16:11 OldCoder Indeed. Theories? 16:11 sfan5 lemme look at connection.cpp 16:11 OldCoder I was supposed to speak with Sapier but he has not been here for days 16:12 ShadowNinja [NickServ] Last seen : Oct 13 23:16:21 2014 (18 hours, 55 minutes, 59 seconds ago) 16:12 OldCoder I have missed him since this started 16:12 sfan5 OldCoder: is 70KB/s the expected download speed for minetest.org? 16:12 ShadowNinja OldCoder: /monitor + sapier and wait for him to come online. 16:13 OldCoder Hardly 16:13 OldCoder ShadowNinja, thank you! 16:13 Calinou is it Web download or from-server download? 16:14 OldCoder Zeno` added a forced timeout. Did not work. I added a 50ms delay on resends, code to handle corruption in resend counter field, and increased timeout periods 16:14 OldCoder These changes seem to have improved the situation 16:14 OldCoder Web download from the server can be quite fast. In fact, we are going to gigabit. 16:16 sfan5 ow 16:16 sfan5 the code style in connection.cpp does not follow the guidelines 16:18 sfan5 and why is dynamic_cast used everywhere? 16:21 OldCoder sfan5, proller and VanessaE seem to have a debate about this 16:21 OldCoder But the ACKed packet not in outgoing queue error concerns me 16:21 OldCoder As do the millions of resends 16:21 VanessaE I've no real opinion, except that I seem to have no trouble at all with the network code 16:21 OldCoder Usually millions of resends are not needed 16:21 OldCoder VanessaE, that is a clue as your worlds are busy. Yet, see the log file that I posted. 16:22 VanessaE I saw that 16:22 OldCoder My copy of 0.4.10 is probably a few weeks old. Perhaps a temporary issue? 16:22 VanessaE I went looking in my logs for such events and couldn't find anything similar in the busiest of my servers 16:23 OldCoder So, it is a combination of circumstances 16:23 VanessaE though mine aren't all *that* busy these days 16:23 OldCoder Very well 16:23 OldCoder I had a very busy few days 16:32 sfan5 OldCoder: speculation: it got an ACK for a packet it doesn't remember because the outgoing queue is probably cleaned, "UpdatePacketTooLateCounter()" seems to suggest this is not critical 16:32 OldCoder All right 16:32 sfan5 OldCoder: more speculation: the outgoing packet queue is cleared before it receives the ack for that packet and then it resends it, receives the ack "too late" .. (cycle continues) 16:32 OldCoder But the millions of RE-SENDs? 20:06 ninnghazad|2 so i tried to do a pull request, but travis will not compile my patched version, complaining about a missing function in irrlicht, while it compiles and works fine for me here. which irr-version does he use? 20:58 ShadowNinja ninnghazad|2: Probably 1.7 or 1.8, whichever one you're not using. 21:02 ninnghazad|2 1.7 it must be, already got it compiling