Time |
Nick |
Message |
00:04 |
|
alexxss joined #minetest-dev |
00:06 |
|
rickmcfarley joined #minetest-dev |
00:08 |
kahrl |
does anyone else get weird crosses above all dry shrubs when fsaa >= 2? http://i.imgur.com/zvhhjwf.png |
00:08 |
kahrl |
(this screenshot is with fsaa = 16, but it happens with anything but fsaa = 1 or 0) |
00:09 |
kahrl |
strangely enough it doesn't seem to happen with other plantlike nodes |
00:13 |
kahrl |
this is with an ATI Radeon HD 6850 using the opensource ati driver on gentoo |
00:14 |
VanessaE |
I saw that effect once before with something RBA was working on |
00:14 |
VanessaE |
at the time I think it was the texture "wrapping through" the bottom of the mesh and back to the top |
00:14 |
ShadowNinja |
kahrl: I can reproduce, also a line above the hand. |
00:15 |
kahrl |
VanessaE: seems plausible |
00:15 |
kahrl |
maybe it happens with other plantlike nodes too but I can't see it because it's too short |
00:17 |
kahrl |
ShadowNinja: not getting that here |
00:17 |
ShadowNinja |
kahrl: Got it with fsaa=64 or so. |
00:18 |
kahrl |
ah, let me try |
00:18 |
kahrl |
at fsaa=16 the extruded wield meshes are already all broken up through (lines between the pixels) |
00:19 |
kahrl |
not happening with fsaa=64 either, but I think my card can't handle that setting |
00:24 |
OldCoder |
I'd really appreciate help with the RE-SEND RELIABLE bug. This is a lockup that seems to be a fundamental issue. |
00:24 |
OldCoder |
All obvious patches including one by Zeno` are failing |
00:26 |
kahrl |
OldCoder: only sapier (and maybe c55) are likely to be able to help with that, I think |
00:27 |
OldCoder |
It is regrettable; this bug is completely fatal. There is no workaround at all. |
00:27 |
OldCoder |
I will need to shut down and do not wish to do so. I am looking at the code again. |
00:27 |
OldCoder |
It should be a simple fix. |
00:27 |
OldCoder |
But it escapes me. |
00:28 |
kahrl |
OldCoder: issue #? |
00:28 |
|
eeew joined #minetest-dev |
00:28 |
OldCoder |
It has been reported in the wild about 4 times but I don't know if it has an issue number yet |
00:28 |
OldCoder |
Miner_48 did the research |
00:28 |
OldCoder |
He forwarded a number of web pages which showed that other people have the issue |
00:29 |
OldCoder |
Zeno` knows the code and may help when he wakes up |
00:29 |
OldCoder |
He offered a patch yesterday but it had no effect |
00:29 |
OldCoder |
I am experimenting with other patches now |
00:29 |
OldCoder |
Essentially, connection.cpp locks up in the code that says RE-SEND RELIABLE |
00:30 |
OldCoder |
Millions and millions of the lines are printed |
00:30 |
kahrl |
writing an issue with all the relevant information might help sapier to get up to speed |
00:30 |
OldCoder |
You are correct |
00:30 |
OldCoder |
If Zeno` returns I will seek his advice regarding what to say; as he has now worked on the same code |
00:32 |
VanessaE |
*looks at clock* Zeno should be here within the next hour or so |
00:32 |
OldCoder |
Yes, we will see |
00:32 |
OldCoder |
Now that I have started some new worlds I wish to keep them up; but am now on time IRL. I will speak again with him. |
00:33 |
OldCoder |
Lockups are the worst type of issue. With crashes, at least you can restart. |
00:37 |
kahrl |
I can think of something worse: map corruption :P |
00:40 |
OldCoder |
It is correct |
00:40 |
ShadowNinja |
OldCoder: You can write a scipt that checks the logs for those re-send messages, and restarts the server if it finds too many of them. |
00:40 |
OldCoder |
ShadowNinja, the ultimate kludge :-) |
00:40 |
ShadowNinja |
You'll have to pipe stdout/stderr to it. |
00:40 |
OldCoder |
No |
00:41 |
OldCoder |
The messages appear in debug.txt |
00:41 |
OldCoder |
A daemon could watch there |
00:41 |
OldCoder |
But the problem occurs often sometimes |
00:41 |
OldCoder |
Restarting every few minutes is not viable |
00:41 |
ShadowNinja |
OldCoder: I didn't say it wasn't hacky. ;-) |
00:41 |
OldCoder |
Never mind hacky |
00:41 |
OldCoder |
Is a game that shuts down every few minutes going to work? |
00:41 |
OldCoder |
It appears that I have attracted a lot of mobile devices with one world |
00:42 |
OldCoder |
The mobiles are confusing the code |
00:42 |
OldCoder |
In short, Kindle equals Minedeath |
00:42 |
ShadowNinja |
OldCoder: Yes, that works too, but I'd use the pipe so the high-verbosity messages are just kept in memory (debug logs can baloon to GBs in size with high enough log levels). |
00:42 |
OldCoder |
Indeed. Not viable either way due to frequency of lockups. |
00:43 |
OldCoder |
Sometimes hours, but sometimes every few minutes. This has been happening for about four days now. I have just made another experimental patch. |
00:44 |
OldCoder |
This is probably a high-value issue to address, with the rise of mobiles in popularity |
00:45 |
* kahrl |
tries https://gist.github.com/kahrl/9f28eca10f3d62c9bd63 right now to reproduce the problem maybe |
00:47 |
ShadowNinja |
OldCoder: Is this on all of your servers? |
00:48 |
OldCoder |
On two worlds so far. Ones that have recently become popular. |
00:48 |
OldCoder |
You are familiar with one of them. |
00:48 |
* OldCoder |
reviews the gist |
00:49 |
OldCoder |
kahrl, that is interesting, what does the random disconnect do? |
00:49 |
OldCoder |
random drops packets? clever |
00:49 |
kahrl |
ok, doesn't seem to be enough to get it to lock up |
00:49 |
OldCoder |
So it simply doesn't send them and they pile up. Will they be classified in the RE-SEND RELIABLE group? |
00:50 |
OldCoder |
What constitutes a RELIABLE packet? |
00:50 |
kahrl |
yeah it resends them but since the likelihood is still high that the resent ones arrive, it won't lock up |
00:50 |
OldCoder |
I'm wondering if there is a bug elsewhere. If it helps, sequence numbers sometimes jump from 500 to 65000 |
00:50 |
OldCoder |
Would this be normal? |
00:50 |
kahrl |
a reliable packet is one that one can't afford to drop |
00:51 |
kahrl |
e.g. position updates can be dropped (they will be resent soon anyway), but chat messages can't |
00:51 |
OldCoder |
Not RELIABLE then but ESSENTIAL ? |
00:51 |
kahrl |
reliable = essential |
00:51 |
OldCoder |
Got it, thanks |
00:51 |
OldCoder |
So what might trigger millions of resends? |
00:51 |
OldCoder |
If we look at it differently |
00:52 |
kahrl |
a peer suddenly disappearing? |
00:52 |
OldCoder |
Hm |
00:52 |
OldCoder |
In this case |
00:52 |
kahrl |
although it wouldn't be literally millions |
00:52 |
OldCoder |
It *is* |
00:52 |
OldCoder |
This is what has caused the lockups |
00:52 |
OldCoder |
Millions of lines (literally) of RE-SEND RELIABLE |
00:52 |
OldCoder |
Is this a clue? |
00:52 |
kahrl |
over what time period? |
00:53 |
OldCoder |
1062624 of those lines today |
00:53 |
OldCoder |
And that is *with* patches intended to limit them |
00:53 |
OldCoder |
I suspect it was 5 million yesterday |
00:55 |
kahrl |
perhaps the server fails to time out such peers for some odd reason |
00:55 |
OldCoder |
kahrl, I'm guessing that something corrupts sequence numbers or queues |
00:56 |
OldCoder |
The default code drops the packets, reliable or not, if the resend count exceeds 5 |
00:56 |
OldCoder |
Yet millions of lines are printed |
00:56 |
OldCoder |
What might this imply? |
00:56 |
OldCoder |
I think the resend count might be ending up as -50000 or something |
00:56 |
OldCoder |
I'm adding a kludge to address this if so |
00:57 |
* OldCoder |
rests briefly and thanks you |
00:57 |
ShadowNinja |
OldCoder: Or reliable messages are continually being added to the queue. |
00:57 |
OldCoder |
Hm |
00:57 |
kahrl |
could you make a histogram of how many resends there are each minute? |
00:57 |
OldCoder |
But what would add so many of them? |
00:57 |
OldCoder |
Yes. If the problem occurs after the latest patch. |
00:58 |
OldCoder |
Resetting the log file now |
00:58 |
kahrl |
I wonder if it stays at about 12 re-sends per second all the time or if it suddenly piles up |
00:58 |
OldCoder |
Let me look at yesterday's |
00:58 |
|
rickmcfarley joined #minetest-dev |
00:59 |
OldCoder |
http://minetest.org/lockup.txt |
00:59 |
OldCoder |
kahrl, if you are curious, kindly glance at this file ^ |
00:59 |
ShadowNinja |
OldCoder: is your `if k->resend_count > 5) break;` patch before the print line? Also, does breaking cause the packet to be removed from the queue? |
01:00 |
ShadowNinja |
OldCoder: You can try to add a DisconnectPeer line there instead. |
01:00 |
OldCoder |
It was before the print line. And I don't think it caused removal. Zeno` produced a patch that was to prevent this but it also did not work. I tried the DisconnectPeer an hour ago. |
01:00 |
OldCoder |
I think something is corrupted else. Well, it will become clear in time. |
01:01 |
OldCoder |
Thank you for remarks. I might go over all of the patches tried for more ideas. |
01:01 |
* OldCoder |
rests briefly |
01:01 |
OldCoder |
Zzz |
01:01 |
proller |
yes, connection.cpp corrupted |
01:01 |
OldCoder |
Hm? Code is believed to need work? |
01:02 |
OldCoder |
I will fiddle with it further in a few minutes after a nap |
01:02 |
kahrl |
that was a very brief rest :P |
01:02 |
ShadowNinja |
OldCoder: proller is a troller, ignore him. |
01:02 |
OldCoder |
proller troller? |
01:02 |
OldCoder |
"The wheels on the troll go round and round..." |
01:02 |
ShadowNinja |
OldCoder: Yep, in fact "troller" is his alt nick. |
01:02 |
OldCoder |
Um, O.K. |
01:03 |
proller |
yes, but sometimes i can said too true things |
01:03 |
OldCoder |
I thought C55 disliked, um, trolling in this channel |
01:03 |
* OldCoder |
must rest for 5 to 10 minutes |
01:04 |
proller |
reverting connection.cpp to year ago state can help |
01:04 |
VanessaE |
/kick proller stop trolling |
01:04 |
VanessaE |
:P |
01:04 |
ShadowNinja |
OC: I told him, he agreed to quiet him but said that he wouldn't bother doing it himself. |
01:04 |
proller |
but i'm trying to help |
01:05 |
VanessaE |
proller: reverting to code that's 10x slower and far less reliable is no solution and you KNOW it. |
01:05 |
VanessaE |
strike that, more like 100x slower |
01:05 |
proller |
old code was tunable to sppeds higher than now |
01:05 |
proller |
and 100x stable |
01:06 |
VanessaE |
then how about you offer up a patch that actually fixes the problems you perceive. |
01:06 |
ShadowNinja |
Seems like no ops are arround now, other than possible kwolekr. |
01:07 |
ShadowNinja |
I've PMd sfan though. |
01:07 |
proller |
ShadowNinja, you need to change some spaces around |
01:12 |
OldCoder |
J, |
01:12 |
OldCoder |
Hm |
01:12 |
OldCoder |
What if I put |
01:12 |
OldCoder |
a few milliseconds delay in the loop? |
01:13 |
OldCoder |
Maybe the resends are too fast |
01:13 |
kahrl |
it'll probably lock up even faster |
01:14 |
kahrl |
but I guess you could try |
01:18 |
OldCoder |
Hm |
01:18 |
OldCoder |
Even faster? |
01:18 |
* OldCoder |
does not think that sounds desirable |
01:22 |
OldCoder |
kahrl, it appears to be 100s of resends per second if that is what you were curious about |
01:22 |
OldCoder |
But never on every packet |
01:23 |
OldCoder |
Perhaps dozens instead of 100s |
01:32 |
OldCoder |
ShadowNinja, disconnectPeer, forceTimeout, or both? |
01:34 |
ShadowNinja |
OldCoder: forceTimeout sounds like the right function to use. |
01:36 |
OldCoder |
Experimenting |
01:37 |
OldCoder |
Didn't work before but still playing with it |
01:37 |
OldCoder |
Added a 20ms delay on resend |
01:37 |
OldCoder |
We'll see if that makes it worse |
01:38 |
|
kaeza joined #minetest-dev |
01:47 |
OldCoder |
Hm. The 20ms delay may have helped or may be coincidence. |
02:04 |
|
zat joined #minetest-dev |
02:43 |
|
NakedFury joined #minetest-dev |
02:45 |
|
mos_basik__ joined #minetest-dev |
02:49 |
|
MikeFair_ joined #minetest-dev |
02:52 |
|
monte joined #minetest-dev |
03:13 |
|
GrimKriegor joined #minetest-dev |
03:21 |
|
rmilan joined #minetest-dev |
03:21 |
|
Robby joined #minetest-dev |
04:02 |
|
kaeza joined #minetest-dev |
04:31 |
|
sol_invictus joined #minetest-dev |
04:41 |
|
MikeFair joined #minetest-dev |
04:47 |
|
Miner_48er joined #minetest-dev |
05:00 |
|
kaeza joined #minetest-dev |
05:17 |
|
werwerwer joined #minetest-dev |
05:46 |
|
kaeza joined #minetest-dev |
05:46 |
|
HLuaBot joined #minetest-dev |
05:46 |
|
harrison joined #minetest-dev |
05:46 |
|
rickmcfarley joined #minetest-dev |
05:53 |
|
Hunterz joined #minetest-dev |
06:09 |
|
mos_basik joined #minetest-dev |
06:27 |
|
darkrose joined #minetest-dev |
07:17 |
|
ninnghazad joined #minetest-dev |
08:04 |
|
shmanceloticus joined #minetest-dev |
09:11 |
|
PenguinDad joined #minetest-dev |
09:53 |
|
chchjesus joined #minetest-dev |
09:56 |
|
Amaz joined #minetest-dev |
10:13 |
|
FR^2 joined #minetest-dev |
10:28 |
|
ImQ009 joined #minetest-dev |
12:53 |
|
Hunterz joined #minetest-dev |
13:03 |
|
ImQ009 joined #minetest-dev |
13:15 |
|
ImQ009 joined #minetest-dev |
13:22 |
|
iqualfragile joined #minetest-dev |
14:00 |
|
VanessaE joined #minetest-dev |
14:07 |
|
AnotherBrick joined #minetest-dev |
14:33 |
|
rickmcfarley joined #minetest-dev |
14:45 |
|
dhasenan joined #minetest-dev |
15:49 |
|
NakedFury joined #minetest-dev |
15:56 |
|
zat joined #minetest-dev |
15:57 |
|
Calinou joined #minetest-dev |
16:01 |
|
Sokomine joined #minetest-dev |
16:01 |
OldCoder |
A problem world stayed up overnight and so did my client. I may have a patch for the RE-SEND RELIABLE problem. |
16:01 |
OldCoder |
Sokomine, VanessaE, sfan5, ShadowBot, celeron55 ^ |
16:02 |
sfan5 |
I'd suggest pasting the patch somewhere :) |
16:02 |
OldCoder |
Of course. But testing is required and I'll also ask you or others 1 or 2 key questions. |
16:03 |
OldCoder |
Wished to indicate progress on an intractable problem. |
16:03 |
OldCoder |
Will tweak it further and then post. |
16:03 |
|
proller joined #minetest-dev |
16:04 |
OldCoder |
sfan5, just 1 question for now. What are negative consequences of increasing minimum resend timeout from 0.1 to 0.5 ? |
16:04 |
OldCoder |
A question for anybody else as well ^ |
16:04 |
|
RealBadAngel joined #minetest-dev |
16:05 |
sfan5 |
OldCoder: a client will experience even more delay if a packet gets lost |
16:05 |
sfan5 |
s/client/socket/ |
16:05 |
OldCoder |
sfan5, without the increase, the game locks up |
16:06 |
OldCoder |
Not the client, but the entire game, it appears |
16:06 |
sfan5 |
like not being able to look around? |
16:06 |
OldCoder |
The world goes dead entirely; a lockup |
16:07 |
OldCoder |
For everybody |
16:07 |
sfan5 |
hm |
16:07 |
OldCoder |
I have spent about 5 days on this |
16:07 |
sfan5 |
someone should debug that |
16:07 |
OldCoder |
<- did |
16:07 |
OldCoder |
But no explanation |
16:07 |
sfan5 |
..by looking at the traces of all threads when the server is locked up |
16:07 |
OldCoder |
It continued to run |
16:07 |
OldCoder |
But was busy with millions of RE-SEND RELIABLs |
16:08 |
OldCoder |
Literally, millions of them |
16:08 |
OldCoder |
My guess is corruption somewhere. Zeno` gave me a patch to force timeouts but it was insufficient. |
16:08 |
sfan5 |
maybe someone can mistake the error in your patch |
16:08 |
sfan5 |
wat |
16:08 |
OldCoder |
He didn't make a mistake |
16:08 |
sfan5 |
maybe someone can find the mistake in the patch* |
16:08 |
OldCoder |
I said, millions |
16:09 |
OldCoder |
There was no mistake in the patch; it was simply incomplete |
16:09 |
OldCoder |
connection.cpp as it stands is not functional |
16:09 |
OldCoder |
Worlds can easily fall into a state where millions of RE-SEND RELIABLEs occur |
16:09 |
OldCoder |
When I say, millions, I refer to 10 to the sixth power |
16:09 |
OldCoder |
That is a lot of zeroes! |
16:10 |
OldCoder |
sfan5, review if you wish: http://minetest.org/lockup.txt |
16:11 |
OldCoder |
^ That file was produced by the unpatched server (i.e., upstream as it was) |
16:11 |
OldCoder |
|
16:11 |
sfan5 |
"WARNING: ACKed packet not in outgoing queue" o.o |
16:11 |
OldCoder |
Indeed. Theories? |
16:11 |
sfan5 |
lemme look at connection.cpp |
16:11 |
OldCoder |
I was supposed to speak with Sapier but he has not been here for days |
16:12 |
ShadowNinja |
[NickServ] Last seen : Oct 13 23:16:21 2014 (18 hours, 55 minutes, 59 seconds ago) |
16:12 |
OldCoder |
I have missed him since this started |
16:12 |
sfan5 |
OldCoder: is 70KB/s the expected download speed for minetest.org? |
16:12 |
ShadowNinja |
OldCoder: /monitor + sapier and wait for him to come online. |
16:13 |
OldCoder |
Hardly |
16:13 |
OldCoder |
ShadowNinja, thank you! |
16:13 |
Calinou |
is it Web download or from-server download? |
16:14 |
OldCoder |
Zeno` added a forced timeout. Did not work. I added a 50ms delay on resends, code to handle corruption in resend counter field, and increased timeout periods |
16:14 |
OldCoder |
These changes seem to have improved the situation |
16:14 |
OldCoder |
Web download from the server can be quite fast. In fact, we are going to gigabit. |
16:16 |
sfan5 |
ow |
16:16 |
sfan5 |
the code style in connection.cpp does not follow the guidelines |
16:18 |
sfan5 |
and why is dynamic_cast used everywhere? |
16:21 |
OldCoder |
sfan5, proller and VanessaE seem to have a debate about this |
16:21 |
|
rubenwardy joined #minetest-dev |
16:21 |
OldCoder |
But the ACKed packet not in outgoing queue error concerns me |
16:21 |
OldCoder |
As do the millions of resends |
16:21 |
VanessaE |
I've no real opinion, except that I seem to have no trouble at all with the network code |
16:21 |
OldCoder |
Usually millions of resends are not needed |
16:21 |
OldCoder |
VanessaE, that is a clue as your worlds are busy. Yet, see the log file that I posted. |
16:22 |
VanessaE |
I saw that |
16:22 |
OldCoder |
My copy of 0.4.10 is probably a few weeks old. Perhaps a temporary issue? |
16:22 |
VanessaE |
I went looking in my logs for such events and couldn't find anything similar in the busiest of my servers |
16:23 |
OldCoder |
So, it is a combination of circumstances |
16:23 |
VanessaE |
though mine aren't all *that* busy these days |
16:23 |
OldCoder |
Very well |
16:23 |
OldCoder |
I had a very busy few days |
16:26 |
|
ImQ009 joined #minetest-dev |
16:32 |
sfan5 |
OldCoder: speculation: it got an ACK for a packet it doesn't remember because the outgoing queue is probably cleaned, "UpdatePacketTooLateCounter()" seems to suggest this is not critical |
16:32 |
OldCoder |
All right |
16:32 |
sfan5 |
OldCoder: more speculation: the outgoing packet queue is cleared before it receives the ack for that packet and then it resends it, receives the ack "too late" .. (cycle continues) |
16:32 |
OldCoder |
But the millions of RE-SENDs? |
16:39 |
|
rubenwardy joined #minetest-dev |
16:41 |
|
GrimKriegor joined #minetest-dev |
16:42 |
|
proller joined #minetest-dev |
16:56 |
|
SudoAptGetPlay joined #minetest-dev |
16:56 |
|
kilbith joined #minetest-dev |
16:56 |
|
SudoAptGetPlay left #minetest-dev |
17:14 |
|
Krock joined #minetest-dev |
17:36 |
|
rickmcfarley joined #minetest-dev |
17:46 |
|
Miner_48er joined #minetest-dev |
17:49 |
|
chchjesus joined #minetest-dev |
18:14 |
|
Du_Draig joined #minetest-dev |
18:19 |
|
kaeza joined #minetest-dev |
18:33 |
|
kahrl joined #minetest-dev |
18:33 |
|
kaeza joined #minetest-dev |
18:48 |
|
asl joined #minetest-dev |
19:44 |
|
DuDraig joined #minetest-dev |
20:04 |
|
ninnghazad|2 joined #minetest-dev |
20:06 |
ninnghazad|2 |
so i tried to do a pull request, but travis will not compile my patched version, complaining about a missing function in irrlicht, while it compiles and works fine for me here. which irr-version does he use? |
20:07 |
|
werwerwer joined #minetest-dev |
20:08 |
|
Amaz joined #minetest-dev |
20:15 |
|
AnotherBrick joined #minetest-dev |
20:26 |
|
werwerwer joined #minetest-dev |
20:32 |
|
kilbith joined #minetest-dev |
20:38 |
|
PilzAdam joined #minetest-dev |
20:41 |
|
AnotherBrick joined #minetest-dev |
20:58 |
ShadowNinja |
ninnghazad|2: Probably 1.7 or 1.8, whichever one you're not using. |
21:02 |
ninnghazad|2 |
1.7 it must be, already got it compiling |
21:37 |
|
kaeza joined #minetest-dev |
21:39 |
|
Fritigern joined #minetest-dev |
21:47 |
|
diemartin joined #minetest-dev |
22:01 |
|
diemartin joined #minetest-dev |
22:42 |
|
proller joined #minetest-dev |
22:52 |
|
twoelk joined #minetest-dev |
23:05 |
|
proller joined #minetest-dev |
23:20 |
|
mos_basik joined #minetest-dev |
23:41 |
|
exio4 joined #minetest-dev |
23:54 |
|
twoelk left #minetest-dev |