Time Nick Message 01:15 MTDiscord Note to devs/Zughy for discussion in the near future for the rename: It may be a good idea to officialize the core namespace rather than add another namespace. 01:39 v-rob I agree. The core namespace already exists, and it also has the benefit of being fork-agnostic. No need to have three namespaces--core, minetest, and newname 01:41 v-rob (It will also make it so I don't have to configure my brain in and out of builtin mode--I frequently use minetest by accident when I should use core) 01:46 MTDiscord Especially since "the former game and aspiring engine formerly known as Minetest" would just be too long to type out each time I wanted to make an API call. 03:56 MTDiscord Call it nodecore 03:57 MTDiscord https://tenor.com/view/beluga-the-cat-hakosh1307-hakosh-beluga-cat-hug-gif-22532913 11:32 MTDiscord Core devs, I would like to know whether the idea itself of having the texture atlas in MT (what my #15061 does) is approved generally by you or not? If it is not, I would like to know which reasons could be of it? It is also tagged now as "on the roadmap". I ask it because it distracts other reviewers from reviewing of the PR because they don't know in which state it is as it was found out from Desour in the PR. 11:32 ShadowBot https://github.com/minetest/minetest/issues/15061 -- Texture atlas for mapblocks meshes by Andrey2470T 12:00 celeron55 added a comment 12:00 celeron55 it's an "if" for me. in short, it must be clearly better than the previous one which was removed 12:00 celeron55 if it is, then it's a "yes" 12:27 [MatrxMT] have we changed globalsteps somehow with 5.9.X? If I play locally everything works smoothly, but if I play online things go slower 12:27 [MatrxMT] Like, Block League stamina bar stutters and Block League submachine gun shoots more slowly. Try it on AES 12:28 [MatrxMT] those are both operation that should happen every 0.1s 12:28 [MatrxMT] *operations 13:59 MTDiscord Would be great if somebody could take a look at #15176 13:59 ShadowBot https://github.com/minetest/minetest/issues/15176 -- Decouple Ambient Occlusion from Shadows 15:16 [MatrxMT] "fun" bug, I don't know if Minetest related: I had left the client open for a while, online. I come back to find out the server crashed more than hour ago and when I reconnect, the mouse starts lagging everywhere on my PC. Like, it moves smoothly but clicks weren't always working. I close the client and it starts behaving correctly 15:16 [MatrxMT] *an hour ago 15:44 [MatrxMT] well #15193 15:44 ShadowBot https://github.com/minetest/minetest/issues/15193 -- Different globalstep compared to <5.9? 15:49 [MatrxMT] just to be clear: this thing completely breaks my minigame, so if any core dev could look into it, I'd really appreciate it 17:41 sfan5 since 5.9.1 or 5.9.0? 17:43 sfan5 okay that's answered in there 17:44 sfan5 have you tried measuring if the delta time is actually bigger than 0.1? 18:32 [MatrxMT] sfan5: we'll run a few tests in a couple of hours. Best way to do that? 18:49 MTDiscord Possibly by checking register_globalstep()'s dtime var 18:50 sfan5 core.get_time_us() can measure the actually elapsed time without relying on the server step 19:14 Krock node timers too, I think 19:14 sfan5 planning to merge #15151 later. if there are any concerns please scream 19:14 ShadowBot https://github.com/minetest/minetest/issues/15151 -- [manual squash] Divorce map database locking from env lock by sfan5 19:17 Krock AAAAAAAAAAAAAAAAAAAAAAAAAAA 19:19 Krock I did have a look at the PR a few times - the concept makes sense overall, although I do not fully understand by how much the locking situation is now improved. 19:21 sfan5 previously while waiting for the database to return the block data into an std::string, we'd be holding the env lock 19:21 sfan5 that is now no longer the case 19:23 sfan5 it is a simple but massive improvement if it wasn't for the lock contention that starts being an issue on highly loaded servers 19:23 sfan5 (not a new issue per se but more servere now because the emergethread releases the lock for longer time) 19:27 Krock I see. Thanks for the explanation. This is also largely documented in yieldToOtherThreads, which is nice to see. 19:28 Krock no concerns. the code looks good. feel free to merge later 19:33 MTDiscord That PR increases lag on servers, apparently by design, i.e. we're essentially trying to fix starvation of the emerge thread by starving the main thread. It seems to mitigate the problem when disk performance is bad, but creates a problem when it isn't and MT is CPU-limited. 19:35 MTDiscord If we decide to merge it, it may help some people, and may hurt others. In my case, I've already basically had to mitigate the disk issue by changing backup processes to avoid making the disk too busy, and testing so far showed that trade-off worse for me. 19:35 MTDiscord I think we're going to want a more permanent solution really soon, with like proper queues or something. 19:38 MTDiscord As of right now, server performance is a big obstacle for me, and this may affect my ability to adopt 5.10 when it's released, if it remains in the state it's in now. If it'll be fixed by 5.11 or 5.10.1 or something then that's fine, I can wait a bit ... but if it ends up trapping me on 5.9, that's something to consider I suppose. 19:41 MTDiscord The whole thing arose from the fact that MT fails to achieve single-threaded performance, and that even fully saturated it might only be using ~0.85 cores. It's weird that somehow we reached the conclusion that reducing MT's utilization of available resources somehow solves this. 19:43 sfan5 I can't tell if this is supposed to be a insult toward my work 19:44 MTDiscord That's what you got from that? 19:44 MTDiscord Not the "I have many concerns about broken performance due to this PR"? 19:46 sfan5 it's "the situation is already bad and you are intentionally making it worse" paraphrased 19:47 Krock why is less locking worse? 19:47 MTDiscord I see 19:47 sfan5 i think the sensitive point is the fix-lock-contention-by-sleeping workaround 19:47 Krock if the overall server performance is an issue, then the emergethread will have to be paused until it catches up 19:48 Krock (as mapblock loading and generating does come with additional callbacks and traffic) 19:50 MTDiscord I'm not criticizing your work, I'm criticizing the suggestion that we're ready to merge it when discussion is still ongoing about consequences. 19:50 MTDiscord I was asked to scream, I think I did so quite politely. 19:50 Krock hmm. isn't the 1 ms sleep far faster than what the emerge thread can process? is the loop executed more than once? 19:50 sfan5 to be clear I am not happy with that either, it's awful. in my (admittedly artificial) tests it performed okay. I don't want to bloat up the PR into a total emerge code rewrite, so I'm willing to merge it now and see if it turns out to the unfortunate problem specific to Warr1024's setup or a flood of complaints about 5.10.0-dev shows up next week 19:51 sfan5 not implying your setup does not matter, but that would make the problem lower priority 19:51 sfan5 Krock: with 50ms of server thread lag, as few as 4ms of sleep were enough to get map loading up to speed 19:53 MTDiscord I'd actually really like to see progress made on this problem, whether it's a "proper" solution, a workaround, or just some experiments to gather data. I've just had a lot of trouble reproducing this "in the lab" and want to make sure that if I'm using "production" servers to get field data, I can mitigate risks. 19:53 sfan5 wouldn't it be possible to capture a lag profile on a prod server and replicate that in a lab using sleeps? 19:55 sfan5 this idea seems so simple that it shouldn't work 19:55 MTDiscord I've actually had pretty lousy reliability simulating lag with sleeps. The problem is I don't have a good way to "sleep until the current step takes X time" because I don't actually know how long the current step has been running for by the time my code hits. I also don't understand well enough the various things happening in what order within the step loop. 19:56 MTDiscord My szutil_lag mod can't even trust dtime, it basically just busy-waits until the current time is at least the time it reached the previous point in the step loop, plus whatever lag time is configured. 19:57 MTDiscord What really kills the experience isn't the average step time but large spikes, and if those lead to large extra sleep times in proportion, that exacerbates the problem ... but the emerge thread should only need a certain amount of time to acquire a lock, and waiting longer may not necesarily help it. 19:58 sfan5 it needs to acquire the lock once or twice *per* emerged block 19:59 sfan5 this isn't any different from before either, but it holds it for much shorter. which somehow leads to this 19:59 MTDiscord Yeah, that's ... unfortuante. If it could just acquire the lock once, and hang onto it for all the blocks it's got queued up, that'd be so much better ... but I guess "queued up" is sort of the root of the issue here, and what we don't have time right now to make happen. 19:59 Krock sfan5: random question. do you happen to know whether the emerge manager is smart enough to cancel out duplicate requests, or such that are outdated (i.e. block already loaded)? 20:00 sfan5 short answer: yes 20:00 Krock okay good 20:00 MTDiscord haha, that's a shame, because if it weren't, that sounds like it'd be an easy win 👤 20:00 MTDiscord 😄 20:02 sfan5 perhaps I could cap the sleep time so that if max_simultaneous_block_sends_per_client (40) blocks are emerged it stops 20:02 sfan5 that could reduce the lag impact and still provide reasonable map loading for users 20:02 sfan5 but could also do nothing 20:03 MTDiscord you could also sprinkle verbose logging all over the place to look for whether it WOULD have an impact, or whether things are already not peaking past 40. 20:05 MTDiscord tbh when dealing with "lag spike" problems, all the steps that AREN'T lag spikes really mess up the statistics. I got some value out of making a version of the jitprofiler that instead of writing to a file, records samples to an array, and only flushes that array to a file if the previous step dtime is > 0.25. That showed me immediately that "lag spikes" have a different performance profile than "normal" steps, and at least some 20:05 MTDiscord things that stood out on the unfiltered graph were minuscule when problems were happening. 20:05 MTDiscord Sadly it hasn't yet pointed me to what IS the problem, just what ISN'T that looked like it might have been. 20:06 Krock yieldToOtherThreads is currently written such that sleeps are proportional to dtime. But this dtime originates from the server thread that you're blocking 20:06 sfan5 to be exact it is proportional if the emerge queue is reasonably filled and it makes progress 20:07 sfan5 I guess an upper limit of 10 would really make sense before merging 20:07 MTDiscord Hmm, I didn't know that there was actually progress being made. 20:07 Krock indeed 20:08 sfan5 well we can't judge which progress is good so that can still cause problems 20:09 sfan5 it could go like 150 300 301 302 303 304 306 310 312 315 and we should have stopped after the second one 20:09 MTDiscord It does kind of suck to have 1ms per mapblock though ... it's not hard to queue up hundreds of mapblocks by traveling, and having the main thread sleep hundreds of ms is noticeable when the server is already running into lag spike issues ... which it will, since saves are also still happening, are a can of worms we haven't opened yet, and will now be contending with all the load activity. 20:09 sfan5 it's not 1ms per block 20:10 sfan5 1ms for the emerge thread to do as many blocks as it can 20:10 sfan5 limited by the response time of your db, of course 20:10 sfan5 this workaround doesn't work when both the CPU and db are slow 20:10 MTDiscord oic, so it just doesn't try to reacquire the lock during that time, but the emerge thread can repeatedly release and reacquire it? 20:10 Krock I'd expect it to process at most 2 per step 20:10 Krock (step = 1 ms) 20:11 sfan5 yes, the lock is simply left unlocked for that time 20:12 MTDiscord So the emerge thread has to acquire some lock to check the queue and pull at least one item from it, then it does some loading work in background until it has a "ready to use" mapblock, and then just needs to acquire the lock to connect that mapblock to the env? Or is there something more complex it's doing with that lock? 20:12 MTDiscord er I mean while holding that lock? 20:12 sfan5 no, that's a good explanation of what happens 20:13 Krock and then there's the mapgen part which makes the code execution somewhat non-predicable in time 20:14 MTDiscord Yeah, though in both my lab and my field test cases, mapgen shouldn't have been a factor as all the mapblocks in question were existing and saved. 20:15 MTDiscord I suppose to be thorough we should test mapgen scenarios too, just to be certain they are no worse too. Maybe I'll need to work on a more extensive lab setup for this. 20:15 sfan5 otoh mapgen still holds the lock for longer (due to on_generated) so maybe it just happens to not be affected 20:16 MTDiscord I'm starting to think that I should just run more non-trivial but repeatable setups in my lab tests, like snapshots of a developed world and complex game, and then just trust the lab results then. 20:17 sfan5 that would be worthwhile 20:17 MTDiscord I ran my tests in #15125 against a vanilla MTG world, where it was basically all just loading and saving mapblocks, because I was looking for a "minimal repro" setup, but now that the problem has been "proven" it's probably more useful to be looking at "real-world-like" scenarios so we can get a good idea of the impact of changes. 20:17 ShadowBot https://github.com/minetest/minetest/issues/15125 -- Busy Disks Cause Lag 20:18 sfan5 fwiw all my test are with either minetest_classic or minetest_game, both very careful with performance 20:18 sfan5 I expect nodecore to be quite stressful for the engine due to all the entities alone 20:19 MTDiscord I might just have to start recording every single dtime to a huge array, and then flush it out to disk every minute or so, and start crunching the statistics on every server step. Looking at averages or other window-limited aggregates can be deceptive if something happens in the background that throws a huge outlier into the data. 20:20 MTDiscord When it comes to server performance, from what I can tell, NodeCore's biggest impact is ABM saturation. From what I can tell, entities have minimal impact; I see no measurable impact on dtime from having like 8000 of them loaded, and the only real consequence of it seems to be client-side rendering, if they're within visible range. 20:21 MTDiscord If entities are not physical, they don't seem to process collision, and if they don't have on_step callbacks, they don't impact the lua tick. 20:21 sfan5 you don't use get_objects_in_range often then? 20:21 MTDiscord I've completely eliminated that function, actually recently, though I was never using it very heavily. 20:22 MTDiscord most of the entities are tied to a specific node position, so I can have a node position "key" I can index them by, and I can just loop over luaentities each tick. 20:23 sfan5 good :) 20:23 sfan5 it would be another bottleneck otherwise 20:24 MTDiscord Since virtually none of my entities are static_save either, it's kind of necessary to loop over them externally instead of using on_step, because they can get deleted pretty much any time. I used to have serious problems with the player wield attachments disappearing but I'm starting to thing I solved them by switching to the "inside out" globalstep approach. 20:33 sfan5 2024-09-25 22:32:06: [Main]: Server::AsyncRunStep() [ms] _________________ 52x 207.634 20:33 sfan5 2024-09-25 22:32:06: [Main]: Server::SendBlocks(): Collect list [ms] _____ 1x 12 20:33 sfan5 2024-09-25 22:32:06: [Main]: Server::SendBlocks(): Send to clients [ms] __ 1x 21 20:33 sfan5 this is with 200ms of server thread lag 20:33 sfan5 whoops 20:33 sfan5 2024-09-25 22:32:06: [Main]: Server::yieldTo...() progress [#] ___________ 36x 50.972 20:33 sfan5 2024-09-25 22:32:06: [Main]: Server::yieldTo...() sleep [ms] _____________ 36x 7.833 20:34 sfan5 these two 20:34 sfan5 takeaways: the sleep workaround wasn't activated every server step; sleep was 7ms on average *when* activated 20:34 sfan5 and for every ms slept the emerge thread managed to process 6.5 blocks 20:37 MTDiscord that sounds pretty good 20:37 MTDiscord I wonder if we can set an upper bound, like "if the server step spikes past 1000ms, don't do the sleep on this step" or something. 20:38 MTDiscord If a 600ms step turns into 610, that's no problem. If it turns into 700, that could be a problem. If a 1200ms step turns into like 1400 then that's pretty bad. 20:38 sfan5 even with 800ms of server lag the map loads very slowly, but not unplayably so. the yield workaround always sleeps 10ms, but this shouldn't be so bad 20:39 sfan5 Warr1024: I think the server could benefit from an upper limit in general, where upon reaching it it defers non-essential things to the next server step 20:40 MTDiscord Yeah, I've actually got a number of mechanics that do that already, but it's not something that we could do at like the engine level, it requires each game/mod to make that call itself, and there's no clean way to preempt lua. 20:40 MTDiscord The thing is, at this point, it's basically just not my lua code that's running most of the time, unless there's some new serious bug. 20:41 MTDiscord The biggest performance issues I have are in the C++ code with things like loading/saving data and running ABMs, and in some cases it can be really non-obvious how decisions made on the lua side impact behaviour on the C++ side. 20:42 MTDiscord Like, Lars just discovered that if you delete a field that was marked private, it marks the block as dirty, which triggers a mapblock send packet later. I haven't even asked yet about whether that would affect whether the block is considered dirty for save-to-disk purposes. 20:42 sfan5 "considered dirty for save-to-disk purposes" <- obviously, yes 20:43 MTDiscord There are a lot of things in the engine where if something has a value X, and you set it to value X, it's still marked as "dirty" and sent somewhere, which may have nontrivial cost ... but then sometimes it's impractical to ask the engine whether it was already set to X, or there's a cost associated with that itself. 20:43 MTDiscord For some things, I can maintain a cache, but for others it's totally impractical ... like, I can't cache every potential node metadata, for example. 20:44 sfan5 nodemeta is unaffected, setting a value that is already present is a no-op 20:44 MTDiscord In my case I'm more concerned about clearing a value that may already be not-present. 20:45 MTDiscord Though I guess this conversation already gave me like 2 or 3 ideas for things I can do to try to reduce marking blocks as dirty. 20:46 MTDiscord It'd be nice if there were some way to tell why a mapblock got marked as dirty, like, did a param0 get written, or a param2, or a value in meta... 20:46 sfan5 also applies to set_string(k, "") to clear something 20:46 sfan5 does *not* apply to mark_as_private 20:47 MTDiscord So if I want to set_string on a node meta, am I better off doing a get_string first and checking to make sure the value isn't already equal? 20:47 sfan5 no, that's wasted time 20:47 MTDiscord Oh, wait, I got it backwards, okay that makes sense 20:48 MTDiscord so I should blindly set the value and trust that the engine won't mark something dirty, but I SHOULDN'T blindly mark as private 20:48 sfan5 yes 20:48 sfan5 this could just be fixed for mark_as_private too 20:48 MTDiscord heh, that'd be nice 20:49 MTDiscord I was wondering if I need to somehow check if it's already marked ... but I don't know if there's a getter for that. 20:49 MTDiscord One thing I can do though at least is not mark as private if I'm clearing a meta field, since that's redundant and I'm pretty sure that trips the dirty flag based on what Lars said. 20:56 MTDiscord does set_float(key, 0) do the same thing as set_string(key, "") or does it actually write a 0 or something? 🤔 20:57 sfan5 puts a "0" 20:59 MTDiscord Thanks. So that means set_string(key, "") is the only way to "delete" a field? I've been doing stuff like if value ~= 0 then meta:set_float(key, value) else meta:set_string(key, "") end to try to keep meta "compact" and avoid extra keys. get_float(key) returns 0 whether the field is "0" or absent, it seems... 21:02 MTDiscord 'puts a "0"' ... feels kinda like a bug 😅 unless there's some really good reason why it shouldn't remove the field that I just don't understand. 21:47 [MatrxMT] sfan5: dtime goes between 0.09 and 0.1 21:47 [MatrxMT] a user tested on their server, same issue 21:48 sfan5 isn't 0.1 what you want to see? 21:50 [MatrxMT] yes, but it's definitely not 0.1 when you shoot 21:50 [MatrxMT] it is locally 21:50 [MatrxMT] I'd say it's around 0.25 21:51 [MatrxMT] even weapons requiring 0.2s are a little bit slower 21:51 sfan5 you mean the measured value is not 0.1 when shooting or it doesn't "feel" like that? 21:52 [MatrxMT] the value is 0.1 but when shooting it doesn't feel like that 21:52 sfan5 sounds like it might be network then 21:52 sfan5 can you attach a verbose log to the issue? 21:53 [MatrxMT] anything in particular? Like, when shooting, when the server starts..? 21:54 sfan5 my suggestion is to start the server, join, shoot for 5 seconds, wait for 5 seconds, stop server 21:55 sfan5 no hurry btw, tomorrow suffices 22:16 [MatrxMT] sfan5: published 22:33 [MatrxMT] if it's not enough, we'll try to feature the start and shut down as well tomorrow