SLua geek gadgets demo/fullperm copies
Frio Belmonte
I've been converting some of my LSL projects into SLua and set them up at the SLua sandboxes for a demonstration.
At the time of this writing, what's set up:
- A computer (emulates a 6502 CPU and custom devices to run a computer within SL), which can be poked at and played with - the SLua implementation runs maybe 3x+ faster than LSL, thanks to easier data handling with buffers and other improvements, clocking at roughly ~1.5 kHz real CPU equivalent. Feel free to poke at it, there's a short explanation and a game to play on it.
- Mandelbrot/Julia fractal plotters, just the classic ones on a 40x40 pixel grid. These are fullperm including the script, right click+buy to get a copy.
- A complex number class, to enable doing complex number arithmetic with normal SLua equations, compatible with built in numbers. Not actually used by the fractal plotters since it's slower than optimized manual computation, but good for prototyping or just having a complex number calculator. Demo is rather minimal, just compares Euler's formula results to built-in sine and cosine. Fullperm copy via click.
Will probably add more things still, as inspiration strikes.
Location:
Log In
Frio Belmonte
Added a chiptune player meant as a music device for the computer, but not integrated at the time. It's not yet converted to SLua, but preliminary testing on LSL-Luau for now - runs at desired rate and uses 25% less memory, which is great, internally it's a small specialized computer with its own word-wise accessed memory and CPU so it's demanding on both accounts of performance and memory. Since it isn't connected to the computer it also includes a song parser script that can load "mostly human readable" notecards to set up the memory, the test song is included for giggles.
Note that it
will
miss notes on the first play or two because we can't just pitch-shift sounds or preload over 100 tiny samples all the time, but once they're cached by the viewer it's fine. Firestorm's aggressive sound spam blocker will get annoyed, but that's a viewer setting that can be turned off, the serverside spam limit is not exceeded.Frio Belmonte
Spent a couple hours rewriting it but it seems okay in SLua now. Sadly Harold Linden with how much you've improved basic math performance, the bit32 functions are again starting to fall behind... for raw power had to go back to divide+modulo instead of bit32.extract and such again, heh! Of course the difference between "almost nothing" and "2x almost nothing" isn't a big deal, which is about the numbers I'm getting. Might need to do that for the computer as well to squeeze in a couple more cycles/sec. Thankfully much less bit wrangling is required altogether due to being able to just read a buffer in 8, 16 and 32 bit modes which are all used by the music engine, without having to use a clunky quaternion list.
Kind of crazy that it manages to not measure any time over 80% of the time for processing the playback (os.clock has only so much precision, I suppose) despite much more detailed song debug info, and the playback itself isn't too much of a snooze either but the worst occasions taking 2 frames means there might be audible glitches... but it's just the nature of LL library calls, especially functions that change prim params like sound, I suppose, and perfection for something this nonsensical can't be expected.
Full sample song included, since this time I could be bothered to fire up OBS.
H
Harold Linden
Frio Belmonte
Yeah, I expect that the difference will be much less stark once we stop using instrumented debug builds,
bit32.extract()
is much faster than math when using a release build. A lot of current performance results are impacted by how many LUAU_ASSERT()
s live in between you doing the call and getting what you wanted. Using my example from https://feedback.secondlife.com/slua-alpha/p/relax-interrupt-checks-for-constant-time-cheap-library-functions with a release build with asserts disabled (note these are on MacOS AArch64 so take the absolute numbers with a grain of salt):% ./build/release/luau --sl bench_math.lua
Starting
math: 0.009963250020518899
bit32.extract: 0.00798375008162111
bit32.rshift(bit32.band()): 0.009976124973036349
with asserts:
% ./build/release/luau --sl bench_math.lua
Starting
math: 0.01467616658192128
bit32.extract: 0.011877166572958231
bit32.rshift(bit32.band()): 0.01465191668830812
The
LUAU_ASSERT()
perf penalty is even more heavily felt in functions that deal with heap types, and can account for 70% of the runtime, but the feedback is super helpful for debugging issues with the beta.I'd be interested to see the actual SLua code, there may be something in the compiler that's better able to optimize and collapse your particular div and mod operations than
bit32.extract()
Frio Belmonte
Harold Linden I feel we've had this conversation before and I found out I had an error in the synthetic test that did indeed lead to a compiler optimization for the math case. It could be happening here too, but I won't worry about finetuning for now since things are so fast already, thanks for all the good work under the hood.
H
Harold Linden
I should also mention that in SL itself,
os.clock()
is limited to a resolution of 100 microsecond intervals after ll.GetTime()
to prevent the more obvious timing attacks that you have to deal with due to SPECTRE and inherent SL jank. Browsers do it got that reason as well, see performance.now()
resolution in browsers.The code in the server looks like
F64 getClockElapsedPerformance() const override {
// Time the frame started
auto elapsed = getClockElapsed();
// Time since the frame started
auto frame_elapsed = LLFrameTimer::getCurrentFrameTimeF64();
// Clamp frame_elapsed to 100 microsecond precision
// This is necessary to mitigate timing attacks since the timer is so high-resolution.
// We don't care about clamping `elapsed` itself since that's just microseconds when the frame started,
// and that's far enough away from when this code would be running to not be an issue for timing attacks.
constexpr int DESIRED_RESOLUTION = 10000;
auto clamped_frame_elapsed = std::floor(frame_elapsed * DESIRED_RESOLUTION) / DESIRED_RESOLUTION;
return elapsed + clamped_frame_elapsed;
};
Frio Belmonte
Harold Linden That explains the undetectable timings I'm getting, just hitting < 0.1 ms for the processing. Adjusting my readout accordingly.
Frio Belmonte
No new developments, but everything updated for the new SLua beta and functional once again.
If my estimate of ~2.3 real Hz per emulated cycle is good, the computer can now run at up to 2.7 - 3 kHz, so that's nice.
Frio Belmonte
Tinkered together something that would've been painful to do with LSL: a node-based "filter simulator". It sets up a graph of node objects consisting of signal sources, wires that induce a signal delay and signal combiners, which will result in reasonably realistic looking filtering. Signals are drawn with particles. I remember taking a class on this some 20 years ago and refused to look up any information to see if I'm doing it right, for exploration's sake.
Frio Belmonte
Converted my free-rule cellular automaton to SLua too. Unfortunately it relies a bit heavily on bit ops (it's based on storing bitfields in integers) and speed could be better, so it will need a deeper rework to be more SLua-friendly. Also put out a demo gadget with hue-space to RGB color conversions, can grab the fullperm code as a copy.