Matching All The Things
closed
Bleuhazenfurfle Resident
I want to make another call for something I Jira's (which was accepted) a while back, because it really seems to be VERY important (and was flagged as accepted, then seemingly promptly forgotten):
It was intended as an extensible grab-bag of all the searching methods people need, and also acknowledging the varying skills (and lacks thereof) of those people, with the intention that once implemented, could be applied liberally, and adding a new method to the core matcher would automatically add it to every place matching is done throughout LSL, without the proliferation of matching functions the way it's being done now; at it's core, I want xxxFindEx functions, with just a single extra integer "match type" argument. llLinksetDataFindEx, llListFindListEx, etc. (Not general string matching, they require something more, esp. with regex groups, match indexes, and the likes, though a simple llStringTestEx would likely work out rather nicely. Start offsets and all that are also not covered here, though definitely desired, and an Ex(tended) function is a great place for them, too.) And it's not an "either or" issue, either, the current matcher functions are simply deemed "convenience functions for the common case", along-side the longer-winded more general FindEx variant.
I started off with simple regular exact matching, but adding anchoring (prefix, postfix, inner, and exact), with options for case insensitivity, and numeric matching (those annoying numbers that get added to inventory names). Then, with the advent of regex in linkset data (though I was thinking about regex as a "future addition" before linkset data was a thing), that's just another search option added to the matcher, and instantly available everywhere. And the way I suggested it could be implemented in the Jira I'd written, also opened the door for one exceedingly handy option — reverse matching (you've probably all done it, just maybe not realised it could be an actual thing).
Much of the reason for all this, also, is that often a simple string match _is_ all that you need, and it's a heck of a lot faster and easier on the sim than building a regex to do it — and, as a first pass implementation, it _could_ simply internally generate and apply the corresponding (sanitised) regex, and then more specific functions get swapped in behind the scenes as the data indicates.
Which brings up the other point (and half of why I asked for glob matching as a bonus feature), proper regex sanitisation is
hard
, especially in LSL — we can do a hacky pass with llParseStringKeepNulls, removing the top worst offending candidates, but it's still a trial (slow and painful) and easy to get wrong (and we desperately need a built in function that does it properly). That first pass implementation, could also be exposed as a "regex builder" which would probably be handy to have, even if I too don't really like it — asking it for the corresponding "exact match" regex, would do double-service as a string sanitiser for dropping into an actual regex.The other reason for glob, is that many people (noobs especially) just find regex
hard
. Many are already struggling to learn LSL, trying to learn what is essentially an entire second language on top, and one with an array of hidden pitfalls as well. And it's all over the place already, anyone who's used Windows has seen the little *.ext thing all over the place, some will know about ? and [], and that's all that's typically needed. I would also suggest # (from the MQTT protocol, among other things), which is basically equivalent to the regex "\w+". (I suggested a matcher flag for "extended glob" that adds other stuff, but at that point it's probably not worth the effort since we have regex on hand — but I have seem them offered in globbing, I first encountered them back in 4DOS.COM and AmigaDOS filename matching, and of course there's Linux…) Plus, a proper glob matcher is just generally less burdonsom on the sim than regex would be — and it's still simple enough that you don't need to precompile the expression. (I did one 25 years ago, that also did numeric matching, in assembler.) But, as with the rest, a first pass implementation can just produce the matching regex, until the data indicates you need to do it better — and in the meantime, scripters don't have to panic (or just ignore, as they mostly do now) the intricacies of regex sanitisation.Over all of that, though, the idea is to vastly reduce the friction to adding new matching types; having even the almighty regex, be "just another matching type", right alongside the standard exact matching presently offered, is a powerful and extensible concept. LSL doesn't have function references and lambda's (I actually raised the possibility that something of that sort could be added too, where the "pattern" is actually the name of an LSL function to call for each value to be tested), so a general-purpose function like this is definitely the way to go, until we get a more capable language.
Log In
Signal Linden
closed
Hey, Bleuhazenfurfle Resident. Thanks for the detailed request. However, we need some more specific recommendations in order to triage requests. If you'd like to pursue this feature request, could you write up some specific LSL functions as an example? Also, please try to keep issues concise so that we can more quickly grok them!
Bleuhazenfurfle Resident
Signal Linden: https://github.com/secondlife/jira-archive/issues/10230
TL;DR: I did, and it's a
simple
plan that covers almost all bases for the foreseeable future, can be rolled out progressively in a simplified form, then improved incrementally as needed, and it's need is painfully obvious.Also… I did; I gave an exact specific recommendation: for each of the current string search functions, we add a new function with "Ex" on the name (for "Extended"), and a single additional "match type" integer argument.
This post was mostly a renewed call that we
need
this to be a thing, whatever exact form it takes, universally across LSL (ie. not regex here, exact there, who knows what somewhere else), and flexibly (one consistent interface to rule them all, and drop the barrier to adding more in the future). Sometimes (I'd even argue mostly) you want to search LSD by an exact key, and sometimes you want to search a list by regex, maybe even find an inventory item that contains a term somewhere within it's name, or just find a list item case-insensitively, this would make all that possible, and any variation thereof.The old function then becomes a "convenience" version (on the assumption that it was implemented first because it's the most common use case), with the new ones providing (just one for each instance of string searching across LSL) a consistent portal to all present string matching methods, and
vastly
dropping the barrier to adding new ones in the future — including small variations like prefix/suffix/infix exact matching, or case-insensitive matches. (You add the new match type to the backend match function, and it's instantly available right across the board with no additional impact on LSL.)I also mentioned in the Jira post how I envisioned it would be implemented behind the scenes, including allowing for
reverse
matching (where the haystack is full of patterns, and you're looking for the first one that matches your needle — mostly useful with lists), and how that implementation allows a quick first-run implementation to support all the features, with behind-the-scenes refinement as it's found to be neccesary (eg. they could all just convert to regex, and go through the regex engine, until LL see's the need to implement more specific functions — with zero impact on existing scripts).This plan can also be extended to "find all" situations with the addition of start cursor (integer for lists, key name for LSD, etc.), though that is a separate enough topic it should get it's own request.