Back to News
Advertisement
Advertisement

⚡ Community Insights

Discussion Sentiment

0% Positive

Analyzed from 392 words in the discussion.

Trending Topics

#semantics#regular#https#regex#language#json#schema#expression#posix#regexes

Discussion (11 Comments)Read Original on HackerNews

myroon510 minutes ago
JSON schema's docs also have a recommended regular expression subset:

https://json-schema.org/understanding-json-schema/reference/...

JdeBPabout 2 hours ago
The author is circling around, but not quite reaching, a statement that POSIX Basic Regular Expressions work everywhere, with the caveat that that not everyone has caught up with version 8 of the Single Unix Specification, which has slightly changed BREs.
MathMonkeyManabout 3 hours ago
I've always been a stickler for being specific about which regex language your thing accepts, and whether it is to match any substring, or a prefix, or a suffix, or the whole thing, or a line, or a substring of a line, or whatever.

Here are some of the [more popular][1] ones, and then there are PCRE and Python.

It took me a while to learn that some of the older ones you see in e.g. grep are [specified by POSIX][2].

[1]: https://cppreference.com/cpp/regex#Regular_expression_gramma...

[2]: https://pubs.opengroup.org/onlinepubs/009696899/basedefs/xbd...

agnishomabout 2 hours ago
A while ago, we wrote a paper about finding regexes which match the same way in both the greedy semantics and the leftmost maximal semantics.

https://par.nsf.gov/servlets/purl/10534654

quotemstrabout 2 hours ago
It drives me nuts when a developer documents something or other as being a "regex" but doesn't mention which dialect of regulation expression he's talking about. This habit is particularly common in the Rust, JavaScript, and Python communities, which seem to forget that their language's regular expression language isn't universal.
zahlmanabout 1 hour ago
Why? Of course it means the dialect that is most directly supported by that language (by builtins or the standard library). And why should they have to consider other dialects? They aren't reading regexes from user input (or they'd be a lot more concerned about sanitization, catastrophic backtracking etc.), and their fellow developers all grok the conventions.
bartread4 minutes ago
I’d imagine precisely because they might be collecting regexes from user input such as parameter values or search terms, and the user may not know or care which technology your tool or service is built with. However, they will need to know which regex dialect(s) you support.

And I’d further bet that people who are casual about specifying that are relatively strongly correlated with people who are casual about santization, catastrophic backtracking, etc. (At least based on code I’ve seen over the decades.)

jonstewartabout 1 hour ago
Then there’s not just the issue of whether the engine supports a particular syntactical feature but the issue of matching semantics. Perl/PCRE’s semantics are far different from POSIX’s and some implementations different semantics altogether (and quite reasonably).
LoganDarkabout 2 hours ago
> the special characters . * ^ $

These already do not work in many tools which require those special characters to be escaped to have any meaning. An easy example is GNU grep, sed, etc. which use BRE ("Basic Regular Expressions") by default. The article mentions GNU coreutils but does not explain that `-E` is required to fix that behavior.

Resonix3 days ago
why I built this
greazyabout 2 hours ago
I think you forgot to post a link?