Back to News
Advertisement
Advertisement

⚡ Community Insights

Discussion Sentiment

50% Positive

Analyzed from 376 words in the discussion.

Trending Topics

#https#parentheses#gpu#bit#com#same#tree#query#level#per

Discussion (7 Comments)Read Original on HackerNews

cscheid9 minutes ago
If you stare at the CYK algorithm long enough and see it for the dynamic programming approach it is, you'll then realize that you can do the same parallelization trick for any context-free grammar!
ww52025 minutes ago
This is an interesting read.

You can solve the same problem with Range Min Max Tree. The query for balanced or unbalanced parentheses is O(log N). If you use a bit vector to represent the parentheses (1 for open paren, 0 for close paren) and apply succinct data structure on the bit vector, you can use RMMTree + succinct_bit_vector to speed things up. A two or three level RMMTree can represent billions of parentheses already (two levels of 65536 branch factor = 4B bits or 4B parentheses). The query is O(65536 + 65536) or effectively O(1). For a four-level tree of 256 branch factor, the query is O(256 + 256 + 256 + 256) or O(1). It becomes a problem of memory access vs number of entries to process per level.

Since the bits can be constructed from the parentheses by segments, it can be broken up into multiple bit vectors, one per segment, and be processed in parallel. Multiple RMMTrees can be used, one per bit segment, and the trees can be processed in parallel.

raphlinusabout 2 hours ago
Also see Fast GPU bounding boxes on tree-structured scenes[1] (unpublished paper) and notes toward a blog post[2]. This is a highly tuned GPU implementation of parentheses matching. It's actually used in Vello (the classic version in which we offload basically all the work to the GPU, not the newer CPU-GPU hybrid version in which tracking the blend stack is done on the CPU).

Earlier versions of the work were featured on HN [3][4], but this is much more sophisticated. (plus a few more zero-comment submissions)

The basic idea (bicyclic semigroup and binary search) is the same as the submission. I think earliest attribution is to Bar-On and Vishkin[5] from 1985. Another implementation of this idea is in pareas[6], an experimental GPU-accelerated compiler.

I believe this work is publishable and would love to work with a student to resubmit it. Especially if you're a student or prof in Sydney, please reach out.

[1]: https://arxiv.org/abs/2205.11659

[2]: https://github.com/raphlinus/raphlinus.github.io/issues/66

[3]: https://news.ycombinator.com/item?id=24385095

[4]: https://news.ycombinator.com/item?id=27164009

[5]: https://dl.acm.org/doi/10.1145/3318.3478

[6]: https://github.com/Snektron/pareas

solomonbabout 1 hour ago
Getting to discover Oleg Kiselyov's work for the first time is such a treat. His web archive is incredible! I'm envious of the author and anyone else discovering it today.

https://okmij.org/ftp/

dmkolobov34 minutes ago
Nice. I remember being exposed to this idea as Dyck languages , via Ed Kmett’s recorded Monoidal Parsing talk.
pantsforbirdsabout 2 hours ago
Fun article and worth the read, but sadly none of the LaTeX was rendered for me (assuming it was supposed to).
munk-aabout 2 hours ago
The rendering appears to be done specifically by the js hosted on jsdelivr so if you've blocked that as a script source you'll just get the raw LaTeX (which I assume we're all fluent in anyways, of course!)