I rewrote PostHog's SQL parser, 70x faster, while barely looking at the code

rrobbie-c about 3 hours ago 36 commentsRead Article on posthog.com

DE version is available. Content is displayed in original English for accuracy.

⚡ Community Insights

Discussion Sentiment

75% Positive

Analyzed from 977 words in the discussion.

Discussion (36 Comments)Read Original on HackerNews

jakewins•8 minutes ago

I’ve had very good success in similar setups where you have some sort of “oracle” and can generate enormous corpuses of test data, such that you really, really trust the LLM code must work for the inputs you expect it’ll ever need to handle.

Makes me think of all the algorithms we specify in proof languages and then hand-implement in production languages - this setup could maybe let you just specify the proof of an algorithm and then let LLMs derive efficient implementations with the (slow) proof as an oracle

duendefm•about 2 hours ago

Well despite my current anti AI sentiment, I have to admit that after reading the article, It was a good use of AI, done by someone with good technical skills. Still I have the feeling that this only works because of the vast accumulated knowledge pre-AI, and if everybody keeps going in this path, it will end up making everyone not advancing their knowledge at the pace they did before. I feel that this AI immersion is really about selling our soul to the devil for short term gains.

bitlad•about 1 hour ago

I think AI is powertool. Period. If you give it to people who are skill, it will create a mess.

I think democratization of intelligence is going to be interesting. You could say the same with same about internet. I think it is part of evolution. May be intelligence or expertise is what does not make us special. May be it is that we are ingenious amd creative with tools and thats how we evolve.

choilive•about 1 hour ago

There are some studies that suggest human brain sizes have been shrinking over the last 20,000 years. The theory is that as civilization developed the demand for individual humans to be independently intelligent has weakened because we developed a "collective brain" and also self-domesticated to be more cooperative.

veidelis•16 minutes ago

Honestly there might be truth to it, I don't get the downvotes - why?

steve_adams_86•39 minutes ago

> democratization of intelligence

I'm not trying to be pedantic; I think this is an interesting topic and there's a worthwhile distinction to make here. It isn't really being democratized for a couple reasons (at least).

One, access to information isn't truly knowledge in and of itself. People allowing information from LLMs to pass through their brains are not necessarily retaining any of it, and their ability to synthesize and utilize disparate information from LLMs isn't inherently improved by this technology. So the premise of knowledge isn't very sturdy in my mind.

Two, LLMs function across very broad fields of capability, accuracy, content, and so on, and the best models are not accessible to many people. I find people tend to mean the technology is widely available and accessible when they say 'democratization', but that's not necessarily true nor what that word means to begin with.

True democratization would mean something more like "everyone participates in, shapes, regulates, and grows this technology with their own inputs". I don't think that's what happens at all, and in fact, it has been quite the inversion of that so far.

I mention all of this because I agree that it will be interesting to watch what happens, but I don't agree that it will be for the same reasons. I worry about it specifically because there is not an egalitarian distribution of knowledge, and it is not democratically built or shared.

ekidd•about 1 hour ago

> May be it is that we are ingenious amd creative with tools and thats how we evolve.

And every time you use the AI to be ingenious or creative, that will be added to the training data. Then someday the AI can be ingenious and creative without you! (It might take a few more breakthroughs. But investors will literally spend trillions chasing those breakthroughs.)

The endgame here is to replace all human intelligence and labor with machines that are smarter and work cheaper. But who controls the machines?

bitlad•38 minutes ago

> And every time you use the AI to be ingenious or creative, that will be added to the training data That's part of evolution.

We as humans have always outsmarted the tools.

Daishiman•about 1 hour ago

> till I have the feeling that this only works because of the vast accumulated knowledge pre-AI

I'm not about to say that there's nothing new under the sun, but parsers are a really well-understood problem where 99.9% of people don't need frontier knowledge and wouldn't be in a position to use it anyway.

And I don't think that people doing research on parsers would ever rely on LLMs for precisely that reason. But we're not parser researchers right?

theLiminator•30 minutes ago

This is the type of problem for which LLM generation is great for.

If you have an oracle, and your problem is largely just a pure function, it's pretty good at generating something that both works and is fast.

mikkelam•about 2 hours ago

I cannot believe they're sticking to their guns on this website design. It's awful.

joshmarinacci•7 minutes ago

I love that it doesn't feel like every other vibe coded VC backed startup.

my-next-account•about 1 hour ago

It's awesome!

kg•about 1 hour ago

Try clicking 'switch to website mode' on the left side

nektro•9 minutes ago

thank you!

pixel_popping•about 1 hour ago

They have an excellent branding and have some balls to pull it off, it shows passion, I highly trust it even in company settings.

softboyled•35 minutes ago

Yeah. It locked up my browser. What a pile.

noja•about 1 hour ago

I love it. So different. Slightly BeOS.

russellthehippo•about 1 hour ago

The key parts of this is how not vibecoded it is. Feels like a model of how you should do software with AI. Now that we can easily set up property testing, fuzzing, etc. there's almost no reason not to.

spullara•about 1 hour ago

that is vibecoding these days

lovasoa•about 1 hour ago

The thing I would have liked to know is why they don't use an existing fast SQL parser. Was being slightly incompatible with all existing SQL dialects a product requirement?

robbie-c•about 1 hour ago

Our SQL is very similar to ClickHouse SQL, in that we used ClickHouse SQL as a starting point as that's what our underlying DB is. We needed to have our own parser so that we could add additional language features on top.

-warren•30 minutes ago

I think thats exactly what indirectly happened. This guy didnt optimize the parser. Someone else did -- years ago. That work was pulled into the LLM and made it look like magic.

sayrer•23 minutes ago

ha, try to keep going. Run it under samply and Gungraun (need AMD64 for this)

sam_lowry_•about 2 hours ago

Dunno about the parser, but you broke scrolling on your fancy website without noticing it also ;-)

sscaryterry•37 minutes ago

Good read, but "70x" is always misleading.

robbie-c•18 minutes ago

In what way? This was a geometric mean of the improvements from a small test corpus. In production, where it only parses longer SQL that didn't hit the parser cache, the mean parse time went down by 454x, across millions of parses.

duke_of_vandals•about 1 hour ago

How long did this take?

robbie-c•about 1 hour ago

It took about 2 days to get a proof of concept, and about a week to get something I could ship to production.

I skipped a few features for the PoC (like XML tag support, token positions), so most of the delta was adding those back in!

duke_of_vandals•37 minutes ago

If you didn’t need to look at the code at all, why not write it in asm instead of Rust, and make it even faster?

robbie-c•34 minutes ago

Ha I did consider that! But 70x is plenty fast enough (we still have to query an actual database!) and the parser runs in a shared process on untrusted input, so it wasn't worth the security risk

orsorna•22 minutes ago

About 1/1000 of the duration of their interview process where they gloat about wasting your time.

CrzyLngPwd•21 minutes ago

"I didn't rewrite"

speedgoose•41 minutes ago

There is no such thing as a legally-required cookie banner. You can read the GDPR, or ask an LLM to read it for you if you can’t read anymore.

sscaryterry•35 minutes ago

This is true and unkind at the same time.

elmean•about 1 hour ago

why do I need to download multiple bibles worth of javascript to read a blog post