ZH version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
56% Positive
Analyzed from 1668 words in the discussion.
Trending Topics
#code#ruby#don#language#format#more#lines#formatting#team#changes

Discussion (55 Comments)Read Original on HackerNews
The lead developer didn't like to bother with formatting code, so I wrote a tool called makenice to format his nasty spaghetti gibberish into something with good indents and layout to make it easier for us normal people to parse.
He was furious, literally spun in circles about it right in the office in front of everyone, so I wrote makenasty to format code into the way he appeared to like.
I only shared makenasty/nice with a couple of the team, who loved it, as it allowed easy conversion between something readable and something the team lead like.
He never knew about makenasty.
There are often limitations (like manually added indentation/spacing for alignment) but as long as you're very intentional about what changes you'll allow and have a good understanding of the language it can be an extremely safe operation.
I had to introduce a formatter in a few sizeable codebases in the past (few 100k to few million LOC), and I always did it incrementally via a script that reformatted all files that are not touched in any open PR. The initial run reformatted 95% of all files. Then I ran the script every day for ~two weeks and got up to 99.5% of all files and then manually each time one of the remaining ~dozen PRs that were WIP for longer were merged.
[1] https://github.com/diffplug/spotless/tree/main/plugin-gradle...
[2] https://git-scm.com/docs/git-blame#Documentation/git-blame.t...
Unfortunately I find that code bases lacking auto formatting are often littered with non functional changes as developers temporarily instrument code, remove it, but leave whitespace changes behind.
In terms of tracking code changes, one really would have to rewrite the entire history with each commit reformatted.
My rough blueprint for introducing formatter or linter nowadys would be:
- Recorded knowledge share session around how to set up the tools for local use 1-2 weeks before the initial rollout, and outline how the process will take place
- On the day of the initial rollout send out a reminder + the recording again
- Do the initial PR
- Incrementally do the rest of the migration, and subscribe to the PRs that drag out the process
The dart formatter has an internal sanity check. It walks through the unformatted and formatted strings in parallel skipping any whitespace. If any non-whitespace characters don't match, it immediately aborts. This ensures that the only thing the formatter changes is whitespace, and makes it much less spooky to run it blind on a huge codebase.
That sanity check has saved my ass a couple of times when weird bugs crept in, usually around unusual combinations of language features around new syntax.
(Unfortunately, the formatter in the past year has gotten a little more flexible about the kinds of changes it makes, including sometimes moving comments relatively to commas and brackets, so this sanity check skips some punctuation characters too, making it a little less reliable.)
$ find chromium-149.0.7826.1/ -name ".cc" -exec cat {} + | wc 21640925 55715244 833460441
And that took less than 6 minutes on a single E5-2696 v3 from 2014:
$ time find chromium-149.0.7826.1/ -name *.cc | parallel -j 16 clang-format $x>/dev/null
real 0m5.666s user 1m13.964s sys 0m13.373s
That’s orders of magnitude faster, especially if we assume they’re not running their workloads on potatoes like mine. Is Ruby’s syntax really that much more complicated than C++, or is this a tooling problem?
https://research.google/pubs/why-google-stores-billions-of-l...
AI has been a huge problem here: the amount of code is just exploding. Quality of the produced code is another matter.
I recently wrote a very esoteric Python script. 100 lines of code. No classes, no functions, but yes argparse.
I've tried out the latest open source models on the task. They go bananas. It's like Enterprise fizzbuzz (https://github.com/enterprisequalitycoding/fizzbuzzenterpris...). They love classes and imports and reinventing the wheel. A great way for me to tell trash AI slop code is it'll define a useful constant then 15 lines later do it again with a different name.
They love making code that looks impressive. "Wow look at all the classes and functions. It's so scalable. It's so dynamic. It validates every minutae against multiple schema and solves a problem I never thought about." But it was trash code. One really was 400 lines and it didn't even look like it would work. Can't even imagine what it means for 4.5M moderately good human lines to become what? 27M fluffy filler repeat lines that don't even make sense?
PCI-related/vaulting code lived in its own locked-down repo. I think that was a mix of Go and Ruby.
Once you have the foundations in place for account balances and the ledger, processing a payment isn’t that daunting. Those foundations, however, took a lot to build and evolve.
Always nice to see. I've seen people fall into the trap of designing for the common case, not realizing most of the code will be to deal with the less common cases.
Why bother formatting 25m lines of slop, and why is AI wasting tokens on making code look human-readable anyway?
Terrifying.
shows you never worked at "big succesful companies".
The thing I am interested is that I don't suppose that Stripe always had these many LOC's and so I would be curious to know if at any point as the codebase was increasing, were they looking at other new languages which were coming like golang or rust which was more suited for their work or not and what were there decisions/thinking process to continue using ruby.
Stripe has dabbled in Golang. There is also a growing Java monorepo.
Skippy the Intern, now retired these thirty years...
That insight might seem obvious - but if you stay cognizant of it as you work, you can invent some pretty amazing tooling for yourself & your team.