FR version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
55% Positive
Analyzed from 14468 words in the discussion.
Trending Topics
#commit#git#commits#don#messages#more#code#log#merge#printf

Discussion (500 Comments)Read Original on HackerNews
What Changes the Most
Who Built This Where Do Bugs Cluster Is This Project Accelerating or Dying How Often Is the Team Firefighting Much more verbose, closer to programming than shell scripting. But less flags to remember.Not meaning to offend anyone: Nix is cool, but adds complexity. And as a disclaimer: I used jujutsu for a few months and went back to git. Mostly because git is wired in my fingers, and git is everywhere. Those examples of what jujutsu can do and not git sound nice, but in those few months I never remotely had a need for them, so it felt overkill for me.
The most frequent "complex" command I use is to find commits in my name that are unsigned, and then sign them (this is owing to my workflow with agents that commit on my behalf but I'm not going to give agents my private key!)
I hadn't even spared a moment to consider the git equivalent but I would humbly expect it to be quite obtuse.You can find this pattern again and again. How many redditors say 120fps is essential for gaming or absolutely require a mechanical keyboard?
I don't get the mechanical keyboard one, though. I am fine with any keyboard, I just like my mechanical keyboard at home. Just like I am fine with any chair, but ideally I would have a chair I like at home.
120fps I have no experience with, but I would imagine it's closer to video quality. Once you're used to watching everything in 4K, probably it feels frustrating to watch a 1080p video. But when 4K did not exist, it was not a need. I actively try to not get used to 4K because I don't want to "create the need" for it :-).
Those don't fit the others - they don't really require the user to adapt or are more complex in any way but are just nicer versions of the standard.
That's not been my experience with jj which after the initial hurdle is a breeze.
I don't use aliases, I guess I'm insane?
Also 99.9% of the time, git "just works" for me. If I need to do something special once a year, I can search for it. Like I would with jujutsu.
Managing a flake.nix can be a bit more complex than a package.json in practice, due to the flexibility of the format and some quirks around Nix's default caching behavior, but working with it is a breath of fresh air compared relying on globally installed tools. Having said that, you might want to check out Devbox. I haven't used it myself, but found it recently and thought it looked like a nice abstraction over raw Nix.
At this point perhaps a million person-years have been sacrificed to the semantically incoherent shit UX of git. I have loathed git from the beginning but there's effectively no other choice.
That said, the OP's commands are useful, I am copying them (because obviously I won't ever memorize them).
I wrote a cheat sheet in my notes of common commands, until they stuck in my head and I haven't needed it now for a decade or more. I also lean heavily on aliases and "self-documenting" things in my .bashrc file. Curious how others handle it. A search every time I need to do something would be too much friction for me to stand.
Yes! We mostly wouldn’t tolerate the complexity and the terrible UX of a tool we use everyday--but there's enough Stockholm Syndrome out there where most of us are willing to tolerate it.
Partly that, but for me at least, I have a bunch of simple bash scripts and aliases for things I do frequently. Git makes this really easy because you can set aliases for lots of custom commands in the .gitconfig file.
FWIW I too was once a "memorised a few commands and that was it" type of dev, then I read 3 chapters of the Git book https://git-scm.com/book/en/v2 (well really two, the first chapter was a "these are things you already know") and wow did my life with git change.
Or, perhaps better yet, defining your own functions/helpers as you go for things you might care about, which, by virtue of having been named you, are much easier to remember (and still compose nicely).
Don't feel bad - no one remembers them all, we just remember a few idioms we use...
We can't.
Why do you think the `man` command exists?
But jq I use maybe once a week, and it just won't stick. Same for any git features beyond basic wrangling of the history tree (but, on the flip side, that basic wrangling has eliminated 99% of the times I have to look things up).
[1] https://github.com/jacobdeichert/mask
If you have multiple machines (/must have), just apply your user config to current machine?
and when a new branch is one key stroe away from being created, I am more inclined to use it.
And yes, I'm also ecstatic when I manage to iterate over anything in `jq` without giving up and reaching for online reference. For `git`, functionality I use divides neatly into "things I do at least every week or two" and "things that make me reach for the git book every single time".
I mean, that was true until ~year or so. Now, I just have an LLM on speed dial. `howto do xyz in $tool`, `wtf is git --blah`, `oneliner for frobbing the widget`, etc.
To me, the verbosity of both Git and JJ commands to do these things are an indication that neither of these tools are meant to do them.
I often feel I need to setup bots to make superfluous commits just to make it look like my useful and stable repos are “active”
One example (not mine) a a qr-code generator library. Hasn’t been updated in 10 years. It’s perfect as is. It just provides the size and the bits. You convert those bits to any representation you want. It has no need to be updated
It's not impossible, of course, but if I saw even a qr library that hadn't changed in 10 years I would worry that it wouldn't build on current systems (due to dependencies) and that nobody was actually using it (due to lag of bug reports).
A QR (or barcode) library is exactly the type of thing I’d assume would still work fine, since there’s nothing new to do, the parsing rules don’t change, it’s a static, known, solved problem.
Agreed. Assuming there are no open issues and PRs. When I find a project, if the date of the last commit is old, I next look at the issues and PRs. If there are simple-to-deal-with issues (e.g. a short question or spam) and easy-to-merge PRs (e.g. fixing a typo in the README) which have been left lingering for years, it’s probably abandoned. Looking at the maintainer’s GitHub activity graph should provide more clues.
> I often feel I need to setup bots to make superfluous commits just to make it look like my useful and stable repos are “active”
I have never done it, but a few times thought about making a “maintenance” release to bump the version number and release date, especially since I often use a variant of calendar versioning.
A language that properly maps to the data model, and has readable identifiers is a boon. Git is a database, a database needs a proper query language.
Disclaimer: I love jq too :)
`git log --color --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --`
Which is something I see a lot of people alias in Git for viewing logs.
If you remove the rainbow specification, it should be
And most programmers are used to C style formatting.All joking aside, it really is a chronic problem in the corporate world. Most codebases I encounter just have "changed stuff" or "hope this works now".
It's a small minority of developers (myself included) who consider the git commit log to be important enough to spend time writing something meaningful.
AI generated commit messages helps this a lot, if developers would actually use it (I hope they will).
If I joined a company where people committed their code with "stuff" or "made some changes" or "asdfhlfo;ejfo;ae," that would be a red flag that I might have joined the wrong company, and I'd start to wonder what else the developers here do carelessly.
These things never take much time but people dismiss them because of that. Because each commit and each comment in isolation isn't very valuable but they are very helpful in aggregate. I'm not sure why this bias exists though, since the same is true for lines of code. It's also true about a ton of things. All the little things add up. Just because it's little now doesn't mean it's not important
More likely you'd already know by this point because it would be staring you in the face
So we told him to commit at least once every day, with a relevant commit message, or else fail his internship.
He worked 21 more days. There were 21 commits: "17:00, time to go home".
The thing with commit messages is that they are mostly never going to get a lot of scrutiny. But there are exceptions to this; especially if there are audits involved or external contributors. And of course when making a pull request to an OSS project, it is good form to make a bit of an effort. It depends on the context. I tend to focus more on diffs and pull requests. Not on the cryptic one liners that may or may not describe some of the changes. The right unit of change is a pull request, not individual commits.
And all this of course was when I was still able to keep on top of the massive volume of change. With AI that's simply no longer the case and the volume of change is only going to increase over time. Human reviews are now the main bottleneck to getting code merged. AIs probably should be doing a lot of the reviewing, gatekeeping testing, vetting, etc. Especially when AIs also produce most of the change. It's likely a lot of things will slip through unless you get your house in order on guard rails and process that your AI agents need to follow. As a CTO, guarding quality without becoming a human bottleneck is now the main challenge and removing bottlenecks responsibly is part of the job.
BTW. making AI tools write good commit messages is actually be a bit expensive. Many AI tools default to just summarizing the first message of a chat session under the assumption that just one thing changed over the course of a session. Making the AI look at the actual diff is of course possible and not that hard (just ask). And it definitely yields better descriptions when you do that. But it also takes more time and the token cost goes up as well. I'm not sure that's actually worth the expense in tokens. I tend to not bother with this. But again; depends on the context.
In the company I work for, there is a team that has isolated itself to some extent from other teams and works at a furious pace to keep their particular section of the business happy. We're lucky enough that they spun up their own repo to do their work on, so they don't actually impact other teams, but if the quality of the commit messages is anything to go by, I am 100% certain they're going to end up in a huge mess, if they aren't already. The team lead encourages this, and certainly doesn't care about commit messages etc.
Developers who care about other developers tend to write better quality code, because they care what other developers think of them. If you care about other developers, you will most likely write decent quality commit messages too.
I have seen over many years the types of developers who only care about moving their own code into production as fast as possible and getting to the next thing. There is a very high correlation with a mess at the end, which they inevitably won't have to tidy up because they'll be doing the next thing. These types of developers hate owning stuff in production, so they don't do it, so they don't actually care how maintainable their "clever" code is. I am very certain that a number of people reading this will be those types of developers.
As a manager, one of the first things I do is make sure that the PR titles (the PR text becomes the commit messages in squash-merging workflows) at minimum begin with a ticket number. Then we can later read both the intention and the commentary on it.
To the point of other commenters however, I wouldn't lay something this micro at the foot of the CTO in all but the smallest of organizations.
(This is not an endorsement to do that, he's a good developer in spite of his shitty commit messages)
So, before AI came and saved me from writing commit messages I had alias that ran the whole git add . && git commit -m ... && git push with a fixed commit message. But of course we had squashing so PR title was the one to eventually go there, so maybe that part is fine. But all my side projects had just that.
I though we were talking about good commit messages.
I understand this isn't inline with traditional git scm, but it's a very powerful workflow if you are OK with some hybridization.
The gold standard is rebased linear unsquashed history with literary commits, but I'll take merged and squashed PR commits with sensible commit messages without complaint.
The parent comment stated "Most codebases I encounter just have "changed stuff" or "hope this works now"." I worked at 6 tech companies in my career and a slew of contracting gigs, and I literally never encountered the problem of commit messages being uninformative. Most of the companies developed strict rules for commit comments like always including an issue number (with occasional [NO-ISSUE] tags allowed for minor changes) or something like Conventional Commits, https://www.conventionalcommits.org/en/v1.0.0/ .
And I really like that because it leaves room to let the developer do whatever kind of commit messages they want to that makes sense to them. Because nobody's really ever going to read those again after it squashed and merged.
They might not include anything but the Jira ticket number, if the environment is truly lacking.
Is it really a small minority? I have never worked on a project that didn't have commit messages that at least tried to be descriptive (sometimes people fail at it but its very different to an outright "changed stuff").
I don't remember any friend mentioning to me them encountering a work project where the messages were totally neglected either.
The people who don't write commit messages for us are the non developers.
Writing commit messages shouldn't take any time at all. If it does, then you probably have a range of other professional issues.
Non-Core repos are absolute free-form anything goes. You can commit 8 times a day without a PR as long as you're not deploying to production. This is excellent for things like test harnesses, CI/CD nonsense, one-off experiments, demos, prototypes, etc.
My last Core commit had something like 20 to 1 ratio between lines of commit message to lines of code (small change touching something deep that required a lot of explanation). My last non-Core commit message was "hope this works" (it did not).
I have actually seen proper capitalization and correct conventional-commit types to correlate very well with the author being intentional and the patch being of good quality.
e.g.
- (a) chore: update some_module to include new_func
- (b) feat: Add new_func to handle XYZ case
Where:
(a) is not a chore, as it changes functionality, is uncapitalized and is so low-signal I can probably write a 10 line script to reliably generate similar titles.
(b) is using the correct "feat" commit type, capitalized and describe what this is for. I expect the body to explain "why", as well, and not to reiterate the "how" in natural language.
This is just my experience, but I've seen commit messages where people actually put in some effort to usually come with a good patch, and vice-versa.
I have been told off several times at different jobs for writing commit messages that are "too big". Also for writing too much commentary in my code changes. Also also for complaining that other people aren't doing these things.
(Not that it stops me, mind, but it does make working relationships fractious.)
I work on a lot of 2D system, and the only way to debug is often to plot 1000s of results and visually check it behaves as expected. Sometimes I will fix an issue, look at the results, and it seems resolved (was present is say 100 cases) only to realize that actually there are still 5 cases where it is still present. Sure I could amend the last commit, but I actually keep it as a trace of “careful this first version mostly did the job but actually not quite”
main branch is advanced on PR level, with squashed commits.
So the "." should never make it to main, and have PR description as commit message.
Showing my Git ignorance here, of course - does "ancestors(trunk)" pull in all the commit messages?
relying on git commit messages assumes they're correct by convention since there is no technical constraint to enforce it. and it assumes no work in progress commits, sometimes it's just necessary to hit the save button real quick or move a workspace from one device to another.
my point is: git is a way of storing and loading files at its core.
git log --oneline and a sprinkle of your personal sauce on .claude goes a long way :)
For small personal projects I often write one phrase messages with `-m`, but if you're working with other people you should be writing good commit messages.
Random, subjective, or written in a state of mental exhaustion commit messages.
I also love the switcheroo the author made: git not logs. But hey :)
> git shortlog -sn --no-merges
Is the most egregious. In one codebase there is a developer's name at the top of the list who outpaced the number 2 by almost 3x the number of commits. That developer no longer works at the company? Crisis? Nope, the opposite. The developer was a net-negative to the team in more ways than one, didn't understand the codebase very well at all, and just happened to commit every time they turned around for some reason.
This leaves developers to commit locally and comment as much or little as they like.
As you identified there is a lot of parallel sentence structure.
They also have a bunch of lists of 3-5 items which is a classic.
It doesn't have the anodyne rambling never getting to the point style common to LLMs
Also important to have a plan for code review and merges when a lead is out of town. I found out I needed to reevaluate how that's handed when I was gone for two weeks.
If you look at a code base that's not really in trouble, these commands don't reveal the source of the trouble, because there might be none.
Assuming I'm not ego-mad, I like to think this is because I built the project from the ground up before handing it over to the rest of the team.
These days other people commit more often than I do, but my name is still dominant, and probably will be for some time.
If someone came to me and said "I ran these and I see XXX was the most prolific committer and they left X months ago, what will be do???" I'd have to work hard not to laugh.
Since these snippets are self-described as ways to get familiar with the code/projects I wanted to provide the counter point. Most of those snippets do not at all paint the real picture and for all the repos I tested it on they paint the opposite of reality.
I know these codebases like the back of my hand, the purported purpose of these snippets is to better understand the codebase, I can tell you they don't work for anything I tested them on. Maybe they work for other codebases but the sample size I have access to says they don't work for me.
Modern problems require modern solutions and all...
The most changed file is the one people are afraid of touching?
Generation is precompilation.
Entry point is interesting. Lots of scope to become a scary file.
I ran this on the repo I have open and after I filtered out the non code files it really can only tell me which features we worked on in the last year. It says more about how we decided to split up the features into increments than anything to do with bugs and “churn”.
Very different from just counting commits - https://vectree.io/c/delta-compression-heuristics-and-packfi...
It shows places that are problematic much better. High churn, low complexity: fine. Its recognized and optimizef that this is worked on a lot (e.g. some mapping file, a dsl, business rules etc). Low churn high complexity: fine too. Its a mess, but no-one has to be there. But both? Thats probably where most bugs originate, where PRs block, where test coverage is poor and where everyone knows time is needed to refactor.
In fact, quite often I found that a teams' call "to rewrite the app from scratch" was really about those few high-churn-high-complexity modules, files or classes.
Complexity is a deep topic, but even simple checks like how nested smt is, or how many statements can do.
otherwise you're right, it could be a long linear list of appends where people are happy to contribute.
Nobody is afraid of changing it.
Edit. https://github.com/mattrighetti/dotfiles/blob/master/.gitcon...
git log -i -E --grep="\b(fix|fixed|fixes|bug|broken)\b" --name-only --format='' | sort | uniq -c | sort -nr | head -20
I have a project with a large package named "debugger". The presence of "bug" within "debugger" causes the original command to go crazy.
git log -i -P --grep="\b(fix|fixed|fixes|bug|broken)\b" --name-only --format='' | sort | uniq -c | gsort -nr | head -20
LLM's make it even easier; "Commit all the outstanding code in as many commits as you can, as long as the tests pass after each one". (Sometimes that second clause is ommitted, too.)
In my experience, when the team doesn't squash, this will reflect the messiest members of the team.
The top committer on the repository I maintain has 8x more commits than the second one. They were fired before I joined and nobody even remembers what they did. Git itself says: not much, just changing the same few files over and over.
Of course if nobody is making a mess in their own commits, this is not an issue. But if they are, squash can be quite more truthful.
Another one I do, is:
This way I can see right away which branches are 'ahead' of the pack, what 'the pack' looks like, and what is up and coming for future reference ... in fact I use the 'gss' alias to find out whats going on, regularly, i.e. "git fetch --all && gss" - doing this regularly, and even historically logging it to a file on login, helps see activity in the repo without too much digging. I just watch the hashes.If the commit frequency goes down, does it really mean that the project is dying? Maybe it is just becoming stable?
git log --format='%ad' --date=format:'%Y-%m' | sort | uniq -c | awk '{printf $2" "; for (i=1;i<=$1;i++){printf "-";} print ""; }'
If you work with service oriented software, the projects that are "dying" may very well be the most successful if it's a key component. Even from a business perspective having to write less code can also be a sign of success.
I don't know why this was overlooked when the churn metric is right there.
Whenever we initiated a new (internal) SW project, it had to go through an audit. One of the items in the checklist for any dependency was "Must have releases in the last 2 years"
I think the rationale was the risk of security vulnerabilities not being addressed, but still ...
I (largely) wrote a corporate application 8 years ago, with 2 others. There was one change 2 years ago from another dev.
Lots of programs are functionally done in a relatively short amount of time.
"Accelerating or Dying" sounds like private equity's lazy way to describe opportunity, not as a metric to describe software.
I know people win the lottery every week, but I also believe that buying a lottery ticket is essentially the same as losing. It's the same principle.
Also, very tangentially, to the notion of the Developer's Legacy Index: https://www.javaadvent.com/2021/12/using-jgit-to-analyse-the...
Let's NOT jump to conclusions; it could mean many things. For example, a period with other priorities, different urgencies, other issues external to the project itself and beyond our control, vacations, illnesses, or anything else that could impact the commit history.
I think these considerations and the others expressed in this article can easily lead to hasty conclusions and erroneous deductions, too simplistic.
Coding flow, like business needs, cannot always be objectively and deterministically measured.
git clone --depth 1 --branch $SomeReleaseTag $SomeRepoURL
If you only want to build something, it only downloads what you need to build it. I've probably saved a few terabytes at this point!
[1]: https://github.com/marmelab/ArcheoloGit
Most good projects end up solving a problem permanently and if there is no salary to protect with bogus new features it is then to be considered final?
> How Often Is the Team Firefighting
> git log --oneline --since="1 year ago" | grep -iE 'revert|hotfix|emergency|rollback
> Crisis patterns are easy to read. Either they’re there or they’re not.
I disagree with the last two quoted sentences, and also, they sound like an LLM.
The grep for bugs is not particularly comprehensive: it will pick up some things that aren't bugs, and will miss a bunch of things too.
The "project accelerating or dying" seems odd to me. By definition, the bulk of commits/changes will be at the very beginning of history. And regardless, "stability" doesn't mean "dying".
This list is also one of many arguments for maintaining good Git discipline.
For the "bus factor", there's one guy and then there's me, but I stopped being a primary contributor to this project nearly two years ago, lol.
Well isn't it typical that the person who wrote is also the person that merged? I have never worked in a place where that is not the norm for application code.
Even if you are one of those insane teams that do not squash merge because keeping everyone's spelling fixes and "try CI again" commits is important for some reason, you will still not see who _wrote_ the code, you will only see who committed the code. And if the person that wrote the code is not also the person that merges the code, I see no reason to trust that the person making commits is also the person writing the code.
I've got my Emacs set up to display next to every file that is versioned the number of commits that file has been modified in (for the curious: using a modified all-the-icons-ivy-rich + custom elisp code + custom Bash scripts I wrote and it's trickier than it seems to do in a way that doesn't slows everything down). For example in the menu to open a file or open a recently visited file etc.: basically in every file list, in addition to its size, owner, permissions, etc. I also add the number of commits if it's a versioned file.
I like the fix/bug/broken search in TFA to see where the bugs gather.
The takeaway from my experiment is that you can really tell a lot by how / when / what people commit, but conclusions are very hard to generalize.
For example, I've also stumbled upon the "merge vs squash" issue, where squashes compress and mostly hide big chunks of history, so drawing conclusions from a squashed commit is basically just wild guessing.
(The author of course has also flagged this. But I just wanted to add my voice: yeah, careful to generalize.)
¹ Nothing is ever finished.
and it touches in detail what exactly commit standards should be, and even how to automate this on CI level.
And then I also have idea/vision how to connect commits to actual product/technical/infra specs, and how to make it all granular and maintainable, and also IDE support.
I would love to see any feedback on my efforts. If you decide to go through my entire 3 posts I wrote, thank you
Squash-merge workflows are stupid (you lose information without gaining anything in return as it was easily filterable at retrieval anyway) and only useful as a workaround for people not knowing how to use git, but git stores the author and committer names separately, so it doesn't matter who merged, but rather whether the squashed patchset consisted of commits with multiple authors (and even then you could store it with Co-authored-by trailers, but that's harder to use in such oneliners).
With a squash merge one PR is one commit, simple, clean and easy to roll back or cherry-pick to another branch.
1. Open PR page in whatever tool you're using
2. Read title + description to see what's up
3. Swap to diff and start reading through the changes
4. Comment and/or approve
I've never heard anyone bothering to read the previous commit messages for a second, why would they care?
For other cases, you lose the information about why things are this way. It's too verbose to //comment on every like with how it came to be this way but on (non-rare in total, but rare per line) occasion it's useful to see what the change was that made the line be like this, or even just who to potentially ask for help (when >1 person worked on a feature branch, which I'd say is common)
I use it like that too and yet the reviewers don't get to see these commits. Git has very powerful tools for manipulating the commit graph that many people just don't bother to learn. Imagine if I sent a patchset to the Linux Kernel Mailing List containing such "fix typo", "please work now", "wtf" patches - my shamelessness has its limits!
Can you explain to me what an avid squash-merger puts into the commit message of the squashed commit composed of commits "argh, let's see if this works", "crap, the CI is failing again, small fix to see if it works", and "pushing before leaving for vacation" ?
Usually pretty close to what the PR title + description are actually, just without the videos and screenshots.
Example:
feat(ui): Add support for tagging users
* Users can be tagged via the user page * User tags visible in search results (configurable)
etc..
I don't need to spend extra time cleaning up my git commits and force-pushing on the PR branch, losing context for code reviews etc. Nor does anyone have to see my shitty angry commits when I tried to figure out why Playwright tests ran on my machine and failed in the CI for 10 commits.
These are all bad commits IMHO. Aside from the CI one, I understand that message. I have commits like that on personal projects but for professional projects I'd be frustrated if people were committing messages like that.
Personally I'm a "one commit" type of guy, I don't like committing things in a broken state even on a side branch unless I have to (to share the code or test a CI). Occasionally I will make multiple commits at the very end to make review easier or once I have everything working but I want to try something different but I have a bunch of options of saving code that don't involving committing:
- Stash
- Shevle (IDEA)
- Backblaze
- Time Machine
- Local History (IDEA)
The idea of committing WIP before leaving for a vacation just feels so wrong to me.
I once worked for someone who wanted developers to commit code before the end of every day as a safety measure. His reasoning was in case the developer's computer died or similar. I found that silly at the time and still do now. That's what backups are for, I dislike when people use git as a backup like that in a professional setting.
Part of new feature you had working in an intermediate commit, but broke somewhere along the way and is not working in your last commit when you squashed.
If you catch it early enough, I suppose it's in your reflog, but otherwise you're screwed.
It sounds like a silly example, but I bet most developers have run into this at some point.
With mercurial/jujutsu, you get the best of both worlds: The "argh, let's see if this works" commits are what I call "microcommits", and the squashed versions are the real/public commits. With jujutsu, you get both. Your log shows only the "real" commits (equivalent of squashing all the commits between that and the prior "real" commit). But if you want to drill down into the microcommits, the information is always there.
Let's acknowledge the reality. Many people use git not just for version control, but for backup ("Let me commit this so I don't lose it"). Let's ensure the VC tool supports both and doesn't force you to pick one over the other.
If you update a PR with review feedback, you shouldn’t change existing commits because GitHub’s tools for showing you what has changed since your last review assume you are pushing new commits.
But then you don’t want those multiple commits addressing PR feedback to merge as they’re noise.
So sure, there’s workflows with Git that doesn’t need squashing. But they’re incompatible with GitHub, which is at least where I keep my code today.
Is it perfect? No. But neither is git, and I live in the world I am given.
And to counter some specific points:
* In a github PR, you write the main commit msg and description once per PR, then you tack on as many commits as you want, and everyone knows they're all just pieces of work towards the main goal of the eventually squashed commit
* Forcing a clean up every time you make a new commit is not only annoying extra work, but it also overwrites history that might be important for the review of that PR (but not important for what ends up in main branch).
* When follow up is requested, you can just tack on new commits, and reviewers can easily see what new code was added since their last review. If you had to force overwrite your whole commit chain for the PR, this becomes very annoying and not useful to reviewers.
* In the end, squash merge means you clean up things once, instead of potentially many times
I'd say that optimizing for reviewers who don't know "git range-diff" exists has even less merit that optimizing for contributors who don't know how to edit their commit graph.
I'm regularly using GitHub, GitLab, Gerrit, Phabricator and mailing lists and while some make it more pleasant than others, none of them really put any actual obstacles on your path when manipulating the commit graph mid-review.
If you want to dig down into the subcommits from a merge, then you still can. This is useful if you are going back and bisecting to find a bug, as those individual commits may hold value.
You can also cherry pick or rollback the single merge commit, as it holds everything under it as a single unit.
This avoids changing history, and importantly, allows stacked PRs to exist cleanly.
I think this is all Github's fault, in the end, but I think we need to get Github to change and until then will keep using squash-merges.
They're not noise, they tell future maintainers why something is the way it is. If it's done in an unusual way they can see in the blame that a couple of lines were changed separately from the rest and immediately get an explanation by checking that commit.
Yeah, I can imagine it being annoying that sqashing in that case wipes the author attribution, when not everybody is doing PRs against the main branch.
However, calling all squash-merge workflows "stupid" without any nuance.. well that's "stupid" :)
Also I can understand not squashing if the contribution comes from outside the organization. But in that case, I would expect a cleaned up history. But if every contribution is from members of the team, who can merge their own PR, squash merge is an easy way to get a clean history. Especially when most PR should be a single commit.
For really large PRs, I’m more inclined to agree with you, but those should probably have their own small-PR-and-squash-merge flow that naturally cleans up their git history, anyway.
I categorically disagree that squash-merge is “stupid” but agree there are many ways to skin this cat.
(and if you hadn't squashed on merge, short-lived branches wouldn't even "disappear" in the first place as you would still see them decorated in "git log" on commits that were merged in)
If you've worked on a large team without squashing and without increasing frustration I'd be greatly interested to hear about it.
That probably isn’t a good sign
Plus, adding an extra point: When you run git log --oneline --graph and the pattern on the left is more complex than the Persian carpet patterns or Ancient Egyptian writings in the Great Pyramid of Giza, you know it's engineering & process quality issue than the code itself...
What a weird check and assumption.
I mean, surely most of the "20 most-changed files" will be README and docs, plus language-specific lock-files etc. ?
So if you're not accounting for those in your git/jj syntax you're going to end up with an awful lot of false-positive noise.
You're right about package.json, pnpm-lock etc though, but those are easy to filter out if the project in question uses them.
You're right, perhaps I should have said CHANGELOG etc.
Although some projects e.g. bump version numbers in README or add extra one-liner examples ....
This post is about exploring code, not documentation. Nobody is going to warn you about the README unless it is super outdated.
Of course the two most useful ones would never be useful in the code base I am currently working on.
"fix" might be the single most common commit message, and after that comes "."
Trying for two years, to get people to include at least some information in their commit messages, has exhausted me.
A truth no one wants to hear. Everyone in denial.
Anyway, I can glean a lot of this information in a few minutes scrolling through and filtering the log in magit, and it doesn't require memorizing a bunch of command line arguments.
For most, I added some filters and slightly changed the regex, and it showed the reality of the codebase (I already knew the reality, I just wanted to see if it matched, and it did).
On main:
2020-01-01: "Changes"
2020-01-05: "Changes"
2020-01-06: "merge <ref to jira/gh issue>"
2020-01-07: "revert <ref to unrelated jira/gh issue from 2 yrs ago>"
Then there’s the people that include merge commits despite agreeing on rebasing.
Occasionally see sprinkles of decent, consistently formatted commit messages.
I think this is only useful on medium to large _open source_ projects. Clearly established CONTRIBUTING.md/README.md and commit formatting/merging guide.
https://gist.github.com/aeimer/8edc0b25f3197c0986d3f2618f036...
I abhor squash merging for this and a few other reasons. I literally have to go out of my way to re-check out a branch. Someone who wants to use my current branch cannot do so if I merge my changes a month later, because the squash rewrites history, and now git is very confused. I don't get the obsession with "cleaning up the history" as if we're all always constantly running out of storage over 2 more commits.
And GitHub at least sets the author of the squashed commit as the one who opened the PR, not the one who merged it.
I can definitely see where it wouldn't work well for other workflows but I've had it work well on several teams and it seems easier than trying to get everyone to clean up their commits into nice, clean, well-titled histories before putting up a PR.
I wouldn't describe it as "cleaning up the history". And the goal isn't to save space, it's to keep a linear history where things ought to be working at each commit (to enable tools like git bisect and similar).
I personally don't ensure everything is working every time I commit - That's what CI is for. The exact process I work through while writing a PR shouldn't impact other people's workflows, so when I merge back into a central branch it should really only reveal the granularity at which I assert 'this code is working and good', which is NOT every intermediate commit I make. Squash merge is a way to do that that fits nicely with existing engineering workflows, like code review.
The git blame tip is underrated. People treat it like a gotcha tool but its maybe the fastest way to find the PR/ticket that explains a weird decision.
Thank you.
Diagnostics function, colorized (I tried to add guards so it is portable with terminals that do not support color):
Uncolorized diagnostics function (same, but without the colors):*Mainline linux*
Most changed files: pretty much what I expected for 1 and 2... the "cutting edge" of Linux development over other OSes -- bpf and containers. The bpf verifier and AMD GPU driver might get a boost in this list due to sheer LoCs in those files (26K and 14K respectively). An intel equivalent of amdgpu_dm is #21 in the list (drivers/gpu/drm/i915/display/intel_display.c) and nvidia is nowhere to be seen (presumably due to out-of-tree modules/blobs?).
Bus factor: obviously none. The top 4 Buggy files: Intel comes out on top of GPU drivers this time (twice). Along with KVM for x86(64), the main allocator, and BTRFS. *GCC*Most changed files: IR autovectorization code, riscv heuristics tables, and C++ template handling (pt.c is "paramaterized types").
Buggy files: DWARF debuginfo generation, x86 heuristics tables, RS6000(?!) heuristic tables. I had to look up RS6000, it's an IBM instruction set from the 90s lol. cp-tree.h is an interesting file, it seems be the main C(++) AST datastructures. *xfwm4* Most changed files: the list is dominated by *.po localizations. I filtered these out. Even after this, I discovered there is very little active development in the last few years. If I extend to 4 years ago, I get: 1. src/client.c - Realizing this project is too "small" to glean much from this. client.c is just the core X client management code. Makes sense. 2. src/placement.c - Other core window management code.This has not told me much other than where most of the functionality of this project lies.
Bus factor: Pretty huge. Not really an issue in this case due to lack of development I guess.
Files with bug commits: Very similar distribution to most changed files. Not enough datapoints in this one to draw any big conclusions.I think these massive open projects (excl xfwm) are generally pretty consistent code quality across the heavily trodden areas because of the amount of manpower available to refactor the pain points. I've yet to see an example of "god help you if you have to change that file" in e.g. linux, but I have of course seen that situation many times in large proprietary codebases.
Wtf is happening to this website
Look at the rest of this blog:
https://piechowski.io/post/
Almost no posts since 2020, then a swarm of LLM-style clickbait titles…
Oh wait, the 2020 one also has a clickbait title… and it was substantially rewritten in 2026!!
https://alexhans.github.io/
No need to read them, just a vibe check would be insightful. It's weird how branding, even before AI had a lot of the same catchy patterns, and now it's hard to define what is the right prose (engineers might want one thing and other role families others) and sometimes you're trying to write almost with the "everyone else bag in mind" because that's the personal connections you link your "here's what I often repeat in written form".
git rm -rf .