Ask HN: How do you separate intentional test boilerplate from real duplication?
HI version is available. Content is displayed in original English for accuracy.
This seems a very hard problem to solve:
-Tests repeat the same scenario. For a structural detector, this flags as repetition (duplication). However, tests are not something people want to delete from the codebases.
-The repetitions from tests (on purpose) end up looking like undesired code duplication and the tools canno tell which is which.
-One way to solve this would be something like a human in the loop (kind of how linters allow user to accept something once, while keeping the default first run zero-config).
Wonder how you have seen this handle and if anyone have any ideas.
Here is the the repo: https://github.com/Rafaelpta/dupehound
And here is the issue with more detail: https://github.com/Rafaelpta/dupehound/issues/23

Discussion (7 Comments)Read Original on HackerNews
In Rust or Go there’s super clear test markers or filenames.
In Javascript it would have to detect the framework in use then detect test files and tests embedded in program files.
And so forth.
Are you doing any call sequencing heuristics? Like if the same 5 calls (with different args) appear in the same order in multiple places (even in test files) that might be a strong signal for deduplication. Or even if the same 5 calls are in the same order with a couple different interleaved calls - the fuzziness of the heuristic might be something tunable to a language, or particular codebase, or framework, etc.
Sonarqube or CodeQL reports might tell me what percentage of a repo is duplicated code, and a large percentage of that is in src/test/java
I find that a lot of the time this is not just some flippant observation but a clue that I should be using a mechanism like @ParameterizedTest instead of @Test, so I rewrite those tests in a way that makes them easier to set-up, define parameters/constraints, inputs, and outputs. Sometimes it does get a little convoluted as you either use a lot of naked Arguments.of() or define test-class-scoped nested records to encapsulate test parameters, inputs, expected outputs, etc.
Some languages like RUst you mentioned, have a clear tag that says "this is a test," but others do not, so the tool has to guess from file names and ends up missing some and skipping too much.
Also as I mentioned on the answer below, sometimes you actually do want to see the repeats inside tests, or normal code repeats on purpose too. So I am leaning toward letting users wave off one specific case by hand instead of skipping everything blindly.