FR version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
27% Positive
Analyzed from 1900 words in the discussion.
Trending Topics
#https#com#url#example#foo#rfc#server#nginx#path#paths

Discussion (45 Comments)Read Original on HackerNews
To generalize by saying "incorrect" is incorrect. The correct answer is that it depends on the requirements in the given implementation. Making such generalizations will just lead to endless arguing. If there is still any debate then a group must vote to deprecate and replace the existing RFC with a new RFC that requires that merging slashes MUST be either be always enabled or always disabled using verbiage per RFC 2119 [2] and optionally RFC 6919 [3]. Even then one may violate an RFC is there is a need to do so and everyone has verified, documented and signed off that doing so has not introduced any security or other risks in the given implementation and if such a risk is identified that it will be remediated or mitigated in a timely manor.
[Edit] For clarification the reason I am linking to RFC 3986 is that it only defines path characteristics and does not explicitly say what to do or not to do. Arguments will persist until a new RFC is created rather than blog and stack overflow posts. Even then people may violate the RFC if they feel it is safe to do so. I do not know how to reword this to make it less confusing.
[1] - https://datatracker.ietf.org/doc/html/rfc3986
[2] - https://datatracker.ietf.org/doc/html/rfc2119
[3] - https://datatracker.ietf.org/doc/html/rfc6919
The difference between the support for your argument and theirs is that they call out the specific sections in the RFC that they claim are relevant to the issue at hand and your comment only broadly references the RFC by name. In any case, even if they, too, merely gestured to its existence, claiming that it supports their position, then appearing here with a bare claim that RFC 3986 supports the opposing side without further elaboration is not exactly strong candidate for a path to a fruitful resolution.
And looking around I found this SO answer noting nothing in the RFC:
https://stackoverflow.com/a/24661288
That is entirely my point. If the author wants to disable merge slashes then they need to replace the RFC I linked to with one that explicitly says what to do or not do using strong verbiage that is explicit as I explained. Blog articles and Stack Overflow threads will not set a standard.
But 80% of all programming blog posts on the internet rely on being able to make sweeping generalizations across the ecosystem! Without this, we basically have nothing left to argue about.
Caring about tradeoffs, contexts, nuance and not just cargoculting our way into a distributed architecture for a app with 10 users just sounds so 90s and early 00s. We're now in the future and we're all outputting the same ̶t̶o̶k̶e̶n̶s̶ code, so obviously what is the solution in my case, surely must be the solution in your case too.
My theory is that the codex [1] was created not to stop arguments but rather to shorten them so that we can find a path forward, get back to work and accomplish some mission.
[1] - https://www.youtube.com/watch?v=nfKFHTaGzuU
> nginx with merge_slashes
How can it be wrong if it is server-side? If the server wants to treat those paths equally, it can if it wants to.
It would only be wrong if a client does it and requests a different URL than the user entered, right?
It matters where the normalization happens, and server-side behavior is out-of-scope of these identifier RFCs.
> Therefore, collapsing // to / in HTTP URL path segments is not correct normalization. It produces a different, non-equivalent identifier unless the origin explicitly defines those two paths as equivalent.
And at least according to this, the default setting is off so nginx actually is compliant unless you manually make it not be:
https://www.oreilly.com/library/view/nginx-http-server/97817...
EDIT: Actually it seems to be on by default:
https://nginx.org/en/docs/http/ngx_http_core_module.html#mer...
It appears to not default to off on my install (AlmaLinux 10).
I just tested now. Cloudflare normalises ../ and ./ paths and then the nginx proxy appears to normalise // to /:
nginx log:
lighttpd log:When it's the default, it's not a case of someone having configured nginx to do the thing described, as is their prerogative. It's nginx's defaulting to doing the wrong thing and requiring specific configuration to do the right thing. The author's position is that this violates the RFCs.
> and has nothing to do with web standards though
Yes it does. Prescriptions for how intermediate servers are or are not to munge data before passing it to the origin server are written directly into the HTTP RFCs. They're filled with references to this.
> There’s nothing that says that the internal communication on the server needs to follow the standards for user agents.
And is there anyone arguing that that's the case here?
It gets worse if you are mapping URLs to a filesystem (e.g. for serving files). Even though they look similar, URL paths have different capabilities and rules than filesystems, and different filesystems also vary. This is also an example of that (I don't think most filesystems support empty directory names).
Nothing on web is "correct", deal with it
If you're proxying to another server that just assumes relative paths and doesn't do any kind of validation, I guess an extra / might cause reading files outside of the expected area? That'd be an extremely weird and awful setup that I don't think makes any sense in the context of Spring Boot.
Because maybe you use S3, which treats `foo/bar.txt` and `foo//bar.txt` as entirely separate things. Because to S3, directories don't exist and those are literally the exact names of the keys under which data is stored.
So you have script A concatenate "foo" + "/bar" and script B concatenate "foo/" + "/bar", and suddenly you have a weird problem.
I can't imagine a real use case where you'd think this is desirable.
Not S3, but here's a literal real use case: the entry for the Iraqw word /ameeni (woman) in Wiktionary.
https://en.wiktionary.org/wiki//ameeni
If for whatever reason your S3 keys contained English words and their translations separated by a slash, you would have a real problem if one of your scripts were to concatenate woman, / and /ameeni as woman/ameeni instead of woman//ameeni in the English/Iraqw case.
W3C says:
> The slash ("/", ASCII 2F hex) character is reserved for the delimiting of substrings whose relationship is hierarchical.
Can they not just use a 3 like in Arabic?
Of course you shouldn't assume that in a client. If you are implementing against an API don't deviate regarding // and trailing / from the API documentation.
- URL parsing/normalization; and
- Mapping URLs to resources (e.g. file paths or database entries) to be served from the server, and whether you ever map two distinct URLs to the same resource (either via redirects or just serving the same content).
The former has a good spec these days: https://url.spec.whatwg.org/ tells you precisely how to turn a string (e.g., sent over the network via HTTP requests) into a normalized data structure [1] of (scheme, username, password, host, port, path, query, fragment). The article is correct insofar that the spec's path (which is a list of strings, for HTTP URLs) can contain empty string segments.
But the latter is much more wild-west, and I don't know of any attempt being made to standardize it. There are tons of possible choices you can make here:
- Should `https://example.com/foo//bar` serve the same resource as `https://example.com/foo/bar`? (What the article focuses on.)
- `https://example.com/foo/` vs. `https://example.com/foo`
- `https://example.com/foo/` vs. `https://example.com/FOO`
- `https://example.com/foo` vs. `https://example.com/fo%6f%` vs. `https://example.com/fo%6F%`
- `https://example.com/foo%2Fbar` vs. `https://example.com/foo/bar`
- `https://example.com/foo/` vs. `https://example.com/foo.html`
Note that some things are normalized during parsing, e.g. `/foo\bar` -> `/foo/bar`, and `/foo/baz/../bar` -> `/foo/bar`. But for paths, very few.
Relatedly:
- For hosts, many more things are normalized during parsing. (This makes some sense, for security reasons.)
- For query, very little is normalized during parsing. But unlike for pathname, there is a standardized format and parser, application/x-www-form-urlencoded [2], that can be used to go further and canonicalize from the raw query string into a list of (name, value) string pairs.
Some discussions on the topic of path normalization, especially in terms of mapping the filesystem, in the URL Standard repo:
- https://github.com/whatwg/url/issues/552
- https://github.com/whatwg/url/issues/606
- https://github.com/whatwg/url/issues/565
- https://github.com/whatwg/url/issues/729
-----
[1]: https://url.spec.whatwg.org/#url-representation [2]: https://url.spec.whatwg.org/#application/x-www-form-urlencod...
Neither has much to do with / normalization, which applies to the path part of a valid uri.
Not doing it is like punishing people for not using Oxford commas, or entering an hour long debate each time someone writes “would of” instead of “would have”. It grinds my gears too, but I have different hills to die on.