ZH version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
15% Positive
Analyzed from 1136 words in the discussion.
Trending Topics
#string#null#strings#why#pointers#data#array#size#length#end

Discussion (22 Comments)Read Original on HackerNews
C pretends types exist with you, but once bytes hit the road, it's all real-life and segmentation faults.
I wouldn't call "cause of bugs and security issues" "no cost".
> it's quite rare for a pointer in C to have the ability to be NULL
As a C programmer for more than 25 years, that is the exact opposite of my experience.
But at the same time, I think blaming the software was kind of a cop out. Devs were in a hurry and simply didnt respect the rules. Given todays software engineer at large. Nerfing programming languages so they cant destroy things might not be a bad idea. But AI will nerf everything.
Why do you assume that AI is gonna nerf everything?
The limitations were brutal. Initially you could only have 255 bytes in a string. The length of a string and the size of the allocation are now separate and you may need to think about that unused memory in your design. The problem now doubles with the introduction of UTF-8. Your string size is in bytes and you need to track characters separately.
If you want to create an array of strings you either need to specify the length of all strings and accept the memory overhead or have an array of pointers to strings. If you use an array of pointers you may end up choosing to use the 'nil' value as a sentinel that means "end of list." So we're right back where we started.
--
Because someone decided to downvote this HN has limited the speed at which I can reply. This site is tragic and I'm fully done with it now. You can spread propaganda and poorly sourced zeitgeist and be among friends but if you try to have a genuine conversation about programming languages you are made to be unwelcome immediately. Screw this.
--
> No other data structure works like this.
The linked list.
> You can't mess this up in an array
C happily decomposes arrays into pointers. You can erase your length information from the type. This was an intentional decision.
> Strings are the only data structure that assume there will be a NULL at end.
Which is why almost every string API has a version that allows you to specify the maximum length. The fact that you can use a NUL doesn't mean you have to. Which is why the concept of "sentinel values" is broadly used in many types of applications you haven't considered here.
That isn't really a problem.
The problem with null-terminated strings is specifically what happens when you reach the end of the allocated array and there ISN'T a NULL character.
Every string function is designed to keep going until it finds the NULL character, so if a hacker gets rid of the NULL character, he can exploit pretty much any standard string manipulation function being used elsewhere in the program to manipulate whatever memory comes AFTER the string data structure.
No other data structure works like this. You can't mess this up in an array, because no function that manipulates arrays is just going to keep going until there is a null. That would be stupid because it would require users of the function to add a NULL to the end of their arrays before passing it to the function, so instead we just pass the size of the array to everything. Strings are the only data structure that assume there will be a NULL at end.
By the way, I read once that if you use UTF-32 every code point will be 4 bytes, constantly, but even then a single code point isn't necessarily a single character. Text is just complicated.
In C most data structures work like this, you keep going until you find NUL (character) or NULL (pointer). E.g. Strings, array of pointers, linked lists, etc. Of course you can add length to most of those, but it isn't the canonical/traditional way of doing things.
[1] https://man.openbsd.org/strlcpy
I’m not sure why this was - the source base was so old it might have had its origins in Pascal struct behaviour.
unfortunately as time goes by, the linux api surface gets larger and more convoluted. so there's going to be some coverage you're just never going to get.
but in the abstract, definitely. linux is so bloated at this point that its not clear that it can ever be 'made safe'.