RU version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
62% Positive
Analyzed from 3424 words in the discussion.
Trending Topics
#array#pointer#arrays#memory#char#type#size#int#pointers#language

Discussion (131 Comments)Read Original on HackerNews
I had a go at retrofitting C with slices over a decade ago.[1] Too much political hassle.
[1] https://www.animats.com/papers/languages/safearraysforc43.pd...
[1] https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_lo...
If you see a[i][j] it could mean two completely different things:
1) "a" is a continuous chunk of memory of N*M bytes, so it behaves as char*; a[i][j] == *(a + i*M + j)
2) "a" is an array of char* pointers that point to N completely distinct memory chunks of size M, so it behaves as char**; a[i][j] == *(*(a + i) + j)
With flat arrays the difference between an array as a variable and a pointer to the first element is literally negligible because you won't even see the difference in the assembly. This is why the automatic decay-to-pointer makes a lot of sense.
But that breaks completely with multiple dimensions. You definitely see the difference in the assembly because the memory layout is so different.
> 1) ... a[i][j] == *((char*)a + i*M + j) // I added the char* cast to make it correct
> 2) ... a[i][j] == *(*(a + i) + j)
You may already understand this but: even in case (1), you still have
(It has to - that's what operator[] means in C.)It's just that, in this case, `a + i` is applying pointer arithmetic to char[M]* so it adds M * i bytes to a's address.
This is similar to how `a + i`, if a is int32_t*, will give you an address 4 * i bytes bigger than a.
Really the confusing part of this is that *(a + i), which is an array value i.e. has type char[M], decays to char* when you add an integer to it (or dereference it). This is a pretty crazy hack really. Imagine if, in C++, you could do this
Yuck.I rather would say it works nicely in auto-generating the complex indexing operation for n-dimensional arrays which makes it a lot more convenient and less error-prone to write such code. The compiler may also flatten a loop.
The array of pointer hack used previously to similate 2d arrays using an array to pointers to arrays should not be used outside of special algorithms, as it is error prone and slow.
http://c2.com/cgi/wiki?SufficientlySmartCompiler
In practice, C compilers are still notoriously bad at loop optimizations.
Polyhedral optimizations provided some hope, but no compiler managed to adopt it in production.
(a) If the type of a is “array of length N of pointer to (say) char” (declaration: char *a[N]), then a[i][j] means the jth char in the contiguous block pointed to by the ith pointer. In C#, this is what you get with an array of arrays.
(b) If the type of a is “array of length N of array of length M of char” (declaration: char a[N][M] — sic!), then a[i][j] means the jth element of the ith element, aka the (i*M+j)th char in the single contiguous memory block. In C#, this is what you get with a two-dimensional array.
The way this happens is a bit subtle:
(a) The value a, of type “array of size N of pointer to char”, first decays into “pointer to pointer to char”, then a[i] retrieves the ith “pointer to char” starting from it as a base, then in turn a[i][j] retrieves the jth “char” starting from that as a base.
(b) The value a, of type “array of length N of array of length M of char”, first decays into “pointer to array of length M of char” (sic!), then a[i] retrieves the ith “array of length M of char” starting from it as a base, which then decays into “pointer to char”, then a[i][j] retrieves the jth “char” starting from that as a base.
NB: There are no implicit references here, unlike in C#; in part (b), a is an N*M-byte chunk of memory and a[i] is an M-byte piece of it.
So, you really can't tell what's going on behind the scenes.
I wanted to pull my hair out seeing some 'enterprise' code use
for some kind of logging where i was the severity level. There were even instances of state[i++], where the severity was incremental. I hope someone has rewritten that codebase with AI by now.Sorry, hard for me to relate, as I've overloaded [] (in, say, Python) to make life easy on everyone. People loved it.
I hope you're aware that there is a long standing debate on whether overloading operators is good/bad, and it comes down to personal preference?
No, I'm not sure how you got that impression. Overloading is great.
It's also confusing when it does something completely different from what you intuitively expect.
Array memory is on the stack. The size of that array is actually not known at run time, its only known at compile time, where any reference to that length gets resolved by the compiled.
If your 2d array sits on the stack, then inferring memory layout is pretty easy. If you are dealing with pointer that was passed to a function, then you can't assume anything about data size or limits, which is why many functions that take pointers take a size parameter as well.
Right, but 2d arrays come into this picture with their own quirks again. You're not just passing the size as the parameter, you can pass it as a "special" parameter that influences how the compiler will interpret other parameters. E.g. in C99 you can do this:
Here "y" plays the critical role because it will be used to compute offsets in the a[i][j] expression. For 1d arrays this doesn't happen.Of course it's still generalizable as "all but the outermost dimensions should be known" and for 1d array the outermost dimension is the only dimension. Still, this whole thing always felt a bit odd to me.
But my recommendation is to always give the size and then everything is regular and the compiler can use the information for warnings.
Array memory can sit on either the stack or the heap.
> The size of that array is actually not known at run time, its only known at compile time, where any reference to that length gets resolved by the compiled.
This is also a bit misleading, in two ways. First, it's not clear what you mean by "size" here - the size of the memory block(s), or the shape of the array?
Second, many people think that the C runtime doesn't know the amount of memory allocated to an array, but this is actually false. It's just the C abstract model that for some reason chose to not expose this information - but the size is actually always stored and accessible, and this is virtually mandated by the standard: otherwise, `free(arr)` couldn't realistically work, it would have to be `free(arr, size)`. This is one of the weirdest inefficiencies of C, in fact - it requires you to store the size of arrays twice - once in user code, and another time in the internal logic of the allocator.
Edit: and as a fun extra, C++ not only inherited this mistake from C, but reproduced it again, meaning that a C++ array allocated with new[] actually stores the size twice, at least with typical implementations - once in the C++ runtime and again in the allocator - and still requires the user-space code to store it a third time. This is because `delete[]` needs to call the destructors of all of the elements of the array, regardless of where and how the array was allocated, so the number of array elements needs to be stored alongside the object itself.
There are some counterpoints:
1) Conceptually, allocated memory block and data structure / array in it are not related. You can allocate memory block and then subdivide it to multiple different structures / arrays. You can implement sub-allocators.
2) Heap allocator does not need to store exact length of allocated object. For example, it could have several fixed-length slab allocators for smaller objects, select matching one during malloc() and use address range to find slab during free().
3) Array can be also on the stack (VLA or alloca()).
4) Arrays can be also on memory allocated outside of C library allocator (e.g. mmap()).
No, if we are using the definition of an array that is like int c[] = ..., that is always going to be on the stack. Heap continuous memory =/= array. You can use the [] operator to access it like an array, but fundamentally, as far as structures in C language are concerned, those 2 are different, because they get treated by compiler differently.
>but the size is actually always stored and accessible, and this is virtually mandated by the standard: otherwise, `free(arr)` couldn't realistically work,
That would only be true if each element in the array was a char.
The dynamic data structure stores total amount of memory allocated by address, it has no info about the size of the element, so it can't infer the actual number of items at runtime. You could write your own malloc that does this, but generally, that is left to the user for flexibility. For example, a really good practice in C coding that basically solves any double free is a mempool that allocates all the memory up front. That way, you never really even have to call free, and the memory you allocate can be partitioned any way you chose dynamically.
https://godbolt.org/z/PzcjW4zKK
And while the (*array_ptr)[3] notation take a moment to get used to, it is very logical. If you have a pointer to an array, you dereference it first and then indx into it. Again, useful for bounds checking: https://godbolt.org/z/ao1so9KP7
Not sure why, maybe it doesn't feel like C anymore, maybe it feels hacky?
typically if you're passed an array you'd want to get more anyway, so you'd get passed a struct. Not sure.
Some languages have extended the C declaration syntax such that the type derivators can be moved from the declarator part to the "stem". For instance, as an alternative to:
you can write This is how we could get as a declarator stem indicating an array of 3 pointers to pointers to int. But it's not in C.So in short, the bad design (array values produce pointers) was informed by conceptual compability with an earlier design in which that was literally happening.
The language B was evolved in-place by adding new features, then editing the compiler source to make use of those new features, then repeating. They simply started calling it "New B". At some point the language had evolved sufficiently that they decided to call it C.
The semantics of arrays were inherited from B and simply never changed. Part of me suspects this was also because it was seen as "clever" at the time. Look ma, we let arrays turn into pointers! Isn't that clever?
When you look at pre-ANSI C function prototypes you wonder "where are the parameter types?" because there are none. The compiler didn't bother to check. Part of that was perhaps for implementation reasons but a big part of that was the feeling or culture inherited from B: in that language you just had words of memory. You were free to interpret any word of memory as any data type you liked. So duh of course it is up to you to decide how many parameters your function received and of what type. If the caller supplied a different number or different types? Don't do that.
If you are coming from that sort of world clever tricks like arrays decaying to pointers or automatically converting between data types and sizes seems perfectly natural. Anything C offers above and beyond that is an improvement from B after all.
It was intentional and functional. The idea was basically a primitive kind of polymorphism, which allowed for functions intended to act on arrays to accept any size of an array to be passed in. It was redundant with pointer arithmetic, but allowed for communication of intent without accidentally incurring a semantic unit of meaning. There's an interview where Ritchie talked about this.
Pascal's biggest misgiving was that it went the complete opposite route, where pointer arithmetic was disallowed and arrays did not decay. It also lacked any kind of polymorphism, and one of the biggest ergonomic painpoints ends up being that if your problem domain has non-uniform array sizes, you're in for a lot of annoying re-writing.
> When you look at pre-ANSI C function prototypes you wonder "where are the parameter types?" because there are none.
Actually pre-ANSI C technically didn't have function prototypes, ANSI C introduced them and it got them from C-with-classes. It did have function declarations though (which aren't the same thing)
Pedantics aside,
This is fully typed, the parameters and return type default to int.Fun fact:
Does not declare a function with no parameters, but it does declare a function with an unknown number of parameters of unknown types. An empty parameter list in C is:Thanks, I completely forgot they weren't called prototypes originally.
But if you designed a language in the era where Fortran, THE array language, reigned supreme, nobody would use your language. The mindshare Fortran had is difficult to convey now, half a century later.
Think of it like making a chatbot today and not mentioning AI or LLMs, that's what making a language without arrays would have felt like in 1970.
The "restrict" keyword was invented to solve this but it still has weaker semantics than original Fortran arrays. It can still solve a big share of problems, but it never got proper adoption and never even made it into C++.
(It's not as portable as C though, and the compilers have more bugs.)
i used fortran recently to see how "slow" python is, i did matrix multiplies by hand in .c, and .py. Now i didn't write the fortran, the AI did, but i remember enough that i verified what it did was sane, also the other two i wrote did agree with results.
for the same matmuls.anyhow, 1996-ish. crazy.
C compilers don't really do this.
I would phrase that differently: "The main feature of arrays (performing the `base + index * size` address computation) is already provided by the C pointer type via the `ptr[N]` syntax sugar, so having a separate array type might have felt redundant at the time".
I think having "proper" array types in a language (where the type carries both the array item type and the comptime length) only really makes sense when there's also a slice type (e.g. a runtime ptr/length pair). And I guess at any point during C's development this was a too big language change for the committee to swallow.
they should pay programmers less. get rid of all these moneygrabs
I love C more than I should.
Tangent: I have a pet theory that part of Zig's raison d'etre is to fix some of the problems with C, while accommodating its pointer-based data structures, and the resulting patterns.
https://www.hytradboi.com/2025/05c72e39-c07e-41bc-ac40-85e83...
Then it (understandably) becomes UB to attempt to get the pointer.
(It also probably isn't stored in a register, since the keyword is just asking the compiler nicely.)
https://www.godbolt.org/z/TKq9rWzP1
Don't know what's the idea behind not allowing to take the address of a value though.
Thinking about it, storing arrays in registers would possibly make sense on systems like the 8051 where you actually have a bunch of general purpose register banks, but those don't exist in x86.
Compilers got so good at optimization that there is little point using it.
If a variable is held in a register you can't access it with a pointer. So if your intention is it should be in a register you can't take the address.
https://stackoverflow.com/questions/79897621
struct A { int size; char data[]; }
struct B { int size; char *data; }
https://www.digitalmars.com/articles/C-biggest-mistake.html
But in other news most don't know that a[3] == 3[a]
https://stackoverflow.com/a/16163840
In C a[i] is converted to *(a+i) internally. i[a] is converted to *(i+a). Array names also act as pointers in c. so (a+i) or (i+a) give an address (using pointer arithmetic) that is dereferenced using