Getting silly with C, part (void*)2

(lcamtuf.substack.com)

175 points | by justmarc 439 days ago

18 comments

sylware 439 days ago
C syntax is already way too rich and complex.
We need a C- ore µC:
No implicit cast except for literals and void* (explicit compile time/runtime casts), one loop statement (loop{}), no switch/enum/generic/_thread/typeof/etc, no integer promotion, only sized primitive types (u64 s32 f32 etc...), no anonymous code block, real compiler hard/compile time constant declaration, many operators have to go (--,++, a?b:c, etc)... and everything I am forgetting right now (the dangerous struct pack attribute...). But we need inline keywords for memory barriers, atomics for modern hardware architecture programming.
[-]
- wongarsu 439 days ago
  There is C0, a stripped-down version of C popular in academia [1]. Great for teaching because it's conceptually simple and easy to write a compiler for. But with a couple of additions (like sized primitive types) it might match what you are imagining
  1: https://c0.cs.cmu.edu/docs/c0-reference.pdf
- glouwbug 439 days ago
  C really just needs if / else / while / and void functions. Function inputs should be in/out (const type* or type*).
  [-]
  - bregma 439 days ago
    So, FORTRAN IV except for the else.
- butterisgood 439 days ago
  Pre-scheme?
- accelbred 439 days ago
  Does Zig fit your bill?
  [-]
  - sylware 439 days ago
    Dunno, I should have a look though. But I have recollection of some garbage collector, wrong/right ?
    [-]
    - nick__m 438 days ago
      I doubt that, zig is allocators land. Even stdlib datastructures required an allocators to be instanciated. Have a look at the selection of allocators: https://zig.guide/standard-library/allocators .
      [-]
      - sylware 438 days ago
        I had a look, zig seems to require a runtime, even small, for basic syntax support. So it seems it is not suitable.
        You should be able to generate machine code without the need of any runtime, like you can with C.
        [-]
        mlugg 438 days ago
        I'm unsure what you're referring to here -- Zig doesn't have any runtime, it doesn't even depend on libc.
        The only thing I can think of that you might be referring to is compiler-rt: if so, this is a thing in C too! It's just a small collection of implementations for operations the code generator wants to call into (e.g. memset, arithmetic for integers larger than CPU word size). Clang uses compiler-rt when compiling C code, and GCC's equivalent is libgcc. Nonetheless, Zig lets you disable it using `-fno-compiler-rt`, in which case you'll need to provide the relevant symbols yourself somehow.
        [-]
        sylware 438 days ago
        This is not what I understood, there is some kind of flags required for pointers or something, which requires a data section for basic primitive types.
        [-]
        mlugg 438 days ago
        I'm afraid I'm not sure what you're referring to. For instance, I can build a simple Hello World in Zig using `zig build-exe`, and get a static executable, on which I can use `nm` to confirm that there aren't symbols from any kind of runtime. I can even trivially build the actual Zig compiler to a static binary.
        (For context, by the way, I'm on the Zig "core team"; I'm a notable contributor to the project.)
        [-]
        sylware 438 days ago
        mmmmh... basically, generates machine code which only requires a stack (which could be used by code paths not written in zig), contained in memory pages with execute and read permission only. Ofc this machine code would interact with the other machine code (written in other languages) via the architecture calling convention (the ABI/C one).
        [-]
        accelbred 435 days ago
        Thats what Zig does. It compiles down to the same stuff as C. It does not have a runtime.
        [-]
        butterisgood 435 days ago
        It also has been very much trying to get rid of things like "undefined behavior".
- short_sells_poo 439 days ago
  Sooo, assembly :)
  [-]
  - poincaredisk 439 days ago
    No. Assembly is not portable, not typed and not structured (as in structured programming).
    Another usecase for microC: most decompilers decompile to a custom C-like language. It pretends to be C, but in reality this is just a representation of a recovered AST that is often-but not always-a valid C code. MicroC would be a better target, because it would supposedly have less weird edge cases.
    [-]
    - pjmlp 438 days ago
      Hence why Macro Assemblers have existed for almost as long as raw Assembly.
      MASM and TASM, were already far beyond the features in K&R C, if we overlook the issue of being bound to 80x86 Assembly.
      TI has some DSPs where their Assembly is basically bare bones C like, in a SSA kind of approach.
    - uecker 439 days ago
      I don't think any weird edge case is a problem when targeting C. You just do not produce such cases when emitting the code.
    - jcranmer 439 days ago
      I'd argue that unsafe Rust is a better target here (although I don't know if &raw has made it into stable Rust yet, which you need for doing an address-of that doesn't have a risk of UB like & does). Rust's typesystem is less silly (there's only one multiple-types-for-same-value, namely usize), there's no implicit conversions, the core CFG primitives are a little bit richer (labeled break/continue, although those are now added in C2y), and there's a decent usable intrinsic class for common operators-not-in-C like rotations or popcount.
      If your goal is a simple language to compile, well, Rust won't ever fit the bill; but as a target for a decompiler, I think the unsafe subset is worth exploring.
      [-]
      - steveklabnik 439 days ago
        &raw landed in the release just before this latest one.
        And even before that release, there were macros the poly fill it, so you could have used those to target an earlier version.
      - uecker 439 days ago
        Why would you target a language that is less portable and less stable?
        [-]
        jcranmer 439 days ago
        For the specific question of decompiling, as noted in the GP comment, you're already forced to decompile to a not-quite-C language because C's semantics just aren't conducive to accurately representing the hardware semantics. Not that Rust doesn't have its own foibles, but I suspect its core semantics are easier to map to than C's semantics without having to resort as much to a not-quite-Rust target.
        It's definitely something I would like to see at least explored!
        [-]
        uecker 439 days ago
        I am pretty sure somebody will do this. My guess is that it will make basically no difference, but the end result is likely less useful...
        sylware 438 days ago
        dude... we say C has some syntax already too rich and complex and you bring rust on the table? please...
    - 9rx 439 days ago
      > Assembly [...] not typed and not structured (as in structured programming).
      That depends on the assembly language. Some have structure constructs, some are typed. Portability is out.
      But if you accept a slightly higher abstraction, WebAssembly is portable, typed, and structured.
      [-]
      - PittleyDunkin 439 days ago
        > But if you accept a slightly higher abstraction, WebAssembly is portable, typed, and structured.
        Does WebAssembly support AOT compilation to native binaries? I thought it was just a VM.
        [-]
        9rx 439 days ago
        It does if you implement it! You can do anything if you get around to doing it.
        Or use an existing implement like wasmedge, I guess.
  - sylware 439 days ago
    Yep, assembly... but a royalty free, brutally simple ISA...
    Wait... we have it... RISC-V.
    But we need performant µArchitectures of all major use cases (server/desktop/mobile/etc) and that on the best silicon process.
    If RISC-V is a success, no need for a µC, just go RISC-V assembly BUT... do not abuse that macro preprocessor, because if it is to move complexity from the C syntax to some macro preprocessor, that would be pointless.
  - Frenchgeek 439 days ago
    Sphinx C-- maybe?
    [-]
    - sylware 439 days ago
      Is that microsoft C--?
      [-]
      - Frenchgeek 438 days ago
        https://bkhome.org/archive/goosee/cmm/
        [-]
        sylware 437 days ago
        Had a look, its syntax is already too rich: I saw "while" "switch" etc.
        I guess, we are looking for even simpler thas sphinx c--.
        I start to wonder, if I could not express such simple C syntax using a powerful assembler macro preprocessor (like the one of fasm2). Until there is a "expression" processor, it should kind of be easier.
- mystified5016 439 days ago
  You want assembly with some sugar.
  Read up on Forth languages. It's pretty much exactly what you're after.
  [-]
  - mananaysiempre 439 days ago
    Forth is kind of weak dealing with value types of unknown size. For example, suppose you're writing a cross-compiler, and an address on the target machine might take one or two cells on the host machine depending on the host bitness. Now suppose you need to accept a (host-machine) cell and a target-machine address (e.g. an opcode and an immediate) and manipulate them a bit. Trivial in C, definitely possible in Forth, but supremely annoying and the elegance kind of falls apart.
    [-]
    - mystified5016 435 days ago
      Assembly isn't portable? Wow! You must be some type of genius! I'll start calling newspapers
      [-]
      - mananaysiempre 435 days ago
        Assembly language isn’t, but assemblers usually are. If you want a (cross-)assembler for 32-bit x86, you can build GNU as or Nasm on any reasonable platform with a C implementation, because, ultimately, bytes are bytes, and you can write
        void emitd(struct buf *buf, int opcode, uint_least32_t address);
        or however it looks inside your assembler without caring what sizeof(int) is (assuming CHAR_BIT is 8). By comparison, in Forth that will be
        emitd ( buf op adr -- ) ( 32-bit host ) emitd ( buf op ahi alo -- ) ( 16-bit host )
        depending on the bitness of where your assembler runs, even if the machine it assembles for is exactly the same in both cases. You cannot hide the platform difference behind a typedef for uint_least32_t or whatnot, unless you’re willing to drastically reshape the entirety of Forth from inside (which it does allow).
    - sylware 438 days ago
      endian support is like the memory barrier and atomic support, should be inline function/keyword.
mhandley 439 days ago
I expect many people know this one, but it's a useful teaching aid when understanding the relationship between arrays and pointers
```
  int array[10];
  *(array+1) = 56;
  array[2] = 4;
  3[array] = 27;
```
The first two are obvious, but the third is also legal. It works because array indexing is just sugar for pointer arithmetic, so array[2]=4 is identical in meaning to *(array+2)=4. Therefore 3[array]=27 is identical to *(3+array)=27 and so is legal. But just because you can doesn't mean you should.
[-]
- macintux 439 days ago
  The best, most entertaining book I've ever read on C covered that (unless I'm misremembering, but I doubt it): Expert C Programming.
  https://www.goodreads.com/book/show/198207.Expert_C_Programm...
  [-]
  - dualogy 438 days ago
    I'm already liking that one! Page 5 quote:
    > There is one other convention — sometimes we repeat a key point to emphasize it. In addition, we sometimes repeat a key point to emphasize it.
    One more quote and I'll stop:
    > ctime() converts its argument into local time, which will vary from GMT, depending on where you are. California, where this book was written, is eight hours behind London, and several years ahead
- WalterBright 439 days ago
  > The first two are obvious, but the third is also legal.
  D doesn't have that bug!
  In 44 years of C programming, I've never encountered a legitimate use for the 3rd. (Other than Obfuscated C, that is.))
  [-]
  - WolfeReader 439 days ago
    It's not a bug. You're seeing the difference between "this is how you're taught to access arrays" and "this is how array access actually works".
    [-]
    - WalterBright 439 days ago
      Since the Standard specifies what that does, pedantically it is not a bug. Ok.
      But I call it a bug because it has no use and just pointlessly confuses people.
    - im3w1l 438 days ago
      Well it could (and I agree with WalterBright that it should) have been disallowed. a[b] being implemented as an early stage rewrite rule expanding to *(a+b) is an uninteresting implementation detail. And I doubt it is even implemented that way in modern compilers anyway. It certainly can't be in C++ as a[b] and b[a] mean different things when [] is overloaded.
      [-]
      - WolfeReader 436 days ago
        That "uninteresting implementation detail" is actually of grave importance when it comes to understanding how buffer overflow attacks work. I hate to think anyone would put C code into production without understanding this.
    - kragen 437 days ago
      You seem to be lecturing the author of one of the most prominent early C compilers on how array access actually works in C.
      [-]
      - WolfeReader 436 days ago
        Yep.
  - mhandley 439 days ago
    Agreed - I've only been programming C for 38 years but I've also never found a legitimate use. However I have used it to illustrate a point when teaching C to beginners - it looks so odd they tend to remember it.
matheusmoreira 439 days ago
Note that this is GNU C, not standard C. GNU has extended the normal C language with features such as forward parameter declarations and numeric ranges in switch cases. Lots of people don't know about these things.
[-]
- dzaima 439 days ago
  Note that switch case ranges might be coming in C2y though.
  [-]
  - mananaysiempre 439 days ago
    Also forward parameter declarations, or is that proposal dead?
    [-]
    - wahern 439 days ago
      Basically dead. The main motivation would be to make it easier to use variably modified types in function parameters, where the (length) identifier is declared after the variably modified type, as in
      > void foo(int a[m][m], int m)
      Currently you can only do:
      > void foo(int m, int a[m][m])
      The holy grail is being able to update the prototypes of functions like snprintf to something like:
      > int snprintf(char buf[bufsiz], size_t bufsiz, const char *, ...);
      However, array pointer decay means that foo above is actually:
      > void foo(int (*a)[m], int m)
      Likewise, the snprintf example above would be little different than the current definition.
      There's related syntax, like
      > foo (int m, int a[static m])
      But a is still just a pointer, and while it can help some static analyzers to detect mismatched buffer size arguments at the call site, the extent of the analysis is very limited as decay semantics effectively prevent tracing the propagation of buffer sizes across call chains, even statically.
      There's no active proposal at the moment to make it possible to pass VM arrays (or rather, array references) directly to functions--you can only pass pointers to VM array types. That actually works (sizeof *a == sizeof (int) * m when declaring int (*a)[m] in the prototype), but the code in the function body becomes very stilted with all the syntactical dereferencing--and it's just syntactical as the same code is generated for a function parameter of `int (*a)[m]` as for `int *a` (underneath it's the same pointer value rather than an extra level of memory indirection). There are older proposals but they all lost steam because there aren't any existing implementation examples in any major production C compilers. Without that ability, the value of forward declarations is greatly diminished. Because passing VM array types to functions already requires significant refactoring, most of the WG14 felt it wasn't worth the risk of adopting GCC's syntax when everybody could (and should?) just start declaring size parameters before their respective buffer parameters in new code.
      [-]
      - uecker 439 days ago
        I hope it is not "basically" dead. I just resubmitted it at the request of several people.
        And yes, for new APIs you could just change the order, but it does help also with legacy APIs. It does even when not using pointers to arrays: https://godbolt.org/z/TM5Mn95qK (I agree that new APIs should pass a pointer to a VLA).
        (edited because I am agreeing with most of what you said)
      - mananaysiempre 439 days ago
        > everybody could (and should?) just start declaring size parameters before their respective buffer parameters in new code
        I know that was a common opinion pre-C23, but it feels like the committee trying to reshape the world to their desires (and their designs). It's a longstanding convention that C APIs accept (address, length) pairs in that order. So changing that will already get you a score of -4 on the Hard to Misuse List[1], for "Follow common convention and you'll get it wrong". (The sole old exception in the standard is the signature of main(), but that's somewhat vindicated by the fact that nobody really needs to call main(); there is a new exception in the standard in the form of Meneide's conversion APIs[2], which I seriously dislike for that reason.)
        The reason I was asking is that 'uecker said it was requested at the committee draft stage for C23 by some of the national standards orgs. That's already ancient history of course, but I hoped the idea itself was still alive, specifically because I don't want to end up in the world where half of C APIs are (address, length) and half are (length, address), when the former is one of the few C conventions most everyone agrees on currently.
        [1] https://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
        [2] https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Restart...
      - Gibbon1 439 days ago
        That's related to something I would like which is to be able to set the number of elements in an incomplete struct.
        struct foo { size_t elements; int data[]; }; foo foo123 = {.elements = array_size(data), .data = {1, 2, 3}};
        or struct str { size_t sz; char str[]; };
        str s123 = {.sz = strlen(.str), .str = "123"};
        [-]
        uecker 439 days ago
        Clang and GCC just got the [[counted_by()]] attribute to help protect such structs in the kernel. But yes, native syntax for this would be nice.
      - dfawcus 438 days ago
        Note that GCC does (sometimes) detect the misuse of the "int a[static 3]" case, but maybe that is only when the length is a compile time constant; and possibly only with char arrays.
        $ make texe cc -g -O2 -std=c11 -Wall -Wextra -Wpedantic -Werror -c -o test.o test.c test.c: In function ‘do_test_formatSmallElem’: test.c:108:9: error: ‘matSmallElemFormat’ accessing 8 bytes in a region of size 2 [-Werror=stringop-overflow=] 108 | matSmallElemFormat(elem, buffer); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ test.c:108:9: note: referencing argument 2 of type ‘char *’ In file included from test.c:8: mat/display.h:17:6: note: in a call to function ‘matSmallElemFormat’ 17 | void matSmallElemFormat(mElem elem, char buffer[static matSmallElemLen]); | ^~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make: *** [<builtin>: test.o] Error 1
dfawcus 438 days ago
I'd have to argue the function typedefs are not useless, I've come across two uses.
The obvious one is rather than a function pointer typedef, such the subsequent use in a struct is obviously a pointer. Which helps when others are initially reading unfamiliar structures.
```
  typedef int handler_ty(int a);

  struct foo {
    handler_ty *handler;
    /* ... */
  }

  struct foo table[] = { { /* init fields */, /* init fields */, };
```
The other case can be somewhat related, namely as an assertion / check when writing such handler functions, and more importantly updating them.
```
  handler_ty some_handler;
  int some_handler(int a) { /* ... */ }
```
When updating code, it allowed for easier to decode compiler errors if the expected type of handler_ty was changed, and some specific handler was incorrectly updated, or not updated at all.
Basically the error would generally directly call out the inconsistency with the prior line, rather than with the distanct use in the initialisation of 'table'.
As I recall this mechanism has been around since at least C89, I don't recall using it in K&R.
WalterBright 439 days ago
I'm going to speculate a bit on why these silly things are in C.
C was developed on a PDP-11 that had 64Kb of memory. That's not much of any at all. Therefore, the compiler must be extremely tightly coded.
The fundamental rules of the C language are pretty simple. But articles like these expose consequences of such simple rules. Fixing them requires adding more code. Adding more code means less room for the code being compiled.
Therefore, if the intended use of the language works, the pragmatic approach would be to simply not worry about the quirky consequences.
A more interesting question would be "why do these characteristics persist in modern C compilers?"
The stock answer is "backwards compatibility", "Obfuscated C Code contests" and "gotcha job interview questions". My argument would be that there is no reason for the persistence of such "junk DNA" and it should be deprecated and removed.
I've done my part. D doesn't support that stuff, even though the basic use of the language is easily confused with C.
For example:
```
    #include <stdio.h>
    void main()
    {
        int i;
        for (i = 0; i < 10; ++i);
            printf("%d\n", i);
    }
```
I've died on that hill. I know others who lost an entire day staring at it wondering what's wrong with it. I saw it on X recently as "99% of C programmers will not be able to find the bug."
The equivalent D code:
```
    import core.stdc.stdio;
    void main()
    {
        int i;
        for (i = 0; i < 10; ++i);
            printf("%d\n", i);
    }
```
gets you:
```
    test.d(5): Error: use `{ }` for an empty statement, not `;`
```
C'mon, Standard C! Fix that!
[-]
- moefh 438 days ago
  > I know others who lost an entire day staring at it wondering what's wrong with it. I saw it on X recently as "99% of C programmers will not be able to find the bug."
  Both gcc and clang give a warning[1] for that code with just "-Wall", so I's hard to imagine it being a real problem these days.
  [1] https://godbolt.org/z/vfPzhc596
  [-]
  - WalterBright 438 days ago
    I know modern compilers do it, too. But still the language needs to be fixed. The proof is C programmers still get victimized by this useless feature.
    Compiler warnings are a good source of material for things that need to be fixed in the language. Unfortunately, every compiler has their own set of warnings, and sometimes warnings from different compilers contradict each other. That encourages programmers to not use the warning feature. That's another reason why the language should be fixed.
    [-]
    - uecker 438 days ago
      In my experience the warnings work quite well for the programmers I know.
      Anyway, ranting on HackerNews does not get anything fixed: https://www.open-std.org/jtc1/sc22/wg14/www/contributing.htm...
      [-]
      - pjmlp 438 days ago
        Working in offshoring projects for the lowest bid changes one's point of view, regarding "programmers I know" approach.
        [-]
        uecker 438 days ago
        I work a lot with students who can not program well in C. I would say turning on -Wall is not a difficult problem for them. If it is, then using another programming language also does not help.
        [-]
        pjmlp 438 days ago
        That is already progress, compared with the quality of delivery in many offshoring projects.
        It usually might go so bad, that after a couple of years, a new management ends up onshoring it all over again.
        You might turn -Wall, but then who fixes the warnings?
        And better be prepared to fight managament if warnings break the build, as it "slows down sprint velocity with useless coding efforts".
        Unless required by some kind of certification laws or quality assessement on project delivery.
- HeliumHydride 439 days ago
  I was able to find it instantly, but that's because I always use curly braces for my if/while/for loops.
  [-]
  - WalterBright 439 days ago
    I added a warning for it in my C compiler back in the mid 1980s.
    Fun story. A friend of mine (Eric Engstrom!) bought himself a backhoe. I'd never driven one before and he offered to let me drive it. Sure!
    The clutch pedal works backwards from that in a car. Press on the clutch to engage it, release the pedal to disengage it. After some struggling with my reflexes being all wrong, I came within a couple feet of taking out the side of his barn - by switching off the key.
    There was nothing wrong with that user interface, other than being insane.
  - binaryturtle 438 days ago
    Indeed, enforcing brackets here would be the proper fix, IMHO. :)
    I always use brackets too. That's simply a less error-prone style.
- dfawcus 438 days ago
  clang and gcc now warn of that.
```
  $ gcc-12 -g -O2 -std=c11 -Wall -Wextra -Wpedantic -Werror c-error.c
  c-error.c:2:10: error: return type of ‘main’ is not ‘int’ [-Werror=main]
      2 |     void main()
        |          ^~~~
  c-error.c: In function ‘main’:
  c-error.c:5:9: error: this ‘for’ clause does not guard... [-Werror=misleading-indentation]
      5 |         for (i = 0; i < 10; ++i);
        |         ^~~
  c-error.c:6:13: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘for’
      6 |             printf("%d\n", i);
        |             ^~~~~~
  cc1: all warnings being treated as errors
```
  and:
```
  $ clang-14 -g -O2 -std=c11 -Wall -Wextra -Wpedantic -Werror c-error.c
  c-error.c:2:5: error: 'main' must return 'int'
      void main()
      ^~~~
      int
  c-error.c:5:33: error: for loop has empty body [-Werror,-Wempty-body]
          for (i = 0; i < 10; ++i);
                                  ^
  c-error.c:5:33: note: put the semicolon on a separate line to silence this warning
  2 errors generated.
```
  Now granted, those are specific implementations, not things mandated by language changes.
- WalterBright 438 days ago
  > I've died on that hill.
  But I'm feeling much better.
mystified5016 439 days ago
Forward parameter declaration is an insane feature. It makes perfect sense in the context of C's other forward declarations but just bonkers.
I can't wait to slip this into some production code to confuse the hell out of some intern in a few years
svilen_dobrev 439 days ago
hehe. similar to
How to Get Fired Using Switch Statements & Statement Expressions:
https://blog.robertelder.org/switch-statements-statement-exp...
kazinator 439 days ago
Without information about how identifiers are declared, you do not know how to parse this:
```
  (A)(B);
```
It could be a cast of B to type A, or function A being called with argument B.
Or this (like the puts(puts) in the article):
```
  A(B):
```
Could be a declaration of B as an identifier of type A, or a call to a function A with argument B.
Back in 1999 I made a small C module called "sfx" (side effects) which parses and identifies C expressions that could plausibly contain side effects. This is one of the bits provided in a small collection called Kazlib.
This can be used to make macros safer; it lets you write a #define macro that inserts an argument multiple times into the expansion. Such a macro could be unsafe if the argument has side effects. With this module, you can write the macro in such a way that it will catch the situation (albeit at run time!). It's like a valgrind for side effects in macros, so to speak.
https://git.savannah.gnu.org/cgit/kazlib.git/tree/sfx.c
In the sfx.c module, there is a rudimentary C expression parser which has to work in the absence of declaration info. In other words it has to make sense of an input like (A)(B).
I made it so that when the parser encounters an ambiguity, it will try parsing it both ways, using backtracking via exception handling (provided by except.c). When it hits a syntax error, it can backtrack to an earlier point and parse alternatively.
Consider (A)(A+B). When we are looking at the left part (A), that could plausibly be a cast or declaration. In recursive descent mode, we are going left to right and looking at left derivations. If we parse it as a declaration, we will hit a syntax error on the +, because there is no such operator in the declarator grammar. So we backtrack and parse it as a cast expression, and then we are good.
Hard to believe that was 26 years ago now. I think I was just on the verge of getting into Lisp.
I see the sfx.c code assumes it would never deal with negative character values, so it cheerfully uses the <ctype.h> functions without a cast to unsigned char. It's a reasonable assumption there since the inputs under the intended use case would be expressions in the user's program, stringified by the preprocessor. Funny bytes would only occur in a multi-byte string literal (e.g. UTF-8). When I review code today, this kind of potential issue immediately stands out.
The same exception module is (still?) used in the Ethereal/Wireshark packet capture and analysis tool. It's used to abort "dissecting" packets that are corrupt or truncated.
jwilk 439 days ago
First part discussed on HN:
https://news.ycombinator.com/item?id=40835274 (113 comments)
zzo38computer 438 days ago
I had read the GCC documentation and I did not know about the forward parameter declaration. I did know about the other stuff that is mentioned there (and in the first part).
Declarations in for loops is something that I had only ever used in macros (I had not found it useful in other circumstances), such as:
```
  #define lpt_document() for(int lpt_document_=lpt_begin();lpt_document_;lpt_document_=(lpt_end(),0))
  #define win_form(xxx) for(win_memo win_mem=win_begin_();;win_step_(&win_mem,xxx))
```
(The compiler will optimize out the loop and the declared variable in the use of the lpt_document macro; I had tested this.)
teddyh 438 days ago
The comp.lang.c Frequently Asked Questions <https://c-faq.com/> should be required reading for every serious C programmer.
[-]
- hulitu 438 days ago
  Also the C infrequently asked questions https://www.seebs.net/faqs/c-iaq.html
GrantMoyer 439 days ago
I should keep this link handy for when people claim C is a simple language. Even without the GNU extensions, the examples here are pretty wretched.
[-]
- SAI_Peregrinus 438 days ago
  C is a small language. People confuse simple with small quite often. As languages get smaller, using them gets more difficult once below a certain size. The "Turing tarpit" languages like Brainfuck are extremely difficult to write complex programs in, mostly because they're so small.
  C is clearly too small to be simple. C++ is too large to be simple. Somewhere in between, there may exist a simple language waiting to be invented.
- tpoacher 438 days ago
  C is simple in the same way Conway's Game of Life is simple.
  That's not to say you can't create interesting monstrocities out of it!
betimsl 439 days ago
Where can I find more about this BASIC compatibility mode? Thnx
[-]
- hackyhacky 439 days ago
  Of course it isn't really BASIC compatibility mode. In reality, it's an lesser-known array initialization syntax. It's explained here: https://jameshfisher.com/2016/12/25/c-array-literal-explicit...
  In short, you can initialize an array like this, by specifying each element in order:
```
   int foo[] = {10,20,30}; // initialize elements 0, 1, and 2
```
  However, you can also initialize specific array elements:
```
   int foo[] = {[50] 10, [51] 20, [52] 30}; // initialize elements 50, 51, 52
```
  "BASIC compatibility" mode uses the above syntax.
  [-]
  - betimsl 438 days ago
    I never knew you can do this. Wow.
GranPC 439 days ago
Can anyone explain how the last snippet works?
[-]
- andreyv 439 days ago
  It creates a compound literal [1] of type array of int, and initializes the specified array positions using designated initializers [2] with the results of calls to puts().
  Using designated initializers without the = symbol is an obsolete extension.
  [1] https://gcc.gnu.org/onlinedocs/gcc/Compound-Literals.html [2] https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
- a12k 439 days ago
  Basically they execute in order of array initialization, not index order, so it outputs hello cruel world rather than cruel world hello.
utopcell 439 days ago
C[++] never ceases to amaze me. Just yesterday I saw this, totally valid, code snippet:
void g(); void f() { return g(); }
[-]
- dzaima 439 days ago
  That's not valid standard C; gcc and clang give a warning with '-pedantic'. It's valid C++ though.
  And IMO it's quite a nice feature, useful sometimes for reducing boilerplate in early returns. It's the obvious consequence if you don't treat void as some extremely-special syntax but rather as just another type, perhaps alike an empty struct (though that's not valid C either ¯\_(ツ)_/¯) that's just implicitly returned at the end of a void-returning function, and a "return;" statement implicitly "creates" a value of void.
  In fact in Rust (and probably a bunch of other languages that I'm too lazy to remember) void-returning functions are done via returning a 0-tuple.
  [-]
  - utopcell 439 days ago
    > warning: ISO C forbids 'return' with expression, in function returning void [-Wpedantic]
    Sanity restored.
- gpderetta 439 days ago
  What's wrong with that?
  [-]
  - gizmo686 439 days ago
    In C function prototypes with no arguments are strange.
```
  void g();
```
    Means that g is a function which takes a not-specified number of arguments (but not a variable number of arguments).
    Almost always what you want to do is
```
  void g(void)
```
    Which says that g takes 0 arguments.
    Having said that, declaring it as g() should work fine as long as you always provided that you always invoke it with the correct arguments. If you try invoking it with the wrong arguments, then the compiler will let you, and your program may just break in fun and exciting ways.
    Edit: looking closer, it looks like the intent might have been to alias f and g. But, as discussed above, it does so in a way that will break horribly if g expects any arguments.
  - utopcell 439 days ago
    Well, at the risk of stating the obvious: `return expr;` is meant to return whatever expr evaluates to.
    Here, g()'s return type is void, so there's no value to return and at the same time, f()'s return type is void, so return should not have an expr to begin with.
    This statement is effectively equivalent to: "g(); return;".
    [-]
    - josephg 439 days ago
      > so return should not have an expr to begin with.
      Thats one way to think about it. Another is that void is a type - which is obviously true given you can have void* pointers and functions can return void. In this example, f() returns a void expression, so that’s a perfectly fine thing to return from g.
      [-]
      - utopcell 439 days ago
        void* is leaps-and-bounds different from void. It is an expression, it has storage, you can actually assign something to it. You can't have a variable of type void; "void var;" is meaningless. A function doesn't "return" void. void simply denotes that a function doesn't return anything.
        [-]
        int_19h 439 days ago
        You can't have a variable of type void, but you can have an expression of type void - calling a function that returns void does just that (in C++, "throw" is also a void-typed expression).
    - Narishma 439 days ago
      Not the parent but I too don't see what's wrong with that.
    - frizlab 439 days ago
      so?
AKluge 439 days ago
An old classic: Remember, C was never intended to be taken seriously. https://www.cs.cmu.edu/~jbruce/humor/unix_hoax.html
[-]
- rramadass 438 days ago
  Ha, Ha! Didn't know of this.
  Money quote;
  We stopped when we got a clean compile on the following syntax:
  for(;P("\n"),R-;P("|"))for(e=3DC;e-;P("_"+(u++/8)%2))P("|"+(u/4)%2);
  I am NOT going to try it out.
psychoslave 439 days ago
Usually I'm more in the camp of "let's preserve everything we can as cultural heritage, yes even those awful Nazi propaganda material" and I'm confident that some distant archeologist (or current close neighbor) will be glad we did.
But as time pass, I'm more and more convinced that wiping-out every peace of C that was ever produced would be one of the greatest possible gesture for the future of humanity.
I also have a theory that universe exits so we can have some opportunities to taste chocolate. Surely in that perspective, even C can be an unfortunate but acceptable byproduct.
[-]
- H8crilA 439 days ago
  Too many people still don't understand C's greatest failure: the undefined behavior. Most people assume that if you write past an array then the result may be a program crash; but actually undefined behavior includes other wonderful options such as stealing all your money, encrypting all your files and extorting you for some bitcoin, or even partially destroying a nuclear isotopes processing facility. Undefined really means undefined, theoretically some demons may start flying out of your nose and it would be completely up to spec. If you think that this is justified by "performance gains" or some other nonsense then I really don't know what to tell you!
  [-]
  - inopinatus 439 days ago
    No, it doesn't. It means the C standard imposes no requirements. It does not mean the compiler becomes unconstrained by statutory law or physics, and if your compiler is actually malware then it is not politely waiting for undefined behaviour before injecting a payload.
    The notorious nasal demons may not be in conflict with the C standard, but they are not going to actually happen, because they only exist in the imagination. The example is given to illustrate by absurdity that the scope of consequential defects is greater than "your program may crash", that's all. If you do wish to produce a similar effect then I suggest consuming a bowl of Buldak instant noodles whilst inducing a sneeze during compilation. Warning: your sinuses will not thank you. And cover your keyboard.
    The biggest hazard with undefined behaviour is that the compiler is not required to issue warnings or errors when encountered.
    [-]
    - rramadass 438 days ago
      Thanks for straightening out people's absurd ideas about UB.
      It is instructive to read what Ritchie himself thought of various UBs "specified" by the ANSI committee - https://news.ycombinator.com/item?id=20171616
    - saagarjha 438 days ago
      Typically this view is proposed because a malicious attacker can actually make your program do all sorts of stuff you probably did not expect, including adding "functionality" that you did not include in your program.
      [-]
  - astrange 438 days ago
    You don't have to implement the compiler that way if you don't want to. UBSan is fast enough to ship with it on and there's -fbounds-safety.
- adonovan 439 days ago
  Remember that C's contemporary languages were either inefficient (e.g. ALGOL 68, PL/1, Lisp), functionally obsolete (e.g. FORTRAN didn't have recursion or heap allocation), or even lower level (Assembly, B). C eliminated the need for need for assembly in programs that were low level (like OS kernels) or high performance (math, graphics, signal processing), and that was surely a huge improvement in type safety and expressiveness.
  [-]
  - psychoslave 438 days ago
    Well, Basic and Pascal was already something I guess, and Modula arrived in the same "era" as C. So at a general level the weight of C is to my mind not so much due to how it was shining out of the crowd of its alternative options for general programming. Instead my perception is that it's mostly due to a conjunction of where it was born (Bell Labs), its initial focus on construction of low level layer parts (kernel/OS), and how software stacks tends to leak.
    That doesn't void completely what C achieved at a technical level, of course. But it certainly ponder differently how much its spread can be weighted on its technical benefits.
  - anticensor 438 days ago
    You missed one important category: too verbose (COBOL, Pascal).
- uecker 439 days ago
  I don't get that C hate. That terse syntax can be misused to produce unreadable code in C, does not change that I usually find it more readable than more verbose syntax.
  [-]
  - psychoslave 438 days ago
    Probably hate is a bit too strong of a word here, at least to describe what I personally feel about this programming language, or any programming language really. And more importantly, it looks like I failed to put properly the humorous cursor at the level I was intended to express it in this comment. Sorry about that.
    I'm not sure what you mean with "terse syntax" here. To my mind what this article cover is more about convoluted constructions permitted by the languages. The C-user community tends to have a more abundant use of terse identifiers, which I personally find detrimental to the readability with no sound benefit; but this has nothing to do with syntax. An other thing that the article point to in that case is how much overloaded are the reserved tokens like parentheses and the asterisk, and syntax here too is marginally involved at best. That is, we could use `schtroumpf` and `schtroumpfly` instead of `(` and `)` and `schtroumpfing` instead of `*` without changing anything to the nub of the ergonomics issues this implies. What you can infer from looking at a line of code regarding how the compiler will interpret it is not a question of terseness, it's a matter of how much context sensitive the language is and how much the community follows idioms with assiduity which allows cognitively cheaper correct inferences most of the time.
    All programming languages have their pitfalls, it just happens that C comes with many original surprising ones, with paths of least cognitive resistance easily matching big trouble ahead. In a nutshell, C has terrible ergonomics, which makes no wonder it might be despised by some who have to reluctantly use it. But of course C will receive more harsh/gentle critics proportionally to the attention weight it has in the industry.
    [-]
    - uecker 438 days ago
      My point is that any kind of formal notation using symbols can be used to write incomprehensible gibberish. That this is possible does not tell you much about the quality of the formal notation used. I would even say that it is a sign of a good notation that you can write incomprehensible gibberish, because this means that it is flexible enough and too constrained. A compact formal notation can be used to express complicated thing clearly that one can not easily express clearly otherwise. This is why mathematics also uses a lot of formulas and it also very easy to write mathematical formulas no one can understand anymore. But this is not the point, the point is that you can write mathematical formulas that can express complicated things well. IMHO C has a very good trade-off that lets you write complicated programs in a clear way. It should be judged on how well good C code looks and not how incomprehensible code is that intentionally misuses the notation.
  - rramadass 438 days ago
    It is not "C hate" in as much as folks flexing self-aggrandizement.
    These type of constructs are just intellectual curiosities and not really related to actual usage.
- chasil 439 days ago
  But how interesting would your life become if SQLite ceased to function anywhere around you?
  Should this misfortune befall you, please don't get on an airplane (with me).
  [-]
  - psychoslave 438 days ago
    An alternative world without the quirks of C is obviously not necessarily one without any database relying on an other programming language (carrying its own quirks of course :D)