Hacker Newsnew | past | comments | ask | show | jobs | submit | Spex_guy's commentslogin

> If the type system is uncomputable, then the type system will never be able to resolve all uses of function pointers everywhere.

Can you elaborate on what you mean here, and the problems this might cause? Do you mean that some function pointers cannot be resolved to concrete functions? Or that the process of evaluating comptime may be infinite? Or some other problem where the compiler can't determine whether a function pointer is needed for a given function?


These are great questions.

All of the above, actually.

Turing-completeness and the uncomputability thereof are widespread, easy to build, hard to keep away, and affects everything that might happen at runtime.

A list of things that can happen is infinite, but that list absolutely includes:

* Whether a particular function pointer resolves to certain functions,

* Whether evaluating the comptime portion of a Zig program will take forever,

* Whether the compiler fails to exclude a particular function from being used as a function pointer at a particular place.

Turing-completeness is a wildly powerful property, but that power is not free; the inability to figure out what might happen in a program is only just the biggest and most apparent cost.

The problems this might cause can be summarized like this: you can never know what a program will do for a given set of inputs until you run the program on those inputs, and even then, you might never know if the program never halts.

Does that make sense?


> Whether a particular function pointer resolves to certain functions

This is only a turing completeness issue if the list of functions is infinite. It is not, in zig.


You don't need an infinite list of functions to run into the problems with Turing-completeness. You only need effectively infinite behavior, which means all you need is effectively infinite possible inputs.


this is simply untrue. I'm sorry. I recommend studying math a bit more rigorously before making claims in the future.


> What kind of tests are these "behavior test"?

Snippets of zig code that use language features and then make sure those features did the right thing. You can find them here: https://github.com/ziglang/zig/tree/master/test/behavior

> Is that a list of compilation targets?

Mostly. Pedantically, it's a list of code generation backends, each of which may have multiple compilation targets. So for example the LLVM backend can target many architectures. The ones that are architecture specific are currently debug-only and cannot do optimization.

> If not all behavior tests pass, does that not mean that the compiler fails to compile programs correctly?

Some tests are not passing because they cause an incorrect compile error, others compile but have incorrect behavior (miscompilation). Don't use Zig in production yet ;)

(edit: fix formatting)


To add to what Spex said: also some of those tests check language features that the compiler code doesn't exercise, like async/await. This means that the compiler is able to build itself, but is not able to build every possible valid Zig program. We're getting there though :^)


The compiler adds a tag in debug modes, but not in release modes.


Neat, that's somewhere in between a `union` and a `std::variant`. You could build your own, but it's cool that it's a first class member of the language.


Parsing this out of utf-8 encoding requires no knowledge of unicode or even utf-8. All of the relevant characters (reverse solidus, quotation mark, and control characters) are single byte characters in the ascii subset. These characters cannot be found inside multi-byte characters in utf-8 due to the design of the encoding. Converting the unicode character escape codes to utf-8 would require knowledge of utf-8 encoding, but this unescaping is not a feature that would be provided by the language regardless.


> Parsing this out of utf-8 encoding requires no knowledge of unicode or even utf-8.

If you have valid UTF-8 already, then yes, the task is a lot easier. But depending on the level at which you're parsing, this might not be the case — i.e., if you're writing a JSON parser from the ground up, you do need to know what UTF-8 and Unicode are, and will need to validate the input data.

> Converting the unicode character escape codes to utf-8 would require knowledge of utf-8 encoding

Agreed. Even if you're not working at the "array-of-bytes" level, you will need to be able to parse and translate "\u..."-style strings into the appropriate output character encoding.

> but this unescaping is not a feature that would be provided by the language regardless.

I'm not sure we're talking about this being handled at the language level. This translation is something that would likely be offered at the parser level (working with the features offered by the standard library), but the parser does need to know about it — and does need to be able to work with strings at a granular level to be able to parse it out. By definition, it cannot leave the input data as an undecoded bag of bytes.

Note, too, that the JSON spec does not specifically require UTF-8. UTF-16 is a completely valid encoding for JSON (though much less common than UTF-8), in which case none of these characters are an ASCII subset, and greater awareness is needed to be able to handle this.


> it cannot leave the input data as an undecoded bag of bytes

But all it's doing here is taking a hex string (which is entirely ASCII) and converting it into the respective hex representation. Since ASCII translates unambiguously to bytes, it doesn't really matter if `str[0]` is operating on a byte stream, codepoint stream or grapheme stream, because in utf8, they're all the same thing as long as we're within the ASCII range.

Where things get hairy is stuff like `str.reverse()` over arbitrary strings that may or may not be in ASCII. This repo[0] talks about some of the challenges associated with conflating characters with either bytes or codepoints. The problem is that programming languages often approach strings from the wrong angle: you can't just tack on handling of multi-byte codepoints on top of ascii handling; you lose O(1) random access and you don't actually model the linguistic domain properly by doing so, because in the first place, humans think of characters not in terms of bytes or codepoints, but in terms of grapheme clusters. Clustering correctness falls deep in the realm of linguistics, and is therefore arguably more suitable to be handled by a library than a programming language.

[0] https://github.com/mathiasbynens/esrever


I agree entirely with your second paragraph, but regarding this:

> hex string (which is entirely ASCII)

My point is that JSON doesn't need to be UTF-8 or a superset of ASCII to be valid. It can be any representation of Unicode, including UTF-16, UTF-32, GB 18030, etc.; so long as the text is is comprised of Unicode code points in some Unicode transformation format, the JSON is valid.

As I said in the parent comment: if you are working within UTF-8 exclusively, and can assume valid UTF-8, then great! But this isn't necessarily true, and in some cases, you will still need to care about the encoding.

(Either way, this starts straying slightly from the more general discussion at hand: regardless of the encoding of the string, you will still need an ergonomic way of interacting with the contents of the data in order to meaningfully parse the contents — even past the hurdle of decoding from arbitrary bytes, you still need to manipulate the data reasonably. In some cases, this means working with a buffer of bytes; in others, it makes sense to manipulate the data as a string... In which case, you may run into some of the string manipulation ergonomic considerations being discussed around these comments.)


> JSON doesn't need to be UTF-8 or a superset of ASCII to be valid. It can be any representation of Unicode, including UTF-16, UTF-32, GB 18030, etc

Sure, it can also be gzipped, encrypted, etc but that goes back to the point that there's nothing inherently special about JSON as it relates to encoding to a byte stream. All there is to it is that somewhere in a program there's an encode/decode contract to extract meaning out of the byte stream, and in a protocol one most likely only looks at byte streams as sequences of bytes (because performance-wise, it doesn't make sense to look at payload size in terms of number of codepoints/graphemes at a protocol level)


The UTF-8 encoding is designed so that this is usually not a problem. If you do a search in a utf-8 encoded byte array for an ascii character, for example, you can never get a false positive. Compound UTF-8 characters always have the most significant bit set of each component byte, and ascii characters always have it unset. Additionally, treating the string as an array of unicode codepoints doesn't solve the problem -- now you have people screwing around with individual codepoints inside grapheme clusters :P


> Additionally, treating the string as an array of unicode codepoints

I suggested no such thing.

> individual codepoints inside grapheme clusters

That's less severe than invalid codepoints.

Perhaps the whole thing whichever way it is represented should not be mutable given that there's no way to make it mutable in a sensible way?


I can't comment for all the other people who are posting and voting for those posts, but at least for me Zig has quickly become my language of choice for side projects. Its cross compilation features alone are enough for it to replace the system C/C++ compiler toolchains I used to use, and the language itself is everything I'm looking for. Readable (IMO) syntax, proper namespaces, order independent declarations, powerful metaprogramming, and an unmatched level of internal consistency all make it stand out to me. It feels like a massively simplified C++, a native language that improves significantly on C without introducing a massive set of unrelated features.


It seems strange to compare to C++ when it has none of the features that most define C++ like RAII, OOP & templates. Its not really "simplified" - its something totally different.


Presumably the idea is that lots of things that you basically have to use C++ for today because C alone is weak could be served by Zig instead, not that Zig and C++ are comparable languages.


I really like Zig, but the lack of RAII means we're back to malloc/free style programming, and I would never opt into that unless Rust could not be used (e.g. binary size too large). Having said that, it's way better than C and I hope it does well.


Unfortunately malloc-free is unavoidable if you're writing code that needs to allocate memory rather carefully (e.g. guaranteeing no OOMs—Rust has many situations where it'll silently heap alloc and then panic on OOM). Looks like Zig has accepted that it'll be used for those situations and decided to make that experience really good, instead of deciding to be a rustalike or insist on RAII. I think that's an appropriate choice! It makes me excited to use C less.


RAII and explicit allocators are independent concerns. That Rust chose a bad default in having 1) a static global allocator, and 2) an expectation of infallible allocations from said global allocator that was then made pervasive in its libstd and third-party ecosystem, has nothing to do with the fact that it has lifetime-based destructors. Having explicit allocators does not mean you need to `free` manually instead of via an automatically-called destructor. A type that allocates needs to ensure a corresponding free in its dtor, and then every other code that uses the type gets the lifetime-based cleanup for free.


I imagine that Zig has a lot more focus on "I can't use X in my environment" types of situation. It seems that for many such situations it might be a better fit than Rust.


There is, for practical purposes, noplace where one can't use C++, except where gatekeepers exercise power keep it out. Typically it would be only a day's work to get those building with a C++ compiler, whence they could begin modernizing.

There are plenty of loadable modules in C++ for Linux and BSDs, and plugins for Postgres, in places where there is no expectation of upstreaming them.

Zig is in a similar boat.


Perhaps you're correct about C++, but I was more referring to the Zig vs. Rust situation.


Since the Zig tool chain can also compile C, and Zig can use C headers without translation, the cases are more similar than one might otherwise suspect.

But of course it would be less of an upgrade, and the Zig parts would have to stay clearly segregated.


Not really back to that. Zig just chooses a different way to deal with memory, namely different compilation modes and faster write-compile-test cycles.

It’s a different tradeoff which may be worth it for some use-cases. But it’s definitely not obvious whether it is worse, as the alternative comes with a huge complexity on the language side.


It is possible they will add a type level resource release obligation at some point. It would not be anything quite like the Rust borrow checker but I think would be a big help. https://github.com/ziglang/zig/issues/782


Well, Zig’s compile time metaprogramming capabilities do rival that of C++’s in a much more simple way, imo.


It does have comptime, which covers a lot of the same ground as templates (and generics, in other languages).


comptime seems like a superset of templates. The target audience was probably not using OOP. People do come by the discord to chat about how nice RAII is though.


Hard to say, 1.0 for us comes when everything is fully stable in both the language and standard library, and we believe that it can last at least 20 years without modification. My estimate is at least 5 years from now, possibly more.


Will there be a Zig 0.10 release?



We currently don't advise using the Zig language in production. The compiler has known bugs and even some miscompilations, and we sometimes make breaking changes to the std lib and even the language. However, `zig cc` and `zig c++` are currently considered stable enough for production use.


This is a worry, but it's not as bad as you might initially think. The first thing to notice is that even though the "interface pointer" got fatter, the implementation got much leaner, as it no longer contains the vtable. Vtables are now shared between instances of the same type, so total memory use has gone down, and implementations can be packed more densely. If you're worried about overall cache usage, this is a net positive. The first load, from the fat pointer, is very likely to be in cache. The second load, from the vtable, will be in cache if you have used the same type recently. Which is likely if you have thousands of objects, you probably do not have thousands of implementations.

There is some additional latency because the virtual function load is now two pointer dereferences instead of one. However, C++ and Go both use double-dereference models like this, and it seems to be working fine for them. Additionally, if virtual calls like this are on your critical fast path, you have bigger problems :P


It's true that language specific IRs are more powerful, but that isn't the problem in this example. Interfaces aren't part of Zig at the language level, nor are object lifetimes over which to make the vtables immutable. Having a memory model where memory is not innately typed is what makes this problem difficult to optimize, the IR has very little to do with it.


> a memory model where memory is not innately typed

This is intriguing. Do you have examples of systems where memory is typed? Or some ideas?


Memory is typed in C++. You must use placement new to imbue memory with a non-POD type, and that memory continues to have that type until its destructor is invoked. This is what allows the optimizer to assume that vtable pointers don't change while an object is alive.

Effects of the typedness of memory can also be seen in the strict aliasing rules. Once a piece of memory has taken on a type, it may no longer be aliased by pointers of different types, until the memory is relinquished back into typelessness by a destructor.


thanks!


BitC used a strictly typed memory model.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: