Hacker Newsnew | past | comments | ask | show | jobs | submit | cafxx's commentslogin

That's not how it's implemented (it returns false if you're inside a Do() on a unsupported platform), although I agree the wording should be clearer.


Filed a CL for this, hopefully it gets merged ~soon.

https://go-review.googlesource.com/c/go/+/729920


I find this example mildly infuriating/amusing:

    func Encrypt(message []byte) ([]byte, error) {
        var ciphertext []byte
        var encErr error
    
        secret.Do(func() {
            // ...
        })
        
        return ciphertext, encErr
    }
As that suggests that somehow for PFS it is critical that the ephemeral key (not the long-term one) is zeroed out, while the plaintext message - i.e. the thing that in the example we allegedly want secrecy for - is totally fine to be outside of the whole `secret` machinery, and remain in memory potentially "forever".

I get that the example is simplified (because what it should actually do is protect the long-term key, not the ephemeral one)... so, yeah, it's just a bad example.


PFS is just one of many desirable properties, and getting access to plaintext is just one of many kinds of threat. Getting access to ephemeral keys and other sensitive state can enable session hijacking. It's still not a great example, though, because it doesn't illustrate that threat model either.


> If you really want an arena like behavior you could allocate a byte slice and use unsafe to cast it to literally any type.

A word of caution. If you do this and then you store pointers into that slice, the GC will likely not see them (as if you were just storing them as `uintptr`s)


You need to ensure that everything you put in the arena only references stuff in the same arena.

No out pointers. If you can do that, you're fine.


I still would be wary, even in that case. Go does not guarantee that the address of an allocation won't change over the lifetime of the allocation (although current implementations do not make use of this).

If you really store just references to the same arena, better to use an offset from the start of the arena. Then it does not matter whether allocations are moved around.


There's a bunch of activity ongoing to make things better for memory allocation/collection in Go. GreenTeaGC is one that has already landed, but there are others like the RuntimeFree experiment that aims at progressively reduce the amount of garbage generated by enabling safe reuse of heap allocations, as well as other plans to move more allocations to the stack.

Somehow concluding that "By killing Memory Arenas, Go effectively capped its performance ceiling" seems quite misguided.


That one is kind of interesting given the past criticism of Java and .NET having too many GCs and knobs.

With time Go is also getting knobs, and turns out various GC algorithms are actually useful.


Not sure what you are referring to. There are no knobs involved in the things I mentioned (aside from the one to enable the experiment, but that's just temporary until the experiment completes - one way or the other).


The knobs are the values that can be given to GOGC environment variable.

Also I kind of foresee they will discover there are reasons why multiple GC algorithms are desired, and used in other programming ecosystems, thus the older one might stay anyway.


Your previous message was referring to Go "getting" knobs, but GOGC has always been there.

The older GC algorithm won't stay, IIRC the plan is for it to be removed in 1.27 (it's kept for now just to give a fallback in case of bugs in the first release).


GOGC was introduced in Go 1.5, and I thought the problem was solved.

https://go.dev/blog/go15gc

> Go 1.5’s GC ushers in a future where stop-the-world pauses are no longer a barrier to moving to a safe and secure language. It is a future where applications scale effortlessly along with hardware and as hardware becomes more powerful the GC will not be an impediment to better, more scalable software. It’s a good place to be for the next decade and beyond.


> GOGC was introduced in Go 1.5

yes, that's quite literally what I meant by "GOGC has always been there". 1.5 was released 10 years ago, just 3 years after 1.0.

So to summarize: there is one knob (that has been there from basically the beginning), plus a second one (if you squint hard enough: GOMAXMEM), and absolutely no plans to add further ones, or to add alternative GCs.


That would be since Go exists, literally.


The point is that no-one is thinking to add knobs, or allow alternative GCs.


That past criticism was and is correct, proven by the fact new Java GCs like ZGC were deliberately designed to offer few knobs.

Go isn't getting any new knobs, there are only two; that's nothing compared to 100's of options that old Java GCs had. Completely incomparable.

> and turns out various GC algorithms are actually useful.

I don't know what you're trying to say here, but I think I know why — you don't know either. Stop spitballing.

There are no "various GC algorithms" at play here at all. There is just a new algorithm that performs better. You can read all about it here: https://go.dev/blog/greenteagc. It's not an optional alternative GC, but a successor.


I think that the idea of not initiating writeback immediately derives mostly from the days of spinning rust, where read latencies would be noticeably impacted if you initiated writeback too aggressively: reads, contrary to writes, are synchronous by default, and spinning rust rarely allowed high (by modern standards) IOPS, so it made a lot of sense to buffer writes as much as possible to minimize the number of I/O operations spent on writes, as this would leave as many of those IOPS as possible available for reads.

This is probably much less of a concern today, as NVMe drives - beside having many orders of magnitude higher IOPS capacity - also have (at least on paper) much better hardware support for high I/O concurrency. It may still make sense, even today, if your hardware (or stack) limits IOPS.


As I mention elsewhere in the thread, reading the kernel documentation for the various flags suggests that the kernel devs are also concerned about multiple writes to the same piece of data & thus buffering lets you potential elide unnecessary disk I/O.


Would be nice also if they fixed the ubiquitous "network errors" that happen approximately every single time...


Right?! That then give you no way to interact except to force a regeneration. I don't want to waste messages or time repeating a long generation, I want to be able to either continue it or, if it's sufficient as is, just respond to it in place. I don't understand why you're forced to hit regenerate after those.


Not to be the devil's advocate here, but almost certainly it can be the case that data was used to define heuristics (potentially using automated statistical methods) that a engineer then formalized as code. Without that data that specific heuristic wouldn't exist, at least very likely not in that form. Yet that data does not have to be included in any open source release. And obviously you as a recipient of the release can modify the heuristic (or at least, you can modify the version that was codified), but you can not reconstruct it from the original data.

I know my example is not exactly what is happening here, but the two sound pretty affine to me and there seem to be a fairly blurry line dividing the two... so I would argue that where "this must be included in a open source release" ends and "this does not need to be included in a open source release" starts is not always so cut and dry.

(A variant of this, that happens fairly frequently, is when you find a commit that says something along the lines of "this change was made because it made an internal, non-public workload X% faster"; if the data that measurement is based upon did not exist, or if the workload itself didn't exist, that change wouldn't have been made, or maybe it would have been made differently... so again you end up with logic due to data that is not in the open source release)

If we want to go one step further, we could even ask: what about static assets (e.g. images, photographs, other datasets, etc.) included in a open-source release... maybe I'm dead wrong here, but I have never heard that such assets must themselves be "reproducible from source" (what even is, in this context, the "source" of a photograph?).

That being said, I sure wish the training data used for all of these models was available to everyone...


...or maybe Windows should just offer an API for marking a file for deletion once it's not in use anymore (I understand unlink semantics may not be possible, but that's not what my suggestion above is saying)


Windows do have this API, NtDeleteFile, AND it could be used to delete current running exe. https://twitter.com/jonaslyk/status/1345167613643661312 but it is undocumented...


I thought I'll be the guy to point out that once again mandatory file locking is to blame, but you beat me to it.

I never digged into the question, but why is it used, what benefits did it provide over the UNIX unlink behaviour?


> I never digged into the question, but why is it used, what benefits did it provide over the UNIX unlink behaviour?

How do you defragment/move files that are unreachable on the file system? How do you shrink volumes when you can't move files that need to be moved?

Edit: Actually, hmm... as I type this, I suddenly recall you can also open a file by its ID on NTFS. And you can enumerate its streams as well. So presumably this could work on NTFS if you loop through all file IDs? Though then that would make these files still accessible, not truly unreachable.


You don't? Those are either free space, or held by handle by a running process, so you just leave them be and assume they will be released sooner or later. Worst case you defragment on boot.

https://unix.stackexchange.com/questions/68523/find-and-remo...

This is how it works on UNIX. Generally better then apps randomly failing because a file(name) is held open somewhere by something.


> You don't? Those are either free space, or held by handle by a running process, so you just leave them be and assume they will be released sooner or later.

Well that's what I was getting at, it would suck to not be able to move around file blocks just because a process is using the file. That "sooner or later" might well be "until the next reboot". The current strategy makes it possible to live-shrink and live-defragment volumes on Windows - ironically, saving you a reboot in those cases compared to Linux.

But actually, maybe not - see the edit in my original comment.


I'm yet to want to defrag my computer and worrying about still open deleted files.

I face builds failing because I have a terminal open in a build output directory or a textfile in an editor open is far more often, and annoys me more. (or being unable to replace a running service binary of a service being developed/tested, needing to stop the service, replace it, and start again. Or failing log rotations because a logfile is open in a Notepad. Or...)

Also see my link for a solution on unix, where you can indeed fix this problem, or simply kill the process holding the file. I didn't need to defrag my computer in the last 20 years, neither on Linux, nor on Windows, but hey, it makes me happy that my daily work is hindered for this hypothetical possibility. (which could and is solved in other OSs with appropriate APIs for the job)

Also the original post is about Windows installers... don't get me started on the topic (or windows services), please.


I wasn't just talking about defragging. I was also talking about live volume shrinking.

> Also see my link for a solution on unix, where you can indeed fix this problem

Looping through every FD of every process just to find ones that reside in your volume of interest is... a hack. From the user's perspective, sure, it might work when you don't have something better. From the vendor's perspective, it's not the kind of solution you design for & tell people to use.

In fact, I think that "solution" is buggy. Every time you open a an object that doesn't belong to you, you extend its lifetime. I think that can break stuff. Like imagine you open a socket in some server, then the server closes it. Then that server (or another one) starts up again and tries to bind to the same port. But you're still holding it open, so now it can't do that, and it errors out.

> or simply kill the process holding the file.

That process might be a long-running process you want to keep running, or a system process. At that point you might as well not support live volume shrinking or defrags, and just tell people to reboot.

> Also the original post is about Windows installers... don't get me started on the topic (or windows services), please.

This seems pretty irrelevant to the point? It's not like they would design the kernel to say "we'll let you do this if you promise you're an installer".

> I face builds failing because I have a terminal open in a build output directory or a textfile in an editor open is far more often [...]

Yes, I agree it's frustrating. But have you considered the UX issues here? The user has C:\blah\foo.txt open, and you delete C:\blah\. The user saves foo.txt happily, then reopens it and... their data is gone? You: "Yeah, because I deleted it." User: "Wait but I was still using it??!"


I have considered it. Never had any serious problem about it during 15 years of desktop linux use as a developer machine. Grandma would not have more problems than unplugging the pendrive where the file was opened from, and trying to save it, for example... Modern operating systems have far worse and more user hostile patterns.

And for the live volume shrinking: the kernel can solve this problem, it there is a need for this, there is no need for this invariant for this feature, as it is not only possible to do it via the same APIs offered for ordinary basic file manipulation gruntwork. On unix basically a filename is disassociated from the inode, but afaik the inode holding the blocklist still exists, will be cleaned up later, thus it can be updated if its blocks are moved under the high level filesystem APIs.

You just made a strawman you are sticking to.


> Never had any serious problem about it during 15 years of desktop linux use as a developer machine.

You're not the typical customer of Windows.

> Grandma would not have more problems than unplugging the pendrive where the file was opened from, and trying to save it, for example

Actually she would, because in that case writing to the same file handle would error, not happily write into the ether.

Also, you have one tech-savvy grandma. I don't think mine even knows what a "pendrive" is (though she's seen one), let alone try to open a file on one, let alone try to save her files on it, let alone use pen drives on any regular basis.

> You just made a strawman you are sticking to.

The only strawman I see here is your grandma using pen drives to save files.

What I'm pointing at are real issues for some people or in some situations. Some of them you might be able to solve differently at a higher investment/cost, or with hacks. Some of them (like the UX issue) are just trade-offs that don't automatically make sense for every other user just because they make sense for you. Right now Windows supports some things Linux doesn't, and vice-versa. Could they be doing something better? Perhaps with more effort maybe they could both support a common superset of what they support, but it's not without costs.


'Sooner or later' means 'until the file is no longer open'.


Yes? And that might not happen until you log off or shut down the OS.


But it doesn't have to. Space is freed up deterministically, not "sooner or later".


What? Space can't be freed up while the file is in use. The process is using the file, the data needs to be there...


Using the same API that lets you move file blocks around at will.


Huh? That API requires a file handle. Which you get by opening a file. Which you can't do because you can't find it on the filesystem when it's not there.

Edit: Actually, hmm... see edit above.


While a process still has an unlinked file open, /proc/<pid>/fd can be used to obtain a handle to the file so that you can mess around with it.


You're suggesting opening every single FD of every single process (which might not even point to a file, let alone a file on that volume) and querying it just to do this? I mean, sure, I guess that's usually not physically impossible (unless e.g. /proc is unavailable/unmounted)... but it's clearly a hack.

In fact, I think it's not just a (slow!) hack, but a buggy one too. Every time you open a an object that doesn't belong to you, you extend its lifetime. I think that can break stuff. Like imagine you open a socket in some server, then the server closes it. Then that server (or another one) starts up again and tries to bind to the same port. But you're still holding it open, so now it can't do that, and it errors out.


No I'm just saying it's possible. I can count on the fingers of 0 hands the number of times I've needed to do this to edit a deleted file out from under a process that has the only reference to an unlinked file open so at least in my experience it's merely acadamic knowledge!


Locking mechanism that actually works, like in any sane OS besides UNIX.

And with it, less data corruption issues.


Why not just allocating the blobs off-heap? (That is something you probably want to do anyway if it's cryptographic material, to avoid being at the mercy of the GC leaving copies around)

ByteBuffer.allocateDirect should do that IIRC. This allows you to use the standard ConcurrentHashMap while being able to get a stable pointer for use by the rust logic.


I could use DirectByteBuffer instances as CHM values. But Java deallocates the backing memory of DirectByteBuffers during object finalization. If there is no on-heap memory pressure then there is no GC and thus no finalization. So it would leak offheap memory. I could also use Unsafe to hack into DirectByteBuffer and call the Cleaner explicitly. Many libraries do that anyway. But then I would still need some kind of reference counting to make sure I won't deallocate a buffer with active readers.


Or you could simply invoke a GC periodically (or every N times a key is removed from the map, or similar schemes).

Another simple way, if we don't like the idea of triggering GCs manually,is to allocate the same buffer both off-heap and on-heap: use the off-heap one for actual key storage, and the on-heap one just to generate heap memory pressure.


is there any regret from choosing Java for Db development and need to work around this and likely many other issues?


Is this the equivalent of directly asking the os for more pages, or does it work via some other heap-like mechanism that simply isn't garbage collected?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: