More

neandrake · on March 4, 2024

We’re not yet using git, but we have one repository with multiple branches, one for each released version (on-prem software, not a service run). We are in regular active development of ~3 branches at a time. Most developers dealing with this will have multiple clones checked out rather than switch branches in the same clone. The reason for doing so is we often make significant project structural changes in each release, and the act of switching branches updating thousands of files and lots of structural changes often causes most IDEs to combust. Pointing the IDE to a different cloned workspace works much more smoothly. After switching to git I anticipate it will include our own guides for using worktrees for such a purpose.

neandrake · on Feb 10, 2024

Except in your example I’ve had numerous cases where bugs have been introduced because the developer’s primary thinking to pick a set was “fast” and “no duplicates”, only to later realize that order of items is important. And looking back at their choice, dealing with 5-20 items, using a list and checking contains() not only had no performance impact but better reflects the intent of “no duplicates”. Instead the set is left in place and the dev fixing it usually copies the results in a list to be sorted the same order prior to being in the set...

cogman10 · on Feb 10, 2024

I've never seen this example. There are set implementations which maintain order, but in my experience it's fairly rare to both need "no duplicates" and "items inserted maintain order".

It's fairly easy to remove the set and replace it with something else if this does come up.

jwells89 · on Feb 10, 2024

Maybe I’m misunderstanding, but isn’t it pretty common to e.g. have a list of non-duplicate items on screen which have a specific order they’re supposed to be displayed in which can be changed as a result of user actions (like adding new items or editing existing ones)?

cogman10 · on Feb 11, 2024

Maybe? Perhaps that's just a part of my daily dev that is lacking. I generally work on the backend and not the frontend. I could envision how keeping stuff in the same order on a UX would be important.

Doesn't seem like it'd be the general case though. And it seems like it'd be fairly obvious when something like this applies.

matheusmoreira · on Feb 11, 2024

Hash tables contain a list internally and sets are hash tables with just the keys in them so they are already lists. With a level of indirection, it's easy to make them insertion ordered just like normal lists. Just make the hash table a map of keys to list indexes. That way the table can be rehashed freely while the internal list remains ordered.

neandrake · on Feb 1, 2024

A video of someone implementing a project of a related topic, for identifying binary patterns based on visualization. Interesting stuff.

https://www.youtube.com/watch?v=AUWxl0WdiNI

neandrake · on Oct 22, 2023

> there is ZERO intelligence, here

The effect of that is what people are referring to here. How is one supposed to know a tech-based video they watched once is the reason for videos made by someone else entirely on the topic of ADHD being recommended. No one is going to make that connection and clean up their watch history accordingly. Additionally tying recommendations to watch history maybe needs a step removed. What if I like to see the history of everything I watched without it affecting my recommendations?

A few months ago I must’ve been digging into settings and turned off watch history as I get only a blank page with no recommendations. I don’t discover content as much as I used to but it’s been a good change for me - just seeing updates from the channels I subscribe. Stumbling across content is left to sites like HN or other communities.

neandrake · on Oct 13, 2023

I don’t have an answer, but the article’s description made it sound like clients assume a default of 100 until it received the sever’s value in a SETTINGS frame. So there would only be a brief window of time where the client would appear to not respect the server’s configuration. I would assume the SETTINGS are pretty quick to be sent.

Conjecture on my part but their example of a photo album loading 100 images at once might come up where the page html/js/css are served up from one domain and the resulting page immediately tried to load images from a separate content server which has the lower limit. Maybe try updating your test to use two servers, one that serves up a page with 100 img tags, all different img resources being loaded from the second server, and the second server has the low concurrency limit. That might result in the browser issuing 100 immediate requests to server 2 without awaiting the SETTINGS frame.

neandrake · on Oct 12, 2023

I think the example is the code for a child process which must use a separate thread to block on stdin for the whole process. As soon as the parent process dies (no example code?), the connection to the child process’ stdin will be unblock, causing the child monitor thread to detect and panic.

neandrake · on Sept 25, 2023

Check out CodeBook, it’s not open source but it’s a 1-time fee for device type (windows, Mac, iPhone, android), up to five installs. I’ve purchased for phone, MacBook, and windows pc and been using for the past 5+ years and am satisfied with it. The product itself isn’t open source but the company which makes it does develop an open source module/extension of SQLite for encrypted database. All syncing is manually done, across Wi-Fi or it can use Dropbox or google drive.

https://www.zetetic.net/codebook/

nyolfen · on Sept 25, 2023

wow, closed source, manual sync, and i get to pay for it?

neandrake · on Sept 17, 2023

This can be problematic when working on a project where updating the version of rust is difficult, limiting what libraries can be used - or ci/cd where devs may not have direct control of the environment. It’s less of a language compatibility issues and more deployment logistics issues.

neandrake · on Aug 7, 2023

Did google cite patents as one of the reasons they removed support initially? I thought it was all around lack of benefits and difficulty of maintenance.

neandrake · on July 12, 2023

My thinking is the same. I doubt this is an oversight. Making the String constructor thread-safe would likely slow things significantly. Great point about things being assumed not thread-safe. The JDK is pretty thorough with documenting thread-safety.

masklinn · on July 13, 2023

> Making the String constructor thread-safe would likely slow things significantly.

By my reckoning it would speed things up, at least going by the Java code.

neandrake · on July 13, 2023

By what means? The only ways I would expect are either unconditionally duplicating the array or mutex, and I don’t think the mutex wouldn’t be simple. Adding a sync block on the input could be done but that’s assuming nothing else is locked on it (but probably points to there being a race condition elsewhere if there is..).

Unconditionally duplicating the array would use more memory and wouldn’t be faster.

masklinn · on July 13, 2023

Inlining `StringUTF16.compress` and `StringUTF16.bytes what the code currently does is essentially this (using python as pseudocode):

    latin1 = bytearray(len(input))
    for i in range(len(input)):
        c = input[i]
        if c > 255:
            latin1 = None
            break
        latin1[i] = c

    if latin1 is not None:
        return LATIN1, latin1

    utf16 = bytearray(len(input)*2)
    for i in range(0, len(input)):
        c = input[i]
        utf16[2*i] = c >> HI_BYTE_SHIFT 
        utf16[2*i+1] = c >> LO_BYTE_SHIFT

    return UTF16, utf16

The issue occurs because `input` can be mutated between the moment it finds a char that's above 255 in the first loop and the moment it visits that same char in the second loop.

The solution is to not do that, but instead something like:

    latin1 = bytearray(len(input))
    for i in range(len(input)):
        c = input[i]
        if c <= 255:
            latin1[i] = c
            continue
        
        utf16 = bytearray(len(input)*2)
        for j in range(i):
            utf16[2*j] = latin1[j]
        latin1 = None
        utf16[2*i] = c
        utf16[2*i+1] = c >> 8
        for j in range(i+1, len(input)):
            c = input[j]
            utf16[2*i] = c
            utf16[2*i+1] = c >> 8
        return UTF16, utf16

    return LATIN1, latin1

This means if you find a char that's above 255 you will always append that char to the UTF16 array, there's no possibility that someone will change it under you because you append the exact same char you tested. So you can not get into the situation the essay describes, a utf16 string will always contain at least one non-latin1-code unit.

Non-vectorised performances should an improvement as latin1 to utf16 is a trivial operation (just copy every byte of the input to every other byte of the output)

Though if you vectorised the char to utf16 conversion you now vectorise two loops on bailout (latin1 -> utf16 up to i, then char -> utf16) which is probably less efficient. I don't know if the JDK has vectorised optimisations, the source has "HotSpotIntrinsicCandidate" annotations but I don't know to what extent the intrinsics go.

neandrake · on July 12, 2023

Coming back to this the answer is probably not strictly about concurrency and so there is a possibility something would need addressed about this.