More

klmr · on March 2, 2024

> But in all honesty for the pharmaceutical industry it’s mostly momentum that keeps R on top

I can’t agree with this: especially in PK/PD, R is only just now taking over from the previous (closed-source) systems. Momentum would keep R out, not in.

klmr · on March 2, 2024

> I can’t remember the last time a library incompatibility led to a show stopper.

Oh, it’s very common unless you basically only use < 5 packages that are completely stable and no longer actively developed: packages break backwards compatibility all the time, in small and in big ways, and version pinning in R categorically does not work as well as in Python, despite all the issues with the latter. People joke about the complex packaging ecosystem in Python but at least there is such a thing. R has no equivalent. In Python, if you have a versioned lockfile, anybody can redeploy your code unless a system dependency broke. In R, even with an ‘renv’ lockfile, installing the correct packages version is a crapshoot, and will frequently fail. Don’t get me wrong, ‘renv’ has made things much better (and ‘rig’ and PPM also help in small but important ways). But it’s still dire. At work we are facing these issues every other week on some code base.

hadley · on March 2, 2024

I'd love to hear more about this because from my perspective renv does seem to solve 95% of the challenges the folks face in practice. I wonder what makes your situation different? What are we missing in renv?

klmr · on March 4, 2024

Oh, I totally agree that ‘renv’ probably solves 95% of problems. But those pesky 5%…

I think that most problems are ultimately caused by the fact that R packages cannot really declare versioned dependencies (most packages only declare `>=` dependency, even though they could also give upper bounds [1]; and that is woefully insufficient), and installing a package’s dependencies will (almost?) always install the latest versions, which may be incompatible with other packages. But at any rate ‘renv’ currently seems to ignore upper bounds: e.g. if I specify `Imports: dplyr (>= 0.8), dplyr (< 1.0)` it will blithely install v1.1.3.

The single one thing that causes most issues for us at work is a binary package compilation issue: the `configure` file for ‘httpuv’ clashes with our environment configuration, which is based on Gentoo Prefix and environment modules. Even though the `configure` file doesn’t hard-code any paths, it consistently finds the wrong paths for some system dependencies (including autotools). According to the system administrators of our compute cluster this is a bug in ‘httpuv’ (I don’t understand the details, and the configuration files look superficially correct to me, but I haven’t tried debugging them in detail, due to their complexity). But even if it were fixed, the issue would obviously persist for ‘renv’ projects requiring old versions.

(We are in the process of introducing a shared ‘renv’ package cache; once that’s done, the particular issue with ‘httpuv’ will be alleviated, since we can manually add precompiled versions of ‘httpuv’, built using our workaround, to that cache.)

Another issue is that ‘renv’ attempts to infer dependencies rather than having the user declare them explicitly (a la pyproject.toml dependencies), and this is inherently error-prone. I know this behaviour can be changed via `settings$snapshot.type("explicit")` but I think some of the issues we’re having are exacerbated by this default, since `renv::status()` doesn’t show which ones are direct and which are transitive dependencies.

Lastly, we’ve had to deactivate ‘renv’ sandboxing since our default library is rather beefy and resides on NFS, and initialising the sandbox makes loading ‘renv’ projects prohibitively slow — every R start takes well over a minute. Of course this is really a configuration issue: as far as I am concerned, the default R library should only include base and recommended packages. But it in my experience it is incredibly common for shared compute environments to push lots of packages into the default library. :-(

---

[1] R-exts: “A package or ‘R’ can appear more than once in the ‘Depends’ field, for example to give upper and lower bounds on acceptable versions.”

apwheele · on March 2, 2024

Agree with this, I am pretty agnostic to the pandas vs R whatever stuff (I prefer base R to tidyverse, and I like pandas, but realize I am old and probably not in majority based on comments online). But many teams who are "R adherent" folks I talk to are not deploying software in varying environments so much as reporting shops doing ad-hoc analytics.

For those whom want to use both R/python, I have notes on using conda for R environments, https://andrewpwheeler.com/2022/04/08/managing-r-environment....

disgruntledphd2 · on March 2, 2024

Can you not just build your own code as a package and specify exact dependencies?

It's a bit of faff but that seems like it should work (but maybe I'm missing something).

getoffmycase · on March 2, 2024

I basically don’t use anything outside of tidyverse or base R because of the package dependency issues.

wodenokoto · on March 2, 2024

At my old job we snapshotted CRAN and pinned versions of package dependencies _against_ CRAN.

hadley · on March 2, 2024

We now provide snapshotted CRAN binaries (for many platforms) at https://packagemanager.posit.co.

klmr · on Jan 23, 2024

> That's not a lot

Indeed, that wouldn’t be a lot: the framing in the article is grossly misleading. The actual number of protesters this weekend was hundreds of thousands, and plausibly >1M (not “tens of thousands”), according to unanimous reporting in various German media and official sources (see e.g. https://www.tagesschau.de/inland/demos-gegen-rechts-bilanz-1...).

klmr · on Jan 22, 2024

I’m not saying this doesn’t happen but I can’t remember ever having been asked to install a root certificate when joining an airport wifi. And I am confident that this has never happened when I’ve flown out from Gatwick.

llacb47 · on Jan 23, 2024

Perhaps the GCHQ have a backdoor in snapchat then, which is comforting

klmr · on Jan 22, 2024

> why are there not mass protests about this?

Because we hadn’t heard about this case before. — At all. If the vague information in this article is actually corroborated there’ll be an outcry alright.

neom · on Jan 22, 2024

I learned about it 2 years ago:

https://www.dailymail.co.uk/news/article-11002673/Chess-play...

"The message was picked up by their mobiles on Gatwick airport's Wi-Fi servers and immediately triggered alarm bells with security because of the sensitive words used."

I thought that was weird then and I still think it's weird now, either there is a pretty big problem or the daily mail is inaccurate. This whole article is lala land from a tech perspective and I'm quite curious about it.

ghaff · on Jan 22, 2024

From that article it seems like it's the kid who assumed it was the WiFi that caused the leak.

'But I was using the data on my phone and they were using the Wi-Fi at Gatwick and so the message was picked up by the security people.'

As far as I know, it doesn't seem like anyone in the UK has said where they got the info from. And the Daily Mail may just be assuming the kid's right and is jumping to a conclusion.

klmr · on Jan 22, 2024

The entire point of TLS (let alone E2EE!) is to make something like this particular scenario safe.

klmr · on Oct 20, 2023

Why? Because all typical, existing applications written for POSIX systems use the POSIX API (usually indirectly) to interact with the filesystem and perform IO. Being able to use that vast ecosystem of existing applications seamlessly with object storage is such a common requirement that a good dozen different object storage-to-POSIX abstraction layers exist (many of them on top of FUSE).

Even if you build your scalable production environment from scratch with native object storage support (which is rare!), it’s eminently useful to be able to use coreutils to interact with said object storage for devops: nothing beats e.g. running grep on a list of log “files” in an S3 bucket in terms of convenience.

klmr · on Oct 19, 2023

> this is not a recipe for stability of your binaries

I see the point in theory, but it works incredibly well in practice. The specific technology has been used in production by major companies for years. It even carefully works around buggy software that makes incorrect assumptions for undefined behaviour.

(COI disclaimer: I used to work on this product; but I no longer have any stakes in it, financial or otherwise. I do still use the product, because it’s vastly superior to the alternatives.)

DannyBee · on Oct 19, 2023

Certainly it can be made to work, I spent years on things like it, etc.

I actually did similar in production at Google, which i suspect is one of the companies you are referring to.

I'm even overall a fan of using JITted code in production for normally "static" languages.

But - i've never seen this sort of thing break through the significant resistance/feeling of taboo that often exists around doing that kind of thing in production, long term.

You will find companies here and there willing, sure. But writ large? It eventually goes away, even at companies willing to try it.

I do hope they get past all that, i just ... am skeptical.

klmr · on Oct 19, 2023

> … at Google, which i suspect is one of the companies you are referring to.

Just to clarify: no, I was referring to companies which are using products from the company behind cunoFS, and which share the actual code base of the functional interposition. My point is that while this technique is complex and brittle in general, this specific code-base is incredibly battle-tested and has proved itself even in fairly arcane configurations.

You’re definitely right about there being some amount of resistance, but functional interposition offers some compelling advantages over all alternative solutions in terms of ease of use and unparalleled performance.

cuno · on Oct 19, 2023

Hi, author here.

Yes we have many large companies (Fortune Global 500) down to small organisations using our software with this kind of interception (see for example https://cuno.io/about-us/). It took us a decent sized team a lot of years to get right, because it is so very hard a problem to crack. But we think it is worth it. And for those who don't want to use such interception, we do offer a FUSE layer as well that still offers much higher performance than alternatives.

klmr · on Oct 19, 2023

> but what has this to do with storage systems?

The article is about POSIX compliance (or lack thereof) of storage systems.

And the article does not imply that “POSIX … sucks”. On the contrary: the answer to the rhetoric question in the title is obviously “no”.

> It's almost always a better idea … to write your own thin specialized glue layer between your application code and operating system APIs.

Oh, definitely. But writing good abstractions that work equally well with POSIX-compliant filesystems and with object storage is basically impossible without massive trade-offs. That’s the entire point of the product that’s advertised here: it provides a link between object storage and POSIX-compliant file access that manages these trade-offs extremely well, and it allows users to use their existing glue code for POSIX without having to deal with object storage altogether.

(COI disclaimer: I used to work on this product; but I no longer have any stakes in it, financial or otherwise.)

klmr · on Oct 19, 2023

It’s pretty clear that the article is only comparing object storage with POSIX in the context of filesystems, not with all of POSIX.

tgv · on Oct 19, 2023

Well, Amazon is not very interested in making its APIs compatible with other cloud providers. But the whole thing is unrelated to POSIX itself. It makes just as much sense when you replace POSIX with e.g. HDMI cables.

adolph · on Oct 19, 2023

Hard to say what Amazon is or isn’t interested in. The success of S3 is such that significant downstream client libraries are used by people who pay for S3, enough to provide a measure of api stability, simply out of customer inertia.