Hacker Newsnew | past | comments | ask | show | jobs | submit | ebb_earl_co's commentslogin

> This was costing us ~$300K/year in compute, and the number kept growing as more customers and detection rules were added.

Maybe I’m out of touch, but I cannot fathom this level of cost for custom lambda functions operating on JSON objects.


They said in the article that they were running up to 200 pods at a time. Doing some back of the envelope math, 200 pods at $300,000 year is about $0.17/hour, which is exactly what an EC2 c5.xlarge costs per hour (on demand). That has 4 vCPUs, so about 800 vCPUs during peak, with $0.0425/CPU-hour.

I do have some questions like:

* Did they estimate cost savings based on peak capacity, as though it were running 24x7x365?

* Did they use auto scaling to keep costs low?

* Were they wasting capacity by running a single-threaded app (Node-based) on multi-CPU hardware? (My guess is no, but anything is possible)


This is a helpful breakdown, thanks, @otterley.

It is, by orders of magnitude, larger than any deployment that I have been a part of in my work experience, as a 10-year data scientist/Python developer.


This is larger than the resources I have available at Medium-Size-Fabless-Semi-Inc, and larger than the time I had two racks of C++ build farm. It is of course way larger than StackOverflow, which ran for years on two large machines.

All for .. a meta-SaaS?


This is where the cost came from.

>The reference implementation is JavaScript, whereas our pipeline is in Go. So for years we’ve been running a fleet of jsonata-js pods on Kubernetes - Node.js processes that our Go services call over RPC. That meant that for every event (and expression) we had to serialize, send over the network, evaluate, serialize the result, and finally send it back.

But either way, we're talking $25k/mo. That's not even remotely difficult to believe.


First I thought they were AWS lambda functions, perhaps possible if they are over-provisioned for very concurrency or something similar $25k/month is in realm of possibility.

But no, the the post is talking about just RPC calls on k8s pods running docker images, for saving $300k/year, their compute bill should be well above $100M/year.

Perhaps if it was Google scale of events for billions of users daily, paired with the poorest/inefficient processing engine, using zero caching layer and very badly written rules, maybe it is possible.

Feels like it is just an SEO article designed to catch reader's attention.


It has to be satire right? Like, you aren't out of touch on this. I get engineers maybe making the argument that $300k / year on cloud is the same as 1.5 devops engineers managing in-house solutions, but for just json parsing????

For numbers like that, I can never tell whether it's just a vastly larger-scale dataset than any that I've seen as a non-FAANG engineer, OR, a hilariously-wasteful application of "mAnAgEd cLoUd sErViCeS" to a job that I could do on a $200/month EC2 instance with one sinatra app running per core. This is a made-up comparison of course, not a specific claim. But I've definitely run little $40 k8s clusters that replaced $800/month paid services and never even hit 60% CPU.

Right, this is roughly my mental situation, too. I guess that streaming JSON things can eat up compute way faster than I had any intuition for!

I wonder if you've ever worked on a web service at scale. JSON serialization and deserialization is notoriously expensive.

It can be, but $500k/year is absurd. It's like they went from the most inefficient system possible to create, to a regular normal system that an average programmer could manage.

I have no idea if they are doing orders of magnitude more processing, but I crunch through 60GB of JSON data in about 3000 files regularly on my local 20-thread machine using nodejs workers to do deep and sometimes complicated queries and data manipulation. It's not exactly lightning fast, but it's free and it crunches through any task in about 3 or 4 minutes or less.

The main cost is downloading the compressed files from S3, but if I really wanted to I could process it all in AWS. It also could go much faster on better hardware. If I have a really big task I want done quickly, I can start up dozens or hundreds of EC2 instances to run the task, and it would take practically no time at all... seconds. Still has to be cheaper than what they were doing.


Curious about the workload, but as Im trying to make a tool about json, what are those files compressed with? What is the size of the average file ? What is their structure (ndjson ? Dict with some huge data structure a few level deep?)

In S3 the JSON is stored in plain-old .zip files. While downloading to local the files are unzipped to plain old JSON. It's basically an object containing tons of data about each website I manage including all fragments of HTML and metadata used on the sites. It can get quite large, some sites have thousands of pages. We often need to find things stored many levels deep in the JSON that may be tricky to find, it isn't usually a specific path, and lots of iterable arrays and objects are involved. The files range from ~20MB to ~400MB, depending on how much content each site has. And we have ~9000 total sites.

They got a 1000x speed up just by switching languages.

I highly doubt the issue was serialization latency, unless they were doing something stupid like reserializing the same payload over and over again.


Well, for starters, they replace the RPC call with an in-process function call. But my point is anybody who's surprised that working with JSON at scale is expensive (because hey it's just JSON!) shouldn't be surprised.

Well everything is expensive at scale, and any deserialization/serialization step is going to be expensive if you do it enough. However yes i would be surprised. JSON parsing is pretty optimized now, i suspect most "json parsing at scale is expensive" is really the fault of other parts of the stack

Would it be better or worse if I had that experience and still said it's stupid?

You didn't say it was stupid. If you had, I would have just ignored the comment. But you expressed a level of surprised that led me to believe you're unfamiliar with how much of a pain in the ass JSON parsing is.

I think OP’s point was surprise that a company would spend so much on such inefficient json parsing. I’m agreeing. I get that JSON is not the fastest format to parse, but the overarching point is that you would expect changes to be made well before you’re spending $300k on it. Or in a slightly more ideal world, you wouldn't architect something so inefficient in the first place.

But it's common for engineers to blow insane amounts of money unnecessarily on inefficient solutions for "reasons". Sort of reminds me of saas's offering 100 concurrent "serverless" WS connections for like $50 / month - some devs buy into this nonsense.


Is it incontrovertibly built in to macOS? I have an iPhone and have never enabled it or Siri, so maybe there is similar off switch for macOS.

It’s like Siri, or spell check, if you don’t use it you turn it off and it doesn’t bother you again.

The AirPods Max 2 are primarily Bluetooth headphones, but support lossless audio over USB-C cable, for what it’s worth.

That is through a ADC then DAC, at least for the previous iteration, analog direct to the drivers was not supported. You would be compounding distortion, and largely throwing away what the external DAC+Amp had on offer.

No, the ultimate beneficiary of LLM-created code is the toll collectors who stole as much intellectual property as they could (and continue to do so), fleecing everyone else that they are Promethean for having done so and for continuing to do so.


Check out the instructions from Tailscale: https://tailscale.com/kb/1280/appletv


I wish there was a way to use the tailscale app to connect to my own vanilla WireGuard endpoint at home. I don’t want to use and pay for tailscale when I can run WireGuard myself. But there seems to be no good WireGuard app for tvOS (there is for iOS and macOS though) and if the TS app works as well as it says, I’m jealous I can’t use it with my setup.

(There’s another really shitty VPN app for tvOS that I tried, but it also costs money so screw that. It’s also buggy as hell and crashes all the time.)

I should add that my use case is the occasional trip where we take the Apple TV with us places and want to access my media library. Or being able to share my media library with extended family (setting their Apple TV up with a vpn to my house.) More complex things like travel routers can work, but are more hassle than I want, although I’m increasingly leaning towards taking the plunge there…


Personal-level Tailscale is free for up to 3 users. So your immediate family is covered even on trips.

You could create an account with any one of their identity providers (or roll your own OIDC, it's possible) and just have it not have a linked credit card. The account you use to authenticate Tailscale doesn't have to be the Apple account that you use to log into the hardware device itself - my wife's laptop, phone, and iPads are logged in under my Tailscale account but separate Apple/iCloud accounts (we have family sharing for our apps, etc., but the TS is usually going to be up to me, so I haven't created another account for her). Free gets you 100 devices, so we're nowhere close to running out of those.


I’m reading that from a departure lounge.

Wish I’d read this a few hours ago and the AppleTV would be coming with me.


How do you manage this? Just copy the stream URL from Jellyfin Web or…?


Yup the Stream URL is a valid HTTP media!


This is why I still prefer Signal; this practice seems to be their modus operandi even though they, too, were affected by AWS us-east-1 catastrophe


Signal used to never collect data on users, but they've changed that a while ago and now they keep user's name, photo, phone number, and a list of their contacts permanently in the cloud protected from the government by nothing except by a leaky enclave and a pin (https://web.archive.org/web/20250117232443/https://www.vice....)

More recently they've started collected the contents of messages into the cloud too, yet to this very day their privacy policy opens with the lie: "Signal is designed to never collect or store any sensitive information." which hasn't been true for a very very long time. I consider their refusal to update their privacy policy to be a massive dead canary warning people that the service has already been compromised, but feel free to take your chances.


You're able to disable the pin feature to prevent that data from being saved though, so it definitely isn't a requirement.

I'm also not sure where you've read that they collect the contents of messages, because as far as I'm aware they still aren't doing that and I can't find any info online that indicates that they are (other than their secure backup feature that's opt-in only I suppose)


Actually you can't. If you choose not to set a pin, Signal just chooses one for you and uses that to upload all your data, only you won't be able to access it. There is no way to prevent your data from being sent to the cloud. For more info see here: https://old.reddit.com/r/signal/comments/htmzrr/psa_disablin... and https://community.signalusers.org/t/what-contact-info-does-t...

The fact that Signal users are still unaware of where their data is going and when should tell you all you need to know about how trustworthy the service is. Not being 100% clear about the risks people take when using software which is promoted for use by people whose freedom and/or lives depend on it being secure is a very bad look for Signal.

As for message backups they are at least opt-in (for now anyway) and you can learn more about them here: https://signal.org/blog/introducing-secure-backups/


Well shit.

Alternatives?

Was hard enough getting my circle on signal.


I don't have a good one sorry. I'm currently using silence for unsecured texting and jami for secure communication. Both are not something I'd recommend to regular people the way Signal used to be back when they let you get secure and insecure texts in one place.


> This is why I still prefer Signal;

You do realize that Signal's CEO is a "former" CIA asset, don't you /s


I am really only familiar with Python, in which I’m pretty sure that the .py becomes .pyc and then CPython translates .pyc into machine instructions.

How does this differ? Is an IR the same idea as Python’s .pyc?


> and then CPython translates .pyc into machine instructions.

What do you mean? CPython is a bytecode compiler and a virtual machine interpreting that bytecode. Or are you talking about the new experimental JIT?


strictly speaking bytecode isn't IR because typically it's not further transformed - IRs are designed to be further transformed. as with all things these aren't hard and fast rules (plenty of compilers run transformations on bytecode, and there are plenty of interpreters for some IRs).


What is the name of Maltese in Maltese? Like “el español” in Spanish, it’s neat to know what languages call themselves


A term for that concept, by the way, is "endonym":

https://en.wikipedia.org/wiki/Endonym_and_exonym


Wikipedia says it's "Malti"


Il-Malti to be precise. Il- means "the" and changes its meaning to that of the language. Malti alone would mean a Maltese person.

Source: I'm also Maltese.


The "Il" in Il-Malti is like "al" in Arabic, which Maltese is closely related to as was pointed out above.

Arabic (language): al-‘arabiyyah (الْعَرَبِيَّة).


'ish' is a pretty universal english suffix. So Spanish is just "españ-ish".


I’m in the same boat, and what tipped me there is the ethical non-starter that OpenAI and Anthropic represent. They strip-mined the Web, ripped off copyrighted works in neat space, admitting that going through the proper channels was a waste of business resources.

They believe that the entirety of human ingenuity should be theirs at no cost, and then they have the audacity to SELL their ill-gotten collation of that knowledge back to you? All the while persuading world governments that their technology is the new operating system of the 21st century.

Give me a dystopian break, honestly.


On top of which, the most popular systems are proprietary applications running on someone else's machines. After everything GNU showed us for 40 years, I'm surprised programmers are so quick to hand off so much of their process to non-free SaaSS.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: