More

divideby0 · on July 26, 2020

While this is clearly xenophobic and just plain awful, it seems like the most sensible workaround for this is for universities to offer very small (4 students or less per session) in-person classes specifically for their new international students. This could work essentially like office hours with faculty leading small group discussions to help acclimate international students to their new lives abroad. I can imagine anyone moving to the US in the year 2020 (the peak of our crazy) could use some emotional support.

divideby0 · on Dec 11, 2017

I'm not quite certain I understand that last part. To the best of my understanding, Kafka's design is such that consumer ingestion is completely decoupled from production throughput. All messages from the producer are first copied to disk and then zero-copied from disk to network to the consumers at the byte array level. If a consumer falls behind, it has its own independent offset stored on the broker that keeps track of where in the byte array (or log) it left off. This, by design, allows Kafka to handle different profiles of consumers and even have consumers drop off entirely and later join to catch up. But perhaps I'm missing something about what you're saying.

divideby0 · on July 29, 2016

That works for some use cases, but for others (Elasticsearch, Zookeeper, Kafka, etc) the service inside the container needs to bind to an interface associated with an IP that's also addressable by the host. Even in host networking mode, eth0 inside a DFM-powered container will bound something like 192.168.x.y but that 192.168.x.0 subnet is completely inaccessible from the host.

divideby0 · on March 31, 2016

I believe it's called a Sankey diagram:

http://bl.ocks.org/d3noob/5028304

divideby0 · on Sept 20, 2015

us-east-1 is where they typically deploy new features/hardware first (with the exception of efs which went to us-west first for some reason). it's also by far the largest region, with the most tenants and the heaviest traffic, so it's approaching the limit on what's physically possible to do in a public data center.

divideby0 · on Sept 20, 2015

Sign-ins to AWS console also appear to be timing out:

https://www.evernote.com/l/ABkKLgp3RjRDe5uV4pMlyVg1uzkW41DG4...

divideby0 · on Sept 20, 2015

that's completely untrue. there are many ways to do fault-tolerance in AWS. it's expensive, but it's possible. netflix even goes as far as simulating the failure of entire aws regions in their simian army testing suite:

http://techblog.netflix.com/2011/07/netflix-simian-army.html

That's why Netflix stays up when us-east or us-west are down.

kordless · on Sept 20, 2015

It is true. Amazon offloads a decent amount of fault tolerance to the application provider, as you point out here. I will also mention that Netflix does not solely rely on Amazon for running their services. They run their own decentralized caching layer: https://openconnect.netflix.com/deliveryOptions/

beagledude · on Sept 20, 2015

Netflix is down.

veverkap · on Sept 20, 2015

Works fine for me

divideby0 · on Sept 11, 2015

In addition to some of the other replies, Consul also has a writeup on their website comparing itself with Zookeeper, etcd, etc:

https://consul.io/intro/vs/zookeeper.html

divideby0 · on Sept 11, 2015

So one of the challenges in a containerized environment is that services start up on random ports. If I'm running a Postgres container with the default Docker networking mode, for example, the internal port of 5432 may be bound to the host port of 12345. This allows me to spin up multiple instances of Postgres on the same machine for greater service density.

However, in a distributed environment, services can spin up on different machines. The instance of Postgres my application needs could be on server1:12345 or server2:23456. But in a distributed system, you need a cross-cutting service that's available to all servers so that if my app is running on server1, it can find the right Postgres instance running on server2.

I'm not an expert on etcd, but my understanding is that the most common use case is to run etcd on each host machine. When services start up, their supervisor registers the service's hostname, port, etc with etcd's key-value store. This registration is then propagated to other etcd nodes in a consistent manner, using a consensus algorithm called Raft:

http://thesecretlivesofdata.com/raft/

Consensus actually turns out to be one of the harder problems in a distributed system design. If I have a network partition that prevents etcd instances from seeing each other, you don't want one instance reporting incorrect or stale data. Otherwise, my application could be writing to the wrong service, causing data loss.

Etcd does consensus extremely well, and in a way that scales to support hundreds of nodes. It's one of the two distributed systems I'm aware of that have (mostly) passed Jepsen testing:

https://aphyr.com/posts/316-call-me-maybe-etcd-and-consul

There are also alternatives like Consul and Zookeeper, but in the case of Zookeeper you have to do a lot of the heavy lifting yourself to support service discovery. There are also some well-documented caveats:

https://tech.knewton.com/blog/2014/12/eureka-shouldnt-use-zo...

Consul also has a pretty fair writeup (IMHO) on the tradeoffs of each solution on their website:

https://consul.io/intro/vs/zookeeper.html

divideby0 · on July 22, 2015

Charlie Nutter gave a great talk covering this and a lot of additional new JVM features here:

https://vimeo.com/114187541

Edit: The JNR stuff is covered around 42 minutes in.