It's all fun and games until some RPC gets in the mix and you need to start dropping requests, retrying, throttling, load balancing, and protecting against cascading failure across multiple nodes.
for {
select {
val := <- someChan
doSomething(val)
}
}
for single channel reads very common. I don't like this pattern because you can't close the channel, which is a useful way of signalling you want the routine to end. I guess you could add a signal channel in that select block too, but the language supports "auto exit loop" on close already, via the range function
Instead:
for val := range someChan {
doSomething(val)
}
It might be subjective but
1. It reads better
2. It leaves the loop when you call close(someChan), which is neat
But that's just my take. I'd be curious if anyone has any reasons for the single-channel-read-in-a-select-block approach
The downside to that more "not idiomatic Go" example is that you lose useful stack traces, and it complicates contexts (timeouts, credentials, tracing, etc… etc…).
For this reason the article I would say is also suboptimal, and is non-idiomatic. It's something you'd do in other languages, but should be avoided in Go.
In my opinion you should just start a goroutine. The goroutine could block on a semaphore/channel to limit concurrency, but there's nothing inherently more costly about having the goroutines themselves be the queue instead of like the article having a list.
Yes, it'll probably take a bit more memory to create a goroutine than to add to a list, but almost always I'll take that to get more correct behavior.
But also, your suggestion doesn't handle the requirement "Calling the service should not block the caller", does it?
The article overengineered, by far. You don't need infrastructure for this. It's just:
for work := range workGenerator() {
go process(work)
}
to limit concurrency, create a semaphore and just block either before starting goroutine (thereby blocking caller):
sem := sync.NewSemaphore(runtime.NumCPU())
for work := range workGenerator() {
work:=work
sem.Acquire(1)
go func() {
defer sem.Release()
process(work)
}()
}
Or in the goroutine, to not block the caller:
sem := sync.NewSemaphore(runtime.NumCPU())
for work := range workGenerator() {
work:=work
go func() {
sem.Acquire(1)
defer sem.Release()
process(work)
}()
}
You don't need the infrastructure from the article and, as I described, it's actually hurting.
Not blocking the caller is usually bad, the program can run out of memory if the producer is much faster than the consumers.
Also, in more realistic scenarios "process" can error, one would want to use the errgroup package.
I presented both solutions because "it depends". Though I agree with "usually".
Yes, if in doubt then block the caller. I mainly provided it as an example because the article had it as an explicit requirement.
You're right about errgroup. One can fit only so much in a HN comment. And I didn't want to distract by making it look like "no, you should use my favourite library instead".
But again it depends. If an error is handled by doing log.Fatal(), then there's no point in using errgroup.
I'm also not passing ctx, which much (most?) nontrivial code should pass.
I have done a more generic version of this with success.
You have N workers, each reading from a separate input channel for work to do.
Then you have an outer "channel of channels", holding those worker channels. When you want to submit work, select a worker channel from the outer channel, submit, and push the worker channel back on the outer channel when done.
It's not necessarily idiomatic or nonidiomatic, it's just a pattern that's appropriate in some circumstances and not appropriate in others.
If you want best-effort asynchronous job processing semantics, then it's totally appropriate to use something like what's described in the article -- ideally with a lot less code :)
If you want request-response, then, yeah, this isn't appropriate, and I agree that you should do whatever work in the request goroutine, blocking as necessary.
I guess it's because on one hand the context is handled by the function directly (with the usual `process(ctx, job)` pattern) and on the other hand the context needs to be part of the job, which is less idiomatic ? Functions vs Types. Interesting.
The issue with spawning goroutines all the time is that spawning, while cheap individually, can become expensive if done in large numbers (like 1 Million times a minute as in the article). Is that not considered problematic ?
Yeah, in the longer comment I said that it "complicates" contexts, because you can put the context in the work unit. But it's very easy to get it wrong.
E.g. the godoc for the context package itself says "Do not store Contexts inside a struct type". I understand the reason to be that it's easy to make the lifetime nonobvious, and make it "strange" (hard to read and reason about) if it's not very strictly kept under control. In other words it becomes a foot-gun as the code base grows.
And even then the stack is still unhelpful.
Benchmarking for your own workload beats anything else no matter what the language, of course. But it may change in the future, too. Maybe the next version of Go actually makes it faster? In other words it's not wrong, but great care should be used.
One reason it can be faster to spawn goroutines is that IIRC spawning a goroutine actually makes the current OS thread start running that. And if it completes then it can jump back to the spawner. In other words spawning goroutine can save thread creations and context switches, while a goroutine pool will likely incur an OS context switch, with cache implications and other complex interactions.
But yes, measure with your actual workload is king.
agreed.. separating the signal from the request queue seems like an anti-pattern... Not sure who the author is, and why this has reached front page so we probably missed something
The article doesn’t seem to do a great job of explaining the why of things. From small details, all the way down to why would I use this over simpler more common patterns?
Doesn't this mean the caller may block if there are already the maximum number of inflight requests being run? I use the same type of pattern, but the core loop has a queue of input objects. The loop blocks waiting for input from either a caller or a go routine that completed. It enqueues the request from a caller, or lowers the inflight count from the complete signal. Then, it checks if we have capacity to run the request, and, if we do, dequeues the oldest request and submits it.
This means that the queue could grow unboundedly, but that is ok, as this type of structure is not meant for constant request handling but rather for bursts of requests that can happen asynchronously (and in parallel), and allows us to limit the maximum number that can run while also never blocking callers if there are no available go routines to process their request when submitted.
Here is a minimially-viable example of how I do it:
Yes, that is my solution which uses the queue. I mean what the author of the post was doing would block, and I offered the code as how I do it in a non-blocking way. Sorry for the confusion!
Parallelism is easy, Joint (on the same memory) Parallelism is hard; and you can't do it efficiently without a complex memory model so Erlang, Go and Rust are out of the picture.
Hell even C/C++ is going to have a hard time, you need a VM with garbage collector!
>Java or C# (if the non-blocking, concurrency and GC are good enough from M$, I wouldn't bet on it).
I have a hard time seeing how these two languages could possibly be considered best-in-class for concurrency, or their memory models. Loads of concurrent code as been written in these languages despite their concurrency and (shared!) memory models, not thanks to it. Imperative, shared-memory concurrency is the hardest way to do concurrency. In that respect, Go at least provides some syntactic sugar to encourage good practices. Either we're talking past each other or one of us is missing something.
>Coding without types is not productive if you want to do meaningful work.
That's questionable. And not just because it's hand-wavy (what does "meaningful work" even mean, here?). It's trivial to come up with counter-examples to even the most charitable interpretation of "types are required for meaningful work". If your statement were true, Rails, Django, Clojure, JavaScript and hell, even PHP wouldn't be such raging industrial successes.
>Clojure does not exist without the Java VM.
That's false. It's a hosted language, yes, but it's specifically designed not to be JVM-dependent (see, for example: ClojureScript).
if there is mutex involved it will always be blocking in some way. the best way i think would be to package the client's request with response channel. loop through incoming requests and pass them to workers(goroutines) which will send the response to individual client and close said response channel. then it is merely a decision on whether to go with N workers that will process the request channel or if each request will result in its own worker via goroutine.
Please use the stdlib with Go. Try to refrain from recommending "patterns" as something like "building blocks". Go code design works better without "patterns". Thinking and refactoring are encouraged.
Let's say, here's an example, explaining why the different components compose into good/bad design, performance, whatever. Sometimes/usually full idiomatic code makes sense, other times not. What's idiomatic might change too.
There might be very valid or invalid reasons companies use internal patterns. They all tend to turn sour after commoditization. If a stable library fully encapsulates a separable need, might be good to use it.