Non-Blocking Parallelism for Services in Go

titzer · on Jan 19, 2021

It's all fun and games until some RPC gets in the mix and you need to start dropping requests, retrying, throttling, load balancing, and protecting against cascading failure across multiple nodes.

smokey_circles · on Jan 20, 2021

I find the pattern

    for {
        select {
            val := <- someChan
            doSomething(val)
        }
    }

for single channel reads very common. I don't like this pattern because you can't close the channel, which is a useful way of signalling you want the routine to end. I guess you could add a signal channel in that select block too, but the language supports "auto exit loop" on close already, via the range function

Instead:

    for val := range someChan {
        doSomething(val)
    }

It might be subjective but

1. It reads better

2. It leaves the loop when you call close(someChan), which is neat

But that's just my take. I'd be curious if anyone has any reasons for the single-channel-read-in-a-select-block approach

weddpros · on Jan 19, 2021

I would start n goroutines select'ing on the same bounded channel, which I find much simpler... or did I miss something?

knorker · on Jan 19, 2021

The downside to that more "not idiomatic Go" example is that you lose useful stack traces, and it complicates contexts (timeouts, credentials, tracing, etc… etc…).

For this reason the article I would say is also suboptimal, and is non-idiomatic. It's something you'd do in other languages, but should be avoided in Go.

In my opinion you should just start a goroutine. The goroutine could block on a semaphore/channel to limit concurrency, but there's nothing inherently more costly about having the goroutines themselves be the queue instead of like the article having a list.

Yes, it'll probably take a bit more memory to create a goroutine than to add to a list, but almost always I'll take that to get more correct behavior.

But also, your suggestion doesn't handle the requirement "Calling the service should not block the caller", does it?

The article overengineered, by far. You don't need infrastructure for this. It's just:

  for work := range workGenerator() {
     go process(work)
  }

to limit concurrency, create a semaphore and just block either before starting goroutine (thereby blocking caller):

  sem := sync.NewSemaphore(runtime.NumCPU())
  for work := range workGenerator() {
    work:=work
    sem.Acquire(1)
    go func() {
      defer sem.Release()
      process(work)
    }()
  }

Or in the goroutine, to not block the caller:

  sem := sync.NewSemaphore(runtime.NumCPU())
  for work := range workGenerator() {
    work:=work
    go func() {
      sem.Acquire(1)
      defer sem.Release()
      process(work)
    }()
  }

You don't need the infrastructure from the article and, as I described, it's actually hurting.

rthinker · on Jan 19, 2021

Not blocking the caller is usually bad, the program can run out of memory if the producer is much faster than the consumers. Also, in more realistic scenarios "process" can error, one would want to use the errgroup package.

knorker · on Jan 19, 2021

I presented both solutions because "it depends". Though I agree with "usually".

Yes, if in doubt then block the caller. I mainly provided it as an example because the article had it as an explicit requirement.

You're right about errgroup. One can fit only so much in a HN comment. And I didn't want to distract by making it look like "no, you should use my favourite library instead".

But again it depends. If an error is handled by doing log.Fatal(), then there's no point in using errgroup.

I'm also not passing ctx, which much (most?) nontrivial code should pass.

sprt · on Jan 19, 2021

Agree, in fact this pattern is found almost verbatim in Go's documentation: https://pkg.go.dev/golang.org/x/sync/semaphore#example-packa...

burnthrow · on Jan 20, 2021

Go's Memory Model documentation specifically recommends modeling a counting semaphore with a buffered channel.

https://golang.org/ref/mem#tmp_7

makeworld · on Jan 19, 2021

Where is that semaphore struct from? I can't find sync.NewSemaphore in "sync" or "golang.org/x/sync".

Thaxll · on Jan 19, 2021

There is a more standard way of doing that with a blocking channel.

  c := make(chan struct{}, 10)
  for job := range jobs {
    c <- struct{}{}
    go func(){
      defer func() { <-c }()
    }(job)
  }

andrenth · on Jan 19, 2021

I believe it's meant to be

    sem := semaphore.NewWeighted(int64(runtime.NumCPU()))

from "golang.org/x/sync/semaphore"

kitd · on Jan 19, 2021

Indeed.

I like the push approach outlined here:

http://marcio.io/2015/07/handling-1-million-requests-per-min...

I have done a more generic version of this with success.

You have N workers, each reading from a separate input channel for work to do.

Then you have an outer "channel of channels", holding those worker channels. When you want to submit work, select a worker channel from the outer channel, submit, and push the worker channel back on the outer channel when done.

knorker · on Jan 19, 2021

Core people in Go have enlightened me that the "thread worker" model in Go is non-idiomatic.

It breaks contexts (and all that implies, like credentials, tracing, timeouts) and stacks (making it harder to debug).

You should just start the goroutine instead of enqueueing it.

More info here: https://news.ycombinator.com/item?id=25831844

sagichmal · on Jan 19, 2021

It's not necessarily idiomatic or nonidiomatic, it's just a pattern that's appropriate in some circumstances and not appropriate in others.

If you want best-effort asynchronous job processing semantics, then it's totally appropriate to use something like what's described in the article -- ideally with a lot less code :)

If you want request-response, then, yeah, this isn't appropriate, and I agree that you should do whatever work in the request goroutine, blocking as necessary.

rakoo · on Jan 19, 2021

I guess it's because on one hand the context is handled by the function directly (with the usual `process(ctx, job)` pattern) and on the other hand the context needs to be part of the job, which is less idiomatic ? Functions vs Types. Interesting.

The issue with spawning goroutines all the time is that spawning, while cheap individually, can become expensive if done in large numbers (like 1 Million times a minute as in the article). Is that not considered problematic ?

knorker · on Jan 19, 2021

Yeah, in the longer comment I said that it "complicates" contexts, because you can put the context in the work unit. But it's very easy to get it wrong.

E.g. the godoc for the context package itself says "Do not store Contexts inside a struct type". I understand the reason to be that it's easy to make the lifetime nonobvious, and make it "strange" (hard to read and reason about) if it's not very strictly kept under control. In other words it becomes a foot-gun as the code base grows.

And even then the stack is still unhelpful.

Benchmarking for your own workload beats anything else no matter what the language, of course. But it may change in the future, too. Maybe the next version of Go actually makes it faster? In other words it's not wrong, but great care should be used.

One reason it can be faster to spawn goroutines is that IIRC spawning a goroutine actually makes the current OS thread start running that. And if it completes then it can jump back to the spawner. In other words spawning goroutine can save thread creations and context switches, while a goroutine pool will likely incur an OS context switch, with cache implications and other complex interactions.

But yes, measure with your actual workload is king.

kitd · on Jan 19, 2021

I agree, it may break contexts, but the "idiomatic" form is mentioned in the article I linked and was found much less scalable than the final version.

It's all tradeoffs, I guess.

bsaul · on Jan 19, 2021

agreed.. separating the signal from the request queue seems like an anti-pattern... Not sure who the author is, and why this has reached front page so we probably missed something

donatj · on Jan 19, 2021

The article doesn’t seem to do a great job of explaining the why of things. From small details, all the way down to why would I use this over simpler more common patterns?

andonisus · on Jan 19, 2021

Doesn't this mean the caller may block if there are already the maximum number of inflight requests being run? I use the same type of pattern, but the core loop has a queue of input objects. The loop blocks waiting for input from either a caller or a go routine that completed. It enqueues the request from a caller, or lowers the inflight count from the complete signal. Then, it checks if we have capacity to run the request, and, if we do, dequeues the oldest request and submits it.

This means that the queue could grow unboundedly, but that is ok, as this type of structure is not meant for constant request handling but rather for bursts of requests that can happen asynchronously (and in parallel), and allows us to limit the maximum number that can run while also never blocking callers if there are no available go routines to process their request when submitted.

Here is a minimially-viable example of how I do it:

https://play.golang.org/p/mD_jpMdoY_g

Contents here:

package main

import ( "fmt" "time" "sync" )

func main() { fmt.Println("Hello, playground") wg := &sync.WaitGroup{} wg.Add(1) numRequests := 100

inCh := make(chan int) go func() { resCh := make(chan int) queue := []int{} inFlight := 0 max := 20 completed := 0

  for {
   select{
   case m := <- inCh:
    queue = append(queue, m)
    fmt.Println(fmt.Sprintf("Enqueing: %d", m))
   case <- resCh:
    inFlight -= 1
    completed += 1 
   }
   if len(queue) > 0 && inFlight < max {
    v := queue[0]
    queue = queue[1:]
    fmt.Println(fmt.Sprintf("Submitting: %d", v))
    inFlight += 1
    go func(_v int) {
     time.Sleep(1 * time.Second)
     fmt.Println(fmt.Sprintf("run: %d", _v))
     resCh <- v
    }(v)
   }
   if completed == numRequests {
    wg.Done()
   }
  }
 }()
 
 for i := 0; i < numRequests; i++ {
  inCh <- i
 }
 
 fmt.Println("Waiting for completion...")
 wg.Wait()
 fmt.Println("All processes completed. Exiting.")
}

danesparza · on Jan 19, 2021

Nope. I think this line indicates the work is simply queued, and the call is returned immediately:

queue = append(queue, m)

andonisus · on Jan 19, 2021

Yes, that is my solution which uses the queue. I mean what the author of the post was doing would block, and I offered the code as how I do it in a non-blocking way. Sorry for the confusion!

effectiveshroom · on Jan 19, 2021

Isn't it just a predefined amount of workers stealing from one channel?

bullen · on Jan 19, 2021

Parallelism is easy, Joint (on the same memory) Parallelism is hard; and you can't do it efficiently without a complex memory model so Erlang, Go and Rust are out of the picture.

Hell even C/C++ is going to have a hard time, you need a VM with garbage collector!

Goodbye Karma!

omginternets · on Jan 19, 2021

Which leaves us with what? Clojure?

bullen · on Jan 19, 2021

Java or C# (if the non-blocking, concurrency and GC are good enough from M$, I wouldn't bet on it).

Coding without types is not productive if you want to do meaningful work.

Clojure does not exist without the Java VM.

omginternets · on Jan 22, 2021

>Java or C# (if the non-blocking, concurrency and GC are good enough from M$, I wouldn't bet on it).

I have a hard time seeing how these two languages could possibly be considered best-in-class for concurrency, or their memory models. Loads of concurrent code as been written in these languages despite their concurrency and (shared!) memory models, not thanks to it. Imperative, shared-memory concurrency is the hardest way to do concurrency. In that respect, Go at least provides some syntactic sugar to encourage good practices. Either we're talking past each other or one of us is missing something.

>Coding without types is not productive if you want to do meaningful work.

That's questionable. And not just because it's hand-wavy (what does "meaningful work" even mean, here?). It's trivial to come up with counter-examples to even the most charitable interpretation of "types are required for meaningful work". If your statement were true, Rails, Django, Clojure, JavaScript and hell, even PHP wouldn't be such raging industrial successes.

>Clojure does not exist without the Java VM.

That's false. It's a hosted language, yes, but it's specifically designed not to be JVM-dependent (see, for example: ClojureScript).

Are you expressing thoughts or sentiments, here?

02020202 · on Jan 19, 2021

if there is mutex involved it will always be blocking in some way. the best way i think would be to package the client's request with response channel. loop through incoming requests and pass them to workers(goroutines) which will send the response to individual client and close said response channel. then it is merely a decision on whether to go with N workers that will process the request channel or if each request will result in its own worker via goroutine.

_y5hn · on Jan 19, 2021

Please use the stdlib with Go. Try to refrain from recommending "patterns" as something like "building blocks". Go code design works better without "patterns". Thinking and refactoring are encouraged.

Let's say, here's an example, explaining why the different components compose into good/bad design, performance, whatever. Sometimes/usually full idiomatic code makes sense, other times not. What's idiomatic might change too.

There might be very valid or invalid reasons companies use internal patterns. They all tend to turn sour after commoditization. If a stable library fully encapsulates a separable need, might be good to use it.