Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's true that this would perform better, and greatly reduce allocations. But:

  - Messages (especially opaque ones) are not supposed to be copied. The
    recommendation is to use `m := &mypb.Message{}`.
  - This would make migrating to use the opaque API more difficult, if the
    getters don't return the same type as the old open API fields, much more
    code needs to be rewritten, or some wrapper that allocates a new slice on
    every get.
  - Users expect that `subm := m.GetSubMessages()[2] ;
    m.SetSubMessages(append(m.GetSubMessages(), anotherm))) ; subm.SetInt(42) ;
    assert(subm.GetInt() == m.GetSubMessages()[2].GetInt())`. This would not be
    the case if the API returned a slice of values.
  - ...
Effectively, a slice of pointers is baked into the API, and the way people use protocol buffers in Go. For these reasons, it's not clear to me this would end up performing better or causing less work.

If we had returned an iterator (new in Go 1.23) instead of an actual slice, then it would've been possible to vary the underlying representation (slice-of-pointers, slice-of-value-chunks, ...). But there are other downsides to that too:

  - Allocations when passing iterators to functions that expect a slice.
  - Extra API surface for modifying the list (append, getn, len, ...).
Not that clear of a win either.

Another thing that could be considered is: when decoding, allocate a slice of values ([]mypb.Message), *and* a slice with pointers (or do it lazily): []*mypb.Message. Then initialize:

  for i := range valuel {
    ptrl[i] = &valuel[i] // TODO: verify that this escape doesn't cause disjoint allocations.
  }
That might be beneficial due to grouping allocations, and the user would be none the wiser.


> - Messages (especially opaque ones) are not supposed to be copied.

So?

> - This would make migrating to use the opaque API more difficult

The opaque API is stupid to begin with. Now the objects are no longer threadsafe. You can't just read a message in one thread and process it in two different threads.

> Users expect

Then don't expect this. If you're breaking the API, then at least break it in a way that makes it better afterwards.


> Now the objects are no longer threadsafe. You can't just read a message in one thread and process it in two different threads.

This is not correct. The Opaque API provides the same guarantees as before, meaning you can read a message in one goroutine and then access it (but not modify it) from other goroutines concurrently.


> > - Messages (especially opaque ones) are not supposed to be copied.

> So?

If you have a []mypb.Message, and range over it in the normal way:

  if _, m := range msgs {
    // Use.
  }
That makes a copy of the struct. This is not supported in general for the opaque API, even though it appears to work for standard use cases. The representation is meant to be opaque.


You start with the wrong premise that the messages shouldn't be copied in the first place.

Why?


Copying makes for surprising semantics, and prevents some representation changes.

An example w.r.t. the surprising semantics:

  var ms []mypb.Message = ppb.GetMessages() // A repeated submessage field
  for i, m := range ms {
    m.SetMyInt(i)
  }
  assert(ppb.GetMessages()[1].GetMyInt(1) == 1) // This would fail in general, due to SetMyInt acting on a copy.
This would not work as expected, as I highlighted in the comment. Basically, acting on value types means being very careful about identity. It makes it easy to make mistakes. I like data-driven code, but working around this (sometimes you'd want a copy, sometimes you wouldn't) would be a painful excercise.

You may have noticed that changing a heavily pointerized tree of types into value types often compiles with just a few changes, because Go automatically dereferences when needed. But it often won't work from a semantic point of view because the most intuitive way to modify such types uses copies (the range loop is a good example).

Now imagine changing the representation such that it carries a mutex, or another nocopy type. That would lead to issues unless those nocopy types would be encapsulated in a pointer. But then you get issues with initialization:

  var m mypb.Message // Value type, but what about the *sync.Mutex contained deep within?
Also consider laziness

  func process(lm mypb.LazyMessage) {
    if lm.GetSubMessage().GetInt() != 42 {
      panic("user is not enlightened")
    }
  }

  var lm mypb.LazyMessage
  process(lm) // Copy a fairly large struct.
  ln.GetSubMessage().GetInt() // Does lazy unmarshaling of sub_message redundantly.
  
If you want to make the argument that individual messages should be pointers, but slices should still be value slices. Then I have the following for you:

  ms := m.GetSubMessages() // []mypb.Message
  el := &ms[0]
  anotherEl := new(mypb.Message)
  ms.SetSubMessages(append(ms.GetSubMessages(), anotherEl)) // Can cause reallocation, now el no longer references to m.GetSubMessages()[0]. But it no reallocation happened, it does.
In practice, value typing leads to a bunch of issues.

Since you seem so sure of your position, I'm actually curious. How would you design the API, how would you use it? Do you have any examples I can look at of this style being used in practice?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: