Rust has a well-known borrow checker and a whole programming model that ensures there will be no data races. In particular, it is only possible to have one mutable reference or many read-only references but not both types at the same time. Technically, you might think that because Go is pass-by-value i.e. the arguments that you’re giving to a function are copied into some memory location before calling the function, it is impossible to have races in Go too. However, some types like slices and maps are implemented as references so you have to take a lot of care when returning them from a function because the caller might modify them.
This is especially important when using RPC frameworks like gRPC.
I use quite a bit of grpc-go
at my work and for Thanos as well, and I recently ran into this “fun” quirk. Take a look at the following service and the corresponding server’s code:
syntax = "proto3";
package helloworld;
service Greeter {
rpc SayHello (HelloRequest) returns (HelloReply) {}
}
message HelloRequest {
repeated string names = 1;
}
// The response message containing the greetings for each given name.
message HelloReply {
map<string, string> replies = 1;
}
And the example code:
type server struct {
pb.UnimplementedGreeterServer
replies map[string]string
repliesLock sync.Mutex
}
// SayHello implements helloworld.GreeterServer
func (s *server) SayHello(_ context.Context, in *pb.HelloRequest) (*pb.HelloReply, error) {
log.Printf("Received: %v", in.GetName())
s.repliesLock.Lock()
defer s.repliesLock.Unlock()
return &pb.HelloReply{
Replies: s.replies,
}, nil
}
Imagine that the map is being updated concurrently by another goroutine with code like this:
for i := 0; i < 100; i++ {
go func() {
for {
srv.repliesLock.Lock()
for k := range srv.replies {
srv.replies[k] = genRandomString(10)
}
srv.repliesLock.Unlock()
}
}()
}
It’s all protected with a mutex so there shouldn’t be any problems, right? … right? No. grpc-go
actually “takes ownership” (that’s technically not even possible in Go) of the returned value by sending the returned value (message type) to the client at some unspecified future time. So, the programmer’s function that is called by gRPC must always allocate memory. Otherwise, you get a fun race condition like this:
WARNING: DATA RACE
Write at 0x00c0001d9ec0 by goroutine 78:
runtime.mapassign_faststr()
/usr/local/go/src/runtime/map_faststr.go:223 +0x0
main.main.func1()
/home/giedriusstatkevicius/dev/grpc-go/examples/helloworld/greeter_server/main.go:89 +0x146
Previous read at 0x00c0001d9ec0 by goroutine 128:
runtime.mapiternext()
/usr/local/go/src/runtime/map.go:937 +0x0
reflect.mapiternext()
/usr/local/go/src/runtime/map.go:1537 +0x12
google.golang.org/protobuf/internal/impl.appendMap()
/home/giedriusstatkevicius/go/pkg/mod/google.golang.org/[email protected]/internal/impl/codec_map.go:274 +0x2eb
google.golang.org/protobuf/internal/impl.encoderFuncsForMap.func2()
/home/giedriusstatkevicius/go/pkg/mod/google.golang.org/[email protected]/internal/impl/codec_map.go:56 +0xc4
google.golang.org/protobuf/internal/impl.(*MessageInfo).marshalAppendPointer()
/home/giedriusstatkevicius/go/pkg/mod/google.golang.org/[email protected]/internal/impl/encode.go:139 +0x4ac
google.golang.org/protobuf/internal/impl.(*MessageInfo).marshal()
/home/giedriusstatkevicius/go/pkg/mod/google.golang.org/[email protected]/internal/impl/encode.go:107 +0xd0
google.golang.org/protobuf/internal/impl.(*MessageInfo).marshal-fm()
<autogenerated>:1 +0xc4
google.golang.org/protobuf/proto.MarshalOptions.marshal()
It’s such a hidden foot gun that even the official grpc-go
example almost contains a bug. Here is what the code looks like:
// ListFeatures lists all features contained within the given bounding Rectangle.
func (s *routeGuideServer) ListFeatures(rect *pb.Rectangle, stream pb.RouteGuide_ListFeaturesServer) error {
for _, feature := range s.savedFeatures {
if inRange(feature.Location, rect) {
if err := stream.Send(feature); err != nil {
return err
}
}
}
return nil
}
Since feature
ends up on the wire at some unspecified time in the future and because feature
is a variable that gets reused between iterations (until Go 1.22), means there is a race. The only saving grace is that s.savedFeatures
is read-only i.e. it is set once and never modified.
It’s even more fun if you are using data where []byte
is mmaped
. At least my understanding is that Prometheus is free to munmap
once Select() is over so Thanos needs to make a copy of each []byte
. Here it is happening in Receive: https://github.com/thanos-io/thanos/blob/03c96d05a02425421e0d0b80814c2cd9c765b371/pkg/store/tsdb.go#L334-L335 (https://github.com/thanos-io/thanos/pull/6203).
So, overall, even if a programming language passes everything by value, if the value is a reference, unexpected races can occur. I hope grpc-go will improve the interface in the future. The recent pooling changes is a good step.