Jsonnet is not so bad

Okay, jsonnet does not have such a bad experience nowadays – there is a pretty good LSP server implementation thanks to the team at Grafana, there’s jsonnetfmt that makes the jsonnet code style the same across projects, and we have LLMs that make it much easier to learn any new language.

Also, since it is an interpreted language that simply yields JSON, it is very easy to prototype and test changes. This was especially apparent during the recent hackathon where our team used jsonnet to implement a “dynamic dashboard generator” in Grafana, and we won the prize. Since all Grafana dashboards are JSONs, it is very to generate them with a language specifically geared towards it.

The problem with JSON, though, is that it by itself has no schema. That is implemented by the JSON schema project. And even then, it is not ideal because it doesn’t fully express all constraints that you might need, e.g. if some value is X, then another value can only be between 0 and 42. Plus, jsonnet has no relation to json schema except that both use jsons. Grafana also famously doesn’t have any (strict) JSON schema. There has been some movement towards it but still no formal specification exists as far as I know.

Mirages of data ownership in Go

Rust has a well-known borrow checker and a whole programming model that ensures there will be no data races. In particular, it is only possible to have one mutable reference or many read-only references but not both types at the same time. Technically, you might think that because Go is pass-by-value i.e. the arguments that you’re giving to a function are copied into some memory location before calling the function, it is impossible to have races in Go too. However, some types like slices and maps are implemented as references so you have to take a lot of care when returning them from a function because the caller might modify them.

This is especially important when using RPC frameworks like gRPC.

I use quite a bit of grpc-go at my work and for Thanos as well, and I recently ran into this “fun” quirk. Take a look at the following service and the corresponding server’s code:

syntax = "proto3";

package helloworld;

service Greeter {
  rpc SayHello (HelloRequest) returns (HelloReply) {}
}

message HelloRequest {
  repeated string names = 1;
}

// The response message containing the greetings for each given name.
message HelloReply {
  map<string, string> replies = 1;
}

And the example code:

type server struct {
	pb.UnimplementedGreeterServer

	replies     map[string]string
	repliesLock sync.Mutex
}

// SayHello implements helloworld.GreeterServer
func (s *server) SayHello(_ context.Context, in *pb.HelloRequest) (*pb.HelloReply, error) {
	log.Printf("Received: %v", in.GetName())
	s.repliesLock.Lock()
	defer s.repliesLock.Unlock()

	return &pb.HelloReply{
		Replies: s.replies,
	}, nil
}

Imagine that the map is being updated concurrently by another goroutine with code like this:

	for i := 0; i < 100; i++ {
		go func() {
			for {
				srv.repliesLock.Lock()
				for k := range srv.replies {
					srv.replies[k] = genRandomString(10)
				}
				srv.repliesLock.Unlock()
			}
		}()
	}

It’s all protected with a mutex so there shouldn’t be any problems, right? … right? No. grpc-go actually “takes ownership” (that’s technically not even possible in Go) of the returned value by sending the returned value (message type) to the client at some unspecified future time. So, the programmer’s function that is called by gRPC must always allocate memory. Otherwise, you get a fun race condition like this:

WARNING: DATA RACE
Write at 0x00c0001d9ec0 by goroutine 78:
  runtime.mapassign_faststr()
      /usr/local/go/src/runtime/map_faststr.go:223 +0x0
  main.main.func1()
      /home/giedriusstatkevicius/dev/grpc-go/examples/helloworld/greeter_server/main.go:89 +0x146

Previous read at 0x00c0001d9ec0 by goroutine 128:
  runtime.mapiternext()
      /usr/local/go/src/runtime/map.go:937 +0x0
  reflect.mapiternext()
      /usr/local/go/src/runtime/map.go:1537 +0x12
  google.golang.org/protobuf/internal/impl.appendMap()
      /home/giedriusstatkevicius/go/pkg/mod/google.golang.org/[email protected]/internal/impl/codec_map.go:274 +0x2eb
  google.golang.org/protobuf/internal/impl.encoderFuncsForMap.func2()
      /home/giedriusstatkevicius/go/pkg/mod/google.golang.org/[email protected]/internal/impl/codec_map.go:56 +0xc4
  google.golang.org/protobuf/internal/impl.(*MessageInfo).marshalAppendPointer()
      /home/giedriusstatkevicius/go/pkg/mod/google.golang.org/[email protected]/internal/impl/encode.go:139 +0x4ac
  google.golang.org/protobuf/internal/impl.(*MessageInfo).marshal()
      /home/giedriusstatkevicius/go/pkg/mod/google.golang.org/[email protected]/internal/impl/encode.go:107 +0xd0
  google.golang.org/protobuf/internal/impl.(*MessageInfo).marshal-fm()
      <autogenerated>:1 +0xc4
  google.golang.org/protobuf/proto.MarshalOptions.marshal()

It’s such a hidden foot gun that even the official grpc-go example almost contains a bug. Here is what the code looks like:

// ListFeatures lists all features contained within the given bounding Rectangle.
func (s *routeGuideServer) ListFeatures(rect *pb.Rectangle, stream pb.RouteGuide_ListFeaturesServer) error {
	for _, feature := range s.savedFeatures {
		if inRange(feature.Location, rect) {
			if err := stream.Send(feature); err != nil {
				return err
			}
		}
	}
	return nil
}

Since feature ends up on the wire at some unspecified time in the future and because feature is a variable that gets reused between iterations (until Go 1.22), means there is a race. The only saving grace is that s.savedFeatures is read-only i.e. it is set once and never modified.

It’s even more fun if you are using data where []byte is mmaped. At least my understanding is that Prometheus is free to munmap once Select() is over so Thanos needs to make a copy of each []byte. Here it is happening in Receive: https://github.com/thanos-io/thanos/blob/03c96d05a02425421e0d0b80814c2cd9c765b371/pkg/store/tsdb.go#L334-L335 (https://github.com/thanos-io/thanos/pull/6203).

So, overall, even if a programming language passes everything by value, if the value is a reference, unexpected races can occur. I hope grpc-go will improve the interface in the future. The recent pooling changes is a good step.