My gRPC Annoyances

We have been using gRPC in Thanos since Thanos inception – it has served us great and it has a ton of useful functionality, it solves a lot of problems, it is easy to use, and so on. However, I feel like some stuff is lacking, especially performance that will most likely never be fixed (or, I should say, changed). The framework just does not solve 100% of the things that the Thanos project needs right now. Let’s go through the list of my pet peeves.

Sophisticated compression

A huge part of network traffic between servers in Thanos is used for sending sets of labels, essentially a map<string, string>. Typically, each map only differs by two values between consecutive maps.

Also, we really need streaming to avoid having to buffer the whole response in memory beforehand. Strictly speaking, streaming might not be required because we buffer everything on the querier side either way right now, but there needs to be a way of building the message incrementally.

However, because gRPC is message-based, it means that it is impossible to have a compression scheme that encompasses multiple messages. What I would like to have is some string table that is shared over the whole stream. This also might be in part because gRPC uses HTTP as its transport and there it is impossible to negotiate compression parameters “out of band”. While researching this topic, I stumbled upon https://github.com/WICG/compression-dictionary-transport. Perhaps this will get through at some point and gRPC will be able to leverage this work.

Compression helps a little bit here but not a lot. The gRPC codec interface has []byte as an input, meaning that the input needs to be a contiguous array of bytes in memory. Generally speaking with compression, repeated sequences of bytes have references to them so it would be possible to avoid allocating memory for those repeated sequences if the protobuf unmarshaling interface didn’t need a []byte and would instead be a io.Reader. https://github.com/grpc/grpc-go/issues/499 misses a huge point – the unmarshaling, I believe, still could accept an io.Reader instead of []byte. The point about Marshal() is valid, though.

Fortunately, there is some recent movement to fix this: https://github.com/grpc/grpc-go/issues/6619.

Mirages of ownership

gRPC-Go presents an interesting conundrum to its users – when the user-written code returns from a function that serves a remote procedure call, gRPC-Go takes that variable and marshals it at some point in the future. This is done for performance purposes but it also means that the user is giving away the ownership of the data and that the data must always be not changed after returning. This could become a foot gun in case slices or maps are used because the value of variables of such types are references, making it easy to mutate them accidentally.

Here’s how the “hello world” example looks in grpc-go:

// SayHello implements helloworld.GreeterServer
func (s *server) SayHello(ctx context.Context, in *pb.HelloRequest) (*pb.HelloReply, error) {
	log.Printf("Received: %v", in.GetName())
	return &pb.HelloReply{Message: "Hello " + in.GetName()}, nil
}

In this case, grpc-go takes ownership of &pb.HelloReply{Message: "Hello " + in.GetName()}. This obviously presents no problems but what if there were some slice or map?

type server struct {
  names []string
  ...
}

// Constantly running in the background.
func (s *server) updateNames() {
  for {
    for i := 0; i < len(s.names); i++ {
      s.names[i] = generateRandomName()
    }
    time.Sleep(1*time.Second)
  }
}

// SayHello implements helloworld.GreeterServer
func (s *server) SayHello(ctx context.Context, srv pb.HelloServer) (error) {
        return srv.SendMsg(&pb.HelloReply{Names: s.names})
}

Now this is a bit bad because grpc-go might be trying to marshal s.names while it is being updated in the background. You can find some more context here.

Of course, this is a contrived example and you might think how this could even happen in practice. I had a botched attempt at adding pooling to the marshaling side in Thanos: https://github.com/thanos-io/thanos/pull/4609/files. Fortunately, SendMsg now includes a helpful text:

	// It is not safe to modify the message after calling SendMsg. Tracing
	// libraries and stats handlers may use the message lazily.

Hopefully, all of these problems will be fixed at some point. I am confident that the Codec interface will soon change for the better but the compression stuff will take a longer time. It would allow us to really reduce the memory and CPU usage of Thanos.

Perhaps we will also look at other RPC frameworks in the future. dRPC is a recent project that piqued my interest. I have only dabbled with it for a few hours so I don’t have any opinion on it so far. That’s something for future posts!

How I Started Programming

It all started in, I believe, the eventful year of 2001. I was 6 years old. Our family was lucky enough to get our own first personal computer! Completely for us and used by all members of our family back then. I consider that to be incredible luck given that Lithuania recently regained its independence after the Soviet occupation. I think we got internet connected to our house around 2003 so for a few years we were offline. My mom was still studying back then and she used the computer a lot for her studies. But me and my brother mostly used it for entertainment – music, games, videos. Someone left some pre-installed games like Dave Mirra Freestyle BMX and Worms Blast on the computer, and we played the heck out of them.

My brother at the same time was also learning to program at school. The pupils were programming using Comenius Logo. If you have never heard of it, the best way to describe it is that you can control a turtle using commands, loops, functions, and so on. You can make it draw fractals, create games, and much more! It’s deeply entertaining. I would compare it to something like Scratch.

Here’s how Komenskio Logo, the Lithuanian version of Comenius Logo, looks like on my Linux machine with WINE:

I saw my brother programming some kind of assignment back then and it caught my attention. I remember that I was instantly hooked. You can think of some random commands and then the turtle goes ahead and executes them? Wow, that’s so cool!

Fast-forward a few years and now we were on the internet. IRC was very popular back then. And finding other people to play games with on channels was a very popular thing back then. The IRC channels for finding other people to play Counter-Strike 1.6 with were very populous. I played Counter-Strike 1.6 for many years. I remember that I was trying to automate some things in mIRCScript because I was curious about how others people made these bots that responded to what other people typed.

And all of this led me to find another thing – Linux. I wanted to try to run my own CS server. This combined with my interest in how mods for CS are made fanned the programming fire inside of me.

This was a disk of Ubuntu that I burned. Ubuntu 8.04 was the first GNU/Linux distribution I have ever tried. I had an ATi card back then and it was a horrible experience. fglrx worked somewhat but there was no video acceleration so watching videos was painful. I was constantly juggling back between Windows and GNU/Linux. For gaming, videos – Windows, for other stuff – Linux.

As far as I remember, the rest is history. I picked up C++ for competitive programming and started dabbing in other computer software stuff like fiddling with routers and so on.

Remember that there is no ideal path – everyone is different. I wanted to share mine in case it inspires someone to also pick up programming. I believe the key is part is that you should find something that interests you. If something does interest you then keep going at it and you will succeed.