Things Learned From Trying to Migrate To Protobuf V2 API from gogoprotobuf (So Far)

During the most recent LFX mentorship program’s iteration, I had the honor to work on trying to migrate to version 2 of the protobuf API from gogoprotobuf on the Thanos project with my one and only awesome mentee Rahul Sawra and another mentor Lucas Serven who is also a co-maintainer of Thanos. I wanted to share my technical learnings from this project.

LFX mentorship program’s logo

First of all, let’s quickly look at what protocol buffers are and what is the meaning of the different words in the jargon. Protocol buffers are a way of serializing data. It was first made by Google. It is a quite popular library that is used by Thanos really everywhere. Thanos also uses gRPC to talk between different components. gRPC is a remote procedure call framework. With it, your (micro-)services can implement methods that could be called by others.

Since both were made by Google originally, it is not surprising that gRPC is most commonly used with protocol buffers even though there is no critical dependency between them.

gogoprotobuf is a fork of the original protocol buffers compiler for Go that has (had?) some improvements over the old one. However, it comes not without some downsides. We’ve accumulated random hacks overtime to make generated code compile and work. For example, we edit import statements with sed. This looks like an opportunity for improving code generation tools – perhaps more checks are needed? What’s more, it turns the whole code generation into a “house of cards” – remove one small part and the whole thing crumbles. But, on the other hand, it is not surprising that an unmaintained tool has a bug here and there.

Thanos started using gogoprotobuf at some time in the past. But, after some time it became unmaintained. At some point, the fine Vitess folk came up with their own protocol buffers compiler for the V2 API which has some nice optimizations that bring it up to par with the old gogoprotobuf performance. In addition, it has support for pooling memory on the unmarshaling side i.e. the receiver. The sender’s side still, unfortunately, cannot use pooling because gRPC SendMsg() method returns before the message gets on the wire. I feel like it’s a serious performance issue and I’m surprised that the gRPC developers still haven’t fixed this problem. This is the first learning that I wanted to share.

Another thing is about copying generated code. Sometimes the generated code is not perfect. So, the easiest and most straightforward way to fix this issue is to copy the generated code, change the parts that you don’t like, and commit it to Git. However, that is certainly far from perfect. We have made this mistake in the Thanos project. We’ve copied a generated struct and its methods to another file, and added our optimization. We call it the ZLabelSet. Here is its code. As you can see, it is an optimization to avoid an allocation. However, in this way, the struct members of generated code became deeply coupled with the rest of this custom code. Now it becomes much more painful to change the types of those members which kind of became an interface – this is because the v2 API does not support non-nullable struct members.

On the other hand, using interfaces in Go incurs extra performance costs so don’t try to optimize too heavily. Profile and always pick your battles.

This is the second lesson. Please try to not copy generated code and instead make your own protocol buffers compiler plugin or something. It is actually quite easy to do so.

Last but not least, I also wanted to talk about goals and focus. Ever since we’ve divided the whole project into as many small parts as possible, the main focus was on getting the existing test suite to pass successfully. However, that is not always the best idea. We ran into a problem where gogoprotobuf has an extension to use a more natural type for Go programmers in structs – time.Time, alas the same extension doesn’t exist in vanilla protocol buffers for Go. It has its own separate type – protobuf.Timestamp. Because the usage of timestamps is littered all over the Thanos codebase, we’ve run into a problem where we’ve accidentally defined a bunch of conversion functions between those two types. And they weren’t identical. So, we had to take a step back and look at the invariants. To be more exact time.Time defines an absolute time whereas protobuf.Timestamp stores the time passed since Unix epoch 0. Only after unifying the conversion functions, does everything work correctly. Keep in mind that those “small” parts of this project are thousand of lines added or removed so it’s really easy to get lost. For example, this is one pull request that got merged:

Screenshot of https://github.com/rahulii/thanos/pull/1 showing the diffstat

In conclusion, the third, more general learning is that sometimes it is better to take a step back and to look at how everything should work together instead of being fixated on one small part.

Perhaps in the future code generation will be replaced in some part by generics in Go 1.18 and future Go versions. That should make life easier. I also hope that we will pick up this work again soon and that I will be able to announce to everyone that we finally switched to the upstream version of the protocol buffers for Go. It seems like there is an appetite for that in our community so the future looks bright. We’ve already removed the gogoproto extensions from our .proto files and we are in the middle of removing the gogoproto compiler – https://github.com/rahulii/thanos/pull/2. Just need someone to finish all of this up. And to start using the pooling functionality in Thanos Query. Will it be you who will help us finish this work? 😍🍻

Things Learned From Speaking at a Physical Conference The First Time

DevoxxUK logo

Devoxx UK 2021 was a great conference that has just passed by. It was my first time ever speaking at a physical conference. I spoke at virtual conferences before. I had the great honor of representing Thanos at Devoxx. I was supposed to do this presentation with another Thanos maintainer Prem Saraswat however he could not make it. Since it was my first time speaking live, I have learned a bunch of lessons for the next time I will be able to do this again. This post will be about my journey there and my takeaways from it. I hope that you will be able to learn something from it as well.

Me talking at Devoxx UK 2021, a screenshot

Journey

My journey began in Lithuania. Since I traveled to the UK before Omicron, everyone was still kind of relaxed, and not many people were wearing masks. Actually, what was funny is that fewer people were wearing masks in the UK than in Lithuania. My guess is that elderly people are less vaccinated in Lithuania than in the UK so that could explain the difference in attitude – older people in Lithuania are more likely to get seriously ill thus taking up beds in hospitals. Anyway, this post is not about that.

London now (not sure when it was introduced, the last time I was in London was a few years before that) has a cool system in public transportation in that it is not necessary to buy the Oyster card anymore. One can just touch their debit or credit card upon entering a bus or a tram. With so many people living in London I imagine that it was quite an improvement in reducing congestion i.e. more convenient equals more people on public transportation equals fewer cars on roads.

Picture of the business design centre showing the DevoxxUK logo

Also, it was easy to take the COVID test right after landing since everything is conveniently located in the London Luton airport. The whole process was actually smooth, I expected everything to take much longer. Note that this was when it was enough to do the antigen test upon landing.

Be careful with night busses, though. The flight home was in the very early morning so we’ve had to take a night bus. Even though it was the middle of the night, people were still outside, enjoying the nightlife. So, don’t expect that you will find a seating place for yourself. And if you will be traveling with a private company back to the airport in the middle of the night then I’d recommend you book tickets in advance. We ran into a problem where all of the seats were taken in a bus which was supposed to take us to the airport. Another one was in an hour so it was too late. We had to hail a taxi ride with Bolt which was quite expensive.

Now let’s move on to the lessons that I have learned during this trip.

Lesson 1 – Take Care of Everything in Advance

It is better to get as many things sorted out as possible before your talk to reduce the amount of stress. In my case, I had taken care more or less of everything before the day arrived except one thing – the electrical plug converter. My hotel had the “schuko” type plugs i.e. the same ones used in Lithuania so I always used to charge everything there, and I used battery packs to continuously charge my phone during the day. However, I need a converter for the presentation itself. I had some issues with that. I didn’t know about the “schuko” types beforehand and was misled by false advertising. I bought a “Europe to UK” plug converter but it didn’t work because it only worked with plugs in Western Europe i.e. plugs of type E which have an extra prong. So, I had to take two trips to a shop to get the correct converter. This led to extra, unneeded stress. So, always do your homework and come prepared.

Lesson 2 – Do Not Think Too Much About What Your Audience Thinks While Presenting

During my presentation, I had three jokes. After the first one, I had the natural urge of checking whether listeners understood them and they were enjoying my presentation. However, staring at people’s faces is freight with peril. In my experience, it is way too easy to start overthinking what the audience thinks. I started staring at people’s faces way too much. This led me to a few times where I have lost confidence in what I am saying. This escalated into a bit of stuttering.

Picture of the room showing my point of view before the presentation

I think the lesson here is that it is OK to look at your audience a little bit but do not get too nervous or think too much about it. Always have the topic in mind, do not let your mind jump to other things.

Lesson 3 – Understand Your Audience

My talk was oriented at the intermediate skill level – people that are not too new to the Prometheus ecosystem but people who are not experts, yet. I feel like some of the technical concepts that I have explained have escaped the minds of the listeners. I would say that it is probably always better to err on the side of simplicity and focus instead of trying to fit too much into one presentation. My mistake was probably that the talk kind of had two parts – introduction to Thanos and what we had been working on. If it is oriented at moderately knowledgeable people then maybe it would have been better to skip the introductory part. On the other hand, it is a Java-oriented (or at least it was?) conference so perhaps it would’ve been better to avoid advanced stuff altogether, and to focus only on the introductory part?

Even though this is just anecdotal data but another fact alludes to this – there were some questions after the presentation which were really about core stuff i.e. how Thanos works. I think this means that I have failed to properly explain to listeners what is the StoreAPI and so on.

All in all, I can’t say right now which option would have been better but certainly, it would have been better to either focus on the introductory part or the advanced part instead of both of them at the same time.

Lesson 4 – Always Turn Off Redlight and Disable Notifications

Unless you want everything to look like this:

Snapshot of presentation’s video

I suggest turning off the screen’s temperature adjustment. In Gnome, you can do that via the status bar (“Night light”). Also, I would strongly recommend you to turn off notifications in case someone would text you something in the middle of the presentation. You don’t want that to end up in the video as well.

Lesson 5 – It Is Okay Not To Know Something

After the presentation, someone had asked me a question about what is the preferred way to deploy Thanos on Kubernetes. Truth be told, there are quite a lot of different options. Just to name a few: kube-prometheus-stack, goatlas, prometheus-operator, etc. I haven’t tried all of them so I cannot reasonably tell someone which one is the best. That is exactly what I told them. In addition, I have told them to use the one which fits their use case the best. And I think that is perfectly fine. It is OK not to know everything.

If I would be planning to start using Kubernetes for deploying Thanos in the near future, I could have told them that I will be able to provide an informed opinion in the future and we could talk about it then. But, that’s not the case. So far I have only deployed Thanos on bare metal. Hence it is better not to lie and not to pretend that you are an expert in something.


That’s all from me on this post! Let me know if you have any comments or suggestions. I hope you’ve learned something.