Starting up GitHub sponsors and some recent postings work

Hello everyone! I am happy to announce that I’ve set up GitHub sponsors on my profile. If you want to support my blog or my work on Thanos/Prometheus, and you have some free money then now you have a way to throw some money at these projects. Let’s see if I will even get one sponsor. I was thinking that maybe I should work on some custom features that could be behind a paywall. Let’s see when I will have some time to work on them.

I haven’t written anything on my blog for quite some time. I think it’s high time I’ve revived it. Writer’s paralysis probably happened to me, so I haven’t posted anything. Somehow I kept thinking about many topics but was afraid of writing about them and clicking “Publish”. But now it’s time to not be afraid and do that 🙂

Probably the most exciting stuff that I have worked on (and still do) recently is postings encoding improvements in Prometheus & Thanos. It’s now possible to specify a custom postings encoder in the Prometheus compactor: https://github.com/prometheus/prometheus/pull/13242. After https://github.com/prometheus/prometheus/pull/13567 it will even be possible to use a custom postings decoder. The postings data structure sits at the core of the Prometheus TSDB – it is used for storing sets of sorted integers. Whenever someone specifies some label matcher in a query e.g. {foo="bar"} then Prometheus goes through the set of series (postings) which have foo="bar" in their labels. So, it is paramount to make this data structure as efficient as possible.

Currently, each integer is simply stored using 4 bytes. It’s possible to be much better than that. For example, if you have a set of integers 1, 2, 3, 4..., 10 then it’s enough to only say that there’s a run of 10 integers starting from 1. Over time, many more techniques for compression were invented.

I have researched what is available and found out that the most popular paper (probably) is this one https://arxiv.org/abs/1401.6399 by Daniel Lemire & others. I love his work in particular because he always puts up the source code for his paper. It’s a huge help! I wish more people had done that.

We have a few constraints in the Thanos/Prometheus world:

  • We should read posting lists only in one direction i.e. we shouldn’t need to read them twice. Some encoding formats force the reader to read twice like the patch frame-of-reference variants. This constraint is needed because we would like to avoid allocating memory for the whole postings list if possible to save a lot of memory. In the Thanos world, the list could be easily hundreds of millions in size.
  • The intersection must be very fast. Prometheus/Thanos will do intersections many more times than encode/decode data. It’s not uncommon to have 3+ label matchers in a single query.

From all of the things I’ve looked at, S4-BP128-D4 and roaring bitmaps look the most promising. The latter is used by a lot of similar projects already like M3DB. The former might be not so popular but it is specifically designed for SIMD which gives us very fast encoding/decoding.

I even started writing a Go version of S4-BP128-D4 but I haven’t finished it, yet: https://github.com/GiedriusS/go-bp. So, I am opting to try roaring bitmaps first. Even then it would be a huge improvement because bitmaps allow VERY fast intersection through the bitwise AND operation. The current intersection algorithm needs to step through each element in given postings.

I recently wrote a small program to compare postings compression on Prometheus index files: https://github.com/thanos-io/postings-analyzer. You can see that it is possible to save around ~70% in postings size using S4-BP128-D4 and ~47% using roaring bitmaps. These numbers were consistent in my tests using index files from production. In my case, this would lead to shaving about 30% of the whole index file. Of course, most notably my index files didn’t have any runs of numbers so run-length encoding wasn’t used in roaring bitmaps, and so one could argue that I don’t have a diverse data set in these tests. Perhaps there is some weird setup out there where RLE would be useful? I tried to gather sample index files on CNCF Slack to no avail – no one stepped up to upload them for me.

Either way, all of this work is very promising and I hope to have a feature flag in Thanos soon which would allow using roaring bitmaps!

Cheat Sheet Of How To Get Accepted Into LFX

Hello everyone! I’ve been a mentor in the LFX program for quite a few semesters and I have been participating in it since the complete beginning, when it was still called community bridge. I also participated in GSoC once as a mentor. Since I get questions about how to get accepted into LFX, I thought about writing an article about this topic. Please note that even if you will follow everything to the letter, you still might not get accepted. There is some luck involved but I believe that my suggestions greatly increase your chance of getting accepted as a mentee.  

I would divide all suggestions into two parts – technical and non-technical. Both are equally important, perhaps the non-technical part is even more important.

Let’s start with the ability to work independently. This is a broad topic that encompasses many things. First of all, most mentors already have full-time jobs meaning that they won’t be able to give you lots of time. It is completely fine and encouraged to ask questions but it is expected that you will be doing a lot of research, reading yourself. Also, depending on the project, some experience might be expected from you. It’s not a necessity but would be a clear signal that you can tackle more complex problems relatively easily. At least from what I have seen is that mentors are looking for a combination of the following which serves as an indicator of success:

– Do you have any prior contributions to the project in question? It doesn’t have to be something huge but anything helps, especially those contributions that show that you understand what that project is about

– Perhaps you have some other contributions to similar projects? A GitHub or similar profile is always very nice.

– Last but not least, for example, if the project that you are applying to uses Go, it would be nice to see examples of Go projects on your resume.

Most if not all mentors are also looking for someone that will stay around even after the LFX. For us, it is one of the primary opportunities to attract more contributors to the community and ensure the longevity of our project. It is hard to pin down into words how to show this to potential mentors but this probably needs to reflect in your cover letter. In my personal opinion, enthusiasm is contagious and it just seeps through written words. For instance, if your cover letter is completely generic and has a bunch of sentences copied from the project’s website then it doesn’t show that you are interested in it. It’s better to talk about what you think of the original problem, and how do you think you could solve it. There are no right answers here but I think it’s better to spend some time on a few projects that interest you than to send a bunch of generic applications to many more in comparison.

There have been also a few instances where someone has clearly used ChatGPT to generate cover letters. Please don’t do that as it is dehumanizing to the mentors and because it just shows what you think of the whole process. It might seem tedious and pointless from the mentee’s point of view but mentors aren’t robots, they are volunteers, and they actually want to help you become a better version of yourself, and to improve themselves too.

When it comes to choosing a resume template, don’t worry too much about the aesthetics. The focus should be on the substance within. Since mentors may only have a few minutes per applicant, it’s advisable to include concise yet impactful information that sets you apart. Have experience in a relevant programming language? Highlight it on your resume! Have you written an impressive blog? Showcase it! While I don’t have direct experience as a recruiter or in human resources, this approach likely holds value in professional environments. Your resume should emphasize the key selling points that differentiate you, rather than delving into pages of your entire life’s story. Not just that but it also saves be it mentor’s or recruiter’s time, and there’s less to update in the future when the circumstances change.

Mentors also typically look at how busy a potential mentee is. It’s not uncommon to receive applications from individuals who are already engaged in full-time employment. While we appreciate your dedication, it’s important to note that our projects typically require a commitment of 20-40 hours per week. Considering this workload, we would advise against participating in a mentorship program alongside your current obligations. Burnout is a significant concern, and we aim to discourage such practices. Hence, the typical “ideal” LFX mentee is probably a student or someone who is without a job currently, or looking to change careers.

Finally – diversity. It’s quite a sensitive topic but in my opinion, projects need not only a continuous stream of newcomers and contributors but also a diverse collection of viewpoints and opinions to keep thriving. There are many ways to approach this problem. A good book that is tangentially related is “The Wisdom of Crowds” by James Surowiecki. I would encourage you to read it if you are interested in topics such as this. One simple method of evaluating the situation is to look at gender diversity. Software engineering has a serious problem with that in my anecdotal experience. For example, during the most recent LFX iteration, the Thanos project only received 1 application by a woman out of 45 applicants. That’s a huge disbalance. It’s not always like this, sometimes it was a bit better, but this example stuck out perhaps due to recency bias but maybe also because it illustrates the point well. Note that it’s a broader problem in the space. Linux Foundation puts a lot of effort into trying to solve this. Let’s hope that the situation will improve in the future. If you are reading this then I would encourage you to take a leap and apply to a LFX mentorship program. Mentors care about this stuff and all of their applicants. We want to have an inclusive, diverse, and welcoming ecosystem.

All in all, it could be a tough and frustrating time trying to apply to a mentorship program but I would suggest you try applying nonetheless. Don’t undervalue yourself, you are doing better than you think. Also, remember that if you get accepted then it’s a very rewarding activity. You will learn so much and it’s going to be a ton of fun! Mentors love this process too because they get to meet new people, check and improve their knowledge, and strengthen their soft skills among a plethora of other things.