Vertical compaction + hashmod foot gun in Thanos Compactor

I recently ran into this “fun” problem where the Compactor was stuck compacting the same blocks repeatedly. Only after lots of trial and error, I have found out that sharding using hashmod on some labels that are also dedup labels in vertical compaction was the culprit. Vertical compaction is like horizontal compaction but vertical compaction compacts blocks that have overlaps in time. And also vertical compaction removes the specified replica labels. Apparently, Thanos Compactor depends on having a “global” view of all blocks to filter out blocks that had been compacted into higher compaction level blocks. The interesting part is this filter: https://github.com/thanos-io/thanos/blob/f7ba14066fc35095e13ca3f675915c509310f476/cmd/thanos/compact.go#L235. It filters out blocks that can be formed from two or more overlapping blocks that fully matches the source blocks of the older blocks. It uses data in the meta.json file of each block to understand what are the sources of each block.

Example of a buggy configuration:

---
- action: hashmod
  source_labels:
  - foo
  modulus: 2
  target_label: shard
- action: keep
  source_labels:
  - shard
  regex: 0

And --deduplication.replica-label="foo" would trigger this issue. So, my suggestion would be to stop adding any of the replica labels into the source_labels part of hashmod in Thanos Compactor selector relabel configs. I don’t think there’s a use case for that. We could even add a check for this.

There is one other label that doesn’t make sense for Thanos Compactor. It is __block_id. We have the whole mark no downsample/compact functionality for that – for removing blocks from downsampling & compaction respectively. It really only fits in Thanos Store but then it is still iffy. Thanos Store also kind of needs to see a global view of blocks because it has this functionality where it does not download data from non-downsampled blocks if max_source_resolution allows this.

I gathered from talking with other people that the majority of them opt to use time-based sharding which seems to work well. Perhaps __block_id will be fixed at some point in the future by having a “global” storage of blocks metadata – this way we won’t have to depend on each fetcher having the full view. I remember some people talked about this in the past but I cannot find the issue anymore, sorry.

Perils of /metrics data assertions

In the Thanos project the e2e tests are written in part using assertions on the metrics data. I have encountered a few challenges with those assertions that I wanted to share with you.

Always check for the maximum possible value

Writing tests on /metrics data means your testing harness continuously checks /metrics data to see whether some metric equals some value. If the metric takes some time to reach the maximum value, you might erroneously write an equals check that checks for a smaller value. Later on, this will lead to a flaky test situation.

To fix this, I’d suggest running the flaky test locally and trying to write an assertion for a bigger value. Now try to run the same test again and see what happens. Also, look at the logic inside of your code – perhaps it would be possible to calculate the maximum value without using a hard-coded constant?

Check for specific metrics

By default, the end-to-end testing framework only accepts a metric name and sums up all matching series. I would encourage you to be maximally specific because metrics are not usually transactional. In other words, the user (in this case, the e2e testing framework) might see some results that are in the middle of being updated. For instance, if the state is constantly fluctuating then the sum will change but the sum of all matching metrics might never be equal to the desired value. Imagine that we have some code like this (pseudo-code):

foometric.set(123)
// <--- E2E testing framework request for /metrics page comes in here.
barmetric.set(5000)

If the metrics weren’t set previously then the sum can either be 123 or 5123. Hence, the best practice is usually to be as specific as possible with the label matchers. You can use this function to do that.