How using ioutil.WriteFile() with inotify in tests might make them flaky
Recently I was working on trying to fix one flaky test of the “reloader” component in the Thanos project. It was quite a long-standing one – almost took a whole year to fix this issue. It is not surprising as it is quite tricky. But, before zooming into the details, let’s talk about what does this test does and what other systems come into play.
Simply put, Thanos Sidecar works as a sidecar component for Prometheus that not just proxies the requests to it, captures the blocks produced by Prometheus & uploads them to remote object storage, but the Sidecar can also automatically reload your Prometheus instance if some kind of configuration files change. For that, it uses the inotify mechanism on the Linux kernel. You can read more about inotify itself here. Long story short, using it you can watch some files and get notifications when something changes e.g. new data gets written to the files.
The test in question is testing that reloader component. It is testing whether it sends those “reload” HTTP requests successfully because of certain simulated events and whether it properly retries failed requests. It had emulated changed files with the ioutil.WriteFile() call before the fix. However, during the tests, it sometimes had happened that the number of gotten HTTP calls versus what is expected did not match. After that, I looked at the events that the watcher had gotten via inotify and, surprisingly enough, sometimes some writes were missing or there were duplicates of them. Here is how it had looked like during these two different runs:
You can see that one time there were two writes, the other time – only one. Apparently, inotify is permitted to coalesce two or more write events into one if they happen consecutively “very fast”:
If successive output inotify events produced on the inotify file descriptor are identical (same wd, mask, cookie, and name), then they are coalesced into a single event if the older event has not yet been read (but see BUGS). This reduces the amount of kernel memory required for the event queue, but also means that an application can’t use inotify to reliably count file events.
Then, I started looking into the ioutil.WriteFile() function’s code because that’s what we had been using to do writes. And inside of it, I have found this:
Now this explains everything – due to the usage of O_TRUNC and then writing afterward, ioutil.WriteFile() can either generate one or two inotify events to the watcher depending on how fast it can read them. It is easy to avoid this issue – one simple way is to create a temporary file with ioutil.TempDir() and ioutil.TempFile(), and then move it into place with os.Rename.
Testing software is ubiquitous and people naturally expect it to be a part of any kind of software development process. There are many different kinds of forms it can take:
at the most rudimentary level: ad-hoc testing;
integration testing;
synthetic testing;
and many others
One of the forms that are quite novel is property-based testing. Essentially, the idea is to check if the software that you’ve produced espouses certain characteristics under inputs which have certain distinctive qualities. It sounds very similar to ordinary unit tests however here the catch is that a random number generator is leveraged in this case. Or, you can think of fuzzing but for ordinary code, not binary interfaces. It lets you run a bunch of tests very quickly and find the edge cases under which your code might not work as expected.
Unfortunately, just pushing random data to your functions is not very useful by itself. That is why the process of “shrinking” has been invented. It is a mechanism by which random data is reduced to a minimal test case which shows what characteristics are failing on what input.
There is a quite huge book on this topic called “PropEr Testing” by Fred Hebert about QuickCheck. I recommend it, you can find a lot of information there. However, here we will focus on how to do this in the Go programming language. For this, we will use the featureful gopter library which includes all the necessary batteries for property-based testing. You could still use the book as a reference because that library tries “to bring the goodness of QuickCheck to Go”. Let’s begin by running through the parlance of property-based testing.
Terminology
Generators are simply things which generate data for functions under test. gopter has a bunch of generators ready for you to use in the genpackage. You can probably find anything that you would ever want to generate in there.
Even on the bottom of the page, they have what is called a “weighted generator” – you can pass a bunch of generators to it with their own weights which specify what is the possibility that a generator will be used. It is useful when your function, for example, accepts a interface{} argument and does type assertion inside of it.
The same package contains shrinkers. They have been partially described in the former section. Let me repeat again: shrinkers reduce the random input until you get proper data. For example, an uint64 shrinker, first of all, shrinks it to 0, and then later subtracts the original value from the result of the division of the original value continually by 2 by doing a bit-wise operation. Thus, we would eventually land on a value which shows the problem, if there is any.
Another interesting part of gopter is the commands package. It helps you implement stateful property-based tests. Essentially, ordinarily, you would be testing functions which store no state in any place. However, as we know in real life that is not always the case. Thus, it contains nice and easy-to-use helpers such as ProtoCommands. You can find more information here.
In the end, the arbitary package provides ways to combine multiple generators together using reflection. We have talked about it just a bit before. You could have an array of different generators.
Finally, the main gopter package combines all of these fun things together so that you could use them in your tests. It has some other niche features that I will not look at in this article but you should try them if needed. These include bidirectional mapping, the combination of different generators, the meaning of different outputs of the tests (undecided, exhausted, etc.)
Examples
The documentation of gopter has a lot of examples already so please feel free to explore them. With all of the knowledge that you have now, it should be easy to explore. Still, let me give you two examples so that you could hit the ground running and start using it in your projects in no time!
Fibonacci numbers
Perhaps you have some kind of code in your program which calculates the Fibonacci numbers. Let me remind you that Fibonacci numbers are such numbers that each number in the sequence is the sum of the two previous ones. It starts with a sequence of 1 and 1.
Let’s get back to the code. For example, there could be a function fib(n uint) []int which returns a slice of length n which contains the first n Fibonacci numbers. It could look something like this:
func fib(n uint) []int {
ret := []int{}
a, b := 1, 1
for n > 0 {
ret = append(ret, a)
a, b = b, a+b
n--
}
return ret
}
Such code lends very nicely to property based testing since the returned data has to follow the property mentioned before. Let’s use a uint generator and the main gopter package to write a simple property-based test:
func TestFib(t *testing.T) {
parameters := gopter.DefaultTestParameters()
parameters.Rng.Seed(2000)
parameters.MinSuccessfulTests = 20000
properties := gopter.NewProperties(parameters)
properties.Property("correct data", prop.ForAll(
func(n uint) bool {
r := fib(n)
switch len(r) {
case 0:
return true
case 1:
return r[0] == 1
case 2:
return r[0] == 1 && r[1] == 1
default:
for i := 2; i < len(r); i++ {
if r[i] != r[i-1]+r[i-2] {
return false
}
}
return true
}
},
gen.UIntRange(0, 5555),
))
properties.TestingRun(t)
}
There is a lot to unpack but at first we set the test parameters object: we have set the random number generator seed to a constant number so that we would get the same results each time and bumped up the minimum number of successful tests so that our function would be bashed for more. Without it, the default amount of minimum successful tests is 200.
Later, the properties are being set up. There is only one – that correct data is returned. Inside of it, we create a function which takes the generator’s value and returns a bool – true if the returned data satisfies the properties, and false – if not.
A generator which generates uint in the range from 0 to 5555 has been used. Probably if the function already satisfies those inputs then it works with all kinds of inputs, does not matter what are they.
In the end, we run the property tests. We can execute all of this if you put the content of these two blocks into one file with appropriate imports by running: go test -v fib_test.go.
I have been a maintainer of Thanos for some time now – it’s quite a big Golang project. Recently I have successfully used gopter to catch a bug in one function. That function is pretty important – it selects which blocks of data to download depending on the selected maximum resolution of data, and the time range (minimum/maximum time).
I have written two different property-based tests there:
As the test data, a typical production-grade state has been embedded to test out all possible cases. And it caught this one, serious error – sometimes it didn’t select some blocks that it should’ve. Essentially, the getFor() function (the one under test) only selected the least resolution data and only then went “to the sides” (in terms of time if we imagine it from left to right) to get the higher resolution data, but not in the middle. The property-based tests quickly caught this mistake because the results sometimes weren’t satisfying the “fullness” criteria.
Example stateful test
Last but not least, let’s look over the stateful tests. Not every time you will be so lucky to have some stateless functions in your code like the former which you could easily test. That’s why the gopter library has a nice commands package which has the needed functions to test out code like that as well.
Imagine that you might have some code in your program which determines whether to give out a pizza to someone who has requested it and there is also another action: someone could make a new pizza. The code could look like this:
type Pizza struct{}
type Pizzeria struct {
pizzasLeft int
n int
}
func NewPizzeria(n int) *Pizzeria {
return &Pizzeria{pizzasLeft: n, n: 0}
}
func (p *Pizzeria) GetOut() *Pizza {
p.n++
if p.n > 3 {
return nil
}
if p.pizzasLeft > 0 {
return &Pizza{}
}
return nil
}
func (p *Pizzeria) Bake() {
p.pizzasLeft++
}
You can spot the buggy in the GetOut() code – it will not give out pizzas anymore after three were taken out. We will try to catch it with property-based tests. We will test out the property that we can always take out some pizzas once they are baked.
Let’s say that there are two commands: Bake() command which bakes a new pizza and GetOut() which gets one out (if possible).
The commands are defined by using the commands.ProtoCommand struct. Here is their code:
That is a lot of unpack. Most of the struct members are self-explanatory however here are their descriptions:
RunFunc obviously executes that function
PostConditionFunc gets called when gopter wants to check if the conditions are still true after executing it. In our case, we check that we have gotten a pizza if we have baked something
NextStateFunc gets executed when gopter wants to get the next state of the system under test – in this case the state is increased or decreased by 1 because we just baked or got out one pizza
commands.ProtoCommands lets us define something which must be true at the beginning, before executing tests, lets us define how the system under test object must be constructed, and what commands are available
Then finally lets bind everything and run the tests like the following:
! pizzeria: Falsified after 11 passed tests.
ARG_0: initialState=0 sequential=[BAKE BAKE GET BAKE BAKE GET BAKE GET GET]
ARG_0_ORIGINAL (2 shrinks): initialState=0 sequential=[BAKE BAKE GET BAKE
BAKE GET BAKE GET GET BAKE BAKE]
We can see that we got the commands BAKE BAKE GET BAKE BAKE GET BAKE GET GET after shrinking the original argument 2 times. Indeed, after the 4th GET, we did not get a pizza like we have expected even though we have baked 5 pizzas before 😢
Conclusion
As you can see, property-based testing is a really powerful concept that you should leverage in your own projects, if appropriate. Please do comment if you have found some mistakes or you want to discuss about it. Thanks for reading so far!