What is Actually True and False in Python?

Intro

Did you know that in Python 2.x you can do the following?

$ python2
Python 2.7.14 (default, Sep 20 2017, 01:25:59) 
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> True = False
>>> False
False
>>> True
False
>>> not True
True
>>> True == False
True
>>> True != False
False

How can it be that not True is True and True is equal to False? Why is it even possible to do this? Isn’t what is True and False in the language defined to be constant and unchangeable? What sense does it make to change the meaning of what is True and what is False? In any way, to fix this bug in the matrix, do this:

$ python2
Python 2.7.14 (default, Sep 20 2017, 01:25:59) 
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> True = False
>>> True = not False
>>> True
True
>>> False
False
>>> True == False
False

After this, everything will be back to normal. As you can see, you do not need to worry about anything because you can use various operators to assign a sane value to True again after you change it. Besides not you could use ==!= or any other operator which returns a boolean value.

This article will delve into the presented issue and explain why you are able to do this, first of all. Apart from that, there are some apt questions that are raised by this interesting behavior. They include:

  • What about Python 1.x or 3.x? Can you do the same?
  • How did the programming language developers miss this?
  • What could be the rationale behind these language design decisions?

I will try my best to look into and answer them. This is definitely an interesting piece of history of development of the Python programming language.

True and False in Python 1.x

The oldest major version of the Python programming language – 1.x – does not even have such a thing as False or True. You can see in this example:

try:
  print True
except NameError:
  print 'True not found'

This yields the text ‘True not found’ in the standard output:

$ docker run -it dahlia/python-1.5.2-docker
Python 1.5.2 (#1, Aug 11 2017, 14:21:33)  [GCC 4.8.4] on linux4
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> try:
...   print True
... except NameError:
...   print 'True not found'
... 
True not found

NameError is raised when you are using True in Python 1.x because Python tries to look up a variable called True and, obviously, that does not exist. Also, this shows that False does not exist as well:

try:
  print False
except NameError:
  print 'False not found'

And we get:

$ docker run -it dahlia/python-1.5.2-docker
Python 1.5.2 (#1, Aug 11 2017, 14:21:33)  [GCC 4.8.4] on linux4
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> try:
...   print False
... except NameError:
...   print 'False not found'
... 
False not found

In Python 1.x, like in some other languages, what is true or false is defined in terms of evaluation rules. Anything other than None, numeric zero of all types, empty sequences, and empty mappings are considered true (https://docs.python.org/release/1.6/ref/lambda.html):

In the context of Boolean operations, and also when expressions are used by control flow statements, the following values are interpreted as false: None, numeric zero of all types, empty sequences (strings, tuples and lists), and empty mappings (dictionaries). All other values are interpreted as true.

Thus, we could make our own True and False like this:

$ docker run -it dahlia/python-1.5.2-docker
Python 1.5.2 (#1, Aug 11 2017, 14:21:33)  [GCC 4.8.4] on linux4
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> False = 0
>>> True = not False
>>> print False, True
0 1

But, obviously, they are not protected from modification and they are not available in all Python programs so it is kind of pointless to have them unless you have large piece of software that was written in Python and you want to maintain an unified definition of what is True and False over all of it, and if you want to make it more future-proof because then only one small section will have to be changed to change the meaning of True and/or False.

A keen reader would notice that this also means that there is no dedicated boolean types. This section of the official language specification lists all of the types –  https://docs.python.org/release/1.6/ref/types.html. The official supported types in Python 1.x are these:

  • None
  • Ellipsis
  • Numbers
  • Sequences
  • Mappings
  • Callable types
  • Modules
  • Class and class instances
  • Files
  • Internal types

As you can see, there really is no dedicated type for boolean expressions. However, the situation was significantly improved in the next major version of Python – 2.x. Although not all negative aspects were fixed and they are still there in the language. Let’s talk about Python 2.x.

True and False in Python 2.x

Dedicated ‘bool’ type appeared in the 2.x version of the Python programming language as per this example:

$ python2
Python 2.7.14 (default, Sep 20 2017, 01:25:59) 
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> type(True)
<type 'bool'>
>>> type(False)
<type 'bool'>

The type() function returns the type of the provided argument. As you can see, the type of True and False is bool. Boolean types were truly finally added in Python 2.x and they belong to the integer class of types (https://docs.python.org/2/reference/datamodel.html):

Booleans

These represent the truth values False and True. The two objects representing the values False and True are the only Boolean objects. The Boolean type is a subtype of plain integers, and Boolean values behave like the values 0 and 1, respectively, in almost all contexts, the exception being that when converted to a string, the strings “False” or “True” are returned, respectively.

However, even though a dedicated boolean was added but the new values True and False were not defined to be keywords thus their value could be changed. Instead, they were defined as “constants” that live in the built-in namespace (https://docs.python.org/2/library/constants.html):

A small number of constants live in the built-in namespace. They are:

False
The false value of the bool type.

New in version 2.3.

True
The true value of the bool type.

New in version 2.3.

Unfortunately but the fact that they are “constants” does not constitute that they are immutable in Python 2.x. All of this verbiage essentially just means that there are some variables of certain types and certain values pre-loaded into every Python program and the program itself is free to change their meaning.  Before 2.4, you even could assign a new value to None but later they changed it to raise an SyntaxError exception if you attempted to do that. Why they did not do that for True and False as well – I don’t know. I seriously wonder what insane use-cases or existing code they were accommodating for by not making the same change for True and False as well at the same time in 2.4.

Also, notice that the Python 2.x documentation makes a separation between “true” and “false” constants. “true” constants are those to which you cannot assign a new value because it raises an exception, and “false” constants are those to which you can. The official documentation even puts those words in quotes, I am not making it up. This could really make you say: “wat”.

Wat meme

If you ask me I see it as a huge inconsistency in language design and it makes no sense to me not to make same change from Python 2.4 on-wards to make it illegal to assign new values to True and False as well, and just remove the whole “false” constants notion in general. Perhaps they were afraid of making such a backwards incompatible change and so the developers waited until 3.x?

True and False in Python 3.x

This mess was finally permanently fixed in the next major version of Python, 3.x.  True and False, and other constants like None were turned into keywords. True was defined to be equal to the number 1 and False was defined to be equal the number 0. There are no more such thing as “false” or “true” constants. You can see that from this error message:

$ python
Python 3.6.3 (default, Oct 24 2017, 14:48:20) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> type(True)
<class 'bool'>
>>> type(False)
<class 'bool'>
>>> True = False
  File "<stdin>", line 1
SyntaxError: can't assign to keyword
>>> None = False
  File "<stdin>", line 1
SyntaxError: can't assign to keyword

As you can see, True and None (and others) became keywords and thus officially they became immutable. They are listed here https://docs.python.org/3/reference/lexical_analysis.html#keywords. Boolean type is still a sub-type of the Number types: https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy. The text that describes the boolean type is identical to the 2.x version.

Defining True and False to be constants that are immutable also brings some performance improvements because implementations of Python no longer have to look up what is the value of True and False every time they were used in an infinite loop, for example. Things were finally completely fixed by version 3.x and it only took almost 15 years to properly implement True and False.

What were they thinking?

From what I can tell is that Guido never cared at the beginning to add a separate boolean type just like, for example, the C programming language never had a boolean type until the C99 version of the standard. All of the boolean operations were simply expressed in terms of the evaluation rules.

The boolean type and the True/False constants were proposed in Python Enhancement Proposal (PEP) number 0285 (https://www.python.org/dev/peps/pep-0285/). However, it seems to me that at that time Python 1.x had a bunch of those constants that were mutable and these two new constants were added which kind of floated around and had an unknown status just like the others. After a bit of time, someone noticed that it does not make much sense to override the value of None/True/False and others. At that point they were converted into keywords thus rendering them immutable. The fix in the version 2.4 for the None value seemed like a bit of bandage aid applied by the developers to the language but it wasn’t fixed completely. I guess that they waited until the next major version bump because it’s a backwards-incompatible change.

It’s kind of humorous because some Python developers even (https://github.com/riptideio/pymodbus/issues/43) decided to include lines like these at the beginning of their programs:

True = 1 == 1
False = 1 == 0

It is crazy that they were afraid of people using their libraries who were messing with the values of True and False. Such is the fun story of True and False in the Python world.

EDIT:

2018 January 10 – changed the words to say that in Python 3.x True and False were defined to be equal to 1 and 0 respectively, they are not the actual numbers.

Making Unwinding Functions in C Simple: Do Not be Afraid of Using Gotos

Intro

Today I wanted to talk about unwinding and releasing resources in C functions. Let’s begin by stating that there are three main techniques for handling errors in the C programming language. Sometimes more than one technique may be used. Here is a list of them:

  • You must test the value functions return. Abnormal value indicates that some kind of error has happened and a normal value indicates that it was successful;
  • There is an external variable whose value you must check. For example, the POSIX variant of this is to have an variable called errno that changes to 0 when nothing bad happened and it has some kind of other value when an error occurs;
  • You pass a pointer to a function. The function changes the value of the variable it points to or even calls it with certain arguments if it is a function pointer depending on the result.

I have not mentioned one method but some people use atexit(3) to register functions that will be called at the end of a program which will release resources. However, this is unusual so I have not included it in the list.

This is very much related to our topic because when an error occurs, you will have to handle it. That process includes releasing the resources which were acquired before in the function. Especially if you are deep down in your function and then an error occurred, the choice that you make in how to release the resources will matter a lot so it is important to make the correct decision.

In C++ you have the destructors and so on but how are you going to do that in C?
Are you going to sprinkle all of your error paths with:

free(foo);
free(bar);

and so on? It might be your first choice to go down this route but I think a viable and preferred alternative to this is using gotos and labels. Obviously, they should be used very cautiously. It is a very powerful tool so there is a lot of peril involved and ways to abuse it so you have to be absolutely careful. For simple cases when you don’t have to release any resources a plain return works well but it is a different situation with multiple resource acquisitions. Compared with other methods, using gotos doesn’t force you to duplicate the error paths, the code distracts less from the normal path, and it is more readable. You can’t imagine how this could be true and you cannot believe me? Let me prove to you that you should use gotos in these more complex situations!

Tutorial: using gotos for cleaning up

First, you should begin by naming the goto labels according to the resources that it frees. You want to be able to discern which resource exactly is going to be freed. Also, because goto labels may be used for other purposes other than resource clean-up, it is a good idea to prefix the goto labels with “err_” to indicate that its purpose is for releasing resources when an error occurs. Due to the fact that you will have different labels for different resources that they release, they should only contain one statement after it before the next label or the final return, and only do what it actually says.

Some good examples of names: err_release_view, err_free_list, err_close_lsocket, and so on.

Order the labels in such order that resources which are acquired first are at the bottom. The order of labels which release the resources should be in the inverse order of which they were acquired.

Now whenever an error occurs, use goto to jump to that label which will release the resources that were already gotten. As a rule, you can remember this: always jump to that label which releases the most recently acquired resource. This rule makes it easy to remember.

It may remind you of the defer mechanism in Go and other programming languages where the programmer can specify a list of functions with certain arguments which will be called as soon as the function goes out of scope. We are essentially emulating the same thing with gotos. Just that the C version requires a bit more attention and carefulness.

Example code comparison

To show how readability could be improved by using this method I will present one function from the Linux kernel source code and how it was changed. This function was improved courtesy of Tobin C. Harding. Thanks!

Here is the first version which does not use gotos at all:

static int enqueue_txdev(struct ks_wlan_private *priv, unsigned char *p, unsigned long size, void (*complete_handler)(void *arg1, void *arg2), void *arg1, void *arg2)
{
  struct tx_device_buffer *sp;

  if (priv->dev_state < DEVICE_STATE_BOOT) {
    kfree(p);
    if (complete_handler)
      (*complete_handler) (arg1, arg2);
    return 1;
  }

  if ((TX_DEVICE_BUFF_SIZE - 1) <= cnt_txqbody(priv)) {
    /* in case of buffer overflow */
    DPRINTK(1, "tx buffer overflow\n");
    kfree(p);
    if (complete_handler)
      (*complete_handler) (arg1, arg2);
    return 1;
  }

  sp = &priv->tx_dev.tx_dev_buff[priv->tx_dev.qtail];
  sp->sendp = p;
  sp->size = size;
  sp->complete_handler = complete_handler;
  sp->arg1 = arg1;
  sp->arg2 = arg2;
  inc_txqtail(priv);

  return 0;
}

The version with goto:

static int enqueue_txdev(struct ks_wlan_private *priv, unsigned char *p, unsigned long size, void (*complete_handler)(void *arg1, void *arg2), void *arg1, void *arg2)
{
  struct tx_device_buffer *sp;
  int rc;

  if (priv->dev_state < DEVICE_STATE_BOOT) {
    rc = -EPERM;
    goto err_complete;
  }

  if ((TX_DEVICE_BUFF_SIZE - 1) <= cnt_txqbody(priv)) {
    /* in case of buffer overflow */
    DPRINTK(1, "tx buffer overflow\n");
    rc = -EOVERFLOW;
    goto err_complete;
  }

  sp = &priv->tx_dev.tx_dev_buff[priv->tx_dev.qtail];
  sp->sendp = p;
  sp->size = size;
  sp->complete_handler = complete_handler;
  sp->arg1 = arg1;
  sp->arg2 = arg2;
  inc_txqtail(priv);

  return 0;

err_complete:
  kfree(p);
  if (complete_handler)
    (*complete_handler) (arg1, arg2);
  return rc;
}

As we can see, the code is much more readable and the two error paths are not duplicated. The judicious use of gotos avoids the perils of producing spaghetti code. Also, don’t worry: this not the only case. The Linux kernel source has an uncountable number of such examples. It makes the code much more readable once you get used to this convention. Not to mention that the Linux kernel is one of the biggest, most complex C projects around. So you know that the developers wouldn’t make a decision to use such code constructs which would increase the complexity of the code even more.

One more thing – this cleanup code is simple and clean but imagine a situation where it is much more complex. What if something extra was done in the error path if, for example, closing a socket failed and some extra sub-system had to be informed or some other actions had to be performed? That would be quite some extra code in each path. In this case, the goto method would be so much more attractive.

Conclusion

Using gotos in your C code to clean up after errors have occurred is similar to the defer mechanism in Go. Having clean-up code in one place which may be called completely gets rid of code duplication in error paths. This in part makes the code more readable because the reader won’t be distracted by the error handling code which could possibly obscure the real path. Also, there is less possibility of errors because potentially much less code is duplicated. The gotos can be abused easily so you should be very cautious and follow the tips given in this article.

Bonus: your compiler might have an extension to help out with this

Some C compilers have extensions which help with resource cleanup. For example, the popular gcc supports the cleanup attribute which applies to variables which have automatic storage duration. If you apply this attribute, gcc will run a function with that variable as the argument. Any return value is ignored. Example usage:

void cleanup_free(void *p)

{
  free(*(void **)p);
}




void foo(void) 
{
  char __attribute__((cleanup(cleanup_free))) *bar;
  bar = malloc(128);
} 

This extra function is needed because if ordinary free(3) would be written then it  would receive a char** and, obviously, free(3) doesn’t know that it should be dereferenced one time first. If you compile this function and run it with valgrind then you will see that no memory was leaked. This is also useful with close(2) and other similar functions. However, there is one downside – if you want more granular control of what happens if, for example, close(2) fails then it is impossible with this because any return value is ignored silently.  Check out your compilers’ documentation if there is support for this kind of thing. Obviously, you should consider the alternative of writing portable code first.
Please comment if you find any errors or just want to discuss this.