scanf(3) is a newbie trap, use fgets(3) with sscanf(3) instead

If you have not known before, scanf(3) and fgets(3) are both functions intended for reading something from standard input and doing something with the result – in the former case it is interpreted and the results might possibly be stored in specified arguments and in the latter case the result is simply put into a buffer. The first one is commonly recommended to beginner C programmers for reading something from the user and parsing it. However, it should be avoided mainly because it is very error-prone and it is difficult to understand how it actually works for new people. Instead of scanf(3), sscanf(3) should be used in combination with fgets(3). That way, you can easily guess in what state your standard input stream is after using those functions and you get other benefits. Let me give some examples and explain more.

The following two examples are functionally the same – they both read two numbers from standard input and print them. However, the first one uses scanf(3) and the second one uses fgets(3). Here they are:

#include <stdio.h>

int main(int argc, char *argv[])
{
    int ret, num;

    ret = scanf("%d", &num);
    if (ret == 1)
        printf("%d\n", num);
    return 0;
}
#include <stdio.h>

int main(int argc, char *argv[])
{
    char buf[100];

    if (buf == fgets(buf, 100, stdin)) {
        int num, ret;

        ret = sscanf(buf, "%d", &num);
        if (ret == 1)
            printf("%d\n", num);
    }
    
    return 0;
}

You can try them out for yourself. Let us go through the list of differences. I think the best way to illustrate them is to go through a list of different kinds of inputs that the user maybe provide to these programs and see what are the differences:

  • Valid input. Let us say the user entered “123\n”. scanf would read “123” from stdin and it would leave the “\n” in the stream. However, fgets would eat the “\n” as well. This is where the first problem occurs: new C programmers tend to think that scanf would read the “\n” as well or they do not realise that it is still there. This might not pose an issue if your stdin is not line buffered (e.g. when piping a file into a program) however most of the time it is – as far as I know most of the terminals wait for the user to press “Enter” before sending the user’s input to a program in the foreground. On the other hand, fgets would leave stdin in a predictable state – you will always know that if ‘\n’ existed in stdin then it was eaten (unless your buffer is too small). And it is very easy to check this hypothesis – just check if the last character in your buffer is ‘\n’. Also, you know that you need to read more characters (digits) from stdin to get the full number if the last character is not ‘\n’ and you still have something to read from stdin.
  • Valid input but with some whitespace at the beginning. scanfautomagically skips over it but however fgets presents an opportunity for you to see what kind of whitespace was at the beginning, before the actual number. Buffer size becomes an important question in this case. As always, read until you got the full line. This might or might not be useful depending on your case. In general, fgets in this case introduces more transparency in the process.
  • Invalid input. This is the case when it does not match the format specifier, characters are read from stdin but they are not brought back. scanfmight accidentally eat a character from stdin and it would be gone into the dark abyss unless it was stored in a buffer somewhere. If I remember correctly, it disappears as long as you read it from stdin but maybe on some Unix you can read it back again. This raises a confusion for the user because the user does not know how much was read from stdin exactly. The other function combination lets you know exactly what was read from stdin. This might cause even more confusion when two or more pairs of scanf are used one after the other because it is hard to know what is left in stdin after the preceding calls to scanf.

In general, my recommendation is this: avoid using scanf unless you can absolutely control what the standard input is going to be and what will be its format. Also, besides all of the aforementioned problems, scanf does present some security issues. For example, the format "%s" lets the user input any length string into a specified char *. A malicious user can easily use this to over-run the buffer and write any arbitrary data to memory. I hope you will take this into account the next time you will write code such as this.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.