Ancient specification “bug”: you cannot use colons in your username with the HTTP basic authentication method

Out there, in the wild exists a lot of different authentication schemes or methods however one of them which is relatively popular because it’s implemented in most popular browsers at the moment, has this one peculiar “bug” in its specification – you cannot use a colon (‘:’) in the username field. If you have ever seen a window such as this:

HTTP Basic Auth window on Chrome

Then that website is probably using HTTP basic auth and you must not use a colon in your username on that site because simply you would not be able to do that.

Why, you might ask? Well, simply because in that authentication scheme a colon is used to separate the username from the password. If you used a colon in your username, the HTTP server would not be able to discern between the username and the password because it is transmitted to it in this format with this scheme: username:password.

This scheme is defined in RFC 7617 and RFC 2617. As it says in the RFCs themselves:

Furthermore, a user-id containing a colon character is invalid, as the first colon in a user-pass string separates user-id and password from one another; text after the first colon is part of the password. User-ids containing colons cannot be encoded in user-pass strings.

That is an excerpt from RFC 7617. This is from RFC 2617:

To receive authorization, the client sends the userid and password, separated by a single colon (":") character, within a base64 encoded string in the credentials.

As you can see for yourself, the older version of this RFC (2617 is from June, 1999 whereas 7617 is from September, 2015) does not explicitly state that it is impossible to use a colon with this scheme however it is implicitly stated.

You might be surprised but a lot of software gets this wrong. For example, I recently looked into using ml2grow/GAEPyPI for running a simple PyPI on Google App Engine to reduce the costs. The code is all dandy and nice however the username and password parsing is a bit broken. It all happens here:

(username, password) = base64.b64decode(auth_header.split(' ')[1]).split(':')

As you can see, this code breaks a bit when .split(':') returns more than two results – when the username or password field contains a colon itself. This could be mitigated by using the first result as the username, and by concatenating all following results into a single string which would be used as the password.  I will open a pull request soon to fix this issue. There are probably many more examples such as this.

As far as I know, we can only postulate about why this decision was made. My first thought was that maybe because Internet and computers were not so fast back in June, 1999, the people who made RFC 2617 decided to put it all into one field. This could have been easily remediated by having two separate fields for username and password. Perhaps this would have been too costly? I do not know.

Do you know of any other historical “bugs” in widely used specifications nowadays? Also, maybe you know what might be the reasons why this RFC was made in this way? Please let everyone know in the comments section down below. Thanks for reading and happy hacking!

Turbo-charging git-diff and making reviews easier

Are you working with git repositories in which a lot of code churn is happening? For example, do you have a repository with Puppet hierarchical data and there are a lot of pull requests where code is just being moved around between layers of hieradata? diff(1) ordinarily only differentiates between added and removed lines with green and red colours (certainly the user controls what are actually those colours) however git-diff(1) recently, at the end of 2017, got a new functionality where text that is just moved around can be highlighted in different colours. This is my case and this feature was a godsend because I did not want to spend a lot of time reviewing the syntax side of things if the hieradata is just being moved. That is why I wanted to share it with you so that it could help you out and perhaps someone has some even more good tips on how to make the reviewing process more effective.

This is how it ordinarily looks like in git-diff(1):

With the default diff.colorMoved zebra mode you can get something like this git-diff(1):

And with the dimmed_zebra mode:

This feature is controlled by an option called --color-moved or diff.colorMoved in the gitconfig file. If you enable it, the default mode at the moment is the zebra mode that you saw in the screenshots before. git is sensible enough in zebra mode to only apply its algorithm for lines which are longer than 20 alphanumeric characters. That is also illustrated in those screenshots. Amended characters to moved lines are painted with a different color which is why it is called the zebra mode.

Also, two other modes exist – the plain and the dimmed_zebra mode. However, the plain mode is not so useful for our use-case because it does not differentiate between moved lines which were permutated a little bit. It only, as it says, checks if one, exact line was added somewhere else. dimmed_zebra is a bit more interesting – it only highlights when one block of moved text intersects with the text around it. You can try it out and see if that is something useful to you. In my opinion, this is the best mode.

At the end, let me introduce another project that is about making git-diff(1) even more beautiful. It can add some extra highlights or perform some other munges to the changed lines to make it even more clear what is happening. It is called diff-so-fancy. You can start using it simply by executing these commands:

git config --global core.pager "diff-so-fancy | less --tabs=4 -RFX"

Obviously, the diff-so-fancy binary needs to be available in $PATH. This is how the end result looks like if we applied this to our previous diff:

Happy hacking! Find more information here: