Surprising Optimization When Appending Elements to an Array in ZSH

Recently I ran into a surprising ZSH performance bottleneck while editing my .zshrc. Because it gets loaded every time a new shell comes up, I have noticed the issue pretty fast. I quickly started digging into the problem. Apparently, this surprising behavior concerns appending new elements to an array. I did not have a huge file that I was trying to read, only about 6000 elements or so. But still, it had a significant impact on how fast the Z shell was able to start up.

Imagine a simple loop such as this:

#!/bin/zsh
while read line; do
  echo "${line}"
done < ~/somefile

And ~/somefile has been prepared with:

#!/bin/zsh
for i in `seq 1 8873`; do
    UUID=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
    echo "${UUID}" >> ~/somefile
done

Then, let’s read the same file into an array with two different scripts:

#!/bin/zsh
arr=()
while read line; do
    arr=(${arr} ${line})
done < ~/somefile
#!/bin/zsh
arr=()
while read line; do
    arr+=(${line})
done <~/somefile

Could you guess which one is faster? If your answer is the 2nd one then you are correct. At least on my machine:

$ time ~/testcase_fast.sh
~/testcase_fast.sh  0,17s user 0,11s system 98% cpu 0,279 total
$ time ~/testcase.sh
~/testcase.sh  15,68s user 0,95s system 98% cpu 16,881 total

A simple ltrace(1) comparison reveals where the time is actually spent:

% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 34.49   17.657872          60    292810 read  
 25.07   12.833696          43    292815 mbrtowc
  8.41    4.304812          60     71076 sigprocmask
  7.80    3.994101          43     91362 strlen     
  6.99    3.580468          43     82299 strcpy
  4.04    2.066031          43     47979 malloc
  3.78    1.936972          42     45209 free
...

And the “slow” version (not the full run here since it takes a very long time however the picture is clear):

% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 28.21  177.575768          44   4006596 strlen     
 28.11  176.938217          44   4000420 strcpy
 14.16   89.127826          44   2002599 malloc
 14.01   88.177566          44   1996322 strchr
 13.99   88.058532          44   1999451 free  
  0.62    3.915208          59     65835 read   
  0.45    2.844029          43     65841 mbrtowc  
...

My original assumption was that ZSH would “understand” that I am simply adding a new element at the end and thus avoid copying. To be honest, I haven’t even suspected this part when I had started investigating this. However, clearly, that’s the culprit. It seems that in the slow version the Z shell is constantly allocating, copying, and freeing memory which takes most of the time – more than 50% of the time. 😱

Truth be told, I have copied this code at first from StackOverflow and adapted it to my own needs:

https://stackoverflow.com/questions/54309712/zsh-doesnt-autocomplete-correctly-my-ssh-command

So, this is yet another reminder not to have too many presumptions about how something works and that snippets of code on the Internet are not necessarily of the best quality.

Strictly speaking, we couldn’t probably even say that we are appending an array in the slower version because the += operator is not being used. On the other hand, the effect is the same so I think it is OK to use the word append here in the general sense.

All in all, you should always use += when appending elements to an array! I hope this helps! Let me know if you have any comments or suggestions!

Confusion With File Redirection And sudo

Introduction

I have noticed that newbies sometimes fail to understand why, for example, sudo echo “hi” > /tmp/test will create a file /tmp/test with whatever effective user ID and effective group ID the command is ran with. They fall into the trap thinking that root:root will be the owner of /tmp/test. However, that is certainly not true. This blog post will try to clarify why this happens and what the user can do to avoid this issue. This occurs due to peculiar parsing rules defined in POSIX. We will examine the standards which detail this behaviour, the source code of the popular shells bash/zsh, and some methods on how to avoid this problem with, for example, tee(1).

Why it happens

You, my dear reader, have to understand first of all that sudo is just a command, a binary just like any else. It is a special binary, though, because it runs whatever command you pass as arguments. However, the redirection part (> /tmp/test) is not part of the command. This special file redirection syntax is interpreted by the shell, not passed to sudo for it to be executed later. So the shell that you are running gets special instructions to run that command with file descriptor number 1 which will be opened to write to a newly created file /tmp/test.

Notice that at this point command has not been run yet. The shell, before starting the command, begins to prepare the first fd. The shell does this with whatever EUID/EGID that shell is running with so it creates that new file with not the root:root rights but with whatever EUID/EGID the shell is running with. Most shells probably prepare the file descriptor 1 with the dup*(2) family of functions or fcntl(2), and open(2) before doing a fork(2) and exec*(3) just after it.

This functionality is mandated by a section of POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07. All POSIX-compatible shells follow that section. Obviously, some shells are not compatible with POSIX. This article targets POSIX-compatible shells, however. In either case, to find out all of the gory details of how file redirection works, either refer to that section, read the fine manual, and/or read the source code of the shell that you are using. The latter option gives you the most detail and is the most trustworthy whereas the former one is more accessible. I will not go on a tangent here and I will redirect you to this article that I wrote before if you want to find more about this.

Next we will review the code of two popular shells: GNU/Bash and the Z shell, to see how they realize this functionality in code.

How do the popular shells implement this

GNU/Bash uses the GNU/Bison project for specifying the input command syntax. The syntax file is written in the yacc format. You can find the special syntax described in the parse.y file, in the root directory of GNU/Bash source. There are two special types defined: redirection and redirection_list. Here is an excerpt from parse.y file which defines how redirecting stdout to a file works or how passing a file to stdin works:

redirection:	'>' WORD
			{
			  source.dest = 1;
			  redir.filename = $2;
			  $$ = make_redirection (source, r_output_direction, redir, 0);
			}
	|	'<' WORD 
			{
			  source.dest = 0;
			  redir.filename = $2;
			  $$ = make_redirection (source, r_input_direction, redir, 0);
			}
        | ...

And here is the excerpt which shows that the redirection list goes after a command:

command:	simple_command
			{ $$ = clean_simple_command ($1); }
	|	shell_command
			{ $$ = $1; }
	|	shell_command redirection_list
			{
			  ...
			}
	|	...

My point is that you can see that GNU/Bash really has a special syntax for file redirection, it is not passed to sudo. The shell command is separate from the redirection list. With the Z shell it is a bit different because there is no one single place where the grammar is defined. Instead, it has a big file dedicated for the hand-rolled parsing engine that is written in C. Thus, it will not be possible to present the whole file but here are the excerpts from the file which prove to you once again that the file redirection feature is indeed based on a special syntax, it is not passed to the actual binary as arguments. In the Z shell source code, Src/parse.c we can find this comment:

/*
 * cmd : { redir } ( for | case | if | while | repeat |
 * subsh | funcdef | time | dinbrack | dinpar | simple ) { redir }
 *
 * zsh_construct is passed through to par_subsh(), q.v.
 */

Indeed, redirection is parsed later in the function that is responsible for parsing the command:

static int
par_cmd(int *cmplx, int zsh_construct)
{
    int r, nr = 0;

    r = ecused;
    if (IS_REDIROP(tok)) {
        *cmplx = 1;
        while (IS_REDIROP(tok)) {
            nr += par_redir(&r, NULL);
        }
    }

    switch (tok) {
    case FOR:
          ...
    case FOREACH:
          ...
    case SELECT:
          ...
    case CASE:
          ...
    case IF:
          ...
    case WHILE:
          ...
    case UNTIL:
          ...
    case REPEAT:
          ...
    ...
    }

    if (IS_REDIROP(tok)) {
        *cmplx = 1;
        while (IS_REDIROP(tok))
            (void)par_redir(&r, NULL);
    }

    ...
    return 1;
}

As you can see, the redirections may be at either the end or the beginning of the command. Let me repeat again: file redirection is special syntax and it is not passed to the actual thing being run. The GNU/Bash shell supports the file redirection syntax at the beginning of the line as well but I have just decided to not include it for brevity.

So how to actually redirect output to a file as root?

I guess the most popular solution to this is to simply use a pipe to redirect output to a file. I am not sure but probably the program, tee(1), was made for this purpose. Or it was made as an extension of the tee system call but still it is the perfect tool to solve our issue. So, the solution to this problem would be:

echo "hi" | sudo tee /tmp/test >&-

This will at first (probably, depending on your set up) ask for your password. Then, “hi” will be written to a file descriptor which will get passed on to tee(1). Then, tee(1) will dump everything into /tmp/test. The >&- (or >/dev/null) is needed so that tee(1) would not output anything. tee(1), unfortunately, copies the content from standard input to standard output as well.

Another way to solve this is to pass more commands to run via root either by using the -s sudo option. For example, this works:

sudo -s <<EOF
exec >/tmp/test
echo "hi"
EOF

Or, run another shell using sudo which will redirect everything to /tmp/test:

sudo /bin/bash -c 'exec >/tmp/test; echo "hi"'

Or, run a script with sudo which will internally redirect stdout to /tmp/test:

cat >./script.sh <<EOF
exec >/tmp/test
echo "hi"
EOF
chmod +x ./script.sh
sudo ./script.sh

Obviously, there are more (complex) ways how you could achieve this but these are the main methods. You are free to adopt these examples to your own case. Please comment if you find any errors or if you want to add anything about this topic.