Surprising Optimization When Appending Elements to an Array in ZSH

Recently I ran into a surprising ZSH performance bottleneck while editing my .zshrc. Because it gets loaded every time a new shell comes up, I have noticed the issue pretty fast. I quickly started digging into the problem. Apparently, this surprising behavior concerns appending new elements to an array. I did not have a huge file that I was trying to read, only about 6000 elements or so. But still, it had a significant impact on how fast the Z shell was able to start up.

Imagine a simple loop such as this:

#!/bin/zsh
while read line; do
  echo "${line}"
done < ~/somefile

And ~/somefile has been prepared with:

#!/bin/zsh
for i in `seq 1 8873`; do
    UUID=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
    echo "${UUID}" >> ~/somefile
done

Then, let’s read the same file into an array with two different scripts:

#!/bin/zsh
arr=()
while read line; do
    arr=(${arr} ${line})
done < ~/somefile
#!/bin/zsh
arr=()
while read line; do
    arr+=(${line})
done <~/somefile

Could you guess which one is faster? If your answer is the 2nd one then you are correct. At least on my machine:

$ time ~/testcase_fast.sh
~/testcase_fast.sh  0,17s user 0,11s system 98% cpu 0,279 total
$ time ~/testcase.sh
~/testcase.sh  15,68s user 0,95s system 98% cpu 16,881 total

A simple ltrace(1) comparison reveals where the time is actually spent:

% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 34.49   17.657872          60    292810 read  
 25.07   12.833696          43    292815 mbrtowc
  8.41    4.304812          60     71076 sigprocmask
  7.80    3.994101          43     91362 strlen     
  6.99    3.580468          43     82299 strcpy
  4.04    2.066031          43     47979 malloc
  3.78    1.936972          42     45209 free
...

And the “slow” version (not the full run here since it takes a very long time however the picture is clear):

% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 28.21  177.575768          44   4006596 strlen     
 28.11  176.938217          44   4000420 strcpy
 14.16   89.127826          44   2002599 malloc
 14.01   88.177566          44   1996322 strchr
 13.99   88.058532          44   1999451 free  
  0.62    3.915208          59     65835 read   
  0.45    2.844029          43     65841 mbrtowc  
...

My original assumption was that ZSH would “understand” that I am simply adding a new element at the end and thus avoid copying. To be honest, I haven’t even suspected this part when I had started investigating this. However, clearly, that’s the culprit. It seems that in the slow version the Z shell is constantly allocating, copying, and freeing memory which takes most of the time – more than 50% of the time. 😱

Truth be told, I have copied this code at first from StackOverflow and adapted it to my own needs:

https://stackoverflow.com/questions/54309712/zsh-doesnt-autocomplete-correctly-my-ssh-command

So, this is yet another reminder not to have too many presumptions about how something works and that snippets of code on the Internet are not necessarily of the best quality.

Strictly speaking, we couldn’t probably even say that we are appending an array in the slower version because the += operator is not being used. On the other hand, the effect is the same so I think it is OK to use the word append here in the general sense.

All in all, you should always use += when appending elements to an array! I hope this helps! Let me know if you have any comments or suggestions!

Equivocal Puppet Class Parameters

While working with Puppet recently I have noticed that there is some funny business going on with the rules of parameter naming. The Puppet’s documentation states:

Parameter names begin with a dollar sign prefix ($). The parameter name after the prefix:

Must begin with a lowercase letter.

Can include lowercase letters.

Can include digits.

Can include underscores.

Let’s see if this is true. Tried applying this manifest with Puppet 5.5.10 which is available on Ubuntu Focal Fossa (20.04):

class test(
  String $_content
){
  file {'/tmp/helloworld':
    content =>  "${_content}\n",
  }
}


class { 'test':
  _content => "123"
}

And it does not work as expected:

$ puppet apply hello.pp
Error: Could not parse for environment production: Syntax error at '_content' (file: /home/gstatkevicius/dev/puppet_testing/manifest/hellofile.pp, line: 11, column: 3) on node gstatkevicius-desktop

Now, let’s try creating a simple hiera configuration:

---
:backends:
  - yaml
:yaml:
  :datadir: /home/gstatkevicius/dev/puppet_testing/hieradata
:hierarchy:
  - test
:logger: console
$ cat hieradata/test.yaml 
test::_a: "Hello World!"

It looks like the hiera part works:

hiera -c ./hiera.yaml -d test::_a
DEBUG: 2020-11-03 23:39:00 +0200: Hiera YAML backend starting
DEBUG: 2020-11-03 23:39:00 +0200: Looking up test::_a in YAML backend
DEBUG: 2020-11-03 23:39:00 +0200: Looking for data source test
DEBUG: 2020-11-03 23:39:00 +0200: Found test::_a in test
Hello World!

Now let’s see if applying the manifest with Puppet works:

$ puppet apply --hiera_config=hiera.yaml manifest/hellofile.pp
Notice: Applied catalog in 0.01 seconds
$ cat /tmp/helloworld
Hello World!

Oh, so now everything is OK? It seems that for APL – automatic parameter look-up – the rules are a bit different. My guess at this point would be that they are treated as regular variables instead. I personally haven’t found a way to instantiate a class where one parameter starts with an underscore. Thus, I think we can formulate one lesson:

To prevent your class from ever being instantiated by other classes in Puppet with explicit arguments, start at least one class parameter with an underscore.

But, one question remains – why is it actually considered a syntax error? What makes the underscore character forbidden in names of class parameters whereas it works for regular variables?

Now, I’m not an expert in how Puppet parsing but let’s take a short trip down Puppet’s code-base and see what’s happening.

A quick grep of Syntax error at shows that there is some kind of function SYNTAX_ERROR that gets used whenever there is a need to print a message that a syntax error has occurred.

Digging a bit deeper, it seems that there is some kind of parser being generated from a grammar. Other Puppet developers have kindly documented this process for us in docs/parser_work.md. We are finally able to find the grammar in lib/puppet/pops/parser/egrammar.ra.

Valid name of a variable seems to be expressed here in lexer2.rb:

PATTERN_DOLLAR_VAR     = %r{\$(::)?(\w+::)*\w+}

I believe that what is a valid argument passed to a class is defined here:

#---ATTRIBUTE OPERATIONS (Not an expression)
#
attribute_operations
  :                                                { result = [] }
  | attribute_operation                            { result = [val[0]] }
  | attribute_operations COMMA attribute_operation { result = val[0].push(val[2]) }

  # Produces String
  # QUESTION: Why is BOOLEAN valid as an attribute name?
  #
  attribute_name
    : NAME
    | keyword

A valid NAME is defined here in lib/puppet/pops/patterns.rb:

NAME = %r{\A((::)?[a-z]\w*)(::[a-z]\w*)*\z}

So, it seems like the argument’s name is rejected because it does not follow this regular expression even though it is accepted by the lexer. To be fair, Puppet’s documentation also states:

CAUTION: In some cases, names containing unsupported characters might still work. Such cases are bugs and could cease to work at any time. Removal of these bug cases is not limited to major releases.

All in all, it’s probably best to follow the letter of the law laid out in Puppet’s documentation as it says here but if you want to forbid the users of your class to pass arguments explicitly, start one of them with an underscore 🙃.