Why os.move() Sometimes Does Not Work And Why shutil.move() Is The Savior?

Recently I ran into an issue where this, for example, code fails:

import os

os.rename('/foo/a.txt', '/bar/b.txt')

Traceback (most recent call last):
File "", line 1, in 
OSError: [Errno 18] Invalid cross-device link

The documentation for the os module says that sometimes it might fail when the source and the destination are on different file-systems:

os.rename(srcdst*src_dir_fd=Nonedst_dir_fd=None)

Rename the file or directory src to dst. If dst is a directory, OSError will be raised. On Unix, if dst exists and is a file, it will be replaced silently if the user has permission. The operation may fail on some Unix flavors if src and dst are on different filesystems. If successful, the renaming will be an atomic operation (this is a POSIX requirement). On Windows, if dst already exists, OSError will be raised even if it is a file.

How could it fail? Renaming (moving) a file seems like such a rudimentary operation. Let’s try to investigate and find out the exact reasons…

Reason why it might fail

The fact that the move function is inside the os module implies that it uses the facilities provided by the operating system. As the Python documentation puts it:

This module provides a portable way of using operating system dependent functionality.

In this case, it (probably, depends on the implementation, obviously) uses the rename function in the C language because that what moving is and that in turn calls the rename system call. The system call is even kind of defined in the POSIX collection of standards. It is an extension of the function rename in the C standard but it sort of implies that an official system call exists as well. As the official text of POSIX says, rename can fail if:

[EXDEV] The links named by new and old are on different file systems and the implementation does not support links between file systems.

Even Linux does not support this so this error message is not that uncommon. As the rename(2) manual page says:

EXDEV oldpath and newpath are not on the same mounted filesystem. (Linux permits a filesystem to be mounted at multiple points, but rename() does not work across different mount points, even if the same filesystem is mounted on both.)

The curious case of Linux

The Linux kernel has, as we guessed, an official system call for renaming files. Actually, it even has a family of system calls related to this operation: renameat2, renameat, and rename. Internally, they all call the function sys_renameat2 with different actual parameters. And inside of it the code checks if src and dst are at the same mounted file-system. If not, –EXDEV is returned which is then propagated to the user-space program:

/* ... */
        error = -EXDEV;                                                                                                           
        if (old_path.mnt != new_path.mnt)                                                                                         
                goto exit2;  
/* ... */

Then the error is returned:

/* ... */
exit2:                                                                                                                            
        if (retry_estale(error, lookup_flags))                                                                                    
                should_retry = true;                                                                                              
        path_put(&new_path);                                                                                                      
        putname(to);                                                                                                              
exit1:                                                                                                                            
        path_put(&old_path);                                                                                                      
        putname(from);                                                                                                            
        if (should_retry) {                                                                                                       
                should_retry = false;                                                                                             
                lookup_flags |= LOOKUP_REVAL;                                                                                     
                goto retry;                                                                                                       
        }                                                                                                                         
exit:                                                                                                                             
        return error;                                                                                                             
}

This explicit check has been for forever in this function. Why is it there, though? I guess Linux just took an simpler (and more elegant, should I say) path here and added this constraint from the beginning just so that the code for the file-systems would not have to accompany for this case. The code of different file-systems is complex as it is right now. You could find the whole source code of this function in fs/namei.c.

Indeed, old_dir->i_op->rename() is later called in sys_renameat2old_dir is of type struct inode *, i_op is a pointer to a const struct inode_operations. That structure defines a bunch of pointers to functions that perform various operations with inodes. Then different file-systems define their own variable of type struct inode_operations and pass it to the kernel. It seems to me that it would be indeed a lot of work to make each file-systems rename() inode operation work with every other file-system. Plus, how would any file-system ensure future compatibility with other file-systems that the user could use by loading some custom made kernel module?

Fortunately, we could implement renaming files by other means, not just by directly calling the rename system call. This is where shutil.move() comes in…

Difference between os.move() and shutil.move()

shutil.move() side-steps this issue by being a bit more high-level. Before moving it checks whether src and dst reside on the same file-system. The different thing is here what it does in the case where they do not. os.move() would blindly fall on its face here but shutil.move() is “smarter” and it does not just call the system call with the expectation that it will succeed. Instead, shutil.move() copies the content of the src file and writes it to the dst file. Afterwards, the src file is removed. As the Python documentation puts it:

If the destination is on the current filesystem, then os.rename() is used. Otherwise, src is copied (using shutil.copy2()) to dst and then removed.

So not only it copies the content of the file but shutil.copy2() ensures that the meta-data is copied as well.This presents an issue because the operation might get interrupted between the actual copying of the content and before src is removed so potentially you might end up with two copies of the same file. Thus, shutil.move() is the preferred solution to the original problem presented at the start of the article but however be wary of the possible problems and make sure that your code handles that case if it might pose a problem to your program.

2 thoughts on “Why os.move() Sometimes Does Not Work And Why shutil.move() Is The Savior?”

  1. Very good article. I ran exactly on this problem today and found here more than the a solution, found a good explanation about why i was getting the error.

    Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.