Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent visit method from dying on junctions in Windows #160

Open
ghost opened this issue Dec 6, 2015 · 2 comments
Open

Prevent visit method from dying on junctions in Windows #160

ghost opened this issue Dec 6, 2015 · 2 comments

Comments

@ghost
Copy link

ghost commented Dec 6, 2015

In Windows, NTFS supports something called a reparse point (also known as a junction). They always point to directories and are similar to hard links in Unix. Unfortunately, when the visit method encounters a junction, it tries to open it as a regular directory and dies:

Error opendir on '/Users/LushKava/Application Data': Invalid argument at backup line 121.

This is particularly awkward when traversing one's home directory because Windows makes extensive use of junctions, for reasons of backward compatibility. There are two potential solutions:

  1. Follow junctions as if they were directory symlinks (where follow_symlinks => 1)
  2. Ignore junctions altogether

It is rare for a Windows user to create a junction of his or her own accord, especially given that NTFS also supports genuine file/directory symlinks. In practice, junctions exist only as a form of backward compatibility goop. Therefore, I don't think that ignoring them would be so bad. In fact, I needed to do exactly that in a backup tool that I am writing.

Note that, while Win32::Symlink exists in CPAN, it appears to have some serious flaws. Firstly, despite the name, it doesn't appear to support actual symlinks. Secondly, it often returns undef when using the readlink method on perfectly valid junctions.

So, I decided to go for the second option. I did it by creating a function that can determine whether a given path is actually a junction.

use Path::Tiny;

for my $p (glob 'C:/Users/LushKava/*') {
    print is_junction($p) ? "* $p\n" : "  $p\n";
}

sub is_junction {
    my ($dir) = @_;
    state $last_parent;;
    state $junction_by;
    my $path = path($dir);
    if (! $path->is_dir || $path->is_rootdir) {
        return 0;
    }
    if (! defined $last_parent || $path->parent ne $last_parent) {
        $junction_by = { map { $_ => 1 } list_junctions($path->parent) };
        $last_parent = $path->parent;
    }
    return exists $junction_by->{$path->basename};
}

sub list_junctions {
    my ($dir) = @_;
    my $path = path($dir);
    if (! $path->is_dir) {
        return ();
    }
    my $cmd = sprintf 'dir /AL /B "%s" 2>&1', $path->canonpath;
    my @lines = `$cmd`;
    chomp @lines;
    if ($? >> 8) {
        if ($lines[0] eq 'File Not Found') {
            return ();
        } else {
            die "Failed to execute: $cmd";
        }
    }
    return @lines;
}

There are a few things to note here. Firstly, the /AL switch ensures that only junctions and symlinks are listed. Secondly, if none are listed, it is normal for the command to return an exit status of 1 and print "File Not Found" to STDERR. Thirdly, is_junction caches the results of list_junctions, only updating the cache if asked to check a path whose parent is different from the last time that it was called. This speeds it up significantly during recursion, without unduly wasting memory.

Further, it would be trivial to adapt this code so as to map the targets of the junctions/links, in case one wanted to properly support the follow_symlinks option.

In any case, the above approach is working reliably for me. Here is some sample output:

  C:/Users/LushKava/AppData
* C:/Users/LushKava/Application Data
  C:/Users/LushKava/Contacts
* C:/Users/LushKava/Cookies
  C:/Users/LushKava/Desktop
  C:/Users/LushKava/Documents
  C:/Users/LushKava/Downloads
  C:/Users/LushKava/Dropbox
  C:/Users/LushKava/Favorites
  C:/Users/LushKava/Links
* C:/Users/LushKava/Local Settings
  C:/Users/LushKava/Music
* C:/Users/LushKava/My Documents
* C:/Users/LushKava/NetHood
  C:/Users/LushKava/NTUSER.DAT
  C:/Users/LushKava/ntuser.dat.LOG1
  C:/Users/LushKava/ntuser.dat.LOG2
  C:/Users/LushKava/ntuser.ini
  C:/Users/LushKava/OneDrive
  C:/Users/LushKava/Pictures
* C:/Users/LushKava/PrintHood
* C:/Users/LushKava/Recent
  C:/Users/LushKava/Saved Games
  C:/Users/LushKava/Searches
* C:/Users/LushKava/SendTo
* C:/Users/LushKava/Start Menu
* C:/Users/LushKava/Templates
  C:/Users/LushKava/Videos

Ideally, I'd like to see something like this implemented as part of the visit method in Path::Tiny, so that I can go back to using it in Windows.

@xdg
Copy link
Contributor

xdg commented Dec 6, 2015

Let me see if I understand: on Win32 NTFS, a directory junction has -d true, and -l false, but can be resolved with readlink?

I wouldn't want to shell out to find reparse points. I'm more inclined to add Win32API::File as a prerequisite on Win32. I think it ships with ActiveState and Strawberry (would need to check that), so it's effectively a "core" Win32 module. It has GetFileAttributes that can detect a reparse point.

In my quick reading about directory junctions, they seem more like symbolic links, so I'd like to treat them that way.

Either as an alternative to, or in addition to the above, possibly iterator and visit could get an ignore_errors option that would skip directories that can't be opened.

@ghost
Copy link
Author

ghost commented Dec 6, 2015

A junction has -d true and -l false and cannot be resolved with readlink, as provided by the Strawberry Perl core. As mentioned, Win32::Symlink is unreliable and its own readlink method is not going to be a soution.

Alas, the same appears to be true of symlinks. I suppose that could be considered as a bug in Strawberry (I haven't tested ActiveState yet).

I don't know why I forgot about the existence of Win32API::File. It seems ideal so I'm going to test it on both symlinks and junctions and will let you know how that pans out.

While an ignore_errors option might have its own utility, I would not to mask errors in order to silence this particular issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant