Tags:
A little over a year ago I got into a discussion about the rimraf
package . At the time I was using a mix of Mac and Windows and I was looking for a quick and easy way to remove the whole node_modules
directory from an Angular project. My search led me to rimraf
, which was touted as a cross-platform alternative to the traditional rm -rf
written in node.
Why Delete node_modules
Anyway?
The initial problem stemmed from the Angular project I was working on at the time. While updating packages, I ran into an issue with some sub-modules not being updated correctly, hence my need to wipe out node_modules
and install afresh. This is because the standard canned response for 90% of all npm
issues is usually to delete node_modules
and reinstall; reminiscent of the typical advice of "turning it off and on again" for basically any hardware issue! It's annoying, and just deals with the symptoms without really giving any idea what the actual cause. However, it does work, so this is what I need to do.
The Initial Use
When I ran this I noticed it was incredibly slow, much slower than running rm -rf
on a Bash shell. I wanted to know why, obviously, so I dove into the code for rimraf
.
The code wasn't making use of native shell calls where it could for efficiency, instead it was manually unlinking each file, one by one. It was also making the assumption that every item it was dealing with was a file, with full knowledge that it will fail in some situations (not always a bad thing, but read on).
This is all highlighted succinctly with this comment block found in the source code:
// Two possible strategies.
// 1. Assume it's a file. unlink it, then do the dir stuff on EPERM or EISDIR
// 2. Assume it's a directory. readdir, then do the file stuff on ENOTDIR
//
// Both result in an extra syscall when you guess wrong. However, there
// are likely far more normal files in the world than directories. This
// is based on the assumption that a the average number of files per
// directory is >= 1.
//
// If anyone ever complains about this, then I guess the strategy could
// be made configurable somehow. But until then, YAGNI.
So, Why is This Bad?
I can understand the decision to assume each file (file in the Unix sense, where everything is a file, even a directory) is really a file being made with a very low-level language, like C or C++, where a few extra syscalls won't make a huge difference overall, but for a language that is only JIT compiled like JavaScript, those tens of thousands of extra calls cost.
To compound the speed problems, the implementation in JavaScript is very similar to the C implementation of rm
where each file is unlinked one at a time. Again, this is fine when you're writing in C, but when your code isn't even fully compiled, you'll take the performance hit every time when trying to implement the same algorithm.
The decision to do it this way seems pretty indicative, unfortunately, of the last couple of decades of JavaScript: if it can be done with JavaScript, it will be, even if there are better ways. The readme for the project even makes a reference to a Javascript implementation of mkdir -p
, another shell tool that's been cross-platform for far longer.
What is the Alternative?
Given that rimraf
is a lot slower than the most obvious alternative rm -rf
, the obvious solution would be to use that instead. Recalling my conversation last year, one main point that was raised was that rimraf
was written specifically for cross-platform usage. I do wonder how useful that actually is, how many developers really do work across both Unix-like and Windows operating systems, and then further, how many don't have Bash available on their system? Are there any Windows-based node developers that forgo Bash for the standard Windows command line that don't know how to use rd
(the successor to deltree
after that was removed).
Regardless, that's not the actual issue. A cross-platform tool isn't necessarily a bad idea. It reduces cognitive load (you need only remember how to use one set of tools) and allows you to spend your time on more important tasks. What would be ideal, is if the tool itself were to implement the best solution depending on what it found available (although this is less of an issue these days as Windows has plenty of good terminal applications that offer Bash as the shell).
Which Alternative is Best?
I wanted to test for myself a few different methods of emptying out that node_modules
directory and measuring the speeds. For my test, I ran this on an out-of-the-box Angular 6 install, on a 64-bit Linux system.
I tested 3 of the most common ways a recursive delete would be performed, alongside rimraf
.
If you're interested, each test looked a little like this:
#!/bin/bash
for i in {1..20}
do
npm install
{ time rm -rf node_modules 2>1 ; } >> test_rmrf_times.txt 2>&1
done
Each iteration of the test would run an npm install
and then perform my chosen delete action, all run within a loop 20 times, outputting the results of the time
call attached to each and appending it to a text log.
The Results
Test Number | rm -rf node_modules |
rimraf node_modules |
find node_modules/ -delete |
find node_modules/ -type f -exec rm {} \; |
---|---|---|---|---|
1 | 1.293 | 3.756 | 5.432 | 23.954 |
2 | 0.453 | 5.208 | 1.008 | 26.685 |
3 | 4.704 | 4.227 | 2.59 | 27.864 |
4 | 0.612 | 3.691 | 3.775 | 46.974 |
5 | 4.959 | 3.638 | 3.002 | 50.665 |
6 | 3.01 | 4.464 | 3.971 | 40.942 |
7 | 1.71 | 1.626 | 3.748 | 27.362 |
8 | 1.338 | 1.677 | 0.671 | 23.362 |
9 | 2.325 | 2.576 | 2.457 | 23.984 |
10 | 3.21 | 1.568 | 3.459 | 25.658 |
11 | 2.722 | 3.9 | 1.541 | 30.9 |
12 | 1.401 | 1.584 | 4.004 | 29.204 |
13 | 0.561 | 1.56 | 0.526 | 35.986 |
14 | 0.799 | 1.629 | 0.526 | 26.417 |
15 | 0.553 | 1.566 | 6.827 | 34.995 |
16 | 1.698 | 1.968 | 4.977 | 39.723 |
17 | 1.066 | 3.237 | 2.065 | 38.732 |
18 | 1.135 | 4.267 | 5.523 | 33.835 |
19 | 1.234 | 10.52 | 1.725 | 40.108 |
20 | 1.823 | 2.173 | 1.163 | 24.921 |
Mean average | 1.8303 | 3.24175 | 2.9495 | 32.61355 |
Obviously the find
that pipes each result through to rm
is the slowest, and for good reason, it's doing the same sort of thing that rimraf
does (creating a list of all the files to delete and then passing them on to another command to be deleted), but in a language that doesn't even have a partially compiled component. Its advantage is that it does offer the ability to more specifically filter files by things like date, size, name, etc. While that's not useful for removing node_modules
, it is useful for many situations, so I've tested it here anyway for completeness.
Of the other three, rimraf
performs the worst, although not really that far behind the find -delete
method. I would expect the difference to be more pronounced on a proper node project, rather than this barebones Angular install, but this isn't unexpected. Again, the two are essentially doing the same thing in the same way.
The clear winner here though is the classic rm -rf
, which is efficient and is the same number of keypresses as rimraf
. It's twice as fast while running, and is available wherever you're running Bash (or a similar shell). Even if you're on Windows, you have plenty of options, from simple options like Cygwin and GitBash, right up to the WSL on Windows 10
The Conclusion
Without doubt, rm
is the way to go. It's cross platform, by far one of the fastest methods, doesn't need anything else being installed (unless you're on Windows, where you'd need to install another shell to use rimraf
anyway), and is a tool that anyone familiar with Bash already knows.
It's pretty obvious that rimraf
is not a useful option, and wasn't really ever a better option. It's only existed for about 7 years, whereas Bash for Windows has been available for far longer in one form or another. It's not really even clear why it exists. It's common to implement toy projects before in languages they weren't realy suited for, but rimraf
was never treated as a toy, and is a dependency in nearly 7000 other projects!
There's a problem in the node world, things are being created with no thought as to why they should to be. As developers, we need to be writing good code, and part of that is not reinventing wheels with flat tires. Another part is to stop using flat tires wherever possible.
Comments