npm-follower: A Complete Dataset Tracking the NPM Ecosystem

Nov 30, 2023·
Donald Pinckney
Donald Pinckney
,
Federico Cassano
,
Arjun Guha
,
Jonathan Bell
· 1 min read
Type
Publication
ESEC/FSE 2023

Package managers and their massive open-source ecosystems are the foundation for how any practical software is built. However, the behavior of how developers share and consume packages at-scale is not terribly well understood. How do developers specify their dependencies? How do developers tag new releases of their own packages? Do common practices lead to issues in dependency management, such as out-of-date dependencies or security issues?

To answer these types of questions, we built a system which in realtime archives a replica of the entire NPM ecosystem, including both package metadata and code data (20+ TB). The dataset is now public and can be used for a wide variety of NPM dependency graph queries or big code analysis. Our analysis of the data finds some surprising findings, with an asymmetry between how developers tag updates that they publish vs. how developers write dependency constraints.