Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(resolution): Fix resolution of querystringed imports in ESM files #322

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ out
coverage
test/**/dist
test/**/actual.js
test/unit/cjs-querystring/who?what?idk!.js
97 changes: 76 additions & 21 deletions src/node-file-trace.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,30 @@ const fsReadFile = fs.promises.readFile;
const fsReadlink = fs.promises.readlink;
const fsStat = fs.promises.stat;

type ParseSpecifierResult = {
path: string;
queryString: string | null
}

// Splits an ESM specifier into path and querystring (including the leading `?`). (If the specifier is CJS,
// it is passed through untouched.)
export function parseSpecifier(specifier: string, cjsResolve: boolean = true): ParseSpecifierResult {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export function parseSpecifier(specifier: string, cjsResolve: boolean = true): ParseSpecifierResult {
export function parseSpecifier(specifier: string, cjsResolve: boolean): ParseSpecifierResult {

Lets not use a default param or else the caller might forget

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we need the default to be there, because the first time this is called (at the top of analyzeAndEmitDependency when it's called by emitDependency), we don't have a cjsResolve value. Having it default to true works in that case, though, because that only happens at the root of the tree, when nodeFileTrace calls emitDependency on an actual file (not an import specifier), which we know won't have a querystring.

let path = specifier;
let queryString = null;

if (!cjsResolve) {
// Regex which splits a specifier into path and querystring, inspired by that in `enhanced-resolve`
// https://github.com/webpack/enhanced-resolve/blob/157ed9bcc381857d979e56d2f20a5a17c6362bff/lib/util/identifier.js#L8
const match = /^(#?(?:\0.|[^?#\0])*)(\?(?:\0.|[^\0])*)?$/.exec(specifier);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks slightly different from the webpack regex.

Can we instead use url.parse() or new URL() to safely parse?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks slightly different from the webpack regex.

Can we instead use url.parse() or new URL() to safely parse?

In my testing, node itself seemed to treat import * as something from './someFile.js?things#stuff, import * as something from './someFile.js?things, and import * as something from './someFile.js the same way, namely that it stripped the ? and everything after it off before trying to resolve the file. I therefore wanted to do the same thing. Before landing on this regex (which is the original regex, minus treating the fragment as a separate entity - see the screenshot below), I considered:

  • Using indexOf on '?' and then using slice to cut the string off at that point. Straightforward, but brute-force-y, and I've had brute force bite me enough times when handling edge cases I hadn't thought of that I wanted to out-source it to something purpose-built and presumably battle-tested.

  • Using url.parse. This splits the querystring and fragment off into separate parts I'd just have to put back together again. Also, my linter got mad because it's deprecated.

  • Using new URL(). This similarly splits the querystring and fragment off separately. It also errors out when given a path without http:// (or similar) at the beginning.

None of them seemed ideal, and so I decided to see how webpack does it (since I was in the nextjs nft webpack plugin with the debugger already), figuring that the most likely context for someone having the querystring problem in the first place was them using a webpack loader with options. After a lot of stepping through code, eventually I landed in the enhanced-resolve file I linked to in the comment. As I mentioned above, you're right that it's been changed, but only to combine the querystring and fragment into a single group.

image

(I gave the regex I'm using above a name the same length as the original regex's name in enhanced-resolve so it's easier to compare.)

I can certainly make url.parse or new URL() work if you'd like, but given the above IMHO the regex is still the best option. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url.parse() is not really deprecated because its the only safe way to parse a relative url since new URL() only works for absolute urls. The docs mention the deprecation was revoked
image

Also, I trust node's own url parsing more than a modified regex which might suffer from ReDoS.

Now that there are tests, we should be able to swap this implementation between regex, url.parse() or new URL() easily.

if (match) {
path = match[1]
queryString = match[2]
}
}

return {path, queryString};
}

function inPath (path: string, parent: string) {
const pathWithSep = join(parent, sep);
return path.startsWith(pathWithSep) && path !== pathWithSep;
Expand Down Expand Up @@ -215,19 +239,21 @@ export class Job {
}

private maybeEmitDep = async (dep: string, path: string, cjsResolve: boolean) => {
// Only affects ESM dependencies
const { path: strippedDep, queryString = '' } = parseSpecifier(dep, cjsResolve)
let resolved: string | string[] = '';
let error: Error | undefined;
try {
resolved = await this.resolve(dep, path, this, cjsResolve);
} catch (e1: any) {
error = e1;
try {
if (this.ts && dep.endsWith('.js') && e1 instanceof NotFoundError) {
if (this.ts && strippedDep.endsWith('.js') && e1 instanceof NotFoundError) {
// TS with ESM relative import paths need full extensions
// (we have to write import "./foo.js" instead of import "./foo")
// See https://www.typescriptlang.org/docs/handbook/esm-node.html
const depTS = dep.slice(0, -3) + '.ts';
resolved = await this.resolve(depTS, path, this, cjsResolve);
const strippedDepTS = strippedDep.slice(0, -3) + '.ts';
resolved = await this.resolve(strippedDepTS + queryString, path, this, cjsResolve);
error = undefined;
}
} catch (e2: any) {
Expand All @@ -240,16 +266,19 @@ export class Job {
return;
}

if (Array.isArray(resolved)) {
for (const item of resolved) {
// ignore builtins
if (item.startsWith('node:')) return;
await this.emitDependency(item, path);
// For simplicity, force `resolved` to be an array
resolved = Array.isArray(resolved) ? resolved : [resolved];
for (let item of resolved) {
// ignore builtins for the purposes of both tracing and querystring handling (neither Node
// nor Webpack can handle querystrings on `node:xxx` imports).
if (item.startsWith('node:')) return;

// If querystring was stripped during resolution, restore it
if (queryString && !item.endsWith(queryString)) {
item += queryString;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the query string exist on node built-ins?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, as it turns out. I did some testing:

(Note that this is all for ESM imports, since we know that CJS just treats ? like any other character, and there are no built-ins with ? in their name.)

  • For node on its own:

    • Querystrings on relative imports (import * as file1 from './file1.js?things') work (they get stripped).
    • Querystrings on regular node modules (import * as semver from 'semver?stuff') error out with Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'semver?stuff'.
    • Querystrings on un-prefixed built-ins (import * as fs from 'fs?morestuff') error out with Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'fs?morestuff'. (In other words, it doesn't recognize it as a built-in.)
    • Querystrings on prefixed built-ins (import * as fs from 'node:fs?morestuff') error out with Error [ERR_UNKNOWN_BUILTIN_MODULE]: No such built-in module: node:fs?morestuff.
  • For a webpack build:

    • Querystrings on relative imports (import * as file1 from './file1.js?things') work (they get stripped).
    • Querystrings on regular node modules (import * as semver from 'semver?stuff') work (they get stripped).
    • Querystrings on un-prefixed built-ins (import * as fs from 'fs?morestuff') error out with Module not found: Can't resolve 'fs?morestuff'. (In other words, it doesn't recognize it as a built-in, strips it, and then errors when it doesn't find it in node_modules.)
    • Querystrings on prefixed built-ins (import * as fs from 'node:fs?morestuff') error out with Error [ERR_UNKNOWN_BUILTIN_MODULE]: No such built-in module: node:fs?morestuff.

So not something we have to worry about. Might be worth saying so in a comment, though. I'll do that.

} else {
// ignore builtins
if (resolved.startsWith('node:')) return;
await this.emitDependency(resolved, path);

await this.analyzeAndEmitDependency(item, path, cjsResolve);
}
}

Expand Down Expand Up @@ -343,13 +372,23 @@ export class Job {
}

async emitDependency (path: string, parent?: string) {
if (this.processed.has(path)) {
return this.analyzeAndEmitDependency(path, parent)
}

private async analyzeAndEmitDependency(rawPath: string, parent?: string, cjsResolve?: boolean) {

// Strip the querystring, if any. (Only affects ESM dependencies.)
const { path } = parseSpecifier(rawPath, cjsResolve)

// Since different querystrings may lead to different results, include the full path
// when noting whether or not we've already seen this path
if (this.processed.has(rawPath)) {
if (parent) {
await this.emitFile(path, 'dependency', parent)
}
return
};
this.processed.add(path);
this.processed.add(rawPath);

const emitted = await this.emitFile(path, 'dependency', parent);
if (!emitted) return;
Expand All @@ -368,18 +407,34 @@ export class Job {

let analyzeResult: AnalyzeResult;

const cachedAnalysis = this.analysisCache.get(path);
// Since different querystrings may lead to analyses, include the full path when caching
const cachedAnalysis = this.analysisCache.get(rawPath);
if (cachedAnalysis) {
analyzeResult = cachedAnalysis;
}
else {
const source = await this.readFile(path);
if (source === null) throw new Error('File ' + path + ' does not exist.');
// By default, `this.readFile` is the regular `fs.readFile`, but a different implementation
// can be specified via a user option in the main `nodeFileTrace` entrypoint. Depending on
// that implementation, having a querystring on the end of the path may either a) be necessary
// in order to specify the right module from which to read, or b) lead to an `ENOENT: no such
// file or directory` error because it thinks the querystring is part of the filename. We
// therefore try it with the querystring first, but have the non-querystringed version as a
// fallback. (If there's no `?` in the given path, `rawPath` will equal `path`, so order is a
// moot point.)
const source = await this.readFile(rawPath) || await this.readFile(path) ;

if (source === null) {
const errorMessage = path === rawPath
? 'File ' + path + ' does not exist.'
: 'Neither ' + path + ' nor ' + rawPath + ' exists.'
throw new Error(errorMessage);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is a test missing for this error message

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was missing for the original version of the error message also. If you run coverage on main, none of the errors are tested. (I think possibly because with the current two kinds of tests, it's not obvious how one would.)

image

}

// analyze should not have any side-effects e.g. calling `job.emitFile`
// directly as this will not be included in the cachedAnalysis and won't
// be emit for successive runs that leverage the cache
analyzeResult = await analyze(path, source.toString(), this);
this.analysisCache.set(path, analyzeResult);
this.analysisCache.set(rawPath, analyzeResult);
}

const { deps, imports, assets, isESM } = analyzeResult;
Expand All @@ -393,12 +448,12 @@ export class Job {
const ext = extname(asset);
if (ext === '.js' || ext === '.mjs' || ext === '.node' || ext === '' ||
this.ts && (ext === '.ts' || ext === '.tsx') && asset.startsWith(this.base) && asset.slice(this.base.length).indexOf(sep + 'node_modules' + sep) === -1)
await this.emitDependency(asset, path);
await this.analyzeAndEmitDependency(asset, path, !isESM);
else
await this.emitFile(asset, 'asset', path);
}),
...[...deps].map(async dep => this.maybeEmitDep(dep, path, !isESM)),
...[...imports].map(async dep => this.maybeEmitDep(dep, path, false)),
...[...deps].map(async dep => this.maybeEmitDep(dep, rawPath, !isESM)),
...[...imports].map(async dep => this.maybeEmitDep(dep, rawPath, false)),
]);
}
}
7 changes: 6 additions & 1 deletion src/resolve-dependency.ts
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
import { isAbsolute, resolve, sep } from 'path';
import { Job } from './node-file-trace';
import { Job, parseSpecifier } from './node-file-trace';

// node resolver
// custom implementation to emit only needed package.json files for resolver
// (package.json files are emitted as they are hit)
export default async function resolveDependency (specifier: string, parent: string, job: Job, cjsResolve = true): Promise<string | string[]> {
let resolved: string | string[];

// ESM imports are allowed to have querystrings, but the native Node behavior is to ignore them when doing
// file resolution, so emulate that here by stripping any querystring off before continuing
specifier = parseSpecifier(specifier, cjsResolve).path

if (isAbsolute(specifier) || specifier === '.' || specifier === '..' || specifier.startsWith('./') || specifier.startsWith('../')) {
const trailingSlash = specifier.endsWith('/');
resolved = await resolvePath(resolve(parent, '..', specifier) + (trailingSlash ? '/' : ''), parent, job);
Expand Down
55 changes: 45 additions & 10 deletions test/unit.test.js
Original file line number Diff line number Diff line change
@@ -1,23 +1,39 @@
const fs = require('fs');
const { join, relative } = require('path');
const { join, relative, sep } = require('path');
const { nodeFileTrace } = require('../out/node-file-trace');

global._unit = true;

const skipOnWindows = ['yarn-workspaces', 'yarn-workspaces-base-root', 'yarn-workspace-esm', 'asset-symlink', 'require-symlink'];
const skipOnWindows = ['yarn-workspaces', 'yarn-workspaces-base-root', 'yarn-workspace-esm', 'asset-symlink', 'require-symlink', 'cjs-querystring'];
const unitTestDirs = fs.readdirSync(join(__dirname, 'unit'));
const unitTests = [
...unitTestDirs.map(testName => ({testName, isRoot: false})),
...unitTestDirs.map(testName => ({testName, isRoot: true})),
...unitTestDirs.map(testName => ({testName, isRoot: false})),
];

for (const { testName, isRoot } of unitTests) {
const testSuffix = `${testName} from ${isRoot ? 'root' : 'cwd'}`;
if (process.platform === 'win32' && (isRoot || skipOnWindows.includes(testName))) {
console.log(`Skipping unit test on Windows: ${testSuffix}`);
continue;
};
const unitPath = join(__dirname, 'unit', testName);

if (process.platform === 'win32') {
if (isRoot || skipOnWindows.includes(testName)) {
console.log(`Skipping unit test on Windows: ${testSuffix}`);
continue;
}
} else {
if (testName === 'cjs-querystring') {
// Create (a git-ignored copy of) the file we need, since committing it
// breaks CI on Windows. See https://github.com/vercel/nft/pull/322.
const currentFilepath = join(unitPath, 'noPunctuation', 'whowhatidk.js');
const newFilepath = currentFilepath.replace(
'noPunctuation' + sep + 'whowhatidk.js',
'who?what?idk!.js'
);
if (!fs.existsSync(newFilepath)) {
fs.copyFileSync(currentFilepath, newFilepath);
}
}
}

it(`should correctly trace ${testSuffix}`, async () => {

Expand All @@ -41,6 +57,25 @@ for (const { testName, isRoot } of unitTests) {
return null
}
}

// mock an in-memory module store (such as webpack's) where the same filename with
// two different querystrings can correspond to two different modules, one importing
// the other
if (testName === 'querystring-self-import') {
if (id.endsWith('input.js') || id.endsWith('base.js') || id.endsWith('dep.js')) {
return fs.readFileSync(id).toString()
}

if (id.endsWith('base.js?__withQuery')) {
return `
import * as origBase from './base';
export const dogs = origBase.dogs.concat('Cory', 'Bodhi');
export const cats = origBase.cats.concat('Teaberry', 'Sassafras', 'Persephone');
export const rats = origBase.rats;
`;
}
}

return this.constructor.prototype.readFile.apply(this, arguments);
});

Expand All @@ -67,8 +102,8 @@ for (const { testName, isRoot } of unitTests) {
if (testName === 'multi-input') {
inputFileNames.push('input-2.js', 'input-3.js', 'input-4.js');
}
const { fileList, reasons } = await nodeFileTrace(

const { fileList, reasons, warnings } = await nodeFileTrace(
inputFileNames.map(file => join(unitPath, file)),
{
base: isRoot ? '/' : `${__dirname}/../`,
Expand Down Expand Up @@ -193,7 +228,7 @@ for (const { testName, isRoot } of unitTests) {
expect(sortedFileList).toEqual(expected);
}
catch (e) {
console.warn(reasons);
console.warn({reasons, warnings});
fs.writeFileSync(join(unitPath, 'actual.js'), JSON.stringify(sortedFileList, null, 2));
throw e;
}
Expand Down
7 changes: 7 additions & 0 deletions test/unit/cjs-querystring/input.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
// Test that CJS files treat question marks in filenames as any other character,
// matching Node behavior

// https://www.youtube.com/watch?v=2ve20PVNZ18

const baseball = require('./who?what?idk!');
console.log(baseball.players);
12 changes: 12 additions & 0 deletions test/unit/cjs-querystring/noPunctuation/whowhatidk.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
module.exports = {
players: {
first: 'Who',
second: 'What',
third: "I Don't Know",
left: 'Why',
center: 'Because',
pitcher: 'Tomorrow',
catcher: 'Today',
shortstop: "I Don't Give a Damn!",
},
};
5 changes: 5 additions & 0 deletions test/unit/cjs-querystring/output.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[
"package.json",
"test/unit/cjs-querystring/input.js",
"test/unit/cjs-querystring/who?what?idk!.js"
]
4 changes: 4 additions & 0 deletions test/unit/esm-querystring-mjs/animalFacts/aardvark.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
import { numSpecies } from "./bear.mjs?beaver?bison";
console.log(`There are ${numSpecies} species of bears.`);

export const food = "termites";
4 changes: 4 additions & 0 deletions test/unit/esm-querystring-mjs/animalFacts/bear.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
import * as cheetah from "./cheetah.mjs?cow=chipmunk";
console.log(`Cheetahs can run ${cheetah.topSpeed} mph.`);

export const numSpecies = 8;
1 change: 1 addition & 0 deletions test/unit/esm-querystring-mjs/animalFacts/cheetah.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
export const topSpeed = 65;
6 changes: 6 additions & 0 deletions test/unit/esm-querystring-mjs/input.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
// Test that querystrings of various forms get stripped from esm imports when those
// imports contain the `.mjs` file extension

import * as aardvark from "./animalFacts/aardvark.mjs?anteater";

console.log(`Aardvarks eat ${aardvark.food}.`);
7 changes: 7 additions & 0 deletions test/unit/esm-querystring-mjs/output.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[
"test/unit/esm-querystring-mjs/animalFacts/aardvark.mjs",
"test/unit/esm-querystring-mjs/animalFacts/bear.mjs",
"test/unit/esm-querystring-mjs/animalFacts/cheetah.mjs",
"test/unit/esm-querystring-mjs/input.js",
"test/unit/esm-querystring-mjs/package.json"
]
4 changes: 4 additions & 0 deletions test/unit/esm-querystring-mjs/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"private": true,
"type": "module"
}
4 changes: 4 additions & 0 deletions test/unit/esm-querystring/animalFacts/aardvark.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
import { numSpecies } from './bear?beaver?bison';
console.log(`There are ${numSpecies} species of bears.`);

export const food = 'termites';
4 changes: 4 additions & 0 deletions test/unit/esm-querystring/animalFacts/bear.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
import * as cheetah from './cheetah?cow=chipmunk';
console.log(`Cheetahs can run ${cheetah.topSpeed} mph.`);

export const numSpecies = 8;
1 change: 1 addition & 0 deletions test/unit/esm-querystring/animalFacts/cheetah.js
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
export const topSpeed = 65;
5 changes: 5 additions & 0 deletions test/unit/esm-querystring/input.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
// Test that querystrings of various forms get stripped from esm imports
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add package.json to this dir with "type": "module", otherwise its wont be valid for Node.js


import * as aardvark from './animalFacts/aardvark?anteater';

console.log(`Aardvarks eat ${aardvark.food}.`);
7 changes: 7 additions & 0 deletions test/unit/esm-querystring/output.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[
"test/unit/esm-querystring/animalFacts/aardvark.js",
"test/unit/esm-querystring/animalFacts/bear.js",
"test/unit/esm-querystring/animalFacts/cheetah.js",
"test/unit/esm-querystring/input.js",
"test/unit/esm-querystring/package.json"
]
4 changes: 4 additions & 0 deletions test/unit/esm-querystring/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"private": true,
"type": "module"
}
5 changes: 5 additions & 0 deletions test/unit/querystring-self-import/base.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import * as dep from './dep';

export const dogs = ['Charlie', 'Maisey'];
export const cats = ['Piper'];
export const rats = dep.rats;
1 change: 1 addition & 0 deletions test/unit/querystring-self-import/dep.js
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
export const rats = ['Debra'];
10 changes: 10 additions & 0 deletions test/unit/querystring-self-import/input.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// Test that if a file and the same file with a querystring correspond to different
// modules in memory, one can successfully import the other. The import chain
// goes `input` (this file) -> `base?__withQuery` -> `base` -> `dep`, which means
// that if `dep` shows up in `output`, we know that both `base?__withQuery` and
// `base` have been loaded successfully.

import * as baseWithQuery from './base?__withQuery';
console.log('Dogs:', baseWithQuery.dogs);
console.log('Cats:', baseWithQuery.cats);
console.log('Rats:', baseWithQuery.rats);
Loading