Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reserved words in unquoted imports but disallow whitespace. #4035

Merged
merged 2 commits into from
Aug 21, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
235 changes: 155 additions & 80 deletions working/unquoted-imports/feature-specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Author: Bob Nystrom

Status: In-progress

Version 0.3 (see [CHANGELOG](#CHANGELOG) at end)
Version 0.4 (see [CHANGELOG](#CHANGELOG) at end)

Experiment flag: unquoted-imports

Expand Down Expand Up @@ -107,18 +107,18 @@ import widget.tla.proto/client/component;
```

You can probably infer what's going on from the before and after, but the basic
idea is that the library is a slash-separated series of dotted identifier
segments. The first segment is the name of the package. The rest is the path to
the library within that package. A `.dart` extension is implicitly added to the
end. If there is only a single segment, it is treated as the package name and
its last dotted component is the path. If the package name is `dart`, it's a
"dart:" library import.
idea is that the library is a slash-separated series path segments, each of
which is a dotted-separated identifier component. The first segment is the name
of the package. The rest is the path to the library within that package. A
`.dart` extension is implicitly added to the end. If there is only a single
segment, it is treated as the package name and its last dotted component is the
path. If the package name is `dart`, it's a "dart:" library import.

The way I think about the proposed syntax is that relative imports are
*physical* in that they specify the actual relative path on the file system from
the current library to another library *file*. Because those are physical file
paths, they use string literals and file extensions as they do today. SDK and
package imports are *logical* in that you don't know where the library your
package imports are *logical* in that you don't know where the library you're
importing lives on your disk. What you know is it's *logical name* and the
relative location of the library you want inside that package. Since these are
abstract references to a *library*, they are unquoted and omit the file
Expand Down Expand Up @@ -160,7 +160,7 @@ the reasons for the choices this proposal makes:

### Path separator

An import shorthand syntax that only supported a single identifier would work
A package shorthand syntax that only supported a single identifier would work
for packages like `test` and `args` that only expose a single library, but
would fail for even very common libraries like `package:flutter/material.dart`.
So we need some notion of a package name and a path within the that package.
Expand Down Expand Up @@ -220,27 +220,29 @@ import flutter/material;
Is the `flutter/material` part a single token or three (`flutter`, `/`, and
`material`)? The main advantage of tokenizing it as a single monolithic token is
that we could potentially allow characters or identifiers in there aren't
otherwise valid Dart. For example, we could let you use reserved words as path
segments:
otherwise valid Dart. For example, we could let you use hyphens as word
separators as in:

```dart
import weird_package/for/if/ok;
import weird-package/but-ok;
```

The disadvantage is that the tokenizer doesn't generally have enough context to
know when it should tokenize `foo/bar` as a single import path token versus
know when it should tokenize `foo/bar` as a single package path token versus
three tokens that are presumably dividing two variables named `foo` and `bar`.

Unlike Lasse's [earlier proposal][lasse], this proposal does *not* tokenize an
import path as a single token. Instead, it's tokenized using Dart's current
Unlike Lasse's [earlier proposal][lasse], this proposal does *not* tokenize a
package path as a single token. Instead, it's tokenized using Dart's current
lexical grammar.

This means you can't have a path segment that's a reserved word or is otherwise
not a valid Dart identifier. Fortunately, our published guidance has *always*
told users that [package names][name guideline] and [directories][directory
guideline] should be valid Dart identifiers. Pub will complain if you try to
publish a package whose name isn't a valid identifier. Likewise, the linter will
flag directory or library names that aren't identifiers.
This means you can't have a path component that uses some combination of
characters that isn't currently a single token in Dart, like `hyphen-separated`
or `123LeadingDigits`. A path component must be an identifier (including
built-in identifiers) or a reserved word. Fortunately, our published guidance
has *always* told users that [package names][name guideline] and
[directories][directory guideline] should be valid Dart identifiers. Pub will
complain if you try to publish a package whose name isn't a valid identifier.
Likewise, the linter will flag file names that aren't identifiers.

[name guideline]: https://dart.dev/tools/pub/pubspec#name
[directory guideline]: https://dart.dev/effective-dart/style#do-name-packages-and-file-system-entities-using-lowercase-with-underscores
Expand All @@ -258,18 +260,52 @@ in a large corpus of pub packages and open source widgets:
69 ( 0.010%): dotted with non-identifiers =
```

This splits every "package:" import's path into segments separated by `/`. Then
for each segment, it reports whether the segment is a valid identifier, a
built-in identifier like `dynamic` or `covariant`, etc. Almost all segments are
either valid identifiers, or dotted identifiers where each subcomponent is a
valid identifier.
This splits every "package:" path into segments separated by `/`. Then it splits
segments into components separated by `.` For each component, the analysis
reports whether the component is a valid identifier, a built-in identifier like
`dynamic` or `covariant`, or a reserved word like `for` or `if`.

(For the very small number that aren't, they can continue to use the old quoted
"package:" import syntax to import the library.)
Components that are not some kind of identifier (regular, reserved, or built-in)
are vanishingly rare. In those few cases, if a user can't simply rename the
file, they can continue to use the old quoted "package:" syntax to refer to the
file.

I think this approach is much simpler than trying to add special lexing rules.
It's consistent with how Java, C# and other languages parse their imports. It
does mean users can do silly things like:
### Reserved words and semi-reserved words

One confusing area of Dart that the previous table hints at is that Dart has
several categories of identifiers that vary in how user-accessible they are:

* Reserved words like `for` and `class` can never be used by a user as a
regular identifier in any context.

* Built-in identifiers like `abstract` and `interface` can't be used as *type*
names but can be used as other kinds of identifiers.

* Contextual keywords like `await` and `show` behave like keywords in some
specific contexts but are usable as regular identifiers everywhere else.

This leads to confusion about which of these flavors of identifiers can be used
as package paths. Which of these, if any, are valid:

```dart
import if/else;
import abstract/interface;
import show/hide;
```

Many Dart users (including experts, some of whom may be members of the Dart
language team) don't know the full list of reserved or semi-reserved words. We
don't want users to run into problems determining which identifiers work in
package paths. To that end, we allow *all* reserved words and identifiers,
including built-in identifiers and contextual keywords as path components.

### Whitespace and comments

Even though the unquoted path is tokenized as separate tokens, we don't allow
whitespace or comments to appear between them as we do in most other places in
the language.

We could allow users to write code like:

```dart
import strange /* comment */ . but
Expand All @@ -281,7 +317,37 @@ import strange /* comment */ . but
fine;
```

But they can also choose to *not* do that.
This wouldn't cause any problems for a Dart implementation. It would simply
discard the whitespace and comments as it does elsewhere and the resulting path
is `strange.but/another/fine`.

However, it likely causes problems for Dart *users* and other simpler tools and
scripts that work with Dart code. In particular, we often see homegrown tools
that want to "parse" a Dart file to find its package references and traverse the
dependency graph. While these tools ideally should use a full Dart parser (like
the one in the [analyzer package][], which is freely available), the reality is
that users often cobble together simple scripts using regex to do this kind of
parsing, or they need to write these tools in a language other than Dart. In
those cases, if the package path happens to contain whitespace or comments, the
tool will likely silently fail to recognize the package path.

[analyzer package]: https://pub.dev/packages/analyzer

Also, we find no compelling *use* for whitespace and comments inside package
paths. To that end, this proposal makes it an error. All of the tokens in the
munificent marked this conversation as resolved.
Show resolved Hide resolved
path must be directly adjacent with no whitespace, newlines, or comments between
them. The previous import is an error. However, we still allow comments in or
after the directives outside of the path. These are all valid:

```dart
import /* Weird but OK. */ some/path;
export some/path; // Hi there.
part some/path // Before the semicolon? Really?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a thought... (TL;DR: Maybe don't allow unquoted URIs for part directives.)

We should not allow unquoted URIs in part of declarations, because we havent' had time to migrate people off of part of library.name yet, and that would mean part of foo.bar; would go directly from one valid meaning to another different valid meaning.

The part of library.name won't be disallowed until "parts with import", or maybe this feature, whichever comes first. (Not unless we can remove it in a release before either, but that sounds unlikely. And it looks better to remove a feature as part of another feature that seem to bring value that compensates.)

I don't think it's a problem to not allow unquoted URIs in part of declarations, because the URI of part of is always going to be a relative path, and relative paths cannot be unquoted anyway. (A part must be in the same package as the file it is part of, "parts with imports" makes that explicit, but it was always at least implicitly assumed.)

With that reasoning, maybe we should also never use an unquoted URI in a part directive.
Should we choose to not allow it, in order to highlight that unquoted URIs are for (other) packages, local relative references should/must always be quoted paths, and part URIs should always be relative references?

(That a part is in the same package as its parent is not a language requirement for any particular language version, as much as it's a language versioning requirement, so our tools that accept multiple language versions require parts and parent files to be in the same package, so they have the same default language version. Most likely also require both to be inside lib/ or both outside of lib/, although that's more debatable. But if not, I can make the same part file part of two different libraries.)

At least unless/until we allow unquoted relative paths like ./foo or ../foo. But that's another feature 😁.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be filed as an issue so that we don't lose the discussion. :)

For now, I'll leave the proposal as-is, but I'm definitely keen to talk about this more. For what it's worth, I would ultimately like to support unquoted parts. It's true that we don't support relative unquoted references. But my guess is that users will end up preferring unquoted package references even to files in the same package versus quoted relative imports. At least, that's what I think I'll end up doing.

If that's the case, then unquoted part directives will make sense too.

Copy link
Member

@lrhn lrhn Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preferring unquoted package references even to files in the same package versus quoted relative imports

I find it unlikely that most people will prefers part mypkg/src/subfeature/featurepart; to part "featurepart.dart"; in mypkg/src/subfeature/feature.dart.

We don't even have a way to omit the package name, and parts will always at least be below src/ too, and almost always in the same directory as the library file.

It also only works for libraries inside lib/. If you have part files in test/, bin/, tool/ or (now) hook/, you can't use unquoted package URIs for those.

But we'll see 😁 . (I expect users will ask for ways to not quote relative paths, and we can give them that, as part ./featurepart;. I still think we should use : to separate package name from path, because that makes it harder to typo a relative path into a package path. But the most likely mistake is to do import foo; intending it to mean import "./foo.dart";, which doesn't have any separator, so the separator probably won't make a big difference in anything. Maybe I just like the look of : better.)

;
```

The syntax that results from the above few sections is simple to tokenize and
parse while looking like a single opaque "unquoted string" to users and tools.
munificent marked this conversation as resolved.
Show resolved Hide resolved

## Syntax

Expand All @@ -291,54 +357,57 @@ We add a new rule and hang it off the existing `uri` rule already used by import
and export directives:

```
uri ::= stringLiteral | packagePath
packagePath ::= packagePathSegment ( '/' packagePathSegment )*
packagePathSegment ::= dottedIdentifierList
dottedIdentifierList ::= identifier ('.' identifier)*
uri ::= stringLiteral | packagePath
packagePath ::= pathSegment ( '/' pathSegment )*
pathSegment ::= segmentComponent ( '.' segmentComponent )*
munificent marked this conversation as resolved.
Show resolved Hide resolved
segmentComponent ::= IDENTIFIER
| RESERVED_WORD
| BUILT_IN_IDENTIFIER
| OTHER_IDENTIFIER
```

An import or export can continue to use a `stringLiteral` for the quoted form
(which is what they will do for relative imports). But they can also use a
`packagePath`, which is a slash-separated series of segments, each of which is a
series of dot-separated identifiers. *(The `dottedIdentifierList` rule is
already in the grammar and is shown here for clarity.)*
It is a compile-time error if any whitespace, newlines, or comments occur
between any of the `segmentComponent`, `/`, or `.` tokens in a `packagePath`.
munificent marked this conversation as resolved.
Show resolved Hide resolved
*In other words, there can be nothing except the terminals themselves from the
first `segmentComponent` in the `packagePath` to the last.*

*An import, export, or part directive can continue to use a `stringLiteral` for
the quoted form (which is what they will do for relative references). But they
can also use a `packagePath`, which is a slash-separated series of segments,
each of which is a series of dot-separated components.*

### Part directive lookahead

*There are two directives for working with part files, `part` and `part of`. The
`of` identifier is not a reserved word in Dart. This means that when the parser
sees `part of`, it doesn't immediately know if it is looking at a `part`
directive followed by an unquoted identifier like `part of;` or `part
of.some/other.thing;` versus a `part of` directive like `part of thing;` or
`part of 'uri.dart';` It must lookahead past the `of` identifier to see if the
next token is `;`, `.`, `/`, or another identifier.*
*There are two directives for working with part files, `part` and `part of`.
This means that when the parser sees `part of`, it doesn't immediately know if
it is looking at a `part` directive followed by an unquoted identifier like
`part of;` or `part of.some/other.thing;` versus a `part of` directive like
`part of thing;` or `part of 'uri.dart';` It must lookahead past the `of`
identifier to see if the next token is `;`, `.`, `/`, or another identifier.*
Copy link
Member

@lrhn lrhn Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring lookahead does mean we can't allow more words after part of.
Consider part of foo as f; which would be a part of the foo file, but which makes the imports inherited from foo available under f instead of in the top-level scope.
If we allow that, then part of deferred as f; would be ambiguous, if seen by itself. (EDIT: Well, no, it's a part, not an import, so deferred is not allowed on parts. We'd have to allow more on part directives too to get an ambiguity. /EDIT)

(In practice, a part file is almost always included by a parent file, so we know it's a part file. Only the analyzer and formatter sees files by themselves, and the formatter probably won't care. Also, why is this a part of package:deferred/deferred.dart?)

Or we could disallow packagePath for path directives. Shouldn't be a loss.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow that, then part of deferred as f; would be ambiguous, if seen by itself.

So you're saying this could be either:

  • A part of directive for a file named deferred which should be locally prefixed as f.
  • A part directive for a file named of which is deferred loaded and available under prefix f.

Is that right? If so, I don't think it's ambiguous since the latter isn't valid. We only allowed deferred on imports, not part, right? Does enhanced parts change that?

Copy link
Member

@lrhn lrhn Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was what I thought when I wrote it, but you can't have deferred parts (well, yet! ... but likely ever) so that's not really ambiguous.

So what I should have said is: If we allow something after part <uri> in the future, and that something can be a single identifier or dotted identifier, then part of something; is ambiguous.

And if we allow things after both part <uri> and part of <uri>, then if any thing after part <uri> has the form validUnquotedUri otherThing where otherThing is valid after part of <uri>, then part of *thing* is ambiguous.

A lot of "if"s.

The things I have actually considered so far are, during "parts with modifiers" discussions:

  • as id modifiers on part of and part to contain the identifiers they introduce.
  • sealed modifier on part of to just not inherit any imports from the parent file.

But, as Erik just pointed out to me, until we have a concrete ambiguity, we shouldn't be saying what an incorrect program means. That's a job for recovery and error messaging. We should say what allowed programs mean, and we should not allow programs that we don't want the user to write.

As currently specified, a valid program can have part id.id/id.id/id; with no spaces between the ids and the . and /s.
If it doesn't follow that format, it's not a valid Dart program, and we don't have to say how it's wrong, just that it is wrong.

That does mean that part of; is valid and means the same as part "package:of/of.dart";.
This is not parsing recovery if we allow the syntax and then give a "file/package not found" error. (It's incredibly unlikely that package:of/of.dart is a part file, and the current file is inside package:foo's lib directory, which is the only way it can not be an error.)

It's still surprising. It may be error-prone, if someone writes part of; and forgets to write a URI. Maybe they removed the library foo; because we tell them to, and then change part of foo; to just part of;. (It's obvious which file it's part of, right?)

If we want to prevent that, we can:

  • Not allow unquited URIs in part directives, or
  • Not consider a single identifier of as an unquoted URI in a part directive. (Basically the same as saying that of is a contextually reserved word, and contextually reserved words take precedence over everything else if they occur where we reserved them.)

Or we can do nothing, and have our tools recognize and give a tailored error message if someone does part of; and package:of/of.dart fails to exist or be a valid part of the current file. ("You wrote..., you probably meant to write ...".)


*This may add some complexity to parsing, but should be minor. Dart's grammar
has other places that require much more (sometimes unbounded) lookahead.*

## Static semantics

The semantics of the new syntax are defined by taking the `packagePath` and
converting it to a string. The directive then behaves as if the user had written
a string literal containing that string. The process is:
converting it to a URI string. The directive then behaves as if the user had
written a string literal containing that URI. The process is:

1. Let the *segment* for a `packagePathSegment` be a string defined by the
ordered concatenation of the `identifier` and `.` terminals in the
`packagePathSegment`, with all whitespace and comments removed. *So if
`packagePathSegment` is `a . b /* comment */ . c`, then its *segment* is
1. Let the *segment* for a `pathSegment` be a string defined by the ordered
concatenation of the `segmentComponent` and `.` terminals in the
`pathSegment`. *So if `pathSegment` is `a.b.c`, then its *segment* is
"a.b.c".*

2. Let *segments* be an ordered list of the segments of each
`packagePathSegment` in `packagePath`. *In other words, this and the
preceding step take the `packagePath` and convert it to a list of segment
strings while discarding whitespace and comments. So if `packagePathSegment`
is `a . b /* comment */ / c / d . e`, then *segments* is ["a.b", "c",
"d.e"].*
2. Let *segments* be an ordered list of the segments of each `pathSegment` in
`packagePath`. *In other words, this and the preceding step take the
`packagePath` and convert it to a list of segment strings. So if
`pathSegment` is `a.b/c/d.e`, then *segments* is ["a.b", "c", "d.e"].*

3. If the first segment in *segments* is "dart":

1. It is a compile error if there are no subsequent segments. *There's no
"dart:dart" or "package:dart/dart.dart" library. We reserve the right
1. It is a compile-time error if there are no subsequent segments. *There's
no "dart:dart" or "package:dart/dart.dart" library. We reserve the right
to use `import dart;` in the future to mean something useful.*

2. Let *path* be the concatenation of the remaining segments, separated
Expand All @@ -347,38 +416,38 @@ a string literal containing that string. The process is:
imports. But a custom Dart embedder or future version of Dart could in
theory introduce directories for SDK libraries.*

3. The URI is "dart:*path*". *So `import dart/async;` desugars to
`import "dart:async";`.*
3. The URI is "dart:*path*". *So `import dart/async;` imports the library
`"dart:async"`.*

4. Else if there is only a single segment:

munificent marked this conversation as resolved.
Show resolved Hide resolved
1. Let *name* be the segment.

2. Let *path* be the last identifier in the segment. *If the segment is
only a single identifier, this is the entire segment. Otherwise, it's
the last identifier after the last `.`. So in `foo`, *path* is `foo`.
In `foo.bar.baz`, it's `baz`.*
2. Let *path* be the last `segmentComponent` in the segment. *If the
segment is only a single `segmentComponent`, this is the entire segment.
Otherwise, it's the last identifier after the last `.`. So in `foo`,
*path* is `foo`. In `foo.bar.baz`, it's `baz`.*

3. The URI is "package:*name*/*path*.dart". *So `import test;` desugars to
`import "package:test/test.dart";`, and `import server.api;` desugars
to `import "package:server.api/api.dart";`.*
3. The URI is "package:*name*/*path*.dart". *So `import test;` imports the
library `"package:test/test.dart"`, and `import server.api;` imports
`"package:server.api/api.dart"`.*

5. Else:

munificent marked this conversation as resolved.
Show resolved Hide resolved
1. Let *path* be the concatenation of the segments, separated by `/`.

3. The URI is "package:*path*.dart". *So `import a/b/c/d;` desugars to
`import "package:a/b/c/d.dart";`.
2. The URI is "package:*path*.dart". *So `import a/b/c/d;` imports
`"package:a/b/c/d.dart"`.

Once the `packagePath` has been converted to a string, the directive behaves
exactly as if the user had written a `stringLiteral` containing that same
string.

Given the list of segments, here is a complete implementation of the desugaring
logic in Dart:
Given the list of segments, here is a complete Dart implementation of the logic
to convert an unquoted path to the effective URI it refers to:

```dart
String desugar(List<String> segments) => switch (segments) {
String toUri(List<String> segments) => switch (segments) {
['dart'] => 'ERROR. Not allowed to import just "dart"',
['dart', ...var rest] => 'dart:${rest.join('/')}',
[var name] => 'package:$name/${name.split('.').last}.dart',
Expand Down Expand Up @@ -409,15 +478,15 @@ may make a breaking change and remove support for the old syntax.

The `part of` directive allows a library name after `of` instead of a string
literal. With this proposal, that syntax is now ambiguous. Is it interpreted
as a library name, or as an unquoted URI that should be desugared to a URI?
as a library name, or as an unquoted URI that should be converted to a URI?
In other words, given:

```dart
part of foo.bar;
```

Is the file saying it's a part of the library containing `library foo.bar;` or
that it's part of the library found at URI `package:foo/bar.dart`?
that it's part of the library found at URI `package:foo.bar/bar.dart`?

Library names in `part of` directives have been deprecated for many years
because the syntax doesn't work well with many tools. How is a given tool
Expand Down Expand Up @@ -463,7 +532,7 @@ this proposal's semantics. In other words, `part of foo.bar;` is part of the
library at `package:foo/bar.dart`, not part of the library with name `foo.bar`.
munificent marked this conversation as resolved.
Show resolved Hide resolved

Users affected by the breakage can and should update their `part of` directive
to point to the URI of the library that the file is a part, using either the
to point to the URI of the library that the file is a part of, using either the
quoted or unquoted syntax.

### Language versioning
Expand All @@ -487,7 +556,7 @@ Since the static semantics are so simple, it is trivial to write a `dart fix`
that automatically converts existing "dart:" and "package:" string-based
directives to the new syntax. A handful of regexes are sufficient to break an
existing import into a series of slash-separated segments which are
dot-separated identifiers. Then the above snippet of Dart code will convert that
dot-separated components. Then the above snippet of Dart code will convert that
to the new syntax.

### Lint
Expand All @@ -501,6 +570,12 @@ new unquoted style whenever an existing directive could use it.

## Changelog

### 0.4

- Allow reserved words and built-in identifiers as path components (#3984).

- Disallow whitespace and comments inside package paths (#3983).

### 0.3

- Address breaking change in `part of` directives with library names.
Expand Down
Loading