Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: AIP-193 – Errors #3

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions aip/general/0193/aip.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Errors

Error handling is an important part of designing simple and intuitive APIs.
Consistent error handling allows developers to know how to expect to receive
errors, and to reduce boilerplate by having common error-handling logic, rather
than being expected to constantly add verbose error handling everywhere.

## Guidance

Services **must** clearly distinguish successful responses from error responses
by using appropriate HTTP codes:

- Successful responses **must** use HTTP status codes between 200 and 399.
- Errors indicating a problem with the user's request **must** use HTTP status
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this briefly in the WG, and I'm posting the question here again.

For parts of the proposal that do not map well with one or more of the "implementations" (say GraphQL in this case). How do we want to represent divergence?

For this specific case, in GraphQL there is a distinction between how to represent before GraphQL execution errors (e.g: malformed HTTP request to the server should yield 400) and say validation/business logic errors (e.g: GraphQL query syntax is invalid would yield a 200 status code with errors populated in the errors block of the body of the HTTP response)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect a GraphQL AIP to override this section.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @innou is what you're saying how GraphQL actually does it? returning success codes for cases where there are syntax errors in the request? that sounds like a bad idea to me, but if that's how it's done, there's little that can be done. in my mind, that would not be a good pattern to recommend. RFC 6231 is pretty clear on that one, i think:

400 Bad Request:
  The 400 (Bad Request) status code indicates that the server cannot or
  will not process the request due to something that is perceived to be
  a client error (e.g., malformed request syntax, invalid request
  message framing, or deceptive request routing).

codes between 400 and 499.
- Errors indicating a problem with the server's handling of an valid request
**must** use HTTP status codes between 500 and 599.

### Structure

Error responses **should** conform to the following interface:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is where you were expecting most of the comments :)

My recommendation would be to use RFC 7807

The short of it would be this:

interface Error {
  //URL formated identifier. Does not have to be a functioning URL
  type: string;

  //A human readable description of the problem. Should not change from occurrence to occurrence.
  title?: string

  //The HTTP status code
  status?: number

  //A human readable description of the problem that is unique to this occurrence.
  detail?: string

  //A URI reference unique to the occurrence of the problem. May or may not reveal additional information about the problem.
  instance?: string
}

Additional properties may be added to the root level of the object to convey information unique to the error.

Note that status is optional. That's because it is expected to already be a part of the response. Adding it here is redundant (and IMHO should be discouraged).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we based our AIP 193 on RFC 7807, you might be interested in seeing how ours differs from the RFC as an extension use case: https://github.com/rhamiltonsf/sf.aip/blob/master/aip/0193.md

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of the reasons why status is in there is to allow RFC 7807 to be used for representing problem reports in a self-contained way. and btw, type is optional, too: everything is optional.


```typescript
interface Error {
// The HTTP status code.
code: number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would very much appreciate it if this were renamed, both to align with RFC 7807 and also to not conflict with OCI Errors (which use code to communicate the specific error type).

Suggested change
code: number;
status: number;


// A developer-facing error message, in English.
message: string;

// The source of the error (usually the registered service address of the
// tool or product that generates the error).
// Example: "pubsub.googleapis.com"
domain: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name also seems very strange to me, as opposed to e.g. source or scope or namespace or some suffixed variant thereof.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And shouldn't it be optional?


// The type of the error. This is a constant value that identified the
// cause of the error, unique within a given domain, that developers write
// code against.
type: string;

// An array of additional error details.
details: ?any[];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about recommending a structure for details. Example:

details: [
 { 
   message: string;
   type: string;
   ...
 }
]

}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about describing code here?


  • The code field is the HTTP status code value returned as number. It MUST match the status code in the HTTP response.

- The `message` field is intended for consumption by humans, and therefore
**may** change, even within a single version.
- The `type` field is intended to have code written against it, and therefore
**must not** change. Values for this field should be 0-63 characters, and use
only lower-case letters, numbers, and the `-` character.
- The `domain` field is intended to provide a logical grouping for `type`
values, and **should** usually be set to the registered domain where the API
is served (such as `pubsub.googleapis.com`).
- If the error is generated by common internal infrastructure, the error
domain **must** be a globally-unique value that identifies the
infrastructure.
- The `details` field **may** contain any additional details that are useful,
including additional metadata or localized error messages.

### Messages

Error messages **should** help a reasonably technical user understand and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hudlow Who is the intended audience of these errors?

  • @rhamiltonsf discussed whether we should use RFC 7807, which may satisfy both requirements.

resolve the issue, and **should not** assume that the user is an expert in the
particular API. Additionally, error messages **must not** assume that the user
will know anything about its underlying implementation.

Error messages **should** be brief but actionable. Any extra information
**should** be provided in a `details` field. If even more information is
necessary, the service **should** provide a link where a reader can get more
information or ask questions to help resolve the issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding some examples like 👇 here?


Below are some examples of good errors and not so good errors:

    ❌  Invalid Book Name.
    ✅  Book name must be between 5 and 50 characters.

    ❌  Access is denied
    ✅  Only admin users have access to this resource.

    ❌  Bad input
    ✅  'ID' must be provided in the input

### Localization

Error messages **must** be in American English. If a localized error message is
also required, the service **should** provide the following structure within
its `details`:

```typescript
interface LocalizedMessage {
// The locale for this error message.
// Follows the spec defined at http://www.rfc-editor.org/rfc/bcp/bcp47.txt.
// Examples: 'en-US', 'de-CH', 'es-MX'
locale: string;

// The localized error message in the above locale.
message: string;
}
```

### Partial errors

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhudlow and @saurabhsahni, should we allow for the return of multiple errors?

  • @hudlow, @rhamiltonsf Probably not disparate issues (i.e. Authentication AND invalid body), however, this might need to be explored more for bulk and batch operations.
  • @saurabhsahni, @hudlow errors that are of the same type can be returned together.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhamiltonsf: We got blocked on the batch/bulk API recommendations last time when we chatted about this

  • @hudlow and @rhamiltonsf: In interest of moving this AIP forward, we may want to redact recommendation around batch/bulk APIs for now.


APIs **should not** support partial errors. Partial errors add significant
complexity for users, because they usually sidestep the use of error codes, or
move those error codes into the response message, where the user must write
specialized error handling logic to address the problem.

However, occasionally partial errors are unavoidable, particularly in bulk
operations where it would be hostile to users to fail an entire large request
because of a problem with a single entry.

Methods that require partial errors **should** use long-running operations (as
described in AIP-151), and the method **should** put partial failure
information in the metadata message. The errors themselves **must** still be
represented with an error object.

## Further reading

- For which error codes to retry, see AIP-194.

## Changelog

- **2020-09-02**: Refactored errors AIP to be more generic.
- **2020-01-22**: Added a reference to the `ErrorInfo` message in gRPC.
- **2019-10-14**: Added guidance restricting error message mutability to if
there is a machine-readable identifier present.
- **2019-09-23**: Added guidance about error message strings being able to
change.
7 changes: 7 additions & 0 deletions aip/general/0193/aip.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
id: 193
state: approved
created: 2019-07-26
placement:
category: polish
order: 30