Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanned values don't play well with dotted notation: getting "invalid type: string [..], expected a borrowed string" #798

Open
yannham opened this issue Oct 2, 2024 · 4 comments

Comments

@yannham
Copy link

yannham commented Oct 2, 2024

I'm using Spanned to deserialize TOML to Nickel (a configuration language) while preserving spans as much as possible, as Nickel adds validation capabilities and we'd like to link back validation errors to the precise piece of TOML data that failed.

To do so, we define a bespoke datastructure that is more or less like a TOML value but with Spanned appropriately sprinkled, and write a custom deserializer using serde_untagged. You can find the type definition and the deserializer here: https://github.com/tweag/nickel/blob/927ee23993747b7851e51bcfe3eb3e685ba4ebb1/core/src/serialize.rs#L491-L582

However, when deserializing the following file:

[foo.bar]
baz = "qux"

This gives the following surprising error:

error: toml parse error: TOML parse error at line 1, column 1
  |
1 | [foo.bar]
  | ^
invalid type: string "bar", expected a borrowed string



in `foo`

It's surprising because we never try to deserialize a borrowed string: all strings, both as terminal values and keys, are owned in SpannedValue (NickelString is a simple wrapper around String). Also, any TOML file without dotted notation is parsed fine. After some experimentation, it seems that this happens when trying to deserialize the value (and not the key) of the outer map, that is the value associated to foo.

I suspect that there are some shenanigans around getting the location of the nested map {bar = {baz = "qux"}}. It seems that the spanned deserializer of toml-rs tries to deserialize markers as borrowed string (

impl<'de, T> serde::de::Deserialize<'de> for Spanned<T>
) but it also expects a very precise structure, so I'm not entirely sure what's going on here.

The issue is that I don't see any easy work-around: once we've tried to deserialize the content of a map as spanned (which is entirely legit for files that don't have the dotted notation), there doesn't seem to be anyway to retry the same deserialization at a different type.

@yannham
Copy link
Author

yannham commented Oct 2, 2024

(For the record, the explicitly nested version foo = {bar = {baz = "qux"}} parses correctly, so it really seems to be around dotted field syntax)

@epage
Copy link
Member

epage commented Oct 2, 2024

Do you have an isolated reproduction case for this?

@yannham
Copy link
Author

yannham commented Oct 2, 2024

I can try to make one, yes.

@jneem
Copy link

jneem commented Oct 21, 2024

Here's a smallish reproduction, which crashes with "invalid type: string "bar", expected a borrowed string". Removing the Spanned on line 16 makes it succeed.

#!/usr/bin/env -S cargo -Zscript
---cargo
[dependencies]
serde = { version = "1", features = ["derive"] }
serde-untagged = "0.1.6"
toml = "0.8.19"
---

use serde_untagged::UntaggedEnumVisitor;
use serde::de::{Deserializer, MapAccess};
use toml::Spanned;

#[derive(Debug)]
pub enum SpannedValue {
    String(String),
    Map(Vec<(String, Spanned<SpannedValue>)>)
}

impl<'de> serde::Deserialize<'de> for SpannedValue {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        let data = UntaggedEnumVisitor::new()
            .string(|str| Ok(SpannedValue::String(str.into())))
            .map(|mut map| {
                let mut result = Vec::new();

                while let Some((k, v)) = map.next_entry()? {
                    result.push((k, v));
                }

                Ok(SpannedValue::Map(result))
            })
            .deserialize(deserializer)?;

        Ok(data)
    }
}

const INPUT: &str = r#"
[foo.bar]
baz = "qux"
"#;

fn main() {
    let val: SpannedValue = toml::from_str(INPUT).unwrap();
    dbg!(val);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants