Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly remove surrounding spaces in parsed u-* values #48

Open
aaronpk opened this issue Mar 15, 2020 · 4 comments
Open

Explicitly remove surrounding spaces in parsed u-* values #48

aaronpk opened this issue Mar 15, 2020 · 4 comments

Comments

@aaronpk
Copy link
Member

aaronpk commented Mar 15, 2020

There is currently an inconsistency in the PHP, Ruby and Python parsers regarding spaces in u-* values. The PHP and Ruby parsers will remove surrounding spaces from the value returned in u-* properties, but the Python parser does not.

Given this HTML:

<div class="h-card">
  <a href="  https://example.com/  " class="u-url p-name">Test</a>
</div>

PHP:

        {
            "type": [
                "h-card"
            ],
            "properties": {
                "name": [
                    "Test"
                ],
                "url": [
                    "https://example.com/"
                ]
            }
        }

Ruby

    {
      "type": [
        "h-card"
      ],
      "properties": {
        "url": [
          "https://example.com/"
        ],
        "name": [
          "Test"
        ]
      }
    }

Python

  {
   "type": [
    "h-card"
   ], 
   "properties": {
    "name": [
     "Test"
    ], 
    "url": [
     "  https://example.com/  "
    ]
   }
  }

The HTML spec says:

The href attribute on a and area elements must have a value that is a valid URL potentially surrounded by spaces.

Since the Microformats parser is trying to return a URL value, it seems like removing the spaces is the correct behavior, even though that is not currently in the Microformats spec, which just says:

if a.u-x[href] or area.u-x[href] or link.u-x[href], then get the href attribute

http://microformats.org/wiki/microformats2-parsing#parsing_a_u-_property

I would like to propose a spec change to make it explicit that the parser should remove any surrounding spaces from the href attribute.

if a.u-x[href] or area.u-x[href] or link.u-x[href], then get the href attribute after removing all leading/trailing space characters

@sknebel
Copy link
Member

sknebel commented Mar 15, 2020

Same applies to <img src=, <video src=, …

(Originally published at: https://www.svenknebel.de/posts/2020/3/2/)

@gRegorLove
Copy link
Member

Related discussion about what the mf2 spec means by "normalized": #9

I'm +1 for trimming the whitespace, though the spec change might need to be in the last bullet point ("return the normalized absolute URL...") to ensure it applies to all cases.

@willnorris
Copy link

+1 from me. I don't recall what the Go library does in this regard, but I'm happy to update it to match this spec change.

willnorris added a commit to willnorris/microformats that referenced this issue Mar 16, 2020
@jgarber623
Copy link
Member

+1 to @gRegorLove's note. I think the last bullet in the "parsing a u-* property" should be updated:

return the normalized absolute URL of the gotten value, following the containing document's language's rules for resolving relative URLs (e.g. in HTML, use the current URL context as determined by the page, and first <base> element, if any).

…and/or whitespace stripping is implied in the existing text? I'd rather we be explicit, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants