Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling HTML special cases #312

Merged
merged 32 commits into from
Feb 22, 2022
Merged

Improve handling HTML special cases #312

merged 32 commits into from
Feb 22, 2022

Commits on Jan 26, 2022

  1. Aggressively try to retain markup on words if it appears on one of it…

    …s source tokens
    
    I do need those continuation delimiters for that, even though I really don't like them since they're so character set focussed!
    jelmervdl committed Jan 26, 2022
    Configuration menu
    Copy the full SHA
    40eabc1 View commit details
    Browse the repository at this point in the history
  2. Outdated todo

    🎉
    jelmervdl committed Jan 26, 2022
    Configuration menu
    Copy the full SHA
    723e725 View commit details
    Browse the repository at this point in the history

Commits on Feb 8, 2022

  1. Configuration menu
    Copy the full SHA
    9600c70 View commit details
    Browse the repository at this point in the history
  2. Make HTML tags case insensitive

    Tag case is retained in the output though. Well, for the opening tag at least. Closing tag always matches opening tag.
    jelmervdl committed Feb 8, 2022
    Configuration menu
    Copy the full SHA
    3d6673c View commit details
    Browse the repository at this point in the history
  3. Treat <wbr> special

    Fixes #339
    jelmervdl committed Feb 8, 2022
    Configuration menu
    Copy the full SHA
    5634c40 View commit details
    Browse the repository at this point in the history
  4. Add support for ignoring tags

    Fixes #313
    jelmervdl committed Feb 8, 2022
    Configuration menu
    Copy the full SHA
    e516dbd View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    19acb54 View commit details
    Browse the repository at this point in the history

Commits on Feb 9, 2022

  1. Add test for regression in ignored element code path

    std::bad_alloc :( Also expand tests to make sure we're recording the full ignored tag contents.
    jelmervdl committed Feb 9, 2022
    Configuration menu
    Copy the full SHA
    46159ba View commit details
    Browse the repository at this point in the history
  2. Fix bad_alloc in consumeIgnoredTag

    Trouble was that `Scanner::scanEntity()` returns a value() that does not point to inside the HTML input stream (but to a *decoded* entity instead). So we need another API, `Scanner::start()`, to figure out where a token starts in HTML.
    jelmervdl committed Feb 9, 2022
    Configuration menu
    Copy the full SHA
    af39c75 View commit details
    Browse the repository at this point in the history

Commits on Feb 11, 2022

  1. Prevent straggler void elements to show up twice

    When a word near the of a translated sentence aligns with one at the beginning, it pushes prevIt back to the beginning. Then the next translated token will insert all straggler void elements between prevIt and it. Instead of using prevIt to track where we were with inserting stragglers, we keep our own iterator that never moves backwards.
    jelmervdl committed Feb 11, 2022
    Configuration menu
    Copy the full SHA
    f595c51 View commit details
    Browse the repository at this point in the history
  2. Use isContinuation function to check whether we need to insert a spac…

    …e after a tag
    
    Main reason for using this instead of `std::isspace` is to prevent a space being inserted between the tag and the full stop in `This is a <b>test</b>.`. Because that has been bothering me a lot.
    jelmervdl committed Feb 11, 2022
    Configuration menu
    Copy the full SHA
    32f403a View commit details
    Browse the repository at this point in the history

Commits on Feb 14, 2022

  1. Merge branch 'main' into html-improvements

    # Conflicts:
    #	src/translator/html.cpp
    jelmervdl committed Feb 14, 2022
    Configuration menu
    Copy the full SHA
    afc75f0 View commit details
    Browse the repository at this point in the history
  2. Treat more elements as opaque when parsing

    These are all elements that Firefox treats as opaque in their HTML5 parser. As a consequence, when you'd request `noscriptElement.innerHTML` you'd get the raw text content of the thing, as opposed to a serialized tree. So invalid HTML? Just passed on as is! Well, we're going to do the same then. Besides, if noscript then also probably no extension.
    jelmervdl committed Feb 14, 2022
    Configuration menu
    Copy the full SHA
    72e54f8 View commit details
    Browse the repository at this point in the history
  3. Do not skip <title> for now

    This tag is a bit difficult. No HTML is allowed inside of it (e.g. similar to `<textarea>`) but we do want to capture it's text content as text (decoding entities etc.) so we can translate it. So for now I'll just trust that nobody is insane enough to use HTML inside the title tag. And if they do, we'll be as insane back and try to maintain that (very much not allowed) structure.
    jelmervdl committed Feb 14, 2022
    Configuration menu
    Copy the full SHA
    ea244d2 View commit details
    Browse the repository at this point in the history

Commits on Feb 16, 2022

  1. Follow clang-tidy advice

    jelmervdl committed Feb 16, 2022
    Configuration menu
    Copy the full SHA
    dda9860 View commit details
    Browse the repository at this point in the history
  2. Fix missing \n\n?

    I don't know what happened here.
    jelmervdl committed Feb 16, 2022
    Configuration menu
    Copy the full SHA
    d7e1c07 View commit details
    Browse the repository at this point in the history
  3. Add more comments and less creative variable names

    Hopefully this will make the overall code more readable given you're familiar with the concept it tries to implement…
    jelmervdl committed Feb 16, 2022
    Configuration menu
    Copy the full SHA
    203ba0a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    a1ee8e9 View commit details
    Browse the repository at this point in the history

Commits on Feb 21, 2022

  1. Configuration menu
    Copy the full SHA
    ac83e50 View commit details
    Browse the repository at this point in the history
  2. Update tests

    jelmervdl committed Feb 21, 2022
    Configuration menu
    Copy the full SHA
    6a7bd21 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    c90d00f View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    ad612e4 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    f451983 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    c891eda View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    54be426 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    279462c View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    346821b View commit details
    Browse the repository at this point in the history

Commits on Feb 22, 2022

  1. Configuration menu
    Copy the full SHA
    8cc695b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    48cfc00 View commit details
    Browse the repository at this point in the history
  3. Remark about 'taint'

    jelmervdl committed Feb 22, 2022
    Configuration menu
    Copy the full SHA
    a81dfdf View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    bbfa4e3 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    ea10e91 View commit details
    Browse the repository at this point in the history