Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stray non-breaking space in BRF output #82

Open
rbeezer opened this issue Nov 3, 2022 · 3 comments
Open

Stray non-breaking space in BRF output #82

rbeezer opened this issue Nov 3, 2022 · 3 comments

Comments

@rbeezer
Copy link

rbeezer commented Nov 3, 2022

I'm getting what I think is a stray non-breaking space in BRF output.

  1. I apply file2brf (Version 2.11.0) to an HTML file purpose-built for translation via this method.

  2. HTML contains

<div data-braille="tableofcontents">Contents</div>
  1. Semantic file contains
contentsheader div,data-braille,tableofcontents
  1. Output BRF has

,3t5ts

as the ToC header, where there is a single U+00A0 after the final "s" and before the newline. Clearly visible in my pager (less) and by other means.

I looked through source but couldn't see where a change could be made to test, and a pull request formulated.

Thanks for any help you can provide, this is causiing me to use an incorrect encoding in a Python program that parses the BRF.

https://github.com/PreTeXtBook/pretext/blob/d402bdb3613d95984708150abe2fdb33123f565a/pretext/pretext.py#L2209

@bertfrees bertfrees transferred this issue from liblouis/liblouis Nov 4, 2022
@bertfrees
Copy link
Member

Hi Rob, I've transferred this issue to the liblouisutdml repository because I think it's unlikely that this is a Liblouis issue.

Perhaps what would help to track down this issue is a (minimal) test with input HTML, configuration files (ini, cfg and sem files), translation tables and command line arguments.

@rbeezer
Copy link
Author

rbeezer commented Nov 4, 2022

Thanks, Bert. I forgot there are two repositories. :-( Of course, I should have been poking around in this one.

I'll dig a bit deeper, and as a last resort construct a minimal example.

@rbeezer
Copy link
Author

rbeezer commented Feb 24, 2023

It is not visible here, but there is a non-breaking space (U+00A0) that is output immediately after Contents. So you will need to produce the output and examine the nature of the "extra" character.

Looks like the format centered in style contentsheader is to blame.

Minimal example attached.

contents-space.zip

Use

file2brl -f minimal.cfg source.html

Output is

                ,3t5ts 
,f/ ,divi.n






















                                      #a
  ,f/ ,divi.n
,"s 3t5t4






















                                      #a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants