Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3 proposals for the POT exporter #102

Open
CodeSpartan opened this issue Nov 28, 2021 · 3 comments
Open

3 proposals for the POT exporter #102

CodeSpartan opened this issue Nov 28, 2021 · 3 comments
Labels
third-party feature This feature was contributed by someone else and depends on the contributor to respond/fix

Comments

@CodeSpartan
Copy link

CodeSpartan commented Nov 28, 2021

I came across a few cases where the pot exporter didn't have the functionality that I needed. I'm going to try to implement it myself, even though I'm not yet sure that it's technically possible to implement all the things listed below, because I haven't researched it yet.
Still, before I proceed, I'd like to lay it all out in the open, so it can be discussed and altered if necessary.

Proposal 1

Right now it's unclear how to proceed if one word in English has two different translations based on context.
First of all, there will be no context at all, after the #100 issue is fixed.
And second, even while we still have it, the exporter ignores the same words.

Tentative proposal to solve this is to add the following command line options:
--add-comments[=keyword]
--add-context[=keyword]

If the gathered string is directly preceded by commented out lines in dart, we'll check if they begin with translator or context keywords. By default, it should look like:

// translator: ...
// context: ...

Same words with different context should be exported separately.

The keywords can be altered in the command line.

This would also fix #95

Proposal 2

"Mobs: %s".plural(length).fill([length]) currently exports the following:

#: ./lib\peace_screen_widgets\mob_widgets.dart:33
msgid "Mob: %s"
msgid_plural "Mob: %s"
msgstr[0] ""

While what it should be able to export is this:

#: ./lib\peace_screen_widgets\mob_widgets.dart:33
msgid "Mob: %s"
msgid_plural "Mobs: %s"
msgstr[0] ""

POEdit supports plurals, but whatever is gathered is set in stone and the English plural form can't be changed, so it has to be set in the code. In POEdit it looks like this when the exports are manually corrected:

'poedit image'

Importing this back works correctly with other locales, but the English version keeps displaying "Mob" and not "Mobs". I'm assuming it's a bug?

Proposed syntax:
"Mob: %s".plural("Mobs: %s", length)

This would fix the problem if you're translating from English.

(Obviously, if we're developping the software in a language other than English and which has different plural forms, we'll eventually need to be able to define these plural forms immediately in the code. But my suggestion for now is to make English-to-anything localizations work at least. Anything-to-anything can be taken care of later.)

Proposal 3

Gender is not supported for export. POT doesn't support genders, but we can make do. Most languages have from two to four genders, e.g. masculine, feminine, neuter and plural.

My proposal is to add a .gender() suffix, and if it's encountered, then the string gets exported as many times as there are genders, and the gender is prepended to the context.
The amount of genders (and their names) can be defined in a command line parameter as follows: --genders[=gen1,gen2]
Default values are [masculine, feminine]

This means that we need to be able to define English genders in the code, too.
My proposal is to use the first one as default, and the other ones are specified as parameters by order of appearance.

Example 1:

"He's doing his job".gender("She's doing her job", "They're doing their job", genderIndex)

With the --genders=masculine,feminine,plural, it would produce:

#: ./lib\peace_screen_widgets\mob_widgets.dart:33
msgctxt: masculine
msgid "He's doing his job"

#: ./lib\peace_screen_widgets\mob_widgets.dart:33
msgctxt: feminine
msgid "She's doing her job"

#: ./lib\peace_screen_widgets\mob_widgets.dart:33
msgctxt: neuter
msgid "They're doing their job"

Example 2:

"strong".gender()

This would produce same text in English 3 times, but it allows us to localize it differently in languages where the adjectives are conjugated based on gender.

#: ./lib\peace_screen_widgets\mob_widgets.dart:33
msgctxt: masculine
msgid "strong"

#: ./lib\peace_screen_widgets\mob_widgets.dart:33
msgctxt: feminine
msgid "strong"

#: ./lib\peace_screen_widgets\mob_widgets.dart:33
msgctxt: neuter
msgid "strong"

Let me know what you think.

@bauerj
Copy link
Contributor

bauerj commented Nov 29, 2021

Hey,

thanks for bringing this up! I know gettext has a lot of features that we currently don't use and it probably makes sense to implement them.

Proposal 1

I don't really see a problem here, although of course the code structure would have to be changed in some cases:

List? choices;
// this
choices = ["can".i18n, "bottle".i18n];
// becomes this:
choices = [
  // translators: Like a can of soda, not like "Yes we can".
  "can".i18n, 
  "bottle".i18n,
];
// or (although a bit ugly)
choices = [/* translators: Like a can of soda, not like "Yes we can". */"can".i18n, "bottle".i18n];

The implementation would have to make sure that stuff like this works too:

// translators: xyz
var a = "abc".i18n;

So ideally, the exporter would scan for the next translatable string after the comment and print warnings if too many lines (more than 3?) are in between.

These features should probably be enabled by default with a sensible comment prefix (like translators: and translation-context:).

Proposal 2

I don't understand the bug you're describing. Could you please elaborate? Do you import the translation template back as a translation for English?

Also, if you propose to change the library syntax @marcglasberg should comment on this as well.

Proposal 3

I guess you mean something like "strong".gender(genderIndex)? Having to remember the order in which genders are defined is not very intuitive. Although I don't have a better idea offhand.


If you plan on tackling this, please send an individual pull request for each proposal.

@CodeSpartan
Copy link
Author

CodeSpartan commented Nov 29, 2021

In regards to proposal 1

A problem here is that UnifyingAstVisitor's void visitComment(Comment node) doesn't actually visit non-documentation comments, as discussed here. TL;DR of their discussion:

class Test {
   /// Documentation comment VISITED by `visitComment`
   void testFunc() {
      // Non-documentation comment NOT visited by `visitComment`
      print('hello'); // end-of-line comment NOT visited by `visitComment`
    }
}

And the only way to visit them is through a token's precedingComments. Not just any token on the line, but by the uppermost token on the line.
Because there's no notion of code line in AST, retrieving these comments would be so hacky and computationally expensive, that I just don't think it's worth it.

So here's where we're at:

appBar: AppBar(title: Text( /* we can get this comment */" i18n Demo".i18n))

appBar: AppBar(
          title: Text(
              // we can get this comment, too
              "i18n Demo".i18n))

// we can't get this one
appBar: AppBar(title: Text("i18n Demo".i18n))

To make the last one work would require something along the lines of:

  • working from a literal string, we recursively check if its parent is still on our line using the hacky
    var lineNo = "\n".allMatches(source.substring(0, node.offset)).length + 1; many times.
  • once we're not on our line, the previous one was the uppermost parent on our line, we can finally check if it has parent.beginToken.precedingComments
  • And somewhere in there, we must ALSO check if our literal string was the FIRST literal string gathered by i18n on this particular line... Yeah, that's some gymnastics.

I think this algorithm described above can slow down the string gathering process, I'd prefer to avoid doing it. Let me know what you think.

In regards to proposal 2

Maybe it's not a bug, maybe I'm just doing something wrong?

I have this line in my code

"Mob: %s".plural(length).fill([length])

Currently, it exports this:

#: ./lib\peace_screen_widgets\mob_widgets.dart:33
msgid "Mob: %s"
msgid_plural "Mob: %s"
msgstr[0] ""

In POEdit it looks like this, where the English plural is defined incorrectly and it can't be changed in POEdit.
poedit_screenshot1

Now let's go into strings.pot and manually fix the plural form.

#: ./lib\peace_screen_widgets\mob_widgets.dart:33
msgid "Mob: %s"
msgid_plural "Mobs: %s"
msgstr[0] ""

Now in POEdit it looks correct:
poedit_screenshot2

The behavior in app is this: if we switch to Russian, it will correctly use the singular form or one of the two plural forms depending on the passed number. But in English it will always use singular form. The discrepancy in the behavior is what I call a bug.

So there's two problems here:

  1. The exporter can't export the English plural form to POE.
  2. Even when we manually edit the POT file, the code doesn't use the English plural form. (well, duh, it doesn't re-import the POT file)

I can define English plural elsewhere, but it doesn't get gathered:

  static final _temp = Translations("en") +
      {
        "en": "Mob: %s".one("Mob: %s").many("Mobs: %s")
      };

Maybe it's not a bug after all. It's just not POT friendly.

@CodeSpartan
Copy link
Author

CodeSpartan commented Nov 30, 2021

I've ran into an issue with the first proposal.

I am correctly exporting comments and context from code to POT.

Text(
	// context: some context
	"i18n Demo".i18n),
Text(
	// no context!
	"i18n Demo".i18n),

According to the POT specs, these are two distinct strings, because the uniqueness of a string is a combination of string+context.

So I export them separately:

#: ./lib\example1\main.dart:79
msgctxt "some context"
msgid "i18n Demo"
msgstr ""

#: ./lib\example1\main.dart:82
msgid "i18n Demo"
msgstr ""

And then I translate them. In fr.po, they're like this:

#: lib\example1\main.dart:79
msgctxt "some context"
msgid "i18n Demo"
msgstr "i18 demo (some context)"

#: lib\example1\main.dart:82
msgid "i18n Demo"
msgstr "i18 demo (no context)"

But in the app, after loading the fr.po file and setting the locale, the result is this:

Unsurprising, now that I think about it. I'm not very good at reading other people's complicated code, but I'm guessing that this library simply doesn't support this kind of string uniqueness? The key here is always the string itself, and the library can't even be modified to accommodate the needs of the POT format. @marcglasberg can you please confirm my thoughts? Or is it something that can be worked around somehow?

@marcglasberg marcglasberg added the third-party feature This feature was contributed by someone else and depends on the contributor to respond/fix label Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
third-party feature This feature was contributed by someone else and depends on the contributor to respond/fix
Projects
None yet
Development

No branches or pull requests

3 participants