Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement po_create()/po_update() for creating/updating translations #235

Merged
merged 36 commits into from
Nov 11, 2021

Conversation

hadley
Copy link
Collaborator

@hadley hadley commented Nov 1, 2021

I don't love the name tr_add(), and it's making me doubt that the developer/translator distinction is that useful, because this function might be run by the developer or an R-packages savvy translator.

I tweaked the .Rproj to make my usual roxygen2 workflow a little smoother; let me know if you want to revert.

I'm little surprised that potools doesn't already use msginit, so maybe I should instead copy the approach used in translate_package()?

R/msgmerge.R Outdated Show resolved Hide resolved
@hadley
Copy link
Collaborator Author

hadley commented Nov 1, 2021

Changed it to add or update, depending on whether or not the .po file already exists. That makes tr_add() and even worse name.

@hadley hadley changed the title Implement tr_add() for adding new translations Implement po_create()/po_update() for creating/updating translations Nov 4, 2021
@hadley
Copy link
Collaborator Author

hadley commented Nov 4, 2021

I think this is better as po_create() and po_update(). I think po_update() should update all languages if none supplied; but that needs code from #234, so I'll wait until you've have a chance to review that.

potools.Rproj Outdated Show resolved Hide resolved
@MichaelChirico
Copy link
Owner

I'm little surprised that potools doesn't already use msginit

There are a couple of gettext tools I would do well to integrate better with / provide good wrappers for...

R/msgmerge.R Outdated
#' new messages are added; and translations for deleted message are marked
#' as deprecated and moved to the bottom of the file.
#'
#' @param lang Language identifiers. These are typically two letters (e.g.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically repeats the description in ?translate_package... @inheritParams doesn't feel quite right because of how detailed we want to be... Maybe we need a ?language-code .Rd file we can refer to? @param languages Language identifiers. See [language-code] for details.?

@codecov-commenter

This comment has been minimized.

R/msgmerge.R Outdated Show resolved Hide resolved
R/msgmerge.R Outdated Show resolved Hide resolved
Merge commit '3849f477c94b95af34764c5c96b7c0fce284eedd'
And fundamentally change approach
@hadley
Copy link
Collaborator Author

hadley commented Nov 9, 2021

Ok, reimplemented basically from scratch; I'll add a bunch of tests shortly.

po_create("jp", verbose = TRUE)
Message <simpleMessage>
Updating 'jp' R translation
Running msgmerge on './po/R-jp.po' succeeded:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from #257, but it feels a bit wordy to me. What do you think about not showing the call by default?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it's coming from verbose=TRUE (now the default), are you suggesting we flip the default, or that we implement verbose=0,1,2,...?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to routinely display a message for success. Adding levels of verbosity is unlikely to be worth the effort IMO, so I'd be in favour of verbose = FALSE by default and not passing in verbose from the wrapping function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's not quite right; I think on success we want to see the stdout/stderr from the command line tool, but not the call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tweaked the code to behave the way that I want just to give you a concrete proposal to look at. Let me know what you think.

@@ -300,7 +300,7 @@ format.po_metadata = function(x, template = FALSE, use_plurals = FALSE, ...) {
x$email = "EMAIL@ADDRESS"
x$language = ''
x$language_team = "LANGUAGE <[email protected]>"
x$charset = 'CHARSET'
x$charset = 'UTF-8'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this was CHARSET before, but I was getting an error from tools::checkPoFile() that it was failing to iconv CHARSET to UTF-8.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm note that .pot is not mentioned in ?tools::checkPoFile. CHARSET shows up in templates in base:

https://github.com/wch/r-source/blob/bf5885960808d7c539e74b238575f54de0faba04/src/library/base/po/R.pot#L17

https://github.com/wch/r-source/blob/bf5885960808d7c539e74b238575f54de0faba04/src/library/base/po/R-base.pot#L10

Here's what I'm seeing in the xgettext docs:

https://www.gnu.org/software/gettext/manual/gettext.html

MIME-Version, Content-Type, Content-Transfer-Encoding
These values are set according to the content of the POT file and the current locale. If the POT file contains charset=UTF-8, it means that the POT file contains non-ASCII characters, and we keep the UTF-8 encoding. Otherwise, when the POT file is plain ASCII, we use the locale’s encoding.

and

Content-Type
Replace ‘CHARSET’ with the character encoding used for your language, in your locale, or UTF-8. This field is needed for correct operation of the msgmerge and msgfmt programs, as well as for users whose locale’s character encoding differs from yours (see Charset conversion).

In the gettext sources, here is where xgettext initially populates the default CHARSET:

https://github.com/autotools-mirror/gettext/blob/030c0341a4a0ba6ad7fe62e83ff663bdc76cbe4d/gettext-tools/src/xgettext.c#L2015-L2028

Here write-po treats CHARSET as ASCII:

https://github.com/autotools-mirror/gettext/blob/030c0341a4a0ba6ad7fe62e83ff663bdc76cbe4d/gettext-tools/src/write-po.c#L1650-L1653

This snippet makes me thing setting UTF-8 is probably OK, since the default CHARSET has the probably-in-general faulty assumption that the strings in the .pot file are all-ASCII:

https://github.com/autotools-mirror/gettext/blob/030c0341a4a0ba6ad7fe62e83ff663bdc76cbe4d/gettext-tools/src/po-charset.c#L490-L509

That said, current r-devel .pot are all-ASCII:

grep -rP "[\x80-\xFF]" ~/svn/R-devel/ --include=*.pot
# <empty>

CRAN would require this to hold true for all packages at the R level, and probably the C level?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably patch tools::checkPoTools to accept the CHARSET ➡️ ASCII logic, but I'm not sure that would solve all of the issues of using checkPoTools() on .pot files when it was designed for .po.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'll back this change out here and we can discuss more thoroughly in a separate issue. I think potools should assume all .pot and .po files are UTF-8 and then fix any downstream problems that causes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, backing out this change doesn't break any of my tests (or interactive usage) so definitely doesn't need to be here.

R/msgmerge.R Outdated Show resolved Hide resolved
R/msgmerge.R Show resolved Hide resolved
R/po_create.R Outdated
#'
#' `po_create()` creates a new `po/{languages}.po` containing the messages to be
#' translated. If a translation already exists, it'll be updated with any
#' changes to the `.pot` since it was last touched.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify what happens to existing translations here too? Details may be overkill, but mentioning fuzzying seems appropriate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put most of the details in po_update() and then link to there from here.

I think the updating here is mostly incidental, just because if you use msginit without checking, it will reset the existing translations, which I don't think you ever want. But at the same time, the main point of this function is not the updating, so I don't want to draw too much attention to it.

R/msgmerge.R Show resolved Hide resolved
@@ -300,7 +300,7 @@ format.po_metadata = function(x, template = FALSE, use_plurals = FALSE, ...) {
x$email = "EMAIL@ADDRESS"
x$language = ''
x$language_team = "LANGUAGE <[email protected]>"
x$charset = 'CHARSET'
x$charset = 'UTF-8'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm note that .pot is not mentioned in ?tools::checkPoFile. CHARSET shows up in templates in base:

https://github.com/wch/r-source/blob/bf5885960808d7c539e74b238575f54de0faba04/src/library/base/po/R.pot#L17

https://github.com/wch/r-source/blob/bf5885960808d7c539e74b238575f54de0faba04/src/library/base/po/R-base.pot#L10

Here's what I'm seeing in the xgettext docs:

https://www.gnu.org/software/gettext/manual/gettext.html

MIME-Version, Content-Type, Content-Transfer-Encoding
These values are set according to the content of the POT file and the current locale. If the POT file contains charset=UTF-8, it means that the POT file contains non-ASCII characters, and we keep the UTF-8 encoding. Otherwise, when the POT file is plain ASCII, we use the locale’s encoding.

and

Content-Type
Replace ‘CHARSET’ with the character encoding used for your language, in your locale, or UTF-8. This field is needed for correct operation of the msgmerge and msgfmt programs, as well as for users whose locale’s character encoding differs from yours (see Charset conversion).

In the gettext sources, here is where xgettext initially populates the default CHARSET:

https://github.com/autotools-mirror/gettext/blob/030c0341a4a0ba6ad7fe62e83ff663bdc76cbe4d/gettext-tools/src/xgettext.c#L2015-L2028

Here write-po treats CHARSET as ASCII:

https://github.com/autotools-mirror/gettext/blob/030c0341a4a0ba6ad7fe62e83ff663bdc76cbe4d/gettext-tools/src/write-po.c#L1650-L1653

This snippet makes me thing setting UTF-8 is probably OK, since the default CHARSET has the probably-in-general faulty assumption that the strings in the .pot file are all-ASCII:

https://github.com/autotools-mirror/gettext/blob/030c0341a4a0ba6ad7fe62e83ff663bdc76cbe4d/gettext-tools/src/po-charset.c#L490-L509

That said, current r-devel .pot are all-ASCII:

grep -rP "[\x80-\xFF]" ~/svn/R-devel/ --include=*.pot
# <empty>

CRAN would require this to hold true for all packages at the R level, and probably the C level?

@@ -300,7 +300,7 @@ format.po_metadata = function(x, template = FALSE, use_plurals = FALSE, ...) {
x$email = "EMAIL@ADDRESS"
x$language = ''
x$language_team = "LANGUAGE <[email protected]>"
x$charset = 'CHARSET'
x$charset = 'UTF-8'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably patch tools::checkPoTools to accept the CHARSET ➡️ ASCII logic, but I'm not sure that would solve all of the issues of using checkPoTools() on .pot files when it was designed for .po.

po_create("jp", verbose = TRUE)
Message <simpleMessage>
Updating 'jp' R translation
Running msgmerge on './po/R-jp.po' succeeded:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it's coming from verbose=TRUE (now the default), are you suggesting we flip the default, or that we implement verbose=0,1,2,...?

R/utils.R Outdated
@@ -167,3 +167,19 @@ is_outdated <- function(src, dst) {
}

is_testing = function() identical(Sys.getenv("TESTTHAT"), "true")

local_test_package <- function(..., .envir = parent.frame()) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love the look of this, and looks like a great redesign of generating test packages (the current suite is pretty unwieldy/arbitrarily organized). a good companion would be local_translation_conn() to keep translations right next to where they'll be used too.

but feels like it should be in tests/testthat/helper.R, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've generally been moving away from using helper files, because it seems more natural just to keep all code in one place. Additionally, as these local_ type helpers grow more complex, it's not crazy that you might also want to test them.

That said, I don't feel particularly strongly about it, and since you already have helper.R, that's a more sensible home, so I'll move it there.

}

}
po_prefix <- function(type = c("R", "src")) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the one place this will break is for base, which uses R-base.pot for R and R.pot for src. I'll just add a TODO for now to make sure that case is handled later.

#' @export
po_create <- function(languages, dir = ".", verbose = !is_testing()) {
package <- get_desc_data(dir, "Package")
po_files <- po_language_files(languages, dir)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be nice to warn() here if e.g. R-pkg.pot is present but pkg.pot is not & the package has a src/ directory.

OTOH, if the translation is handled entirely by potools, this should only happen if src/ doesn't have any messages to translate, right? So maybe it will be a warning highly prone to false positives...

skipping for now, but file a follow-up issue if you think the warning is worthwhile.

@MichaelChirico MichaelChirico merged commit e9f85ff into MichaelChirico:master Nov 11, 2021
@MichaelChirico
Copy link
Owner

Looking great, thanks!

@hadley hadley deleted the add-translation branch November 11, 2021 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants