Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File names for translated vignettes? #6221

Open
tdhock opened this issue Jul 4, 2024 · 14 comments
Open

File names for translated vignettes? #6221

tdhock opened this issue Jul 4, 2024 · 14 comments
Assignees
Labels
translation issues/PRs related to message translation projects

Comments

@tdhock
Copy link
Member

tdhock commented Jul 4, 2024

Hi! @Rdatatable/french are planning to translate vignettes to French.
what should the file names for the translated vignettes be?

in the existing directory:
datatable-intro.Rmd
datatable-intro-fr.Rmd
datatable-intro-de.Rmd
datatable-intro-pl.Rmd
....

or:
vignettes/fr
datatable-intro.Rmd
...
vignettes/de
datatable-intro.Rmd
...
vignettes/pl
datatable-intro.Rmd
...

or:
vignettes/po/fr
datatable-intro.Rmd
...
vignettes/po/de
datatable-intro.Rmd
...

vignettes/po/pl
datatable-intro.Rmd
...

or: other ???

My tendency would be to do it in the existing directory. I'm not sure if sub-directories are possible?

@ChristianWia
Copy link

advantage would be to have the locale directly in the file name.

@tdhock tdhock added the translation issues/PRs related to message translation projects label Jul 5, 2024
@tdhock tdhock self-assigned this Jul 5, 2024
@MichaelChirico
Copy link
Member

@eliocamp is this something in scope for the documentation working group? Are there any preliminary recommendations you can suggest here?

@eliocamp
Copy link
Contributor

eliocamp commented Jul 5, 2024

Yes, it is very much in scope. No, I have no recommendations yet 🥲.

@phgrosjean
Copy link
Contributor

Perhaps testing also various options with {pkgdown} to see which one presents better. In the existing directory, I am afraid it would end up in a long list of vignettes.

@leofontenelle
Copy link
Contributor

On Linux, it's usually something like help/appname/(C|fr|zn_CN|pt_BR)/the actual documentation. Examples: LibreOffice documentation, GNOME user documentation, a KDE app complete with its own documentation, GNU/Linux man pages. For the C-locale man pages, there's no C, en or en-US directory.

If there was some way do make .pot files out of vignettes, I guess each vignette would be its own domain, all .po files would be in po/ in the source code, and the directory structure above would be created at compile time.

@ChristianWia
Copy link

ChristianWia commented Jul 5, 2024

May be some solution around Python package pip install mdpo allowing md2po and then reverse po2md to find back the translated vignette. Must still investigate how much information we lose.

Commands:

md2po datatable-intro-fr.Rmd --quiet --save --po-filepath e:/datatable-intro-fr.po
po2md e:/datatable-intro-fr.Rmd --pofiles e:/datatable-intro-fr.po --save e:/datatable-intro-fr2.Rmd

Issue: mondeja/md-ulb-pwrap#7

my tests going on -> https://github.com/ChristianWia/vignettes

@eliocamp
Copy link
Contributor

eliocamp commented Jul 6, 2024

In our work to get multilingual documentation, we are thinking that the translated documentation would live in its own translation module. So the French helpfiles for data.table would be in a package called data.table.fr (or whatever, the name is not important) and any user who would like to see the documentation in that language would install it. The idea is that this would decouple translations from the original package and users won't need to get translations in a language they don't need.

I think that model might also work for vignettes.

@phgrosjean
Copy link
Contributor

OK, I see. Then, in the meantime, we could place vignettes translation in French in an 'fr' subdirectory. Once that new mechanism will be available, we could easily create the corresponding repository and transfer these files.
Two questions to @eliocamp:

  1. {data.table.fr} should be versioned, right? There should be a version for each version of the original {data.table} package. Otherwise, there is a risk of a wrong man page or vignette. This multiplies the maintenance work on many packages, but it is currently the case for the translation teams anyway. What happens when there is no version concordance between {data.table} and {data.table.fr}? A fallback to the closest available version and a warning on the top of the translated man page, or what?

  2. Should the end user call library(data.table.fr) instead of library(data.table)? (this could be a problem with code shared in a multinational context), or {data.table.fr} is detected and used automatically by {data.table}, depending on something like Sys.getenv("LANG")?

@eliocamp
Copy link
Contributor

eliocamp commented Jul 6, 2024

  1. For documentation, there's no need to version the translation module with the original package because the string replacement is done based on strings and the structure of the Red file. So as long as the documentation doesn't change, then the translation stays current and useful. Still we might want to state which version is being translated (with a special field in the DESCRIPTION). I don't know if vignettes can be translated using this system exactly, but the translation module version shouldn't be tied to the original package version (package version could change without meaningful changes in documentation and the translation module might be updated independently)

  2. The latter. The user never has to load the translation module directly.

@eliocamp eliocamp closed this as completed Jul 6, 2024
@eliocamp eliocamp reopened this Jul 6, 2024
@MichaelChirico
Copy link
Member

As long as it actually works, I agree for now putting it in subdirectories is the way to go. Please pass along any learnings in this process to the R documentation working group team -- @eliocamp would https://github.com/RConsortium/multilingual-documentation-wg or https://github.com/eliocamp/rhelpi18n be the better place (I assume the former).

@eliocamp
Copy link
Contributor

Yes, I think these high-level discussions are better had in https://github.com/RConsortium/multilingual-documentation-wg

@ChristianWia
Copy link

Impact of LOCALE on vignette translation

If we agree the translated vignette is a clone of the EN one (same YAML, same skeleton), several elements should be considered during translation.

1 vignettes not using common resources :
no pb, translate only the .Rmd
path to vignette is free (once the directories are defined)

2 vignette using common resources :
Ex: datatable-sd-usage.Rmd using directories ./css and ./plots
In this case the translated vignette should be among others to benefit of the same directory structure.

2.1 access to CSS :
CSS is shared between vignettes to get the menu an section numbering (today).
It is independant of the Locale but may be not always, if we consider scripts LTR and RTL

2.2 access to images and other medias :

2.2.1 médias not relying on Locale :
This is the case of images without EN texte.
no pb - keep the EN existing transclusions

2.2.2 medias depending on the Locale :
This is the case of schemes, architectures, interfaces, flows, spreadsheets... containing EN text.
More of that if the .Rmd describes what is on the image, both must be coherent.

2.2.2.1 .Rmd does not describe the image
no pb , keep the EN image

2.2.2.2 .Rmd describes the image

2.2.2.2.1 either we keep the EN image :
In this case the translated text should use the EN terms for coherence.

2.2.2.2.2 or we create a Locale image (*) :
In this case the .Rmd should use the Locale terms of the translated image for coherence.

(*) I think it is possible to modify the contents of an .svg to translate the displayed text (to investigate)

@aitap
Copy link
Contributor

aitap commented Sep 4, 2024

Part of the appeal of vignettes is that they are already part of the package and accessible offline.

Some very approximate testing shows that enabling the French vignettes (#6455) to render adds more than 30% to the R CMD build time (which may be not that much of a problem because make build skips the vignettes) and less than 15% to the R CMD check time (which takes away from the gains for #6400). The absolute increase for R CMD check is larger because it both weaves the vignettes and re-runs the tangled scripts, but not twice as much because not all vignettes have code in them. The relative increase for R CMD build will be much more dramatic with MAKEFLAGS=-j$(nproc).

french-vignettes

Caching the R results in the English vignettes for reuse in the translations would be hard to implement and is unlikely to help much: time is also spent inside knitr, rmarkdown and pandoc.

If the translated vignettes are included in the data.table package, the sorting order of the vignettes may also become a problem. (Should the translation be sorted near the original? Should the vignettes in the same language be sorted together?) Without \VignetteIndex, vignette files would have to be renamed to achieve the desired order.

raw script
# this was without OPENBLAS_NUM_THREADS=1, so the CPU load is above normal
# most of the process doesn't use linear algebra anyway
elapsed <- \(s) {
 s <- regmatches(s, gregexec('(\\d+):([\\d.]+)elapsed', s, perl = TRUE))[[1]][2:3,]
 as.numeric(s[1,])*60 + as.numeric(s[2,])
}
d <- rbind(
 cbind(kind = 'vignettes/fr/*', rbind(
  data.frame(process = 'build', time = elapsed('
52.84user 14.42system 0:38.99elapsed 172%CPU (0avgtext+0avgdata 1270424maxresident)k
53.32user 14.27system 0:38.61elapsed 175%CPU (0avgtext+0avgdata 1270228maxresident)k
53.36user 14.05system 0:38.53elapsed 174%CPU (0avgtext+0avgdata 1270548maxresident)k
  ')),
  data.frame(process = 'check', time = elapsed('
138.89user 44.84system 2:06.32elapsed 145%CPU (0avgtext+0avgdata 1270412maxresident)k
137.14user 44.95system 2:04.29elapsed 146%CPU (0avgtext+0avgdata 1270192maxresident)k
143.14user 48.11system 2:10.45elapsed 146%CPU (0avgtext+0avgdata 1270584maxresident)k
  '))
 )),
 cbind(kind = 'vignettes/*-fr*', rbind(
  data.frame(process = 'build', time = elapsed('
82.31user 23.16system 0:51.61elapsed 204%CPU (0avgtext+0avgdata 1313952maxresident)k
82.02user 23.47system 0:51.31elapsed 205%CPU (0avgtext+0avgdata 1314172maxresident)k
81.63user 23.42system 0:51.24elapsed 205%CPU (0avgtext+0avgdata 1313888maxresident)k
  ')),
  data.frame(process = 'check', time = elapsed('
171.63user 59.06system 2:21.92elapsed 162%CPU (0avgtext+0avgdata 1313928maxresident)k
173.07user 63.13system 2:23.48elapsed 164%CPU (0avgtext+0avgdata 1313184maxresident)k
175.47user 58.57system 2:26.24elapsed 160%CPU (0avgtext+0avgdata 1313056maxresident)k
  '))
 ))
)
lattice::barchart(time ~ kind | process, aggregate(time ~ kind + process, d, mean), ylim = c(0, max(d$time)))

@phgrosjean
Copy link
Contributor

phgrosjean commented Sep 4, 2024

It would be great to minimally impact the package check/compilation/installation with translations. If we got a convention that the -lang- translation of vignettes is in Rdatatable/data.table.-lang- GitHub repos, and that these repos mainly serve to compile a localized {pkgdown} site, it is relatively easy to compute links to corresponding pages in the original vignettes. Also, a link back to the original English vignettes can be added in the translations.

The installation of the {data.table.} package is not necessary. Only if users want offline versions of the vignettes.

It seems to me to be a relatively simple solution to this problem for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
translation issues/PRs related to message translation projects
Projects
None yet
Development

No branches or pull requests

7 participants