Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is --addNamespaces="100" no longer working? (to add Wikipedia's portal pages to ZIM files for schools!) #1784

Closed
holta opened this issue Feb 16, 2023 · 11 comments · Fixed by #1787
Assignees
Milestone

Comments

@holta
Copy link

holta commented Feb 16, 2023

See --addNamespaces="100" at the bottom of these 2 recipes:

I believe this is supposed to add Wikipedia's portal pages to a ZIM file, and if I'm understanding @kelson42 this was working in the past. Questions:

Schools are always urgently ask for far more colorful pages so as to animate Wikipedia-centric learning — among younger people especially — who commonly turn against Wikipedia (possibly for life?) When elders too often present Wikipedia as text-heavy + ponderous 🐢

So the question is: Can colorful + youth-friendly portal pages like the following (more graphical i.e. with much less text, very intentionally more inviting!) be made available + easily findable offline — ideally in many of the 87 Wikipedia languages[*] that offer portals?

[*] 87 languages appear to have Wikipedia portals, as indicated in the top-right of https://en.wikipedia.org/wiki/Wikipedia:Contents/Portals

Brief code context: https://github.com/openzim/mwoffliner/search?q=addnamespaces

@holta holta changed the title Is --addNamespaces="100" no longer working? (to add Wikipedia's portal pages to a ZIM file for schools!) Is --addNamespaces="100" no longer working? (to add Wikipedia's portal pages to a ZIM file for schools!) Feb 16, 2023
@holta holta changed the title Is --addNamespaces="100" no longer working? (to add Wikipedia's portal pages to a ZIM file for schools!) Is --addNamespaces="100" no longer working? (to add Wikipedia's portal pages to ZIM files for schools!) Feb 16, 2023
@kelson42 kelson42 added this to the 1.13.0 milestone Feb 16, 2023
@kelson42
Copy link
Collaborator

@pavel-karatsiuba please verify this ticket is valid, I’m really surprised it does not work?

@holta
Copy link
Author

holta commented Feb 17, 2023

Thank you @pavel-karatsiuba:

  1. Discoverability (how to find these 540 portal pages) is the key UX question if you have suggestion(s) for teachers? (After we hopefully find them!)

  2. Side Question: Since --addNamespaces does not yet work with --articleList, is it possible to include all portal pages by listing them manually (e.g. listing all 540 portal pages) within --articleList ?

CONTEXT: Students need these colorful/introductory pages in ZIM files like the 1 Million article Wikipedia here: #1756

@kelson42
Copy link
Collaborator

@pavel-karatsiuba Please die at start if --articleList and --addNamespace are specified together

@kelson42
Copy link
Collaborator

kelson42 commented Feb 18, 2023

Given ZIM file has indeed not the Portal pages but current version of MWoffliner seems to work fine when tested on a smaller wiki and with another namespace. We should clarify here what goes wrong because even new versions wikipedia_en_all_mini_2023-01.zim miss them. But https://library.kiwix.org/viewer#wikipedia_ru_all_maxi/A/%D0%9F%D0%BE%D1%80%D1%82%D0%B0%D0%BB:%D0%9C%D0%BE%D1%80%D1%81%D0%BA%D0%BE%D0%B9_%D0%BF%D0%BE%D1%80%D1%82%D0%B0%D0%BB has the portals... so it seems somehow to be a specific problem with WPEN.

@kelson42
Copy link
Collaborator

I would propose to wait https://farm.openzim.org/pipeline/d51dec8e90efcd0ad3f9be36 is over before continuing the investigation.

@holta
Copy link
Author

holta commented Feb 18, 2023

Progress 🙏

Then, very compact/quick/revealing test cases will hopefully be possible after that 😄

@Inbefortus
Copy link

From the most recent English Wikipedia ZIM:

Screenshot_20230220-083205_Samsung Internet Beta
Screenshot_20230220-083247_Samsung Internet Beta
Screenshot_20230220-083357_Samsung Internet Beta

@kelson42
Copy link
Collaborator

@Inbefortus So basically there is not problem (anymore). Thx you for the check. I will merge the new automated test and close the ticket.

@holta
Copy link
Author

holta commented Feb 21, 2023

@Inbefortus great news: Would you happen to know why Portals are now restored?

  1. Discoverability (how to find these 540 portal pages) is the key UX question if you have suggestion(s) for teachers? (After we hopefully find them!)

A very partial answer to the above question is below — unfortunately 540,497 pages is a flood rather than a clean UX — but still this is a start:

https://library.kiwix.org/viewer#search?books.name=wikipedia_en_all_maxi_2023-02&pattern=Portal%3A

image

@Inbefortus
Copy link

great news: Would you happen to know why Portals are now restored?

@holta I'm not even sure if portals ever existed in the first place. I don't recall them being in the 2021-03 ZIM. They must have been added around the end of 2022 or something like that.

@kelson42
Copy link
Collaborator

kelson42 commented Mar 1, 2023

great news: Would you happen to know why Portals are now restored?

@holta I'm not even sure if portals ever existed in the first place. I don't recall them being in the 2021-03 ZIM. They must have been added around the end of 2022 or something like that.

This is actually the only logical explanation, but I don't remember and really woukd really wonder about that. As long as we can not track precisely changes on Zimfarm, we will face this kind of problem. See openzim/zimfarm#498

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants