How to derive the actual number of words per line for each chapter? #36

AlGantori · 2020-05-02T10:07:45Z

If I understand the main page description these "scripts" render from a font an image of the page then builds the rectangle bounds for each words (glyphs) generated (Correct?)

Does it also build one line bitmap at a time for a 15 lines/rows per page madina mushaf?

It sounds overly complex if all I want is the word count per line for each chapter.
Something like the following (showing word count for Fatiha, Baqara,)

   "1": [4,5,4,4,4,5,3],
   "2": [7,5,4,8,6,6,
        9,9,7,9,8,8,9, 8,9,10,9,10,8,7, 7, ..... ],

The text was updated successfully, but these errors were encountered:

ahmedre · 2020-05-02T10:22:38Z

yes, your understanding is correct. and yes, it builds one line at a time.
for what you want to do, i'd download and get the database from this repo and then get this data with a query (or with a script that just does this for each page). if i recall, the table here should contain the line information as well.

AlGantori · 2020-05-02T11:00:05Z

Are you sure this kind of data is not already available in some XML/JSON resource?

I have done indirectly some node.js based development but I don't recognize the commands installation notes like the following:

ppm install dmake
ppm install dbd-mysql
ppm install yaml

are these expected to be executed inside some CLI? or some linux distro?
Thanks for helping out because at this point I am clueless.
I am running in Windows7

ahmedre · 2020-05-02T11:09:05Z

you don't need to do any of those commands nor run this script itself - just download the database and import it and write a script yourself.

AlGantori · 2020-05-02T11:53:39Z

By database you mean download the sql folder in this repo.

I have MySQL Workbench, it's a beast I never got acquainted with all of its terms Open Model, ???
It seems oriented to open dbs over some network connection, I am having hard time making it open a local file. It managed to open schema.mwb and throws me into the err diagram mode, I want to see the tables and data.

Which of these files should I be attempting to open?

Would you suggest a better tool than MySQL Workbench 5.2.44 CE?

By me writing scrips you mean write SQL queries to retrieve info, perhaps from glyph_line_page table?

Thank you for holding my hands thru this.

AlGantori · 2020-05-02T22:17:08Z

So after upgrading MySQL Workbench to 8.0.20 it quit working and has error, these companies Oracle, Microsoft just ruin any piece of software they grab onto and make it a monster. At the end, I just ditched this garbage/ziballa software installing so much junk on my system.
I have not worked with SQL from like 1987 Sybase SQL prior to MS SQL 👎
I guess you have to have a local server installed, HeidiSQL insisted on a connection.
So I installed XAMPP as the server.
Was able to open the 02-database.sql
Is the glyph_ayah the table I should be deriving word count from?
Since I am purely guessing can you confirm the row below refers to the first verse of Fatiha in this case Basmallh and the row is actually the verse number which in my case I won't be counting as a word.

Will tajweed markings (eg. small-meen etc..) be appearing as separate rows in this table or lumped with the previous word (as a single glyph)?

I feel like this is terrible, I would have to query and group count on ayah_number and minus one for the aya_number (hindi thingy) to get my word count???

I have a feeling I am going about this the wrong/difficult way

AlGantori · 2020-05-02T23:00:47Z

Oh that table does not give the line_number, glyph_page_line is what I should be looking into, for this line wrapping (word/token count) I am after.
This particular 1st line of chapter2 (but numbered as the 3rd line because counting sura title + basmallah as page lines/rows)
We have 7 words + 2 tajweed markers + 1 verse-number = 10 tokens

This would be its data, matching the 7 words + 2 tajweed markers + 1 verse-number = 10 tokens

How can I derive/detect that glyph_id = 264 is a verse number, I do not want to count???

AlGantori · 2020-05-02T23:57:07Z

Specifically for Page#2 this database is about this particular layout

Matching query

SELECT COUNT(line_number) FROM `glyph_page_line` WHERE page_number = 2 GROUP BY line_number;

The raw/net count of tokens per line follows:

I happen to be working with the Tajweed version page2 is a bit different, that's ALRIGHT I will handle that.

Again my current road block is detecting a token is a verse number???

ahmedre · 2020-05-03T00:17:44Z

the glyph table will tell you what "type" the glyph is - so you can exclude the ayah markers that way.

AlGantori · 2020-05-03T00:58:47Z

Wow I can't believe I am doing a 3 way join to get this, it appears that all verse-numbers are typed as "end"

Mission almost accomplished !!! ALLAHU AKBAR !!!

ahmedre · 2020-05-03T01:00:25Z

awesome al7amdulillah! make sure to not include other things like pauses (so just include words).

AlGantori · 2020-05-03T01:17:12Z

In my current hacking I am including the Tajweed-marker (here a pause) and verse-numbers as well in the rendering. It's just that my algorithm is based on word count and Uthmani script (word/token sequences) which has the Tajweed-markers inline in the case the two "pause".
I probably should use a similar approach in my JSON and expand it perhaps and tag the Tajweed tokens with their type and perhaps even sub-type.
I don't see verse-number as belonging to a hard-coded line position but rather a floating one. I would not be calling verse-number "end" as I want to flow them either as aaya-prefix (default) or suffix in future renderings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to derive the actual number of words per line for each chapter? #36

How to derive the actual number of words per line for each chapter? #36

AlGantori commented May 2, 2020

ahmedre commented May 2, 2020

AlGantori commented May 2, 2020

ahmedre commented May 2, 2020

AlGantori commented May 2, 2020 •

edited

Loading

AlGantori commented May 2, 2020 •

edited

Loading

AlGantori commented May 2, 2020 •

edited

Loading

AlGantori commented May 2, 2020 •

edited

Loading

ahmedre commented May 3, 2020

AlGantori commented May 3, 2020 •

edited

Loading

ahmedre commented May 3, 2020

AlGantori commented May 3, 2020 •

edited

Loading

How to derive the actual number of words per line for each chapter? #36

How to derive the actual number of words per line for each chapter? #36

Comments

AlGantori commented May 2, 2020

ahmedre commented May 2, 2020

AlGantori commented May 2, 2020

ahmedre commented May 2, 2020

AlGantori commented May 2, 2020 • edited Loading

AlGantori commented May 2, 2020 • edited Loading

AlGantori commented May 2, 2020 • edited Loading

AlGantori commented May 2, 2020 • edited Loading

ahmedre commented May 3, 2020

AlGantori commented May 3, 2020 • edited Loading

ahmedre commented May 3, 2020

AlGantori commented May 3, 2020 • edited Loading

AlGantori commented May 2, 2020 •

edited

Loading

AlGantori commented May 2, 2020 •

edited

Loading

AlGantori commented May 2, 2020 •

edited

Loading

AlGantori commented May 2, 2020 •

edited

Loading

AlGantori commented May 3, 2020 •

edited

Loading

AlGantori commented May 3, 2020 •

edited

Loading