Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cost adjust #1352

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Cost adjust #1352

wants to merge 3 commits into from

Conversation

ampli
Copy link
Member

@ampli ampli commented Nov 4, 2022

(EDIT: Since I have already written it in the wrong box, I leave it here...)

It seems there are several problems in this code.
Also, there are several other problems related to it.

Regarding condesc_t::cncost - this is an improper place to put a multi-connector cost: There is a single condesc_t element for each connector string (i.e. all multi and non-multi connectors with the same connector string share a single condesc_t element). However, each multi-connector in the dict may have its own cost, and these costs may be different.

There is also a problem to store the cost in the connector due to the 64 bytes limitation.
A multi-connector cost table could be added to disjuncts, but there is a space constraint there too.
So the only good place for multi-connector costs seems to be in Sentence_s - a hash table for that can be added.

Among the problems that are related:

  1. It seems there is a need to adjust not only the linkage cost but also the disjunct costs, so reading disjunct costs by the API would lead to consistent results. However, it seems it is not a good idea to update the disjunct costs in their Disjunct_struct because this will create a cost mess if relinking is needed (e.g. for implementing phantoms). To that end, an adjust_cost field could be added to Disjunct_struct, but again there is a space problem there.
  2. The disjunct string (!as shown by !disjunct) should be updated, to be consistent with the adjusted disjunct costs. However, it is not clear how to textually represent multi-matches of multi-connectors. (I also discovered a bug there regarding multi-connectors that has no implications in the current dict/corpora and in your optional multi-connector examples).

Regarding the code, it seems it has several things which are wrong (or that I don't understand).
In any case, the code can be simplified and this will save fixing these things.
I will add my comments to the code itself.

@ampli
Copy link
Member Author

ampli commented Nov 4, 2022

IT seems I answered in the wrong box and pressed the wrong button!
Hopefully, no damage has resulted...

if (NULL == dj) continue;

// Look at the right-going connectors.
for (Connector* rcon = dj->right; rcon; rcon=rcon->next)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the left connectors are ignored?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this was a prototype and I did not do those.

for (Connector* wrc = lkg->link_array[l].rc; wrc; wrc=wrc->next)
{
// Skip if no match.
if (strcmp(rcon->desc->string, wrc->desc->string)) continue;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote in the general response (in the wrong box...), there is 1-1 match between condesct_t elements and connector strings, disregarding if the connectors are multi or not. Thus the condesc_t strings don't include@.
So there is a need for a different comparison.

@ampli
Copy link
Member Author

ampli commented Nov 4, 2022

I propose a simpler algo:

  1. For each word (like now).
  2. For each multi-connector of a chosen disjunct (like now, but also left if needed).
  3. Scan the link array and count the number N of rc (or lc, correspondingly to right/left at (2)) with the same address as found in (2). An address comparison can be used because connectors are copied by reference. Alternatively, a tracon_id comparison can be done.
  4. Adjust by (N-1) * the-multi-connector-cost.

The multi-connector costs can be taken from a hash table stored in Sentence_s. I have no good idea where to store the adjusted disjunct costs (for API fetch if needed and for !disjunct display) - maybe in an array in Linkage_s (with num_words elements).

@linas
Copy link
Member

linas commented Nov 7, 2022

When you say

The multi-connector costs can be taken from a hash table stored in Sentence_s.

Does this table currently exist, or would it have to be created?

@linas linas marked this pull request as draft November 7, 2022 00:20
@ampli
Copy link
Member Author

ampli commented Nov 7, 2022

Does this table currently exist, or would it have to be created?

It doesn't exist.
You can use string_id() on the connector name (with or without the @) to get a hash table index, but you need to resize this table on the fly as needed.

(BTW, I already used string_id() in a similar way in several places. Using a generic hash-table function would be better - added to my to-do list...)

@linas
Copy link
Member

linas commented Nov 7, 2022

Does this table currently exist, or would it have to be created?

It doesn't exist.

OK, thanks. I'm going to keep this open, as a to-do item. Finishing it would not be hard - an afternoon of creating the table and hooking it all up. But nothing depends on it right now, so I don't feel like cluttering up the current code with this new stuff.

@ampli ampli mentioned this pull request Jan 15, 2023
@linas
Copy link
Member

linas commented Feb 21, 2023

FYI, this did eventually become urgent, and an alternate work-around was done in commit 1dcdd15 which is adequate for now.

@ampli
Copy link
Member Author

ampli commented Feb 22, 2023

In any case I added this to my TODO list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants