Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control list filters #323

Open
wants to merge 43 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
7034e27
Revert "Add more details about the lemmatizer at the new corpus phase…
Juliettejns Mar 21, 2024
6e202f0
ajout filtres de correction token
Juliettejns Jun 13, 2024
49016d0
Revert "ajout filtres de correction token"
Juliettejns Jun 13, 2024
4308ecc
ajout fonction filtres tokens à corriger
Juliettejns Jun 13, 2024
09768ec
Add more details about the lemmatizer at the new corpus phase. (#317)
PonteIneptique Mar 20, 2024
0acb09d
ajout filtres modifications de tokens + début affichage + templates c…
Juliettejns Jun 24, 2024
04d080c
suppression anciennes modifications
Juliettejns Jun 24, 2024
61f5957
ajout filtres token invalid + lien corpus
Juliettejns Jun 25, 2024
9d5844f
correction Ignore values des listes de menus dans Control Lists
Juliettejns Jun 26, 2024
f5517e9
correction tests corpus_init//fonction get_unallowed
Juliettejns Jun 27, 2024
b5a2355
correction erreurs tests - bug new corpus
Juliettejns Jun 28, 2024
eaf0822
suppression commentaire test
Juliettejns Jun 28, 2024
797d484
ajouts premiers jets tests
Juliettejns Jul 1, 2024
006fca8
correction test filter update
Juliettejns Jul 2, 2024
2b11ebc
correction test registration corpus filter
Juliettejns Jul 2, 2024
ee34646
ajout test edit token with filter
Juliettejns Jul 2, 2024
0fe9659
modif aggrandissement varchar models corpus + print logs
Juliettejns Jul 9, 2024
f97e432
test ajout création user pour CL filter
Juliettejns Jul 12, 2024
b8bbd51
test bug control filters - ajout users"
Juliettejns Jul 12, 2024
89e9aae
tests CL - modification find element by ID > NAME
Juliettejns Aug 27, 2024
666ff6d
find element by ID>NAME
Juliettejns Aug 27, 2024
56bac05
Correct tests and clean up the way regex are applied (#329)
PonteIneptique Aug 27, 2024
a2fb107
changement filtres CLS ControlListUser>controlList
Juliettejns Aug 28, 2024
8e0ef97
suppression ajout count
Juliettejns Aug 28, 2024
aa1fb7d
modifications corpus.id => self.id + get_unallowed attributes
Juliettejns Aug 28, 2024
dcd28a6
Adding tests back to control list for changing filter
PonteIneptique Sep 3, 2024
41039bd
ajout test base filtre
Juliettejns Sep 3, 2024
1c9e66c
Creating combinatory tests
PonteIneptique Sep 3, 2024
dbf15fa
ajout filtre test combinaison assert + modif filtre ponctuation
Juliettejns Sep 3, 2024
775f7f4
ajout filtre none
Juliettejns Sep 3, 2024
f76e9a1
Fix a condition on lemma
PonteIneptique Sep 3, 2024
f4e925c
Better message
PonteIneptique Sep 3, 2024
06332f7
modif test regex ajout condition spé Sans test
Juliettejns Sep 9, 2024
d482cf3
modif filtre metadata sur form et non lemma + correction unallowed
Juliettejns Sep 11, 2024
1f4b80a
modif test filtres avec metadata
Juliettejns Sep 11, 2024
e00e28f
suppression user_id des appels de get_unallowed
Juliettejns Sep 11, 2024
71a920d
ajout choix unallowed sqlite ou posgtresé
Juliettejns Sep 11, 2024
11caf39
ajout diff sqlite/postgres pour filtres get_unallowed
Juliettejns Sep 11, 2024
668fd90
déplacement logging
Juliettejns Sep 11, 2024
fbabbcf
deplacement logging
Juliettejns Sep 12, 2024
c6c0a15
Change the way the control list filter view is shown
PonteIneptique Sep 17, 2024
db60b6a
modif metadata validity + tests + presentation filtres CL dans inform…
Juliettejns Sep 17, 2024
360a560
Delete tests/test_selenium/download_temp/wauchier.xml
Juliettejns Sep 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions app/control_lists/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -374,3 +374,31 @@ def information_edit(control_list_id, control_list):
def information_read(control_list_id):
control_list, is_owner = ControlLists.get_linked_or_404(control_list_id=control_list_id, user=current_user)
return render_template_with_nav_info('control_lists/information_read.html', control_list=control_list)


@control_lists_bp.route("/controls/<int:control_list_id>/ignore_terms", methods=["POST", "GET"])
@login_required
def ignore_terms_filter(control_list_id):
Juliettejns marked this conversation as resolved.
Show resolved Hide resolved
current_control_list = ControlLists.query.filter_by(**{"id": control_list_id}).first_or_404()
list_filter = []
if request.method == "POST":
list_filter.append(request.form.get("punct"))
list_filter.append(request.form.get("numeral"))
list_filter.append(request.form.get('ignore'))
list_filter.append(request.form.get('metadata'))
filtered_filter = []
for el in list_filter:
if el != None:
filtered_filter.append(el)
filter = " ".join(filtered_filter)
current_control_list.filter_punct = 'punct' in filter
current_control_list.filter_metadata = 'metadata' in filter
current_control_list.filter_numeral = 'numeral' in filter
current_control_list.filter_ignore='ignore' in filter

db.session.commit()

flash('The filters have been updated.', 'success')
current_control_list = ControlLists.query.filter_by(**{"id": control_list_id}).first_or_404()
return render_template_with_nav_info('control_lists/ignore_filter.html', control_list_id=control_list_id, current_control_list=current_control_list)
return render_template_with_nav_info('control_lists/ignore_filter.html', control_list_id=control_list_id, current_control_list=current_control_list)
13 changes: 12 additions & 1 deletion app/main/views/corpus.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ def _get_available():
lists[cl.str_public].append(cl)
return lists


@main.route('/corpus/new', methods=["POST", "GET"])
@login_required
def corpus_new():
Expand Down Expand Up @@ -110,6 +109,18 @@ def error():
form_kwargs.update({"word_tokens_dict": tokens, "allowed_lemma": allowed_lemma,
"allowed_POS": allowed_POS, "allowed_morph": allowed_morph})

list_filter = []
if request.form.get("ignoreforms"):
list_filter.append(request.form.get("punct"))
list_filter.append(request.form.get("numeral"))
list_filter.append(request.form.get("ignore"))
list_filter.append(request.form.get("metadata"))
filtered_filter = []
for el in list_filter:
if el != None:
filtered_filter.append(el)
filter = " ".join(filtered_filter)
form_kwargs.update({"filter":filter})
try:
corpus: Corpus = Corpus.create(**form_kwargs)
db.session.add(CorpusUser(corpus=corpus, user=current_user, is_owner=True))
Expand Down
2 changes: 1 addition & 1 deletion app/main/views/tokens.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ def tokens_correct_unallowed(corpus_id, allowed_type):
)



@main.route('/corpus/<int:corpus_id>/tokens/changes/similar/<int:record_id>')
@login_required
@requires_corpus_access("corpus_id")
Expand Down Expand Up @@ -137,7 +138,6 @@ def tokens_correct_single(corpus_id, token_id):
)
if "similar" in corpus.displayed_columns_by_name:
similar = {
"count": change_record.similar_remaining,
"link": url_for(".tokens_similar_to_record", corpus_id=corpus_id, record_id=change_record.id)
}
else:
Expand Down
10 changes: 9 additions & 1 deletion app/models/control_lists.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ class ControlLists(db.Model):
bibliography = db.Column(db.Text, nullable=True)
language = db.Column(db.String(10), nullable=True)
notes = db.Column(db.Text, nullable=True)
filter_punct = db.Column(db.Boolean, unique=False, default=True)
filter_numeral = db.Column(db.Boolean, unique=False, default=False)
filter_metadata = db.Column(db.Boolean, unique=False, default=False)
filter_ignore = db.Column(db.Boolean, unique=False, default=False)

# For caching purposes, we record the last time these fields were edited
#last_lemma_edit = db.Column(db.DateTime, default=datetime.datetime.utcnow)
Expand Down Expand Up @@ -240,6 +244,7 @@ def has_list(self, allowed_type):
).exists()
).scalar()


@staticmethod
def add_default_lists(path=None):
""" Loads the default lists from the config folder
Expand All @@ -255,7 +260,7 @@ def add_default_lists(path=None):
print("[ControlLists] Adding %s " % data["name"])
cl = ControlLists(**data, public=PublicationStatus.public)
db.session.add(cl)
db.session.flush() # Get the AutoIncrement ID
db.session.flush() # Get the AutoIncrement ID/home/jjanes
Juliettejns marked this conversation as resolved.
Show resolved Hide resolved
Juliettejns marked this conversation as resolved.
Show resolved Hide resolved
configs = [
("lemma.txt", AllowedLemma, read_input_lemma),
("POS.txt", AllowedPOS, read_input_POS),
Expand Down Expand Up @@ -346,6 +351,9 @@ def to_input_format(query):
)





class AllowedPOS(db.Model):
""" An allowed POS is a POS that is accepted

Expand Down
75 changes: 67 additions & 8 deletions app/models/corpus.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
# PIP Packages
import unidecode
import sqlalchemy.exc
import re
from sqlalchemy.ext.associationproxy import association_proxy
from sqlalchemy.orm import backref
from sqlalchemy import func, literal, not_, or_, and_
Expand Down Expand Up @@ -326,12 +327,50 @@ def get_unallowed(self, allowed_type="lemma"):
CorpusCustomDictionary.category == allowed_type,
CorpusCustomDictionary.label == prop
)

list_darguments = [
WordToken.corpus == self.id,
not_(allowed.exists()),
not_(custom_dict.exists())
]

current_control_list = ControlLists.query.filter_by(**{"id": self.control_lists_id}).first_or_404()

allowed = db.session.query(cls).filter(
cls.control_list == self.control_lists_id,
cls.label == prop
)
custom_dict = db.session.query(CorpusCustomDictionary).filter(
CorpusCustomDictionary.corpus == self.id,
CorpusCustomDictionary.category == allowed_type,
CorpusCustomDictionary.label == prop
)

list_darguments = [
WordToken.corpus == self.id,
not_(allowed.exists()),
not_(custom_dict.exists())
]

dict_filter = {'punct': current_control_list.filter_punct,
'metadata': current_control_list.filter_metadata,
'ignore': current_control_list.filter_ignore,
'numeral': current_control_list.filter_numeral}

if True in dict_filter.values():
regex_liste = []
if dict_filter['metadata']:
regex_liste.append(r'^(?!\[[^\]]+:[^\]]*\]$).*')
if dict_filter['ignore']:
regex_liste.append(r'^(?!^\[IGNORE\]$)')
if dict_filter['punct']:
regex_liste.append(r"(?!^[^\w\s]$).")
if dict_filter["numeral"]:
regex_liste.append(r'(?!^\d+$).+')
list_darguments.append(WordToken.form.op('~')("".join(regex_liste)))

return db.session.query(WordToken).filter(
db.and_(
WordToken.corpus == self.id,
not_(allowed.exists()),
not_(custom_dict.exists())
)
db.and_(*list_darguments)
).order_by(WordToken.order_id)

@property
Expand Down Expand Up @@ -1097,13 +1136,31 @@ def is_valid(lemma, POS, morph, corpus):
}

allowed_column = corpus.displayed_columns_by_name

current_control_list = ControlLists.query.filter_by(**{"id": corpus.control_lists_id}).first_or_404()
dict_filter ={'punct': current_control_list.filter_punct,
'metadata': current_control_list.filter_metadata,
'ignore':current_control_list.filter_ignore,
'numeral':current_control_list.filter_numeral}
print(dict_filter)

regex_liste = []
if True in dict_filter.values():
if dict_filter['metadata']:
regex_liste.append(r'(\[[^\]]+:[^\]]*\]$)')
if dict_filter['ignore']:
regex_liste.append(r'(^\[IGNORE\])')
if dict_filter['punct']:
regex_liste.append(r"(^[^\w\s]$)")
if dict_filter['numeral']:
regex_liste.append(r'(^\d+$)')
regex = "|".join(regex_liste)
if lemma is not None \
and "lemma" in allowed_column \
and allowed_lemma.count() > 0 \
and corpus.get_allowed_values("lemma", label=lemma).count() == 0:
if not corpus.has_custom_dictionary_value("lemma", lemma):
statuses["lemma"] = False
if not re.match(regex,lemma):
if not corpus.has_custom_dictionary_value("lemma", lemma):
statuses["lemma"] = False

if POS is not None \
and "POS" in allowed_column \
Expand All @@ -1118,6 +1175,8 @@ def is_valid(lemma, POS, morph, corpus):
and corpus.get_allowed_values("morph", label=morph).count() == 0:
if not corpus.has_custom_dictionary_value("morph", morph):
statuses["morph"] = False


return statuses

@staticmethod
Expand Down
43 changes: 43 additions & 0 deletions app/templates/control_lists/ignore_filter.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
{% extends 'layouts/base.html' %}
{% import 'macros/form_macros.html' as f %}

{% block content %}
<div class="container">
<div class="row">
<h2>
<i class="fa fa-edit mr-2"></i>{{ _('Change the filters for Control List') }}
</h2>
</div>
<div class="row ">
<form method="POST">
<ul>
<li>
<label>
<input type="checkbox" name="punct" value="punct" {% if current_control_list.filter_punct %}checked{% endif %}>
Punctuation
</label>
</li>
<li>
<label>
<input type="checkbox" name="numeral" value="numeral" {% if current_control_list.filter_numeral %}checked{% endif %}>
Numeral
</label>
</li>
<li>
<label>
<input type="checkbox" name="metadata" value="metadata" {% if current_control_list.filter_metadata %}checked{% endif %}>
Metadata
</label>
</li>
<li>
<label>
<input type="checkbox" name="ignore" value="ignore" {% if current_control_list.filter_ignore %}checked{% endif %}>
Ignore
</label>
</li>
</ul>
<button type="submit">Submit</button>
</form>
</div>
</div>
Juliettejns marked this conversation as resolved.
Show resolved Hide resolved
{% endblock %}
16 changes: 10 additions & 6 deletions app/templates/control_lists/macros.html
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,22 @@
<li class="nav-item"><a class="nav-link" href="{{url_for("control_lists_bp.read_allowed_values", control_list_id=control_list.id, allowed_type='lemma')}}">{{ _('Lemma') }}</a></li>
<li class="nav-item"><a class="nav-link" href="{{url_for("control_lists_bp.read_allowed_values", control_list_id=control_list.id, allowed_type='POS')}}">{{ _('POS') }}</a></li>
<li class="nav-item"><a class="nav-link" href="{{url_for("control_lists_bp.read_allowed_values", control_list_id=control_list.id, allowed_type='morph')}}">{{ _('Morphologies') }}</a></li>
{% if control_list.can_edit() or current_user.is_admin() %}
<li class="nav-item"><a class="nav-link" href="{{url_for("control_lists_bp.rename", control_list_id=control_list.id)}}"><i class="fa fa-edit"></i> {{ _('Rename') }}</a></li>
{% endif %}
{% if control_list.can_edit() or current_user.is_admin() %}
<li class="nav-item"><a class="nav-link" href="{{url_for("control_lists_bp.information_edit", control_list_id=control_list.id)}}"><i class="fa fa-edit"></i> {{ _('Edit informations') }}</a></li>
{% endif %}
{% if control_list.can_edit() or current_user.is_admin() %}
<li class="nav-item"><a class="nav-link" href="{{url_for("control_lists_bp.ignore_terms_filter", control_list_id=control_list.id)}}"><i class="fa fa-edit"></i> {{ _('Ignore values') }}</a></li>
Juliettejns marked this conversation as resolved.
Show resolved Hide resolved
{% endif %}
</ul>
{{ _('Others') }}
<ul class="nav flex-column">
<li class="nav-item"><a class="nav-link" href="{{url_for("control_lists_bp.contact", control_list_id=control_list.id)}}"><i class="fa fa-envelope"></i> {{ _('Propose changes') }}</a></li>
{% if control_list.can_edit() and not current_user.is_admin() %}
<li class="nav-item"><a class="nav-link" href="{{url_for("control_lists_bp.propose_as_public", control_list_id=control_list.id)}}"><i class="fa fa-share-alt"></i> {{ _('Make public') }}</a></li>
{% endif %}
{% if control_list.can_edit() or current_user.is_admin() %}
<li class="nav-item"><a class="nav-link" href="{{url_for("control_lists_bp.rename", control_list_id=control_list.id)}}"><i class="fa fa-edit"></i> {{ _('Rename') }}</a></li>
{% endif %}
{% if control_list.can_edit() or current_user.is_admin() %}
<li class="nav-item"><a class="nav-link" href="{{url_for("control_lists_bp.information_edit", control_list_id=control_list.id)}}"><i class="fa fa-edit"></i> {{ _('Edit informations') }}</a></li>
{% endif %}

</ul>
{% endmacro -%}
1 change: 1 addition & 0 deletions app/templates/macros/nav_macros.html
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,7 @@
</a>
{% endif %}
</div>

</div></div>
</div>
<hr />
Expand Down
36 changes: 34 additions & 2 deletions app/templates/main/corpus_new.html
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,9 @@ <h1>{{ _('Create a new corpus') }}</h1>
</fieldset>
<fieldset class="form-fieldset">
<legend>{{ _('Control Lists') }}</legend>

<div class="row">

<div class="col-md-3">
<div class="form-check form-check-inline">
<input class="form-check-input" id="checkbox_reuse" type="radio" name="control_list" value="reuse" checked/>
Expand All @@ -145,6 +147,8 @@ <h1>{{ _('Create a new corpus') }}</h1>
the academic community. You will be able to propose new values to the administrators of control lists.') }}
</p>
</div>
<div>

<div class="col-md-9" id="use_public">
<select class="form-control" id="control_list_select" name="control_list_select">
{%- for class, cls_ in public_control_lists.items() %}
Expand Down Expand Up @@ -195,6 +199,7 @@ <h1>{{ _('Create a new corpus') }}</h1>
</div>
</div>
</div>

<div class="form-group">
<label for="allowed_lemma">{{ _('Allowed lemma') }}</label>
<small id="allowed_lemma_help" class="form-text text-muted">{{ _('This should be formatted as a list of lemma separated by new line') }}</small>
Expand All @@ -211,10 +216,37 @@ <h1>{{ _('Create a new corpus') }}</h1>
<textarea aria-describedby="allowed_morph_help" class="form-control" id="allowed_morph" name="allowed_morph">{% if allowed_morph %}{{allowed_morph}}{%endif%}</textarea>
</div>
</div>


</div>
<div class="form-group row">
<div class="col">
<label for="ignoreforms" id="ignoreforms" class="form-text text-muted">
Ignore Elements in Control List:
</label>
</div>
<div class="col-md-6">
<div class="form-check">
<input type="checkbox" class="form-check-input" id="Numeral">
<label class="form-check-label" for="hyphens-remove">{{ _('Remove Numeral') }}</label>
</div>
<div class="form-check">
<input type="checkbox" class="form-check-input" id="Punctuation">
<label class="form-check-label" for="punct-keep">{{ _('Remove Punctuation') }}</label>
</div>
<div class="form-check">
<input type="checkbox" class="form-check-input" id="Metadata">
<label class="form-check-label" for="punct-keep">{{ _('Remove Metadata such as [METADATA:something] or [REF:1.2.3]') }}</label>
</div>
<div class="form-check">
<input type="checkbox" class="form-check-input" id="ignore_ignore">
<label class="form-check-label" for="punct-keep">{{ _('Remove [IGNORE]') }}</label>
</div>
</div>
</div>
</div>
</fieldset>
<button type="submit" id="submit" class="btn btn-primary">{{ _('Submit') }}</button>
</form>
<button type="submit" id="submit" class="btn btn-primary">Submit</button>
<script type="text/javascript">

$(document).ready(function() {
Expand Down
Loading