You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a need for me to determine grammatical case for terms in texts of a big dataset. I found that the increment of memory usage as large as 0.3 to 0.7 MB occurs virtually every call of forms = predictor.predict(terms).
Consider a simple example:
def findCase(termNumber, text): # нахождение падежа термина с указанным номером в тексте
terms = text.split()
forms = predictor.predict(terms)
myTag = forms[termNumber].tag
parts = re.split('\\|', myTag)
for part in parts:
subparts = re.split('=', part)
if len(subparts) < 2:
continue
if subparts[0] == 'Case':
return subparts[1].upper()
return 'UNDEF'
And then, if I have a collection of texts, i can implement:
myDict = {}
for i in range(len(texts)):
case = findCase(0, texts[i])
myDict[i] = case
I have 12500 texts with average length of about 700 symbols each. Running all my dataset required me extra 1.5 GB of memory due to utilizing predictor.predict(terms).
Seems like my local variable forms remains in the memory after completing the method, but really, is your RNNMorphPredictor model maybe self-trained in this scenario? How to free this volume of memory?
The text was updated successfully, but these errors were encountered:
Update: there is no obvious difference depending on the length of every single text. I reduced input text length down to 10 tokens, or approx. 80 symbols only. Memory usage is the same - 1.5 GB per 12500 texts. Thereby my question becomes even more actual.
There is a need for me to determine grammatical case for terms in texts of a big dataset. I found that the increment of memory usage as large as 0.3 to 0.7 MB occurs virtually every call of
forms = predictor.predict(terms)
.Consider a simple example:
And then, if I have a collection of texts, i can implement:
I have 12500 texts with average length of about 700 symbols each. Running all my dataset required me extra 1.5 GB of memory due to utilizing
predictor.predict(terms)
.Seems like my local variable
forms
remains in the memory after completing the method, but really, is your RNNMorphPredictor model maybe self-trained in this scenario? How to free this volume of memory?The text was updated successfully, but these errors were encountered: