Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

messytables guesses wrong type for decimal number #190

Open
wrinklenose opened this issue Dec 1, 2020 · 1 comment
Open

messytables guesses wrong type for decimal number #190

wrinklenose opened this issue Dec 1, 2020 · 1 comment

Comments

@wrinklenose
Copy link

Describe the bug
Messytables should guess decimals correctly respecting the locale configuration.
For example: In germany the , is used as decimal dot but a value 1,200 is guessed as type "text".

This issue was initially reported as ckan issue ckan/ckan#5769 where I recognized it.

The type guessing seems to happen here: https://github.com/okfn/messytables/blob/51b736892a48e420ab313675f54901c77b446dec/messytables/types.py
and seems to happen locale specific. (I think the magic happens in line 100:
value = locale.atof(value)

Unfortunately python seems to recognizes a dot as decimal point even if a german locale is set, which I could reproduce in my local environment:

>>> locale.getlocale()
('de_DE', 'cp1252')
>>> locale.atof('1,200')

Traceback (most recent call last):
  File "<pyshell#35>", line 1, in <module>
    locale.atof('1,200')
  File "C:\Program Files\Python27\lib\locale.py", line 318, in atof
    return func(string)
ValueError: invalid literal for float(): 1,200
>>> locale.localeconv()
{'mon_decimal_point': '', 'int_frac_digits': 127, 'p_sep_by_space': 127, 'frac_digits': 127, 'thousands_sep': '', 'n_sign_posn': 127, 'decimal_point': '.', 'int_curr_symbol': '', 'n_cs_precedes': 127, 'p_sign_posn': 127, 'mon_thousands_sep': '', 'negative_sign': '', 'currency_symbol': '', 'n_sep_by_space': 127, 'mon_grouping': [], 'p_cs_precedes': 127, 'positive_sign': '', 'grouping': []}
@pazepaze
Copy link

pazepaze commented Dec 3, 2020

Using locale.atof seems to be system dependent.

On my ubuntu 20.04 this seems to work:

locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8')
'de_DE.UTF-8'
locale.atof('1,200')
1.2

It doesn't work when running on an alpine image in docker though, see https://stackoverflow.com/questions/61761085/python-locale-not-working-on-alpine-linux

Is there maybe some other way to do this that is less system dependent?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants