Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.write does not save None values #673

Open
WeilerP opened this issue Jan 6, 2022 · 10 comments · May be fixed by #999
Open

.write does not save None values #673

WeilerP opened this issue Jan 6, 2022 · 10 comments · May be fixed by #999

Comments

@WeilerP
Copy link
Contributor

WeilerP commented Jan 6, 2022

Description

When saving an AnnData object to disk, keys of a dictionary whose value is None seem not to be saved.

import numpy as np
import scanpy as sc
from anndata import AnnData

adata = AnnData(X=np.eye(3), uns={'key_1': 0, 'key_2': None})
adata.write('adata.h5ad')

_adata = sc.read('adata.h5ad')

gives

>>> _adata
AnnData object with n_obs × n_vars = 3 × 3
    uns: 'key_1'
@ivirshup
Copy link
Member

Did this ever work? I recall thinking about it when I implemented the write_none function, but was probably going for backwards compat then.

Do you have a suggested way to save these? I think hdf5 may have an appropriate type, but I'm not sure zarr does.

@WeilerP
Copy link
Contributor Author

WeilerP commented Jan 11, 2022

Not super happy/convinced by this but how about saving it as a string 'None' and then converting it back to None, when reading the file. Would have to make sure that actual strings 'None' are not converted to None.

@WeilerP
Copy link
Contributor Author

WeilerP commented Jan 11, 2022

BTW, this is also an issue if you have None in one of your columns:

import numpy as np
import scanpy as sc
from anndata import AnnData

adata = AnnData(X=np.eye(3), uns={'key_1': 0, 'key_2': None, 'key_3': pd.DataFrame({'col_0': ['string', None]})})
# Alternative failure
# adata = AnnData(X=np.eye(3), uns={'key_1': 0, 'key_2': None}, obs={'col_0': [None]})
adata.write('adata.h5ad')
Traceback
Traceback (most recent call last):
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 270, in write_series
    group.create_dataset(
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/h5py/_hl/group.py", line 148, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 140, in make_new_dset
    dset_id.write(h5s.ALL, h5s.ALL, data)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 232, in h5py.h5d.DatasetID.write
  File "h5py/_proxy.pyx", line 145, in h5py._proxy.dset_rw
  File "h5py/_conv.pyx", line 444, in h5py._conv.str2vlen
  File "h5py/_conv.pyx", line 95, in h5py._conv.generic_converter
  File "h5py/_conv.pyx", line 249, in h5py._conv.conv_str2vlen
TypeError: Can't implicitly convert non-string objects to strings

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 263, in write_dataframe
    write_series(group, col_name, series, dataset_kwargs=dataset_kwargs)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
    raise type(e)(
TypeError: Can't implicitly convert non-string objects to strings

Above error raised while writing key 'col_0' of <class 'h5py._hl.group.Group'> from /.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1912, in write_h5ad
    _write_h5ad(
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 118, in write_h5ad
    write_attribute(f, "uns", adata.uns, dataset_kwargs=dataset_kwargs)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/functools.py", line 875, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 130, in write_attribute_h5ad
    _write_method(type(value))(f, key, value, *args, **kwargs)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 294, in write_mapping
    write_attribute(f, f"{key}/{sub_key}", sub_value, dataset_kwargs=dataset_kwargs)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/functools.py", line 875, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 130, in write_attribute_h5ad
    _write_method(type(value))(f, key, value, *args, **kwargs)
  File "/miniconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_io/utils.py", line 212, in func_wrapper
    raise type(e)(
TypeError: Can't implicitly convert non-string objects to strings

Above error raised while writing key 'col_0' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'uns/key_3' of <class 'h5py._hl.files.File'> from /.

Though it does work for

adata = AnnData(X=np.eye(3), uns={'key_1': 0, 'key_2': None}, obs={'col_0': ['string', 'string', None]})

@ivirshup
Copy link
Member

ivirshup commented Jan 11, 2022

I wouldn't like a string None, but we could encode a null type. E.g. missing_el.attrs["encoding_type"] = "null".

For now, I would say the typical way we handle this in scanpy is just adata.uns.get("maybe_none_key", None) for any parameter that could be None.


The cases for columns in a dataframe are a bit different, since those have to be values in an array.

obs={'col_0': [None]}

This fails because none of us, numpy, or pandas can infer what type that array is beyond object.

pd.DataFrame({'col_0': ['string', None]})

We could potentially infer this to a string array, and then add support for nullable string arrays. See #504 and #669. I'm not sure pandas string representation is mature enough yet to do this at the moment.

obs={'col_0': ['string', 'string', None]}

This works since we cast the column to a categorical, which we support null values for.

@ivirshup
Copy link
Member

@WeilerP if you wanted to look into this, I would appreciate some info on how other systems handle this. For instance json has null, but I'm not so sure about zarr, hdf5, or arrow.

@LustigePerson
Copy link

Just for the sake of documenting this somewhere: I ran into this issue when I used the log1p function, which as a default writes {"base": None} to uns. However after saving and reloading the object, an error was thrown with rank_genes_groups (code), because it is looking for the base key which is not present anymore.

@wangjiawen2013
Copy link

I met this issue too. Please refer to aristoteleo/dynamo-release#440

@flying-sheep
Copy link
Member

People are running into this in the wild, I’ll see if I can implement this: scverse/scanpy#2497, scverse/scanpy-tutorials#65

@flying-sheep
Copy link
Member

@WeilerP if you wanted to look into this, I would appreciate some info on how other systems handle this. For instance json has null, but I'm not so sure about zarr, hdf5, or arrow.

hdf5 has null attributes and null datasets, zarr doesn’t seem to have anything. #999 seems to work well.

@flying-sheep
Copy link
Member

flying-sheep commented Oct 17, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants