-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling missing data #98
Comments
Using As for point 3, the values to be converted to |
@khinsen There is a |
Indeed you should check the API because using nil will force people to check. |
@AtharvaKhare No. |
@khinsen not sure to understand why Pharo does not support IEEE-754: http://pharobooks.gforge.inria.fr/PharoByExampleTwo-Eng/latest/Float.pdf |
@AtharvaKhare For reading DataFrames, here's what I'd do to deal with missing values:
|
@SergeStinckwich Pharo implements a subset of IEEE-754. That's probably true of all higher-level languages, but in the context of this discussion (NaN), a missing feature that many languages do implement is an option to turn off exceptions and have operations such as division by zero or sqrt of a negative number return |
ok @khinsen maybe we should do a list of what is missing. What would be the benefit of turnoff exceptions and obtain |
The best summary I know of the rationale behind introducing NaN is from lecture notes by William Kahan (who was the principal designer of IEEE-754):
Today the main reason for using no-exception-mode today is performance when treating large datasets, but there are cases where Kahan's argument still applies, i.e. algorithms where having to deal with an invalid result immediately is not convenient. In Pharo, another motivation may be prototyping algorithms that do use NaNs for whatever reason, including performance when turned into low-level code. On the other hand, if I wanted to improve number crunching in Pharo, I'd start with more important issues. Such as the precision mismatch between |
@AtharvaKhare The methods you added for removing rows/columns that contain missing values look good. That's one popular way to deal with missing data. However, it is often desirable to ignore missing data only for one particular operation, without removing them completely. For example, one might want to compute the average over a column. R has a boolean parameter "ignore missing data" in functions that somehow iterate over the data, which provides just that functionality. I am not sure how this would best be done in Pharo, but it's worth thinking about. |
@khinsen I think we can define a new message like Would love other's thoughts on this. |
Putting PRs here for reference:
|
All the related issues are now fixed. :) |
This issue is to consolidate related missing-data issues and to discuss potential solutions.
We can use
nil
to represent missing data in a DataFrame(df) / DataSeries(ds). Any better representation is welcome.Following methods need to be added:
?
,NA
,nan
,null
,nil
as values should get converted tonil
By solving this, #14 and #66 should get solved, and ability to read files(csv) with missing data should become possible.
The text was updated successfully, but these errors were encountered: