Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

Package has a dependency on pkgconfig but it isn't in the setup.py #24

Closed
brettc opened this issue Jun 18, 2021 · 5 comments
Closed

Package has a dependency on pkgconfig but it isn't in the setup.py #24

brettc opened this issue Jun 18, 2021 · 5 comments

Comments

@brettc
Copy link

brettc commented Jun 18, 2021

Hi -- thanks for writing this. I just tried installing it, but needed to install another package to get it to run: pkgconfig. (an easy fix, but thought you might like to know)

@tdegeus
Copy link
Owner

tdegeus commented Jun 18, 2021

Thanks ! I will do it soon, but if you have the time a PR would also be very great.

Not that I recently ported the capabilities of this library also to https://github.com/xtensor-stack/xtensor-python that now has a wide array of capabilities to wrap you library (templated library allowing in-place use of NumPy arrays, or non-templated library with minimal wrapping code (but a copy))

@brettc
Copy link
Author

brettc commented Jun 20, 2021

I'll try and submit a PR soon. I've decided just to copy in the headers for now, as it simplifies my build.

As for xtensor-python, I've had a look at it, and there is definitely some nice/clever features, but I don't understand why it requires a different type (pyxtensor rather than just xtensor). Nor why it needs a dependency on numpy (which pybind11 does not require). The approach described here looks like a simpler option if I don't need copies: xtensor-stack/xtensor-python#254.

I'm sure I'm missing some reasons, but for now I like the simple approach in your library.

@tdegeus
Copy link
Owner

tdegeus commented Jun 22, 2021

So the idea behind xt::pytensor is that it is a NumPy array. There is no conversion or copying involved when a C++ function (wrapped with pybind11) is called. This can make things quite efficient. It is also the reason of the NumPy dependency.

Instead, in this library, when a function void foo(const xt::xtensor<double, 1>&) is called from Python, a temporary copy of the input NumPy array is made, and it is being supplied to foo. So when a is enormous this can be quite costly. This limitation is inherent as xt::xtensor has to own its data, so it cannot use a pointer to some external data.

That is the reason why xt::pytensor was created, which does not own its data as such.

Indeed, writing a library with templates allowing xt::pytensor and xt::xtensor can be a little bit more labour intensive (though one also gains possibilities). That is the reason why I offer these casters. I just wanted to say: I now also implemented these casters in xtensor-python, so that you can do some hybrid case in which you template some functions, but keep xt::xtensor in others.

I hope I was clear ;)

@brettc
Copy link
Author

brettc commented Jun 22, 2021

Thanks for taking the time to explain.

I understand the difference between the two libraries, and I can see the advantages of xt::pyxtensor, but I'm not sure that their approach is the best way to do it. Here is the problem with the approach in the pyxtensor library: your c++ code is now dependent on a python specific library. What if I now want to wrap that same code for R, or just use it in a C++ executable? It seems I'm stuck with a dependency on python. So I think introducing a dependency on pyxtensor into my C++ code is not a good idea, and indeed it seems to go against some of QuantStack's stated goals about writing code that can be shared Python / R / Julia.

As I understand it, there are two problems with trying to wrap xtensor. The first is: How can we share an array across the C++ / python boundary without copying? The mechanism for doing this is to use the buffer interface provided by python. I know that this is possible without having a specialised C++ class, as I have wrapped boost's multiarray using cython (before I discovered pybind11), and the link I included above shows how this is possible with xtensor as well.

The second problem is how do we decide who maintains ownership of the shared memory? This is a hard one, and I'm less sure about it. It will depend on the details (who created the array, for example), but I think some settings in numpy and the buffer interface together with the ownership policies provided by pybind11 can be used to solve the problem.

It's also entirely possible that I've missed something important, or misunderstood the constraints!

In any case, for now I'm happy to use your much smaller / simpler package, as the memory copying is not a problem for my current requirements (Thanks!). I've actually decided to vendor it in (copy the headers), rather than use pip install. But I'll still try and send a PR.

@tdegeus
Copy link
Owner

tdegeus commented Jun 24, 2021

Thanks for taking the time to explain.

You are welcome!

I understand the difference between the two libraries, and I can see the advantages of xt::pyxtensor, but I'm not sure that their approach is the best way to do it. Here is the problem with the approach in the pyxtensor library: your c++ code is now dependent on a python specific library. What if I now want to wrap that same code for R, or just use it in a C++ executable? It seems I'm stuck with a dependency on python. So I think introducing a dependency on pyxtensor into my C++ code is not a good idea, and indeed it seems to go against some of QuantStack's stated goals about writing code that can be shared Python / R / Julia.

So I agree that introducing xt::pytensor in a C++ library is a terrible idea. But the point is that there are APIs that avoid to do so, but still allow the use of xt::pytensor. In particular, if you use template arguments you have this exact freedom. E.g. a function

template <class T>
void foo(T& a);

you can use it with xt::xtensor from C++ just like you want (or you can even use other types that provide the same methods, like xt::xarray, but in some cases even std::vector or so).

In you Python API you can just do e.g.

m.def("foo", &foo<xt::pytensor<double, 1>>)

As I understand it, there are two problems with trying to wrap xtensor. The first is: How can we share an array across the C++ / python boundary without copying? The mechanism for doing this is to use the buffer interface provided by python. I know that this is possible without having a specialised C++ class, as I have wrapped boost's multiarray using cython (before I discovered pybind11), and the link I included above shows how this is possible with xtensor as well.

I'm not entirely sure in this case. I think that xt::xtensor owns it own memory unless you set a different (default) template argument. So you'd have to still template your function or have more than one overload. Personally, I'm not very happy about this design choice, though there might be reasons that I fail to see.

The second problem is how do we decide who maintains ownership of the shared memory? This is a hard one, and I'm less sure about it. It will depend on the details (who created the array, for example), but I think some settings in numpy and the buffer interface together with the ownership policies provided by pybind11 can be used to solve the problem.

Not sure it this is ever a problem. If you pass the reference you are responsible for keeping it alive, that is also true in pure C++. This is not different for the Python API, and I think not hard to achieve (you simple don't change the reference count). Probably this is all arranged in xtensor-python / pybind11

It's also entirely possible that I've missed something important, or misunderstood the constraints!

In any case, for now I'm happy to use your much smaller / simpler package, as the memory copying is not a problem for my current requirements (Thanks!). I've actually decided to vendor it in (copy the headers), rather than use pip install. But I'll still try and send a PR.

Agreed, that's why I created it, and have been using it often. I only recently switched to xtensor-python as I have more serious use-cases for the Python API

@tdegeus tdegeus closed this as completed Apr 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants