-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Documentation]: Structure of docs considered harmful to a project that i love! #1482
Comments
updated the title to better reflect my intention |
@sneakers-the-rat thanks for describing your experience with the documentation and suggestions for improvements. Creating good documentation is hard and we continuously strive to make our documentation better. Thank you also for your willingness to contribute and help makes things better. Having this issue as a point for discussion is useful, but I think it will be useful if we can start to create more specific issues from this discussion to give us and the community more concrete items that we can address. For example, your suggestion to improve the ordering of tutorials is a great example for a concrete improvement that we could create an issue for. Just a few quick comments:
The NWB team created this page recently as a result of our recent Documentation Hackathon event. The intent is to create a more approachable entry-point for NWB users. I agree that the interlinking between the PyNWB and this page still need significant improvement.
We have created a new tutorial for this here https://nwb-overview.readthedocs.io/en/latest/intro_to_nwb/1_intro_to_nwb.html . I think one way to maybe address this issue is to add a "Getting Started" page to the PyNWB docs to help guide users through the documentation and point to the resources on the NWB Overview page as well.
I hope we have been able to address part of this issue with our new conversion tutorial https://nwb-overview.readthedocs.io/en/latest/conversion_tutorial/user_guide.html
I agree that creating a more meaningful order for the tutorials in the various sections will be useful as well as possibly adding some text to provide guidance how to navigate the tutorials. Again, this is in part a reflection of natural growth. Over the last few years we have added many tutorials and while a basic alphabetical ordering was fine at the beginning it clearly is no longer sufficient.
One main reason we had chosen
This is in part a question of whether the library structure should be motivated by developer or user structure. The organization of PyNWB is intended to organize classes into modules based on their application areas (e.g, electrical physiology or optical physiology) and to match the organization of types in the nwb-schema rather than reflecting inheritance structure (as shown in pyreverse).
I agree that adding and enhancing docstrings will be very useful to do. I had started (at least for HDMF) to try and use |
Thanks @sneakers-the-rat. You bring up some good points. Let's start with the tutorials order. There is some order, as you indicate. The NWB Basics tutorial really should be first. But on the other hand I like that some of them are unordered- the domain-specific tutorials really should not have an order within that category and should be treated more a la carte. I think the best solution here would be to separate out the "NWB Basics" into its own gallery so that it can be first. I also think we should include links at the end of that tutorial to encourage users to go from there to one of the domain-specific tutorials. note: @oruebel, you mention that the gallery is used so we can CI test the tutorials. My team found another way to to do this by using pydoc, as illustrated here and here. The advantage of this approach is it lets you enhance the test beyond simply "does it run" to testing simple outputs. It also allows you to get around having to be a gallery. The disadvantage is that we couldn't find a way to avoid having to use " |
Fabulous!
To be clear, completely agree and definitely am not doc shaming. having to imagine what other people might or might not understand is extremely hard!
Also agree. I'll try and sketch some below and then if they sound good I'll open them and can try my hand at some PRs.
Yes! Agree! This tutorial is helpful. Understanding it's new, I think it could be worth going through the other tutorials and condensing down some additional basic concepts: dependency between multiple tables (like electrodegroup -> device), how certain objects that seem to be of different "levels" or "modalities" like how epochs can apply to other timeseries, etc.
These are a good start! Given the diversity of conversion practices, it might be good to make a wiki to make it easier than a pull request to the official docs to add your example to a library, will write more details below.
Yes! pydoc and also sphinx's doctest does this as well. It's possible to avoid
Completely fair! I was mostly reflecting on how the code structure makes the doc structure less automatic. I think there's something to be said for giving a bit of hierarchy to the code: core NWB types and etc., then instead of all things for a particular modality being in one file they can be split out into a package with separate modules: eg. for ecephys the timeseries classes could go in one module and the 'analysis' classes like clustering could go in another? I think this is probably a larger conversation, but where i'm coming from here isn't so much a stylistic or functionality question as scaffolding understanding for new users and potential developers: To browse the library I just always open it in an IDE so I can use static analysis to jump to usages and definitions because the package is flat, but having some grouping structure to it would make it clearer how things are related. I think it's possible to do both without a dramatic restructuring with some visualizations, will describe below.
Yes, autosummary is great! I use autodocsumm to make it automatic. With autodocsumm you just add Spinoff Issue IdeasJust sketching these before opening them to check if we're on the same page, so intending these as provisional & welcome your input on whether any should be changed/axed. Explicit Order for Beginner TutorialsAgreed with what @bendichter says above about splitting these out. I think what would be nice here is to expand them into more top-level TOCTree elements so when a new user comes along they see a clear path they should follow to learn. As is I think that everything being nested inside Tutorials obscures that a bit and requires some discovery. If the tutorials are to remain as galleries, I think these could be done with I think it would also be nice to put links at the bottom like 'choose your own adventure' - go to the next tutorial in the sequence, or 'see also' this tutorial for additional information about x, y, z, concept? Conceptual Overview TutorialIt looks like this is happening in If these kind of tutorials are going to be in Links to additional documentation and demosThere are about a half bazillion different learning materials, demos, workshop materials, notebooks that y'all and the rest of the userbase have put together over time. These are great and super valuable for example and learning, but at the moment they're sort of scattered and need a decent amount of hunting around and finding them, and even then it's unclear if they demonstrate current best practices or are just some neat trick that someone has discovered that works for them. It sounds extremely labor intensive to maintain such a directory of links to examples within the docs as such, and would require people doing PRs to this repo directly, which is an undesirably high barrier. I think the right tool (as I often do) here is a wiki! I use and love mediawiki specifically, it's realtively simple to set up and extremely powerful. With an example wiki it could be "self serve" for people to link out to their own example, and it would be a lot easier for a larger team to maintain than being .rst in the sphinx docs. If you like programmatic organization it's possible to do pretty much all you would want to do with semantic mediawiki like annotating which examples demonstrate which concepts, use which objects, etc. It would also let you do stuff like annotate which are "official" workshops/demos/etc. and which are community examples, and put all the multimodal learning materials (videos, notebooks, discussions, etc.) in the same place with pretty low friction! Library and Dependency Graph VisualizationsTo help people get a view of NWB 'at a glance,' as well as navigate the library a bit more easily, I think it would be lovely to make some graphviz/etc. visualizations of the basic structure of the library. This can be done in a few ways:
I think there are two separate things here:
Convert gallery to doctests?Should we convert some/all of the basic tutorials to use doctests rather than gallery? I don't have a strong opinion here, I think some of the relevant considerations include
Explicit API DocsTo make the API docs more user-friendly, we could use explicit api doc stubs with automodule calls. Considerations:
Additional Documentation Needed?Catchall for 'nice to haves' that involve writing additional documentation:
|
What would you like changed or added to the documentation and why?
Meta-context: looking at the most recent (i think?) version of the docs here: https://pynwb.readthedocs.io/en/dev/index.html according to #1478
I am here with love to raise a conversation that I feel is one that must have been being had internally but externally feels overdue: there is a lot of information in the docs, yes. there has been a lot of work done on them, clearly. but at the moment they are not approachable for most people.
I've read I think most of pynwb and HDMF at this point, wrote my first conversion guide in 2019, and have spent the last year or so specifically studying software accessibility in a number of domains (including data formats! which I am a fan of!) and I believe the docs are the single greatest hindrance to adoption. I, again, with love, as someone who shares the goal of realizing the benefits of standards, with the intention of making this tool more accessible, will try and articulate why from an outside perspective.
UX as in the Experience of the User
If I am a neuroscientist interested in converting to NWB, this is how I am greeted to the docs:
All but two of the entries in the TOC are not relevant to me. Hopefully the installation is just
pip install pynwb
, so that leaves just the tutorials.The pages are sorted alphabetically! As far as I can tell as a naive user, the only thing that seems relevant to me to getting my bearings here seems to be NWB File Basics. OK!
NWB File Basics
If i spend awhile reading through that tutorial, I come away with the notion that I need to make a file and add things to an NWB file, a sort of smattering of different data types (which are interesting and sound like what I needed!), and some hints at reading the file. Great! To a programmer, this is useful, I know how inheritance works so I know what it means for things to be a subclass and how that makes shared function. I know that I can click through to the API docs and read them. But to a neuroscientist who is not a programmer, I still am not really sure how this all works! There are timeseries, yes I know that one, but then I need to add epochs to a timeseries? What does that look like? I am not sure what tutorial should come next for me, none of the other general tutorials are obvious places to go next.
Well the next entry in the TOC is Domain-Specific tutorials, maybe I can use those.
Extracellular Ephys
tbh you lost me!
I know that I have electrophysiological data, I know I used electrodes to record it, I don't know how to read that diagram! I don't know what a device is or how it's related to an electrode group aside from the fact that I need to add them yet. I'm not clear about the reasoning behind why I'm doing any of this yet either, for example this took me awhile to parse even as someone who knows the library a bit:
When I go through to the
add_electrode
method to try and understand it (which is helpfully linked!), I get linked to theElectrodeGroup
object which doesn't have a docstring, and so I have no idea what it actually means, or whyadd_electrode
needs it!again as a programmer I think it might be relatively easy to read the inheritance hierarchy, click through to the source, even make a live object and inspect it directly, which is why I'm completely sympathetic to thinking I'm being pedantic here or overly critical. I know you have talked people through the structure of the library a thousand times so you know what I'm missing and that I'm being overly naive. im not trying to say the library is bad, just trying to describe that it's hard to learn as is.
Same thing with
Device
, from the API docs, I have no notion of what other objects it might be used with (links to other objects like from before would be nice!), and since it's in its own module in a very flat namespace, I am not sure where I might go to learn more!Going through the rest of that tutorial, I'm not really sure how my units relate to my electrodes, my raw timeseries, and etc.
Where Am I Now?
From here, I can browse through the other tutorials, but what I'm missing at this point is a basic lay of the land, How can these various things interact with each other? What even are the basic objects here? I learned in the general tutorial that there were only three things: timeseries, processing modules, and metadata. But what are these electrode groups? What is a dynamictable? if I forget a value from my electrode what do I do? If i want to add my data, how would I go about it aside from following exactly what is in the tutorials? If my data is slightly different than what is in the tutorials, how would I go about fixing that?
Role of Code Structure
I'll make this very brief (as some people in this group have had me come in and make PRs drastically restructuring their libraries before lol) in order to limit this issue to the structure of the documentation and how can we scaffold this process better, but I think in the long run what is really needed is a refactoring of the library: most of the code is entirely flat in the base
pynwb
namespace, there's oneio
submodule that has duplicates of a lot of the same file names in the top-level namespace, and so as a result the documentation doesn't structure itself and has to be done manually. Similar things should be grouped together, and honestly a very simple pyreverse diagram demonstrates that that structure already exists (and looks pretty dang reasonable!), it just isn't reflected in the code structure (when read by an outsider):Ideas for Docs Structure Refactoring
The first goal here should be to make a clear pathway for someone interested in converting their data to NWB to do so! which I think we can agree on and work towards. They shouldn't need to come to a workshop (as much as I love them), they ideally shouldn't need an additional library, and they shouldn't have to resort to using their grant funding to pay someone to convert it for them. Aspirations yno?
What that should look like, as in literally visually look like, to a new user is to have much much more of the TOC and homepage devoted to them.
Tutorials Gallery
Starting from the way the docs are implemented: I think one very simple fix is to fix the way
sphinx_gallery
is being used. From the index, the tutorials are linked astutorials/index
. I'm not really sure what thesphinx_gallery
really adds here, but what it subtracts is explicit control over the presentation of the tutorials.There is explicit order to the tutorial groups:
pynwb/docs/source/conf.py
Line 69 in e05b553
but then within a group they are sorted alphabetically:
pynwb/docs/source/conf.py
Line 73 in e05b553
This makes browsing them very challenging! There is no scaffolding, I have to discover it for myself. There really isn't a good way to learn about the structure of the library from the docs page (I know there is more elsewhere!) -- I'm not talking about learning about it from a developer POV for contributing, as the software structure is described in the developer docs below, I mean just knowing at a basic level what exists as a casual neuroscientist wanting to freshen up their data.
I feel like part of this might be the relative brittleness of the format of
sphinx_gallery
-- that looks great for short examples, I know sklearn and scipy use it to great effect, but it looks like a real pain in the ass to write docs as RST within comments! It also is very much programmer-centric in the amount of literal code that is included in the documentation. myst has made it dramatically easier to use sphinx and i can't recommend it enough. In either case a few more explicit steps up the ladder would I think be a good change.Introductions to the API
Tutorials are great! They are not the most straightforward way to teach the structure of a library, and the rest of the API documentation needs to be able to speak for itself if a new user is expected to go from tutorials -> API on their own. As is, the API documentation feels like it's in dire need of dogfooding (something I know well personally and always mess up) - they are written, understandably, from the perspective of someone who understands the library but does not use the API documentation in their own work. There is a lot of missing context about what the role of any particular object does, and given the lack of hierarchical structure in the documentation and code, there is relatively little way for someone to infer it without reading the source code.
It looks like all the API-level documentation is just generated with sphinx-apidoc at the time of building the docs. Doing that means that for the API docs to be useful the code needs to be structured in a way that supports readable documentation. As is, however, most of the modules don't have top-level docstrings explaining what they are, and many of the objects and functions lack or have only barebones docstrings.
What's missing is narrative API documentation -- Even a few short sentences introducing what each of the modules are and how they relate to one another would go a long way in helping someone understand the library. Using sphinx-apidoc is fine, but until the library has browsable structure it should be used to generate doc stubs that live in the
/docs
folder that you can then give explicit structure to by handwriting some of the autodoc directives.As is, it's relatively clear that the developers don't use these docs because when I click on any of the headings in the API documentation tab I am actually led to some sub sub sublink in the API Documentation > PyNWB > Submodules >
<literal module name>
page -- and hopefully y'all wouldn't do that to yourselves on purpose! I don't mean to be harsh just to say this doesn't seem intentional!Using all the existing material!
I know y'all do about a billion workshops, have a ton of users, and probably have a ton of teaching material. That is not reflected in the docs! The main NWB page links to this separate documentation page: https://nwb-overview.readthedocs.io/en/latest/index.html#
which is not linked anywhere from the pynwb docs! The
nwb-overview
docs themselves also largely seem to be overview docs with links out to pynwb andnwb_conversion_tools
and don't reveal any additional structure to the format or library.If the documentation was structured in a more limber way, maybe by using myst, maybe by using a wiki, maybe by figuring out some other way to incorporate all the materials that I know exist, then that would probably improve the documentation tenfold without writing anything new! I'm talking about all this stuff as a start! https://neurodatawithoutborders.github.io/nwb_hackathons/
and I also have seen a few dozen lab-specific conversion repos that would be great to link to from an "examples" subpage!
I will stop there for now, and am more than happy to PR, but I hope hope hope this is received in the spirit I am writing it, as someone who is interested in the same things, that wants to see NWB thrive, that has coached several labs through conversion, and likes and respects what y'all do here. I think that all the cool next-level stuff I see happening like linked analyses and widgets and all that simply won't have the same impact if most people (without the funds to hire a staff programmer et al. to do it for them) simply cannot fathom converting their data in the first place. I have just been sort of confused by the docs for a long time and feel like it was worth saying something, and am again very very very happy to do some of the work of restructuring and rewriting the docs with some guidance for what the team would accept.
Do you have any interest in helping write or edit the documentation?
Yes.
Code of Conduct
edit 1: some grammar
edit 2: when I try and be funny and friendly online I speak in hyperbole like as a joke but then realize that it comes across as serious and so I made more explicit annotations of uh tone lol
The text was updated successfully, but these errors were encountered: