Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Documentation]: Structure of docs considered harmful to a project that i love! #1482

Open
3 tasks done
sneakers-the-rat opened this issue May 20, 2022 · 4 comments
Open
3 tasks done
Labels
category: proposal proposed enhancements or new features help wanted: good first issue request for community contributions that are good for new contributors topic: docs issues related to documentation

Comments

@sneakers-the-rat
Copy link
Contributor

sneakers-the-rat commented May 20, 2022

What would you like changed or added to the documentation and why?

Meta-context: looking at the most recent (i think?) version of the docs here: https://pynwb.readthedocs.io/en/dev/index.html according to #1478

I am here with love to raise a conversation that I feel is one that must have been being had internally but externally feels overdue: there is a lot of information in the docs, yes. there has been a lot of work done on them, clearly. but at the moment they are not approachable for most people.

I've read I think most of pynwb and HDMF at this point, wrote my first conversion guide in 2019, and have spent the last year or so specifically studying software accessibility in a number of domains (including data formats! which I am a fan of!) and I believe the docs are the single greatest hindrance to adoption. I, again, with love, as someone who shares the goal of realizing the benefits of standards, with the intention of making this tool more accessible, will try and articulate why from an outside perspective.

UX as in the Experience of the User

If I am a neuroscientist interested in converting to NWB, this is how I am greeted to the docs:

Screen Shot 2022-05-19 at 8 26 51 PM

All but two of the entries in the TOC are not relevant to me. Hopefully the installation is just pip install pynwb, so that leaves just the tutorials.

Screen Shot 2022-05-19 at 8 48 41 PM

The pages are sorted alphabetically! As far as I can tell as a naive user, the only thing that seems relevant to me to getting my bearings here seems to be NWB File Basics. OK!

NWB File Basics

If i spend awhile reading through that tutorial, I come away with the notion that I need to make a file and add things to an NWB file, a sort of smattering of different data types (which are interesting and sound like what I needed!), and some hints at reading the file. Great! To a programmer, this is useful, I know how inheritance works so I know what it means for things to be a subclass and how that makes shared function. I know that I can click through to the API docs and read them. But to a neuroscientist who is not a programmer, I still am not really sure how this all works! There are timeseries, yes I know that one, but then I need to add epochs to a timeseries? What does that look like? I am not sure what tutorial should come next for me, none of the other general tutorials are obvious places to go next.

Well the next entry in the TOC is Domain-Specific tutorials, maybe I can use those.

Extracellular Ephys

tbh you lost me!

Screen Shot 2022-05-19 at 9 00 23 PM

I know that I have electrophysiological data, I know I used electrodes to record it, I don't know how to read that diagram! I don't know what a device is or how it's related to an electrode group aside from the fact that I need to add them yet. I'm not clear about the reasoning behind why I'm doing any of this yet either, for example this took me awhile to parse even as someone who knows the library a bit:

Screen Shot 2022-05-19 at 9 04 29 PM

When I go through to the add_electrode method to try and understand it (which is helpfully linked!), I get linked to the ElectrodeGroup object which doesn't have a docstring, and so I have no idea what it actually means, or why add_electrode needs it!

Screen Shot 2022-05-19 at 9 08 18 PM

again as a programmer I think it might be relatively easy to read the inheritance hierarchy, click through to the source, even make a live object and inspect it directly, which is why I'm completely sympathetic to thinking I'm being pedantic here or overly critical. I know you have talked people through the structure of the library a thousand times so you know what I'm missing and that I'm being overly naive. im not trying to say the library is bad, just trying to describe that it's hard to learn as is.

Same thing with Device, from the API docs, I have no notion of what other objects it might be used with (links to other objects like from before would be nice!), and since it's in its own module in a very flat namespace, I am not sure where I might go to learn more!

Screen Shot 2022-05-19 at 9 13 49 PM

Going through the rest of that tutorial, I'm not really sure how my units relate to my electrodes, my raw timeseries, and etc.

Where Am I Now?

From here, I can browse through the other tutorials, but what I'm missing at this point is a basic lay of the land, How can these various things interact with each other? What even are the basic objects here? I learned in the general tutorial that there were only three things: timeseries, processing modules, and metadata. But what are these electrode groups? What is a dynamictable? if I forget a value from my electrode what do I do? If i want to add my data, how would I go about it aside from following exactly what is in the tutorials? If my data is slightly different than what is in the tutorials, how would I go about fixing that?

Role of Code Structure

I'll make this very brief (as some people in this group have had me come in and make PRs drastically restructuring their libraries before lol) in order to limit this issue to the structure of the documentation and how can we scaffold this process better, but I think in the long run what is really needed is a refactoring of the library: most of the code is entirely flat in the base pynwb namespace, there's one io submodule that has duplicates of a lot of the same file names in the top-level namespace, and so as a result the documentation doesn't structure itself and has to be done manually. Similar things should be grouped together, and honestly a very simple pyreverse diagram demonstrates that that structure already exists (and looks pretty dang reasonable!), it just isn't reflected in the code structure (when read by an outsider):

classes

Ideas for Docs Structure Refactoring

The first goal here should be to make a clear pathway for someone interested in converting their data to NWB to do so! which I think we can agree on and work towards. They shouldn't need to come to a workshop (as much as I love them), they ideally shouldn't need an additional library, and they shouldn't have to resort to using their grant funding to pay someone to convert it for them. Aspirations yno?

What that should look like, as in literally visually look like, to a new user is to have much much more of the TOC and homepage devoted to them.

Tutorials Gallery

Starting from the way the docs are implemented: I think one very simple fix is to fix the way sphinx_gallery is being used. From the index, the tutorials are linked as tutorials/index. I'm not really sure what the sphinx_gallery really adds here, but what it subtracts is explicit control over the presentation of the tutorials.

There is explicit order to the tutorial groups:

'subsection_order': ExplicitOrder(['../gallery/general', '../gallery/domain', '../gallery/advanced_io']),

but then within a group they are sorted alphabetically:

'within_subsection_order': ExampleTitleSortKey

This makes browsing them very challenging! There is no scaffolding, I have to discover it for myself. There really isn't a good way to learn about the structure of the library from the docs page (I know there is more elsewhere!) -- I'm not talking about learning about it from a developer POV for contributing, as the software structure is described in the developer docs below, I mean just knowing at a basic level what exists as a casual neuroscientist wanting to freshen up their data.

I feel like part of this might be the relative brittleness of the format of sphinx_gallery -- that looks great for short examples, I know sklearn and scipy use it to great effect, but it looks like a real pain in the ass to write docs as RST within comments! It also is very much programmer-centric in the amount of literal code that is included in the documentation. myst has made it dramatically easier to use sphinx and i can't recommend it enough. In either case a few more explicit steps up the ladder would I think be a good change.

Introductions to the API

Tutorials are great! They are not the most straightforward way to teach the structure of a library, and the rest of the API documentation needs to be able to speak for itself if a new user is expected to go from tutorials -> API on their own. As is, the API documentation feels like it's in dire need of dogfooding (something I know well personally and always mess up) - they are written, understandably, from the perspective of someone who understands the library but does not use the API documentation in their own work. There is a lot of missing context about what the role of any particular object does, and given the lack of hierarchical structure in the documentation and code, there is relatively little way for someone to infer it without reading the source code.

It looks like all the API-level documentation is just generated with sphinx-apidoc at the time of building the docs. Doing that means that for the API docs to be useful the code needs to be structured in a way that supports readable documentation. As is, however, most of the modules don't have top-level docstrings explaining what they are, and many of the objects and functions lack or have only barebones docstrings.

What's missing is narrative API documentation -- Even a few short sentences introducing what each of the modules are and how they relate to one another would go a long way in helping someone understand the library. Using sphinx-apidoc is fine, but until the library has browsable structure it should be used to generate doc stubs that live in the /docs folder that you can then give explicit structure to by handwriting some of the autodoc directives.

As is, it's relatively clear that the developers don't use these docs because when I click on any of the headings in the API documentation tab I am actually led to some sub sub sublink in the API Documentation > PyNWB > Submodules > <literal module name> page -- and hopefully y'all wouldn't do that to yourselves on purpose! I don't mean to be harsh just to say this doesn't seem intentional!

Screen Shot 2022-05-19 at 10 00 25 PM

Using all the existing material!

I know y'all do about a billion workshops, have a ton of users, and probably have a ton of teaching material. That is not reflected in the docs! The main NWB page links to this separate documentation page: https://nwb-overview.readthedocs.io/en/latest/index.html#

which is not linked anywhere from the pynwb docs! The nwb-overview docs themselves also largely seem to be overview docs with links out to pynwb and nwb_conversion_tools and don't reveal any additional structure to the format or library.

If the documentation was structured in a more limber way, maybe by using myst, maybe by using a wiki, maybe by figuring out some other way to incorporate all the materials that I know exist, then that would probably improve the documentation tenfold without writing anything new! I'm talking about all this stuff as a start! https://neurodatawithoutborders.github.io/nwb_hackathons/
and I also have seen a few dozen lab-specific conversion repos that would be great to link to from an "examples" subpage!

I will stop there for now, and am more than happy to PR, but I hope hope hope this is received in the spirit I am writing it, as someone who is interested in the same things, that wants to see NWB thrive, that has coached several labs through conversion, and likes and respects what y'all do here. I think that all the cool next-level stuff I see happening like linked analyses and widgets and all that simply won't have the same impact if most people (without the funds to hire a staff programmer et al. to do it for them) simply cannot fathom converting their data in the first place. I have just been sort of confused by the docs for a long time and feel like it was worth saying something, and am again very very very happy to do some of the work of restructuring and rewriting the docs with some guidance for what the team would accept.

Do you have any interest in helping write or edit the documentation?

Yes.

Code of Conduct

edit 1: some grammar
edit 2: when I try and be funny and friendly online I speak in hyperbole like as a joke but then realize that it comes across as serious and so I made more explicit annotations of uh tone lol

@sneakers-the-rat sneakers-the-rat changed the title [Documentation]: Structure of docs considered harmful [Documentation]: Structure of docs considered harmful to a project that i love! May 20, 2022
@sneakers-the-rat
Copy link
Contributor Author

updated the title to better reflect my intention

@oruebel
Copy link
Contributor

oruebel commented May 20, 2022

@sneakers-the-rat thanks for describing your experience with the documentation and suggestions for improvements. Creating good documentation is hard and we continuously strive to make our documentation better. Thank you also for your willingness to contribute and help makes things better. Having this issue as a point for discussion is useful, but I think it will be useful if we can start to create more specific issues from this discussion to give us and the community more concrete items that we can address. For example, your suggestion to improve the ordering of tutorials is a great example for a concrete improvement that we could create an issue for. Just a few quick comments:

The main NWB page links to this separate documentation page: https://nwb-overview.readthedocs.io/en/latest/index.html#

The NWB team created this page recently as a result of our recent Documentation Hackathon event. The intent is to create a more approachable entry-point for NWB users. I agree that the interlinking between the PyNWB and this page still need significant improvement.

but what I'm missing at this point is a basic lay of the land,

We have created a new tutorial for this here https://nwb-overview.readthedocs.io/en/latest/intro_to_nwb/1_intro_to_nwb.html . I think one way to maybe address this issue is to add a "Getting Started" page to the PyNWB docs to help guide users through the documentation and point to the resources on the NWB Overview page as well.

If I am a neuroscientist interested in converting to NWB ...

I hope we have been able to address part of this issue with our new conversion tutorial https://nwb-overview.readthedocs.io/en/latest/conversion_tutorial/user_guide.html

The pages are sorted alphabetically!

I agree that creating a more meaningful order for the tutorials in the various sections will be useful as well as possibly adding some text to provide guidance how to navigate the tutorials. Again, this is in part a reflection of natural growth. Over the last few years we have added many tutorials and while a basic alphabetical ordering was fine at the beginning it clearly is no longer sufficient.

I feel like part of this might be the relative brittleness of the format of sphinx_gallery

One main reason we had chosen sphinx_gallery is because it allows us to execute the tutorials and test them as part of our continuous integration pipelines. This helps us ensure that the code in the tutorials works correctly and allows us to identify when changes in the library break or require updates to tutorials.

simple pyreverse diagram demonstrates that that structure already exists,

This is in part a question of whether the library structure should be motivated by developer or user structure. The organization of PyNWB is intended to organize classes into modules based on their application areas (e.g, electrical physiology or optical physiology) and to match the organization of types in the nwb-schema rather than reflecting inheritance structure (as shown in pyreverse).

As is, however, most of the modules don't have top-level docstrings explaining what they are, and many of the objects and functions lack or have only barebones docstrings

I agree that adding and enhancing docstrings will be very useful to do. I had started (at least for HDMF) to try and use
sphinx.ext.autosummary to create a more approachable overview of modules in hdmf-dev/hdmf#654 and improve that navigation of the API docs but I have not gotten around to fully finishing this. However, I think this will also be useful for PyNWB.

@oruebel oruebel added category: proposal proposed enhancements or new features topic: docs issues related to documentation help wanted: good first issue request for community contributions that are good for new contributors labels May 20, 2022
@bendichter
Copy link
Contributor

Thanks @sneakers-the-rat. You bring up some good points. Let's start with the tutorials order.

There is some order, as you indicate. The NWB Basics tutorial really should be first. But on the other hand I like that some of them are unordered- the domain-specific tutorials really should not have an order within that category and should be treated more a la carte.

I think the best solution here would be to separate out the "NWB Basics" into its own gallery so that it can be first. I also think we should include links at the end of that tutorial to encourage users to go from there to one of the domain-specific tutorials.

note: @oruebel, you mention that the gallery is used so we can CI test the tutorials. My team found another way to to do this by using pydoc, as illustrated here and here. The advantage of this approach is it lets you enhance the test beyond simply "does it run" to testing simple outputs. It also allows you to get around having to be a gallery. The disadvantage is that we couldn't find a way to avoid having to use ">>>" before every line, which is annoying. We got around that a bit with a toggle for the display, but it might not be ideal. I don't think it's worth refactoring all of the tutorials to this format, but it's a good tool to have in our arsenal :-)

@sneakers-the-rat
Copy link
Contributor Author

sneakers-the-rat commented May 20, 2022

Fabulous!

Creating good documentation is hard

Again, this is in part a reflection of natural growth. Over the last few years we have added many tutorials and while a basic alphabetical ordering was fine at the beginning it clearly is no longer sufficient.

To be clear, completely agree and definitely am not doc shaming. having to imagine what other people might or might not understand is extremely hard!

Having this issue as a point for discussion is useful, but I think it will be useful if we can start to create more specific issues from this discussion to give us and the community more concrete items that we can address.

Also agree. I'll try and sketch some below and then if they sound good I'll open them and can try my hand at some PRs.

We have created a new tutorial for this here https://nwb-overview.readthedocs.io/en/latest/intro_to_nwb/1_intro_to_nwb.html . I think one way to maybe address this issue is to add a "Getting Started" page to the PyNWB docs to help guide users through the documentation and point to the resources on the NWB Overview page as well.

Yes! Agree! This tutorial is helpful. Understanding it's new, I think it could be worth going through the other tutorials and condensing down some additional basic concepts: dependency between multiple tables (like electrodegroup -> device), how certain objects that seem to be of different "levels" or "modalities" like how epochs can apply to other timeseries, etc.

I hope we have been able to address part of this issue with our new conversion tutorial https://nwb-overview.readthedocs.io/en/latest/conversion_tutorial/user_guide.html

These are a good start! Given the diversity of conversion practices, it might be good to make a wiki to make it easier than a pull request to the official docs to add your example to a library, will write more details below.

One main reason we had chosen sphinx_gallery is because it allows us to execute the tutorials and test them as part of our continuous integration pipelines.

My team found another way to to do this by using pydoc, as illustrated here and here. The advantage of this approach is it lets you enhance the test beyond simply "does it run" to testing simple outputs.

Yes! pydoc and also sphinx's doctest does this as well. It's possible to avoid >>> while still testing specific output by using the testcode and testoutput directives, and it's also possible to avoid having to print the imports and other auxiliary setup code that can clutter documentation with testsetup. We can talk about whether it would be worth changing the syntax -- it wouldn't be too hard, just wrapping the code blocks in some directives, but also whether that's desirable vs. gallery.

The organization of PyNWB is intended to organize classes into modules based on their application areas (e.g, electrical physiology or optical physiology) and to match the organization of types in the nwb-schema rather than reflecting inheritance structure (as shown in pyreverse).

Completely fair! I was mostly reflecting on how the code structure makes the doc structure less automatic. I think there's something to be said for giving a bit of hierarchy to the code: core NWB types and etc., then instead of all things for a particular modality being in one file they can be split out into a package with separate modules: eg. for ecephys the timeseries classes could go in one module and the 'analysis' classes like clustering could go in another? I think this is probably a larger conversation, but where i'm coming from here isn't so much a stylistic or functionality question as scaffolding understanding for new users and potential developers: To browse the library I just always open it in an IDE so I can use static analysis to jump to usages and definitions because the package is flat, but having some grouping structure to it would make it clearer how things are related.

I think it's possible to do both without a dramatic restructuring with some visualizations, will describe below.

I agree that adding and enhancing docstrings will be very useful to do. I had started (at least for HDMF) to try and use sphinx.ext.autosummary to create a more approachable overview of modules in hdmf-dev/hdmf#654 and improve that navigation of the API docs but I have not gotten around to fully finishing this. However, I think this will also be useful for PyNWB.

Yes, autosummary is great! I use autodocsumm to make it automatic. With autodocsumm you just add 'autosummary':True to autodoc_default_options and it should make them for everything you autodoc.

Spinoff Issue Ideas

Just sketching these before opening them to check if we're on the same page, so intending these as provisional & welcome your input on whether any should be changed/axed.

Explicit Order for Beginner Tutorials

Agreed with what @bendichter says above about splitting these out. I think what would be nice here is to expand them into more top-level TOCTree elements so when a new user comes along they see a clear path they should follow to learn. As is I think that everything being nested inside Tutorials obscures that a bit and requires some discovery.

If the tutorials are to remain as galleries, I think these could be done with ExplicitOrder, but it's unclear to me whether it's possible to put at the top level of a TOCTree like that, but it also looks like it should be possible to explicitly refer to the tutorials in a toctree like tutorials/general/...?

I think it would also be nice to put links at the bottom like 'choose your own adventure' - go to the next tutorial in the sequence, or 'see also' this tutorial for additional information about x, y, z, concept?

Conceptual Overview Tutorial

It looks like this is happening in nwb-overview, but In addition to the NWB File Basics tutorial, it would be good to have more scaffolding on NWB concepts. I think what is here in the neurodata types is what I'm thinking of, but it needs some additional information about how different objects relate to one another: dependent objects like Device, as well as objects that can structure others (eg. epochs applied to other timeseries).

If these kind of tutorials are going to be in nwb-overview rather than the base docs, then they probably deserve their own top-level links in the TOCTree (as above), as well as narrative description from the index ('new users go here!')

Links to additional documentation and demos

There are about a half bazillion different learning materials, demos, workshop materials, notebooks that y'all and the rest of the userbase have put together over time. These are great and super valuable for example and learning, but at the moment they're sort of scattered and need a decent amount of hunting around and finding them, and even then it's unclear if they demonstrate current best practices or are just some neat trick that someone has discovered that works for them.

It sounds extremely labor intensive to maintain such a directory of links to examples within the docs as such, and would require people doing PRs to this repo directly, which is an undesirably high barrier. I think the right tool (as I often do) here is a wiki! I use and love mediawiki specifically, it's realtively simple to set up and extremely powerful. With an example wiki it could be "self serve" for people to link out to their own example, and it would be a lot easier for a larger team to maintain than being .rst in the sphinx docs. If you like programmatic organization it's possible to do pretty much all you would want to do with semantic mediawiki like annotating which examples demonstrate which concepts, use which objects, etc. It would also let you do stuff like annotate which are "official" workshops/demos/etc. and which are community examples, and put all the multimodal learning materials (videos, notebooks, discussions, etc.) in the same place with pretty low friction!

Library and Dependency Graph Visualizations

To help people get a view of NWB 'at a glance,' as well as navigate the library a bit more easily, I think it would be lovely to make some graphviz/etc. visualizations of the basic structure of the library. This can be done in a few ways:

  • explicitly with graphviz,
  • I've also used d3 (for example) to make a clickable and styled visualization that autogenerates from some structural description of the graph
  • pregenerated SVGs that can also link, example from PVP
  • for inheritance: inheritance_diagram

I think there are two separate things here:

  • visual representation of the macrostructure of NWB: these would be in tutorials, overview documentation, and at the top level of modules to let users know how things fit together at a high level
  • visual representation of relationship between individual/sets of objects. These would be in individual class docstrings, so for example you would have a diagram like these to show which other classes you might use. I can imagine a graph that shows required dependencies, but also that show maybe 'commonly used' linked classes. To autogenerate these it would be possible to write an extension to autodoc to inspect the class for attributes that are typed as being other NWB objects, but it seems like using something a bit more explicit like graphviz (which support record nodes like those used in the docs already) might also be useful.

Convert gallery to doctests?

Should we convert some/all of the basic tutorials to use doctests rather than gallery? I don't have a strong opinion here, I think some of the relevant considerations include

  • doing something rather than nothing is nonzero labor
  • maintainability: The gallery format is a little cumbersome (to me) and requires some additional text editing to write. Markdown is very straightforward to use, and so might be easier to maintain than having to go and edit a lot of .rst embedded within docstrings and comments?
  • readability: doctests lets you hide setup code, does gallery also let you do that?
  • test verbosity: doctests lets you do more traditional tests that check whether an output value matches some expected value. does gallery also support this?

Explicit API Docs

To make the API docs more user-friendly, we could use explicit api doc stubs with automodule calls. Considerations:

  • Possible to give additional structure and organization not found in the structure of the code itself (ie. API docs don't need to be isomorphic to code structure, but can group different classes/modules together as makes sense)
  • Possible to give more readable names (currently names are the literal module name)
  • Possible to have more control over things like eg. autosummary
  • Possible to get out of sync with the code (though this is pretty easy to spot and fix)
  • Not as automatic.

Additional Documentation Needed?

Catchall for 'nice to haves' that involve writing additional documentation:

  • Module-level docstrings
  • Class docstrings - an ongoing project if there ever was one
    • Narrative links to all related/dependent objects
    • Narrative description of how it's used
    • links to relevant tutorials
    • Usage examples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: proposal proposed enhancements or new features help wanted: good first issue request for community contributions that are good for new contributors topic: docs issues related to documentation
Projects
None yet
Development

No branches or pull requests

3 participants