Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: expand supported license data #7

Closed
migurski opened this issue Sep 19, 2015 · 32 comments
Closed

Proposal: expand supported license data #7

migurski opened this issue Sep 19, 2015 · 32 comments

Comments

@migurski
Copy link
Member

Based on the license principles discussion, I’d like to recommend a formal change to the contribution guidelines and our parsing code to support an optional extended description.

Currently, the license is documented as “a URL or string”, and supports both explicit links and implicit short strings:

"license": "http://geonb.snb.ca/downloads/documents/geonb_license_e.pdf"

Also valid:

"license": "CC-BY-SA"

We should support an additional expanded version of the license data, with true/false flags for license properties such as required attribution or share-alike:

"license": {
    "url": "http://geonb.snb.ca/downloads/documents/geonb_license_e.pdf",
    "attribution-string": "GeoNB – www.snb.ca/geonb",
    "attribution": true,
    "share-alike": false
}

The old forms will still be acceptable. In cases where attribution or share-alike are not explicitly defined, we would assume both are required.

If this proposal were accepted, here are the next steps:

  1. Write and test machine code to support the structure above.
  2. Deploy new machine code.
  3. Update contribution guide to reflect newly-supported structure.
  4. Research licenses of existing sources to determine their properties.
  5. Expand existing sources to use new license structure.
  6. Determine whether to deprecate older URL/string structure.
@NelsonMinar
Copy link
Contributor

I'm in favor of this! I don't feel strongly about the format of the JSON blob, but this looks good.

Is there some commonly accepted definition of what "Attribution" and "Share-Alike" means that we're implying here? Perhaps the Creative Commons definitions? http://creativecommons.org/licenses/

Are there different degrees of Share-Alike? Wondering if there are some Share-Alike licenses that basically mean "your copy of our database has to be shared", but doesn't extend to a more viral "anything you make with this data must also be Share-Alike".

@migurski
Copy link
Member Author

Attribution seems really clear to me.

Share-alike is much more slippery—I’m still not sure if it seems safer to assume yes or no on this one.

@migurski
Copy link
Member Author

Tagging @sbma44 and @iandees for particular input on this.

@NelsonMinar
Copy link
Contributor

Perhaps we need a "license unknown" category in the output files.

@iandees
Copy link
Member

iandees commented Sep 19, 2015

I like this idea and the format of the json blob. I'm a tad bit worried
about us interpreting licenses and boiling them down into a couple
attributes. Maybe if we're clear in docs that this is our interpretation
and might not want to be your interpretation?

@migurski
Copy link
Member Author

Yes, I agree with the idea that this is our interpretation.

@migurski
Copy link
Member Author

Would it be fair to say that anything in OA can be used for derived works? That’s really the crux of the SA flag: it governs what you can do with those works, but we assert that anything in OA should be usable for new data products.

@NelsonMinar
Copy link
Contributor

I don't think there's any value in us collecting data that cannot be used at all for derived works. (Is there any?) The challenge is what restrictions the license might place on derived works. Share-Alike provisions require derived works (sometimes?) make the whole derived work share-alike. Non-Commercial provisions forbid commercial use. I think we should include sources with SA or NC provisions but very clearly delimit them.

@NelsonMinar
Copy link
Contributor

Calling out Non-Commercial explicitly, that's also a common license provision in some circles. Do we need it for OA data sources?

@migurski
Copy link
Member Author

Sounds like we might, and it would map cleanly to the three flags in CC licenses. There’s eight possible combinations, but CC documents just six.

@migurski
Copy link
Member Author

…and I see that two of them include No Derivatives, which I think we can exclude. We would have five possible kinds of downloads:

  • No requirements.
  • Attribution (BY)
  • Attribution-ShareAlike (BY-SA)
  • Attribution-NonCommercial (BY-NC)
  • Attribution-NonCommercial-ShareAlike (BY-NC-SA)

Three without NC:

  • No requirements.
  • Attribution (BY)
  • Attribution-ShareAlike (BY-SA)

@NelsonMinar
Copy link
Contributor

Yeah, three flags in the source documents (one for each feature: BY, SA, NC). Then we can present a list of collections however makes sense based on which license features are most common.

@migurski
Copy link
Member Author

What do we think about presenting them as positives in the download descriptions:

  1. Share-alike → Any License Allowed.
  2. Noncommercial → Commercial Use Allowed.
  3. Attribution → [whatever the opposite of attribution would be]

@ajturner
Copy link

Big fan of this.

By the way, CreativeCommons Rights Relation & ccREL for w3c & OKFN Open Licenses.

So using CreativeCommons NS would perhaps be:

"license": {
 "cc:permits": ["cc:Reproduction", "cc:Distribution"],
 "cc:prohibits": ["cc:CommercialUse"]
}

@migurski
Copy link
Member Author

I like the “permits” vs. “prohibits” language, that’s great. The cc: namespaces might not be entirely appropriate since we’re not technically dealing with CC, but linking to them for the spirit could be enough.

@sbma44
Copy link

sbma44 commented Sep 28, 2015

Super-late to this, but will say:

  • I like the idea of nudging the world toward CC taxonomies
  • I think noncommercial licenses are likely to be common enough to be worth noting, but the difficulty of defining "commercial" in a unified way across licenses makes this of limited practical value
  • ditto the thresholds under which sharealike attaches. attribution is relatively straightforward, and even when requirements vary (for instance: the Austrians are touchy about having the snapshot date specified), the plausible damages for a violation are minimal. when you ponder the limitless mutability of the ODbL "substantial" threshold you can get a sense of just how slippery this will be in practice.

With all that said I think proceeding is great, but we should make the disclaimers totally unavoidable. I'd hate for anyone to think we're taking formal positions on the usability of the data/offering legal advice.

@migurski
Copy link
Member Author

Makes sense, thank you! I’ll move forward, and I’ll make sure that disclaimers are reflected in the download page design.

migurski added a commit to openaddresses/openaddresses.io that referenced this issue Sep 29, 2015
We are going to have disclaimers there about license status (openaddresses/openaddresses-ops#7), so a simple get-everything zip file link is no longer appropriate on the front page.
@migurski
Copy link
Member Author

migurski commented Oct 4, 2015

I’m making a series of changes here that introduce the new dictionary syntax, with backwards-compatible support for simple strings. It’s just URLs and strings so far; nothing about attribution or license properties yet. The new behavior is released in Machine 2.6.0.

@migurski
Copy link
Member Author

migurski commented Oct 5, 2015

Next step: there are a large number of existing sources with attribution tags. That’s the first explicit license flag we should support. I think we can use the presence of the tag as an implicit hint about the required attribution of the source, going back to the sources later on with explicit flags.

@migurski
Copy link
Member Author

migurski commented Oct 9, 2015

Here’s where we are at the moment with license tag documentation, FYI: https://github.com/openaddresses/openaddresses/blob/633cd4c/CONTRIBUTING.md#optional-tags

@geobrando
Copy link
Contributor

A couple things:

Next step: there are a large number of existing sources with attribution tags. That’s the first explicit license flag we should support. I think we can use the presence of the tag as an implicit hint about the required attribution of the source, going back to the sources later on with explicit flags

  1. @migurski : On at least a few occasions I know that I've added and attribution tag as a simple courtesy to the data owners and not because attribution was required under the license terms. I believe others have done the same. I would recommend against blindly converting all existing sources with this tag to the new license tag structure, unless you're OK with this.
  2. Since license text is sometimes included with the source data, shouldn't the license tag structure allow for paths in the data file that would allow machine to extract this from a single download?

@migurski
Copy link
Member Author

@geobrando: I’m treating the attribution tag as an implied requirement only in the absence of other information, and I’m not updating any of the sources to make this explicit. It should affect only collections without a clear flag, and I believe it will be safe. Does that sound okay to you?

Say more about the paths in the data. Are you thinking that we might point to some text file included in a zip archive?

@migurski
Copy link
Member Author

migurski commented Nov 8, 2015

Comments from @NelsonMinar suggest that splitting attribution downloads doesn’t make sense, but that splitting share-alike ones does. I’m going to put openaddresses/machine#236 and openaddresses/machine#248 on ice for a little while, and introduce a share-alike flag first.

@migurski
Copy link
Member Author

migurski commented Nov 8, 2015

In openaddresses/machine#254, missing share-alike license information is assumed to mean false. Is this safe, or should it default to true to be more cautious?

@NelsonMinar
Copy link
Contributor

My gut reaction is to assume false, simply because share-alike is so rare in the world we're dealing in. Right now do we have any sources that require it? Six months ago I bet we were explicitly not including them at all.

Better yet would be to not assume anything, and either reject a source that doesn't specify or else have some lint tool that's reporting sources missing this info.

@migurski
Copy link
Member Author

migurski commented Nov 9, 2015

I’m thinking false as well. There are a few sources that appear to have SA. I’ll merge the machine changes as they are, and get to work on a set of OA changes that will formally document this and modify some sources.

@geobrando
Copy link
Contributor

@migurski My concern was mainly fully deprecating the standalone attribution tag and converting existing sources to license.attribution = true, but in general I worry about using the presence of an attribution name tag to imply that attribution is required. Maybe it's just a matter of the documentation making this clear.

Say more about the paths in the data. Are you thinking that we might point to some text file included in a zip archive?

Yeah. I think I got confused and thought there were plans to extract license text for dissemination using license.url. But that isn't really feasible or typically necessary. But I believe I have seen some sources with terms that state that a copy of the license should be included whenever the data is disseminated. Can't recall where I saw this though.

@migurski
Copy link
Member Author

Hm good point. I’ve augmented some of the data sets with explicit attribution: false where possible, based on common licenses in openaddresses/openaddresses#1408.

Right now, the only place where the attribution requirement appears is the collection license file. Should I maybe have it default to false instead? Is this dangerous? It’s a softer license term than share-alike.

@migurski
Copy link
Member Author

After a few conversations, realizing that SA is the right license requirement to split downloads on. Going to stop openaddresses/machine#236 and openaddresses/machine#248 and create new issues to reflect this.

@migurski
Copy link
Member Author

With the completion of these issues, I’d like to close this ticket:

Some remaining things that can be done separately:

@iandees
Copy link
Member

iandees commented Nov 17, 2015

I agree. Thanks for all your work on this, Mike!

@migurski
Copy link
Member Author

💥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants