Jonathan Gorman

Recently a newcomer to the Code4Lib mailing list, Cliff, posted a question asking for information about sharing code and also possible ethical considerations as some of the shared code might be based off of other's efforts.

I did a short response that focused more on the first part of his query covering some thoughts about code sharing in the Code4Lib community, which I'm cleaning up and posting here.

There seems to have been a push over the past few years in Code4Lib to share more and more code, even with small projects. There are a lot of individuals scattered about in the library world writing code to accomplish similar tasks, small and large. One common example is the glue between certain academic enterprise systems and our catalogs. This code, particularly in the past, got developed in little pockets without ever getting shared. Occasionally code sharing flourishes as a gated community surrounding a particular vendor, but I think these communities suffer by just not being large enough. There seems to be a conscious push against the tendency of isolated development by releasing often and without regard to size. GitHub in particular has made it really easy and painless to share smaller chunks of code and offer patches to projects.

I have been bad about releasing and sharing source myself. This has been a hindrance as I find myself creating similar code in different internal projects instead of taking a step back and generalizing the code. If I did, not only could the code be shared among my projects, it could be shared with the community.

There is also a barrier in our lawyers. I have not put in the energy needed to get the attention of the office that makes decisions on whether or not to release code as open source. That office also does not make it easy or comfortable to ask questions. I suspect from what I've heard that one really needs to call or try to visit in person, something I tend to sub-consciously avoid in my typical approaches to communication.

On a community level, it feels like Code4Lib is starting to see tension about releasing small projects and lots of code that manifests in a variety of ways.

There is the perception that there are projects have been abandoned or just don't have the level of support and community necessary to sustain development.
Large scale of adoption of code/projects by people who don't have the technical skills to contribute patches and need help to use the project.
Competition among projects that share goals and need to compete with each other for community. I think choices are good, but choices introduce tension and too many choices can lead to people choosing nothing. I don't think the library software world has hit that point, but I can see a future not to far away where this is more of a problem.

There have been a couple of articles over the years on these topics in the code4lib journal that describe it in more detail than the general approach I've taken here that worth reading.

First, an argument on why to just put stuff out there and why so often we seem to fail to by Dale Askey: COLUMN: We Love Open Source Software. No, You Can’t Have Our Code

On the other hand, see Terry Reese's excellent article in the latest issue presenting an argument why one should be prepared to support the code published: Purposeful Development: Being Ready When Your Project Moves From ‘Hobby’ to Mission Critical

Finally, Michael Doran gave an excellent talk a few years back that really stuck in my head with the very issue I've been reluctant to put more effort into: lawyers and code: The Intellectual Property Disclosure: Open Source in Academia. (Powerpoint slides)

In re-reading the original post, I realized I glossed over the ethical part, which is a shame. There are some fascinating issues concerning the ethical dimension of sharing code that was based and inspired off of other code. Of course, on one level are the legal issues involved with copyright and derivative works depending on exactly what "based on" entails.

However, I'm more interested in the learning and sharing aspect of code development. It is extremely useful for me to read code developed by others. Like critical reading of prose, you can learn a lot by not just trying to figure out what the code does, but thinking about how the code you are reading communicates to the reader. Does it flow? Does it jump around? Are abstractions employed that makes it easier to conceptualize? It's a fascinating topic and really deserves longer treatment with another post.

My thanks go out to Peter Murray (aka @DataG) who shared a link to my email. Also thanks to Becky Yoose (aka @yo_bj) for retweeting. In doing so they made me realize perhaps it would be worth revising and posting the email as a blog post.

Sharing Code