Writing Jupyter notebooks: avoiding legal issues

Inge Halilovic
3 min readMay 8, 2019

I’ve edited and tested a number of Jupyter notebooks. My #1 tip for writing notebooks is to make sure you don’t violate copyright laws or terms of use by the way you use text, images, or data sets. You might not think copyright infringement is a big deal, after all, what’s the worst that could happen? You might not have to worry about going to prison, but you could get into trouble with your company’s Legal department (prison might be preferable). So how do you know what’s okay and what’s not? I’ll give you some tips.

Disclaimer: (what? I’m writing about legal stuff — of course there’s a disclaimer!) I’m not a lawyer. If you follow my tips, I can’t guarantee that your notebook will be completely legal and follow all terms of use correctly. Be safe: ask your Legal department to review it!

Copyright symbol

Nobody forgets that books are copyrighted because when you open a physical book, you flip past the copyright page on the way to the table of contents. But online, it’s a lot easier to overlook that almost everything that you see is copyrighted or otherwise protected by law. Which means you can’t just use text, images, icons, fonts, or data sets that you find online without investigating whether your use is legal and correct. Although these guidelines are especially important if you’re publishing your notebook externally, they still apply, legally, if your notebook is internal to your company.

Reusing your company’s content

If you’re writing or publishing a notebook that belongs to your company (meaning your company owns the copyright to it), then you might be able to reuse other content that belongs to your company without quoting or citing it. But check with your Legal department!

Reusing third-party text

Third-party means not you or your company. It can be very tempting to copy text from third-party sources. For example, doesn’t it make sense that you should copy the sentence from the original website that perfectly describes the algorithm you’re using? Why rewrite perfection? Because that text is almost certainly copyrighted, which means you can’t legally copy it. Yes, that’s plagiarism.

How can you tell if text is covered by a copyright on the web? First, look for a copyright on the webpage. If you don’t see a copyright, check for a link to terms, terms of use, or legal. Many websites have a link to terms at the bottom of their pages or somewhere on an About page. Those terms almost always have a statement that prohibits copying content.

Occasionally, the terms allow free use of content as long as you quote it. You might also need to include a citation. For example, Wikipedia allows free reuse of their quoted text with a citation. See their terms about content reuse: https://meta.wikimedia.org/wiki/Terms_of_use#7._Licensing_of_Content

What if you get text from someone else and you’re not sure if that person plagiarized? Try pasting the text in a browser search field and see if you get a match. If you get a match, you should rewrite that text. But no match is not a guarantee that the text is original. If some text seems suspiciously better written than the surrounding text, you should question it.

Reusing third-party images

Even images that allow free use have terms of use. Check the website’s terms to see if you can use the image and whether you need to include a citation.

Using open data sets

You might assume that you can use open data sets any way you want, but they all have terms of use. For example, one of the biggest collections of open data sets, www.data.gov, has an Access & Use section for each data set. The terms of use might require you to include a citation. For example, the UCI Machine Learning Repository requires citations: https://archive.ics.uci.edu/ml/citation_policy.html.

Linking to websites

Some websites have linking policies — who knew? For example, see https://www.data.gov/privacy-policy#linking and https://www.ibm.com/legal (scroll down to the “Linking to this site” section).

The bottom line

You can’t use ANY content from a third party (that is, content that you or your company didn’t create or don’t have a copyright for) unless you follow the terms of use or get permission from the copyright holder.

Additional resources

https://www.copyright.gov/ (this website is actually more fun than you might think. Read about how you can copyright your Elvis sighting.)

Markdown for Juypter notebooks cheatsheet

--

--

Inge Halilovic

I’m a content strategist at IBM. I architect the documentation for watsonx.ai and Cloud Pak for Data as a Service.