Creating citable Code

From HPC Wiki
Jump to navigation Jump to search

Citability and reproducibility of scientific results have always been an integral part of research. More traditional aspects of research, including theoretical verification and experimental validation, have a long history of peer-reviewed publications, which enable referencing and reproduction by other parties. However, this often does not apply to any kind of self-written code, which is more and more common in every parts of research, from small evaluation scripts to whole software packages. Therefore, this page shall give a small overview of how to make your code citable and how to contribute to its reproducibility.


Benefits of citable Code

Naturally, an unavoidable part of creating citable code is making your code publicly available. Therefore, the first step should always be to inform yourself whether your chair or institute agrees to an open publication of the code. This does not necessitate free use for other parties and you should look into all possible licenses, which could be suitable for your code. While the availability of your code opens it to criticism by your peers, it will contribute to the advancement of science in several ways[1][2][3]:

  • transparency on how published results were achieved
  • significant improvement of reproducibility
  • fostering of collaborations among researchers across institutions
  • makes software sustainable (through prolonged support / updates)
  • peer-review and validation
  • proper attribution and credit for the code development
  • depending on the license, other researchers are allowed to use the code for their own research including modifications and redistribution

Making Code citable

As mentioned above, the first step is to make your software publicly available on a platform of your choice. Preferable platforms contain some type of revision control, as in GitLab or GitHub, which is also beneficial to the overall development of your code. The publication should come with a clear license agreement (not just "open access" or "open source"), preferably it should include a license file and a description of the preferred way of referencing / citing the code. Generally any Creative Commons license should be avoided for code and if the code is intended to be free to use, then the GNU GPL can be a good choice. Once the code has been published with an appropriate license, its citability can be further improved upon through the addition of meta data. These include [4] [5]:

  • code version
  • unique persistent identifier like a DOI (digital object identifier)
  • code description
  • code language and utilized code standard
  • code requirements and dependencies (e.g., libraries or hardware)
  • small data set or example to confirm correct functionality
  • related publications

One way to get a DOI for your code is to use the GitHub integration of Zenodo. This of course requires your code to be published on GitHub. Please keep in mind that the DOI is only valid for one particular version of your code.

Links and more Information

References

  1. https://www.researchsoft.org/guidelines/
  2. https://datascience.nih.gov/tools-and-analytics/best-practices-for-sharing-research-software-faq
  3. DS Katz et al. “Recognizing the value of software: a software citation guide [version 2; peer review:2 approved]”. In: F1000Research 9.1257 (2021). doi: 10.12688/f1000research.26932.2.
  4. Greg Wilson et al. “Good enough practices in scientific computing”. In: PLOS Computational Biology 13 (June 2017), pp. 1–20. doi: 10.1371/journal.pcbi.1005510.
  5. Neil P. Chue Hong et al. Software Citation Checklist for Developers. Version 0.9.0. Oct. 2019. doi:10.5281/zenodo.3482769.