Computational Notebooks and Reproducible Biological Research

Last Updated May 28, 2026

Computational notebooks have become one of the central instruments of reproducible biological research because they connect data, code, explanation, visualization, and scientific interpretation in a single auditable research object. In the life sciences, where research workflows increasingly combine genomics, transcriptomics, microscopy, ecology, epidemiology, environmental monitoring, clinical measurements, statistical modeling, and machine learning, notebooks provide a practical way to show not only the result of an analysis, but the path by which that result was produced.

Yet notebooks are not automatically reproducible. A notebook can be transparent, executable, annotated, versioned, and reusable. It can also be fragile, nonlinear, undocumented, manually edited, dependent on hidden files, polluted by execution-order errors, or impossible to rerun outside the original researcher’s machine. The scientific value of a notebook therefore depends on how it is designed, documented, tested, shared, and connected to data provenance.

Abstract scientific illustration of computational notebooks and reproducible biological research showing biological samples, notebook-like workflow layers, provenance trails, validation pathways, metadata structures, ecological data streams, and reproducible research outputs without text or labels.
Computational notebooks help biological researchers connect data, code, provenance, validation, visualization, and interpretation into reproducible scientific workflows.

This article introduces computational notebooks as research infrastructure for biology. It explains how notebooks support reproducible analysis, where they fail, how they should be organized, and how they connect to FAIR data, literate programming, workflow documentation, version control, computational environments, and responsible biological interpretation.

The article is written for biologists, computational biologists, bioinformaticians, ecologists, marine biologists, biodiversity researchers, medical and environmental-health scientists, data engineers, scientific software developers, and research groups building reproducible life-science workflows. It emphasizes notebooks not as informal scratchpads, but as structured scientific records that can support transparency, review, reuse, teaching, and computational rigor.

Why computational notebooks matter in biology

Computational notebooks matter because biological research increasingly depends on workflows that are too complex to be communicated by prose alone. A published figure may summarize differential expression, species distribution, microscopy segmentation, pathway enrichment, microbial diversity, survival analysis, ecological forecasting, protein prediction, or machine-learning classification. But the figure does not reveal every data filter, transformation, parameter choice, normalization step, random seed, software version, sample exclusion, metadata join, model decision, or validation check that produced it.

A computational notebook can make these steps visible. It can combine narrative explanation, executable code, intermediate outputs, diagnostic plots, tables, assumptions, and interpretation. In this sense, the notebook is both a research instrument and a communication medium. It allows a scientist to say: this is what was done, this is why it was done, this is what happened at each step, and this is how the conclusion should be interpreted.

This is especially important in biology because biological data are often layered and conditional. Genomic data require reference genomes, annotation versions, quality filters, and alignment decisions. Microscopy data require segmentation thresholds, illumination correction, and image metadata. Ecological data require site identifiers, sampling effort, detection probability, spatial coordinates, and temporal context. Clinical and environmental-health data require privacy protection, cohort definitions, missingness handling, and governance constraints.

A notebook can help preserve this context. But only if it is written as a reproducible scientific document rather than a private workspace.

Back to top ↑

Notebooks as literate scientific workflows

Computational notebooks belong to the broader tradition of literate programming and dynamic documents. The basic idea is that code and explanation should appear together so that the reasoning behind a computation is visible. In biological research, this matters because analysis is not merely execution. It is interpretation.

A well-designed biological notebook usually contains several layers:

  • Scientific question: the biological or ecological problem being investigated.
  • Data description: source, structure, metadata, identifiers, units, and limitations.
  • Preprocessing: cleaning, filtering, normalization, joins, transformations, and exclusions.
  • Analysis: statistical models, simulations, machine-learning workflows, visualizations, or biological summaries.
  • Diagnostics: quality checks, missingness checks, validation metrics, residuals, sensitivity tests, and sanity checks.
  • Interpretation: biological meaning, uncertainty, constraints, and limitations.
  • Provenance: software versions, environment details, data hashes, random seeds, and output artifacts.

This structure turns a notebook into more than a container for code. It becomes a research narrative with executable evidence. The best notebooks can be read by humans, rerun by machines, inspected by collaborators, adapted by future researchers, and connected to publications, repositories, teaching materials, and computational pipelines.

The notebook should not merely show a final result. It should preserve the chain of reasoning.

Back to top ↑

Biological research use cases

Computational notebooks are useful across many areas of biology because they can preserve both analytical steps and interpretive context. Their value is especially strong where data processing choices influence biological conclusions.

Genomics and bioinformatics

In genomics, notebooks can document quality-control summaries, read-depth distributions, variant filtering, gene annotation, expression normalization, differential expression, pathway enrichment, sequence-feature engineering, and exploratory visualization. They are especially useful for communicating why specific thresholds or annotations were selected.

Single-cell and spatial biology

Single-cell and spatial workflows often involve complex preprocessing: filtering cells, removing low-quality observations, normalizing counts, reducing dimensionality, clustering, assigning cell types, integrating batches, and visualizing spatial relationships. Notebooks help preserve this sequence and make interpretation more transparent.

Microscopy and image analysis

Image-analysis notebooks can document segmentation parameters, feature extraction, object counting, fluorescence intensity measurement, morphological summaries, quality-control thumbnails, and model outputs. They are useful when biological interpretation depends heavily on image-processing choices.

Ecology and biodiversity science

Ecological notebooks can document species-observation records, sampling effort, site metadata, detection/non-detection matrices, diversity indices, occupancy models, remote-sensing covariates, climate overlays, and reproducible maps. They are particularly important when results depend on spatial and temporal context.

Marine, freshwater, and environmental biology

Notebooks can integrate water-quality data, sensor streams, satellite observations, plankton imaging, acoustic monitoring, species records, and environmental covariates. They help researchers track how raw observations become interpretable indicators of ecosystem condition.

Epidemiology and environmental health

Notebooks can document case definitions, exposure windows, cohort filtering, surveillance data, model assumptions, sensitivity analyses, and uncertainty intervals. Because these analyses may influence public understanding or policy, provenance and transparency are especially important.

Machine learning in biology

Machine-learning notebooks can document train-test splits, feature engineering, model training, validation, calibration, error analysis, subgroup analysis, and interpretation. They are useful for exploration, but high-stakes models usually require additional pipeline engineering, testing, audit logs, and external validation.

Back to top ↑

Reproducibility versus transparency

Transparency and reproducibility are related but not identical. A transparent notebook explains what was done. A reproducible notebook can be rerun to produce the same or equivalent results. A notebook may be easy to read but impossible to execute. Another may execute successfully but fail to explain its scientific assumptions.

For biological research, both are needed.

A transparent notebook should explain the biological question, data source, sample structure, preprocessing choices, model assumptions, and interpretation. It should not leave readers guessing why a threshold was chosen, why samples were excluded, why a transformation was applied, or why a result is biologically meaningful.

A reproducible notebook should run from a clean state. It should not depend on hidden variables, manual steps, local absolute paths, private files, unrecorded package versions, or cells that must be executed in a mysterious order. Ideally, it should state its computational environment and create outputs from source data with minimal manual intervention.

Reproducibility also has degrees. A notebook may be computationally reproducible if it can regenerate outputs from the same data and software. It may be analytically reproducible if another researcher can understand and rerun the analysis. It may be scientifically reproducible if independent data support the same biological conclusion. Notebooks are powerful, but they mainly support computational and analytical reproducibility. They do not replace independent biological replication.

Back to top ↑

Data provenance and metadata

Data provenance records where data came from, how they were processed, and how they changed. Metadata describe the data: sample identifiers, species, tissue type, site, time, instrument, units, measurement method, sequencing platform, imaging protocol, environmental conditions, batch, operator, and other context.

In biological research, provenance and metadata are not administrative extras. They are part of the scientific evidence. A gene-expression matrix without sample metadata is difficult to interpret. A species-observation table without sampling effort can be misleading. A microscopy dataset without acquisition settings may be impossible to compare. A clinical dataset without cohort definitions and missingness documentation may be scientifically fragile.

A reproducible notebook should therefore begin by describing data structure and provenance. It should make clear which files are raw, which are processed, which are generated outputs, and which are external references. It should preserve identifiers and avoid silent transformations that make records difficult to trace.

A useful notebook often includes:

  • file paths relative to the project root;
  • data dictionaries;
  • units and measurement definitions;
  • source descriptions;
  • checksums or file hashes;
  • software versions;
  • date of data access or extraction;
  • license or reuse conditions;
  • privacy or governance constraints when relevant.

Provenance makes biological computation auditable.

Back to top ↑

Execution order, environments, and hidden state

One of the greatest risks in computational notebooks is hidden state. A notebook may appear correct because a variable exists in memory from an earlier execution, even though the cell defining it has been deleted or moved. A plot may reflect an old version of a dataset. A table may come from a cell executed out of order. A result may depend on a random seed that was not recorded.

Execution-order problems are common because notebooks support interactive exploration. This flexibility is valuable during discovery, but dangerous when notebooks are presented as scientific records.

Good practice includes:

  • restarting the kernel and running all cells before sharing;
  • keeping cells in logical order;
  • removing abandoned exploratory code;
  • recording random seeds;
  • using relative paths rather than local absolute paths;
  • saving generated outputs to known locations;
  • recording package versions;
  • separating reusable functions into scripts or modules;
  • testing that the notebook runs from a clean checkout.

Computational environments are equally important. A notebook that ran under one version of Python, R, Julia, Bioconductor, NumPy, pandas, scikit-learn, tidyverse, Seurat, DESeq2, or a geospatial package may fail or change results under another. Environment files, container recipes, package lock files, or documented installation steps help preserve reproducibility.

A notebook is only as reproducible as the environment that supports it.

Back to top ↑

Notebooks, pipelines, and production workflows

Notebooks are excellent for exploration, explanation, teaching, and transparent analysis. They are less ideal as the only container for large production workflows. Biological research often requires repeated processing of many samples, strict quality control, automated reports, scheduled runs, high-performance computing, containerized execution, or multi-step pipelines.

For these cases, notebooks should be connected to scripts, modules, and workflow systems. A good pattern is to use notebooks for explanation and review, while placing reusable logic in versioned code files. The notebook can call tested functions, show intermediate results, produce figures, and document interpretation. The pipeline can handle scale, automation, and formal execution.

For example, a genomics project might use a workflow manager for alignment, quantification, and quality control, then use notebooks for exploratory analysis, differential expression summaries, pathway interpretation, and publication figures. An ecological monitoring project might use scripts to ingest sensor data and notebooks to review trends, missingness, site-level summaries, and model diagnostics. A machine-learning project might use notebooks for exploratory model comparison, while training and evaluation are executed by tested scripts.

Notebooks should not be forced to do everything. Their strongest role is to connect computation and scientific explanation.

Back to top ↑

Version control and collaboration

Version control is essential for reproducible notebooks. It allows researchers to track changes, recover prior versions, review edits, and connect notebook outputs to specific states of code and data.

Notebook files can be difficult to review because they are often stored as JSON and may include large output cells, embedded images, execution counts, and metadata changes. Good practice includes clearing unnecessary outputs before committing, using clean naming conventions, storing figures and tables as separate artifacts when appropriate, and keeping notebooks focused.

Collaborative notebook projects benefit from a clear repository structure:

  • data/raw/ for immutable source data or documented placeholders;
  • data/processed/ for derived data;
  • notebooks/ for explanatory analysis;
  • python/, r/, or src/ for reusable code;
  • outputs/ for figures, tables, and reports;
  • docs/ for methodology and reproducibility notes;
  • environment.yml, requirements.txt, renv.lock, or container files for computational environments.

A repository should also include a README that explains what the project does, how to run it, what data are included, what data are excluded, and what limitations apply.

Back to top ↑

Quality control, validation, and review

A notebook should contain quality control. In biological research, this may include sample counts, missingness summaries, outlier checks, batch distributions, sequencing depth summaries, image-quality metrics, site-level coverage, class balance, taxonomic completeness, coordinate checks, unit checks, or validation against expected ranges.

Quality control should happen early and repeatedly. A notebook that jumps directly from loading data to final conclusions is difficult to trust. Readers need to see whether the input data behave plausibly.

Validation depends on the research domain. A machine-learning notebook should show training and validation separation, external validation when possible, calibration checks, and error analysis. A statistical notebook should show assumptions, residuals, uncertainty intervals, and sensitivity analysis. An ecological notebook should show sampling effort, spatial coverage, temporal coverage, and detection limitations. A genomics notebook should show quality metrics, normalization diagnostics, batch effects, and annotation versions.

Peer review of notebooks should include both scientific and computational review. Reviewers should ask: Can the notebook run? Are data available or clearly described? Are assumptions visible? Are outputs generated from code? Are key decisions justified? Are conclusions limited to what the analysis supports?

A reproducible notebook invites inspection.

Back to top ↑

FAIR data and computational reuse

The FAIR principles — findability, accessibility, interoperability, and reusability — apply not only to biological data, but also to code, workflows, computational environments, and notebooks. A notebook is more reusable when it has a clear title, identifiers, metadata, documented dependencies, stable data links, licensing information, and explicit instructions.

Findability requires that notebooks and associated data be locatable. Accessibility requires that users know how to obtain the notebook, data, and dependencies. Interoperability requires formats and metadata that can be used across tools. Reusability requires documentation, provenance, licenses, and context sufficient for future researchers to understand the work.

For biology, reuse also requires domain clarity. A notebook should not merely say “load data.” It should say what the data represent: species observations, RNA-seq counts, microscopy measurements, clinical lab values, environmental sensor readings, pathway annotations, or simulated examples. It should describe units, identifiers, biological scope, limitations, and ethical constraints.

Computational reuse is strongest when notebooks are paired with repositories, environment files, test data, scripts, documentation, and clear assumptions.

Back to top ↑

Mathematical lens: reproducibility and workflow integrity

Several simple mathematical ideas help clarify reproducible notebook practice. These expressions do not replace scientific judgment, but they help make workflow dependencies, provenance, completeness, execution failure, and output drift easier to reason about.

Analysis as a function

\[
Y = f(D, \theta, E)
\]

Interpretation: Analytical output \(Y\) is a function of dataset \(D\), analysis parameters \(\theta\), and computational environment \(E\). Reproducibility requires documenting all three.

Reproducibility condition

\[
f(D, \theta, E) \approx f(D, \theta, E)
\]

Interpretation: A reproducible workflow should generate the same or acceptably equivalent outputs when rerun under the same documented conditions. Approximate equivalence may be necessary when stochastic algorithms, floating-point differences, or external dependencies are involved.

Provenance chain

\[
D_{\text{raw}} \rightarrow D_{\text{clean}} \rightarrow M \rightarrow Y
\]

Interpretation: Raw data are transformed into clean data, analyzed by a method or model \(M\), and converted into outputs \(Y\). A notebook should make this chain visible and auditable.

Checksum identity

\[
h(D_1)=h(D_2)
\]

Interpretation: A cryptographic hash \(h\) can verify that two files are identical. Checksums help researchers confirm that data files have not changed silently.

Workflow completeness

\[
C=\frac{n_{\text{documented steps}}}{n_{\text{required steps}}}
\]

Interpretation: Workflow completeness compares documented steps with required steps. This is not a universal metric, but it illustrates an important principle: hidden steps reduce reproducibility.

Execution failure rate

\[
F=\frac{n_{\text{failed cells}}}{n_{\text{executed cells}}}
\]

Interpretation: Execution failure rate records the fraction of cells that fail during execution. A notebook intended for reuse should have a failure rate of zero under its documented environment.

Output drift

\[
\Delta Y = Y_{\text{rerun}} – Y_{\text{reference}}
\]

Interpretation: Output drift compares rerun output with a reference output. Drift should be explained by stochasticity, dependency changes, data changes, or documented tolerances.

Back to top ↑

Python and R workflows

The following compact examples demonstrate notebook-adjacent reproducibility checks for biological research. The full GitHub repository expands these examples into a broader full-stack workflow with Python, R, Julia, Fortran, Rust, Go, C, C++, SQL, notebooks, data files, documentation, provenance tables, and reproducibility notes.

Python example: notebook provenance manifest

"""
Create a simple provenance manifest for a biological notebook workflow.

This example records input files, output files, checksums, and execution notes.
It is intentionally compact for article display.
"""

from pathlib import Path
import hashlib
import pandas as pd

project_dir = Path(".")
data_dir = project_dir / "data"
output_dir = project_dir / "outputs"
output_dir.mkdir(exist_ok=True)

def sha256_file(path: Path) -> str:
    """Return a SHA-256 checksum for a file."""
    digest = hashlib.sha256()

    with path.open("rb") as handle:
        for block in iter(lambda: handle.read(65536), b""):
            digest.update(block)

    return digest.hexdigest()

records = []

for path in sorted(data_dir.glob("*.csv")):
    records.append(
        {
            "artifact": path.name,
            "artifact_type": "input_data",
            "relative_path": str(path),
            "sha256": sha256_file(path),
            "note": "Synthetic biological workflow input",
        }
    )

manifest = pd.DataFrame(records)
manifest.to_csv(output_dir / "notebook_provenance_manifest.csv", index=False)

print(manifest.to_string(index=False))

Python example: reproducibility check for sample metadata

"""
Perform basic reproducibility checks on biological sample metadata.

The checks are deliberately simple:
- sample identifiers must be unique
- required columns must exist
- missing values must be counted
- expected biological groups must be present
"""

import pandas as pd

samples = pd.DataFrame(
    {
        "sample_id": ["BIO001", "BIO002", "BIO003", "BIO004"],
        "species": ["Danio rerio", "Danio rerio", "Danio rerio", "Danio rerio"],
        "treatment": ["control", "control", "exposed", "exposed"],
        "batch": ["B1", "B1", "B2", "B2"],
    }
)

required_columns = {"sample_id", "species", "treatment", "batch"}
missing_columns = required_columns.difference(samples.columns)

if missing_columns:
    raise ValueError(f"Missing required columns: {sorted(missing_columns)}")

if not samples["sample_id"].is_unique:
    raise ValueError("Sample identifiers must be unique.")

missingness = samples.isna().sum().reset_index()
missingness.columns = ["column", "n_missing"]

group_counts = samples.groupby(["treatment", "batch"]).size().reset_index(name="n_samples")

print("Missingness")
print(missingness.to_string(index=False))

print("\nGroup counts")
print(group_counts.to_string(index=False))

R example: reproducible biological summary table

# Compact R example for a reproducible biological notebook summary.

samples <- data.frame(
  sample_id = c("BIO001", "BIO002", "BIO003", "BIO004"),
  species = c("Danio rerio", "Danio rerio", "Danio rerio", "Danio rerio"),
  treatment = c("control", "control", "exposed", "exposed"),
  batch = c("B1", "B1", "B2", "B2"),
  response_value = c(1.2, 1.4, 2.1, 2.4)
)

required_columns <- c("sample_id", "species", "treatment", "batch", "response_value")
missing_columns <- setdiff(required_columns, names(samples))

if (length(missing_columns) > 0) {
  stop(paste("Missing required columns:", paste(missing_columns, collapse = ", ")))
}

if (any(duplicated(samples$sample_id))) {
  stop("Sample identifiers must be unique.")
}

summary_table <- aggregate(
  response_value ~ treatment,
  data = samples,
  FUN = function(x) c(mean = mean(x), sd = sd(x), n = length(x))
)

print(summary_table)

R example: session information for reproducibility

# Record session information so that future readers can inspect
# the R version, platform, and attached packages used in the workflow.

session_information <- capture.output(sessionInfo())

writeLines(
  session_information,
  con = file.path("outputs", "r_session_info.txt")
)

cat("Session information written to outputs/r_session_info.txt\n")

Back to top ↑

GitHub repository

The companion repository provides a reproducible technical scaffold for the article’s computational examples, including notebook provenance manifests, sample-metadata validation, biological summary tables, session-information capture, data files, documentation, provenance tables, and reproducibility notes.

Back to top ↑

Limits, ethics, and responsible interpretation

Computational notebooks can make biological research more transparent, but they can also create false confidence. A notebook that runs successfully may still contain flawed assumptions, biased data, inappropriate models, hidden confounding, unrepresentative samples, or unsupported biological conclusions. Executability is not validity.

Notebooks also raise ethical and governance issues. Human-derived biological data may require de-identification, access controls, consent constraints, institutional review, and careful handling of small cell counts or rare disease categories. Ecological data may include sensitive species locations, endangered habitats, Indigenous knowledge, or conservation-risk information. Environmental-health data may affect communities already burdened by pollution, disease, or weak institutional protection.

Responsible notebook practice should therefore distinguish between open reproducibility and appropriate access control. Not all data should be public. But even restricted data can be accompanied by metadata, synthetic examples, code templates, provenance documentation, and clear descriptions of methods.

A responsible notebook should also avoid overstating conclusions. Exploratory analysis should be labeled exploratory. Simulated data should be labeled simulated. Machine-learning outputs should not be presented as mechanism. Statistical associations should not be presented as causality without appropriate design and evidence.

The goal is not merely to share code. The goal is to strengthen scientific accountability.

Back to top ↑

Why this matters now

Biological research is becoming more computational, more collaborative, and more dependent on reusable workflows. Scientific claims increasingly emerge from chains of data processing, code execution, model fitting, visualization, and interpretation. If those chains are invisible, research becomes harder to verify, teach, extend, or correct.

Computational notebooks offer one practical response. They allow research groups to document how biological evidence moves from raw observation to analytical conclusion. They support training, collaboration, publication, review, and reuse. They can help connect wet-lab biology, field biology, clinical research, ecology, and computational science.

But notebooks must mature from informal analysis documents into reproducible research objects. That means cleaner structure, better provenance, stronger environment documentation, version control, quality checks, and explicit limits.

The future of biological research will not depend only on more data or more powerful models. It will depend on whether scientific workflows can be trusted.

Back to top ↑

Conclusion

Computational notebooks are among the most useful tools for reproducible biological research because they connect code, data, explanation, and interpretation. They make workflows visible. They help researchers communicate not only what they found, but how they found it.

Their value, however, depends on discipline. A notebook should run from a clean state, document its data, preserve provenance, record its environment, explain assumptions, show quality checks, and distinguish computation from biological truth. It should be written for future readers, including collaborators, reviewers, students, and the researcher’s own future self.

In the life sciences, reproducibility is not a purely technical concern. It is part of scientific credibility. Computational notebooks can strengthen that credibility when they are designed as transparent, executable, and responsibly interpreted research records.

Back to top ↑

Further reading

References

  • AGU Data Leadership Program (2021) Guidance for AGU Authors: Jupyter Notebooks. American Geophysical Union. Available at: https://agu-data.github.io/resources/jupyter-notebooks-guidance
  • Beg, M., Taka, J., Kluyver, T., Konovalov, A., Ragan-Kelley, M., Thiéry, N.M. and Fangohr, H. (2021) ‘Using Jupyter for reproducible scientific workflows’, Computing in Science & Engineering, 23(2), pp. 36–46. Available at: https://ieeexplore.ieee.org/document/9325550
  • Gil, Y., David, C.H., Demir, I., Essawy, B.T., Fulweiler, R.W., Goodall, J.L., Karlstrom, L., Lee, H., Mills, H.J., Oh, J.H., Pierce, S.A., Pope, A., Tzeng, M.W., Villamizar, S.R. and Yu, X. (2016) ‘Toward the geoscience paper of the future: Best practices for documenting and sharing research from data to software to provenance’, Earth and Space Science, 3(10), pp. 388–415. Available at: https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2015EA000136
  • Grüning, B., Chilton, J., Köster, J., Dale, R., Soranzo, N., van den Beek, M., Goecks, J., Backofen, R., Nekrutenko, A. and Taylor, J. (2018) ‘Practical computational reproducibility in the life sciences’, Cell Systems, 6(6), pp. 631–635. Available at: https://www.cell.com/cell-systems/fulltext/S2405-4712(18)30186-3
  • Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B.E., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J.B., Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S. and Willing, C. (2016) ‘Jupyter Notebooks: A publishing format for reproducible computational workflows’, in Loizides, F. and Schmidt, B. (eds.) Positioning and Power in Academic Publishing: Players, Agents and Agendas. Amsterdam: IOS Press, pp. 87–90. Available at: https://ebooks.iospress.nl/publication/42900
  • Knuth, D.E. (1984) ‘Literate programming’, The Computer Journal, 27(2), pp. 97–111. Available at: https://academic.oup.com/comjnl/article/27/2/97/343244
  • Project Jupyter (n.d.) Project Jupyter Documentation. Available at: https://docs.jupyter.org/
  • Quarto (n.d.) Quarto Documentation. Available at: https://quarto.org/docs/
  • Ragan-Kelley, M., Willing, C., Akici, F., Van Brett, M., Bussonnier, M., Georges, A., Granger, B., Hamrick, J., Kelley, K., Pacer, M., Page, E., Pérez, F., Ragan-Kelley, B. and Sundell, E. (2018) ‘Binder 2.0: Reproducible, interactive, sharable environments for science at scale’, Proceedings of the 17th Python in Science Conference, pp. 113–120. Available at: https://conference.scipy.org/proceedings/scipy2018/project_jupyter.html
  • Rule, A., Birmingham, A., Zuniga, C., Altintas, I., Huang, S.C., Knight, R., Moshiri, N., Nguyen, M.H., Rosenthal, S.B., Pérez, F. and Rose, P.W. (2019) ‘Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks’, PLoS Computational Biology, 15(7), e1007007. Available at: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007007
  • Sandve, G.K., Nekrutenko, A., Taylor, J. and Hovig, E. (2013) ‘Ten simple rules for reproducible computational research’, PLoS Computational Biology, 9(10), e1003285. Available at: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285
  • Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A., et al. (2016) ‘The FAIR Guiding Principles for scientific data management and stewardship’, Scientific Data, 3, 160018. Available at: https://www.nature.com/articles/sdata201618
  • Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L. and Teal, T.K. (2017) ‘Good enough practices in scientific computing’, PLoS Computational Biology, 13(6), e1005510. Available at: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510

Back to top ↑

Scroll to Top