SoilCensus ← Back to Home

Data Licensing

Effective Date: March 2026

The SoilCensus aggregate dataset is a growing, georeferenced collection of soil microbiome data contributed by citizen scientists across the United States. This page explains what the dataset contains, how it is licensed, and how researchers and institutions can access it.

1. What the Dataset Contains

The SoilCensus aggregate dataset includes the following for each sample:

  • GPS coordinates of the sample collection site
  • Land use classification (e.g., residential garden, agricultural field, park, forest)
  • Collection date and season
  • Microbial species identification — a complete list of taxa detected via DNA sequencing
  • Diversity metrics — alpha diversity (within-sample richness) and beta diversity (cross-sample comparisons)
  • Functional group analysis — relative abundance of nitrogen fixers, decomposers, mycorrhizal fungi, carbon cyclers, and other functional categories
  • Pathogen detection — presence and relative abundance of known soil pathogens

The aggregate dataset does not contain any personally identifiable information. No names, email addresses, mailing addresses, or account information are included. Sample locations are GPS coordinates only — they are not linked to individual identities.

2. Licensing Model

The SoilCensus dataset is available under two access tiers:

Research License. Available to accredited universities, government agencies, and nonprofit research institutions. The Research License grants access to the full aggregate dataset for non-commercial research purposes including academic publication. Pricing is scaled to institution size and may be subsidized or waived for qualifying projects. Contact us to discuss your research needs.

Commercial License. Available to private companies, consultancies, and commercial organizations that wish to use the dataset for product development, commercial analysis, or integration into proprietary platforms. Commercial licensing terms and pricing are negotiated on a case-by-case basis.

3. What You May Do with Licensed Data

Under both license types, you may:

  • Analyze the data for your licensed purposes
  • Publish findings, visualizations, and derivative analyses based on the data
  • Combine the data with other datasets for research or analysis
  • Store the data on your institutional servers for the duration of your license

4. What You May Not Do

  • Redistribute the raw dataset. You may not share, resell, sublicense, or make the raw dataset available to third parties without written permission from Starrfly Labs LLC.
  • Attempt to re-identify contributors. You may not attempt to link sample locations to individual contributors, addresses, or identities.
  • Misrepresent the source. Publications and products that use SoilCensus data must credit "SoilCensus, a product of Starrfly Labs LLC" as the data source.

5. API Access

Licensed institutions may access the dataset through the SoilCensus API, which provides programmatic access to sample data, spatial queries, time-series filtering, and diversity metrics. API documentation and credentials are provided upon license activation.

API rate limits and data refresh schedules are specified in your license agreement. The dataset is updated on a rolling basis as new samples are processed.

6. Attribution

Any published work, analysis, or product that incorporates SoilCensus data must include the following attribution:

"Data provided by SoilCensus (soilcensus.com), a product of Starrfly Labs LLC. The SoilCensus dataset is generated through citizen science contributions and crowdsourced soil sampling."

7. Data Quality and Limitations

SoilCensus data is collected by citizen scientists following standardized but simplified collection protocols. While we take steps to ensure data quality — including GPS verification, sample integrity checks, and lab processing standards — the dataset has inherent limitations:

  • Sample locations are self-reported and GPS-captured; positional accuracy may vary
  • Sampling depth and technique may vary slightly between contributors
  • Geographic coverage depends on contributor participation and is not uniform
  • Taxonomic identification depends on current reference databases, which are continually updated

We provide the dataset "as is" without warranty of fitness for any particular analytical purpose. Licensees are responsible for evaluating data quality within the context of their own research.

8. Contributor Privacy

The data licensing program is designed to ensure that individual contributors cannot be identified from the aggregate dataset. We do not include any personally identifiable information in licensed data. Our Privacy Policy describes in detail how contributor data is handled and protected.

9. Pricing and Terms

Licensing fees vary by institution type, dataset scope, and intended use. We offer:

  • Subsidized or no-cost access for qualifying academic and nonprofit research
  • Pilot access for institutions evaluating the dataset before committing to a full license
  • Custom agreements for large-scale commercial applications

10. Contact

To discuss data licensing, request a sample dataset, or ask questions about the program, contact us at:

Starrfly Labs LLC
hello@soilcensus.com

We're particularly interested in conversations with researchers, land managers, university extension offices, conservation organizations, and anyone who thinks citizen-generated soil data can help solve a problem they're working on.

SoilCensus · hello@soilcensus.com
A product of Starrfly Labs LLC
  • Privacy Policy
  • Terms of Use
  • Data Licensing
  • Built by Starrfly Labs ↗