Frequently asked questions

Common questions about VERITIR pilots, data readiness, governance, and pricing.

What is this pilot?

A short, structured engagement that assesses whether a research dataset can be productized for foundation-model evaluation, benchmarking, and (where appropriate) licensing.

What do you actually deliver?

A lightweight "readiness pack" that includes:

  • Data quality + completeness assessment
  • Labeling review (explicit + implicit outcome labels)
  • Governance and rights checklist (what's allowed, what isn't)
  • A sanitized sample and a simple schema/metadata template
  • Practical licensing guidelines (plain-English terms and risk tiers)

Who is this for?

Research labs, institutes, data owners, and R&D organizations that have valuable machine or experimental data and want to turn it into a repeatable revenue stream without creating compliance or reputational risk.

What kinds of data are a good fit?

High-signal datasets that are hard to reproduce, such as:

  • Experimental measurements + outcomes (success/failure, yield, binding, stability, performance metrics)
  • Instrument outputs (imaging, spectroscopy, sequencing, microscopy, sensor logs)
  • Structured annotations, scoring rubrics, or decision labels tied to outcomes

How do you handle sensitive or regulated data?

We start with strict guardrails: sensitive data requires clear consent/authorization, strong access controls, and often a restricted "evaluation-only" model. If the rights or governance can't be made clean, it's out of scope for licensing.

Why not just send the data directly to an AI lab?

Because "direct" deals often fail on the basics: unclear rights, mixed sponsorship/collaboration terms, missing provenance, privacy exposure, and inconsistent pricing. A governance layer makes the dataset safe, contractable, and repeatable.

Are you selling patents or inventions?

No. This is about data assets (and their evaluation access / licensing). If patent rights are involved, they're handled separately under the appropriate IP process.

Are you training foundation models on the data?

Not by default. The typical starting point is benchmarking and evaluation (lower risk, high value). Training or fine-tuning rights, if allowed, are treated as a higher tier with tighter controls.

What does "labels" mean here?

Anything that connects inputs to outcomes—e.g., "worked/didn't," yields, scores, phenotypes, performance metrics, pass/fail gates, or expert judgments embedded in spreadsheets and lab workflows.

What does success look like?

A dataset that is:

  • Clearly scoped and documented
  • Legally and ethically usable for defined purposes
  • Structured enough to evaluate models or license with confidence
  • Packaged so buyers can quickly understand value and constraints

How long does a pilot take?

Usually 4–8 weeks, depending on data readiness, governance complexity, and stakeholder availability.

What happens after the pilot?

You can choose one of three paths:

  • Publish an evaluation benchmark (recurring access)
  • Offer controlled dataset licenses (selective buyers/uses)
  • Build a longer-term data product pipeline (multiple datasets, standard terms)

How do you price it?

Pilot pricing is typically fixed-fee. Ongoing commercialization can be structured as subscription, per-evaluation access, licensing fees, and/or revenue share—depending on the asset and constraints.

Have a question not covered here? Reach out directly.