Guidance for data providers integrating into OpenSAFELY🔗
Danger
This page discusses the new OpenSAFELY Data Builder for accessing OpenSAFELY data sources.
Use OpenSAFELY cohort-extractor, unless you are specifically involved in the development or testing of Data Builder.
OpenSAFELY ehrQL and its documentation are still undergoing extensive development. We will announce when ehrQL is ready for general use on the Platform News page.
Warning
This page is not yet complete.
More complete guides on working with OpenSAFELY Contracts for external data providers will eventually be documented more fully here.
Introduction🔗
For a data provider to offer new data for OpenSAFELY, there are two technical requirements:
- The data being offered must satisfy existing OpenSAFELY Contracts.
- The data backend must have an implementation in OpenSAFELY Data Builder.
Implementing existing OpenSAFELY Contracts🔗
An OpenSAFELY backend implements one or more of these specifications. Each specification covers a specific healthcare data domain.
The structure of a Contract is explained in the Contracts introduction.
The Contracts reference provides the existing data specifications for both OpenSAFELY users and data providers.
Refer to those specifications when preparing data tables for integration with OpenSAFELY.
What if the available Contracts are unsuitable for a data provider?🔗
Structuring data in line with OpenSAFELY Contracts makes it easier for researchers using OpenSAFELY to run studies across multiple data backends.
However, data providers may have:
- data in a considerably different structure from existing Contracts
- data not covered at all by existing Contracts
In these cases, a data provider could propose:
- amendments to existing OpenSAFELY Contracts, if appropriate.
- an entirely new OpenSAFELY Contract. This may involve the creation of a new Contract namespaced to your organisation or backend.
Note
We see the development of OpenSAFELY Contracts as an ongoing process. Each discussion that we have with data providers informs the design of the Contracts. We aim to continue to iterate and improve on the designs of Contracts, while providing stability through versioning.
Proposing changes to OpenSAFELY Contracts🔗
Note
OpenSAFELY Contracts are still in an initial design and implementation phase. We already have designs for additional Contracts that will be implemented in future.
If no existing Contract corresponds to the healthcare data domain that your data covers, please contact our technical team to discuss how we can help.
Integrating a data backend into OpenSAFELY ehrQL🔗
OpenSAFELY ehrQL is the software component that researchers use to extract datasets of interest from healthcare data providers in OpenSAFELY. ehrQL is written in Python.
ehrQL abstracts the details of writing queries for researchers away. Researchers only need be concerned with specifying the data they want, not how to access it.
ehrQL integration requirements🔗
Supporting a new backend in ehrQL has two requirements:
- ehrQL must have a query engine compatible with the backend.
- ehrQL must have code describing how tables in the backend satisfy the supported OpenSAFELY Contracts.
Note
Currently, ehrQL has the following query engines:
- Microsoft SQL Server
Support for another data store will require adding a new query engine.
If you are a new data provider, please contact our technical team to discuss integration with ehrQL and OpenSAFELY.
TO BE REPLACED IN FULL DOCS BUILD
This snippet will be replaced in the main docs with the parent file 'includes/glossary.md'