4 The unique health data landscape

Access to health data requires a lot of initial consideration, possibly more than accessing data from other industries. Some key points to consider include:

Knowing the data you’re able to access is critical to understanding what is possible, in addition to understanding the question or problem you are trying to solve
Data access can be one of the more tricky and time consuming aspects of a data science project. Access should be addressed at a very early stage. It can take a lot of time to understand what data is available, agree on access, create a data sharing agreement, and to receive the data (see Accessing data).
Health data is extremely sensitive and has restrictions and controls due to the close linkage with individual lives. Those who have access to data are in a privileged position and should treat this taonga (treasure) with utmost care.
New Zealand’s health system provides a unique individual identifier - the National Health Index (NHI) number - which lets data from different health sources be linked together, in contrast to systems used in other countries which make this linkage more difficult.
Timeliness of health data can be different and this can also significantly impact model evaluation. There can be significant lags between data being collected, processed and refreshed. For example, there is a specific chain of events that lead to updates to national health data collections, and this can take a significant amount of time - often many months. This can also affect planning for evaluations, as a model that goes live today may not be able to be effectively evaluated until several months later.
Healthcare data is often complex and ‘dirty’ (inaccurate, incomplete or inconsistent). When possible, liaise with analysts who work within the organisation to gain a local understanding of the data. Clinical expertise is required to put the information in context
All of us have obligations as Te Tiriti partners to improve equity for Māori, including health equity. Depending on your background, this concept may be less familiar, but it is imperative to understand when working with health data and/or answering health questions.
There are also significant considerations related to ethics, privacy, consent and social licence. The sections that follow cover these areas.

4.1 Finding data

Data for health data science projects is everywhere! There’s no shortage of available data in New Zealand. However, keep the following concepts in mind when you are considering how to find data, or access data you may already be interested in:

Data can only be used for the purpose for which it was collected; any other use is called “secondary purpose” and requires additional consent (see Use and re-use of data section)
Health data used for research purposes needs thought around how the research will be conducted efficiently, ethically, and with privacy and safety front-of-mind.

To understand population trends and context, aggregated data sets and web tools are publicly available via the Ministry of Health and Statistics New Zealand (Stats NZ). Micro data (at the level of the individual) is available to researchers on application to National Collections or Stats NZ for Confidentialised Unit Record Files (CURFs). Dissemination of micro data is in accordance with the Privacy Act, health legislation and contracts and access is strictly controlled according to use.

Useful resources:

Through a review of Aotearoa New Zealand health datasets, PDH produced an interactive and updatable list of data available in New Zealand. This includes data from Figure.nz and many other sources. (Precision Driven Health 2022)
A recent research project which undertook a local algorithm scan produced a whitepaper report on The future of healthcare algorithms in Aotearoa New Zealand (New zealand health sector algorithm scan 2021).
Where can I find health information? (Manatū Hauora): Data sources commonly used when analysing the health of New Zealand populations
Health statistics and data sets (Manatū Hauora)

Precision Driven Health. 2022. “Aotearoa NZ data sources review”. https://data.precisiondrivenhealth.com.

New zealand health sector algorithm scan. 2021. https://precisiondrivenhealth.com/new-zealand-health-sector-algorithm-scan/.

4.1.1 The Integrated Data Infrastructure

Supported by Statistics NZ, the Integrated Data Infrastructure (IDI) is a repository of individual-level data from multiple government sources, able to be linked together, anonymised, and used for research. It’s a massive resource which is accessible through a prescribed process with very high safety and privacy requirements. It may not be possible to derive individual-level insights from IDI data.

The Virtual Health Information Network includes many researchers who are using the IDI. VHIN’s IDI guides can help you understand what centralised data is available and how it could be used in your project.

If you’re interested in what you could do with the IDI, it’s best to make contact with a researcher who already has experience in using this platform. Refer to the list of research using Stats NZ microdata.

4.2 Understand the data

Knowing all the data you’re able to access is critical to understanding what is possible, in addition to understanding the question or problem you are trying to solve.

Tip

Healthcare data is often complex and ‘dirty’ (inaccurate, incomplete or inconsistent). When possible, liaise with analysts who work within the organisation to gain a local understanding of the data.

Health data is often only available after a significant lag time, and with a slow refresh rate. For example, there is a specific chain of events that lead to updates to national health data collections, and this can take a significant amount of time - often many months.

At an early stage, it is valuable to consider:

Data landscape - What data is collected? How is access managed? Does it help address the problem you are trying to solve?
Characteristics - What is the format, type and size of data? When was the data collected? How often will you receive it?
Availability and quality - How much historical data is available, and what is the quality? Use the minimum necessary!
Purpose - What purpose was the data collected for, and does that influence your interpretation?
Consent - Is the use of data covered by existing patient consent?
Data collection, maintenance, publication - Distinguish between data already collected and new data created by the study. How do you plan to maintain and/or publish this data?
Personally identifiable information - Is de-identification required? Who will do this? (See Data identifability)
Data availability - How long will it take to source and are there any reporting or system lags?
Labelling/annotation - Does the project need Human In the Loop (HITL) mechanism for annotating and validating the input and output data?

4.3 Ethics & privacy

Concepts around the legal, privacy, and ethical dimensions of health data projects are often interlinked. Legal perspectives consider the fit of the project with the laws of the data source jurisdiction as well as any legal requirements of the analysis location, if these aren’t the same place. Privacy perspectives are around what data is collected, how it’s collected safely, where it is stored, and its lifecycle. Ethical conduct of a project includes ensuring the risks to participants of data use are outweighed by the benefits of the project.

All research in New Zealand which uses data from humans needs to be undertaken in an ethical manner. In some cases, approval from a registered ethics committee is required before the project can be started. In many cases, the organisation undertaking the research (such as your employer) may also have specific research or ethical approval processes as well.

When dealing with health data, be careful when assessing if a data science project requires ethical approval. Ethical approval is typically required for any evidence-generating studies, or studies which propose changes to current standard of care (before any model is evaluated/validated) and should always be sought prior to accessing patient data. Ethical approval is given through a written application process and gives permission for the research to be conducted by named investigators during a specific time period.

Ethics approvals in New Zealand are provided through committees which have had centralised approval to follow appropriate standards. The ethical standards are set by the National Ethical Standards for Health and Disability Research and Quality Improvement and apply to researchers, health service providers and disability service providers, regardless of whether or not an additional approval process is required. Think of these like a set of best practices for undertaking work with health data or human participants - follow them regardless of whether anyone is requiring you to do so!

Most formal approvals for health data research are provided at a national level by the Health and Disability Ethics Committees (HDEC); in some cases a more localised approval is required instead or in addition (for example, the Auckland Health Research Ethics Committee).

On top of this, hospitals or clinical organisations usually have their own research offices which require separate notification (such as Research Office, Hauora a Toi Bay of Plenty, Te Whatu Ora).

Find out if your study requires HDEC review on the HDEC website.

Many of the questions asked in an ethics application and approval process are important in evaluating the risks and benefits and can indicate if there is social licence for the intended research. This process forces researchers and organisations to consider whether their work has a net benefit for society. The concept of “social license” or societal acceptance for use of health data is a related idea (see Social license - use of health data).

Important

Seek appropriate ethics approvals prior to accessing patient data.

The implications of assuming that ethics is not required can have significant downstream effects on a project, such as reputation issues or delays. HDEC can provide ‘Scope of Review’ services to advise if the research you are doing is considered exempt from requiring an ethics application. Consider seeking written evidence of this as an assurance for stakeholders. See the HDEC website for the current process.

Important

Some ethics applications can take months before an approval is granted, so allow adequate time for this process, factoring in when review meetings are held. It’s also likely that further questions may be asked at this review stage.

Important

Consider if existing patient consent is sufficient to cover use of the data. See Consent below.

Generally all use of administrative health data for research purposes will need to go through the HDEC process. Consent for use of administrative data is mainly around its use for improving the care of that individual within the health service or quality improvement processes (defined as audits or other activities). Unless the work is specifically with business intelligence within a healthcare organisation, you should ask for assurance to confirm if ethical approval from HDEC is required, or not.

In New Zealand, health information follows the Health Information Privacy Code 2020. The rules of this code can be summarised for health agencies as below. These principles are also highly relevant to the use of health data for research purposes.

Only collect health information if you really need it.
Get it straight from the people concerned where possible.
Tell them what you’re going to do with it.
Be considerate when you’re getting it.
Take care of it once you’ve got it.
People can see their health information if they want to.
They can correct it if it’s wrong.
Make sure health information is correct before you use it.
Get rid of it when you’re done with it.
Use it for the purpose you got it.
Only disclose it if you have a good reason.
Make sure that health information sent overseas is adequately protected.
Only assign unique identifiers where permitted.

Privacy Impact Assessments (PIAs) should also be conducted at an early stage to identify potential data protection risks on the data of the individuals included. Measures should be adopted to eliminate or mitigate risks. The Office of the Privacy Commissioner offers some guidance on PIAs.

Useful resources:

4.3.2 Use and re-use of data

‘Use’ of data relates to using data for the purpose for which it was consented and collected. Re-use of data (or secondary use) is when you use data collected for another purpose.

When you re-use data, you should be mindful of its purpose, coverage, bias, timeliness, and applicability to the secondary use. For instance, the National Minimum Data Set is gathered for policy formation, performance monitoring, research and review. It may be useful for understanding hospitalisations, but does not provide a complete picture of an individual’s health journey.

4.4 Data sovereignty

Data sovereignty refers to the understanding that data is subject to the laws of the nation within which it is collected and stored. In New Zealand, there is also a focus on where the data is stored and processed. Data agreements should take care to address these points so they are clear to all parties. Data handling and ethical dimensions of the project also need to ensure data sovereignty is at front of mind.

Māori data sovereignty recognises that Māori data should be subject to Māori governance. Māori data sovereignty supports tribal sovereignty and the realisation of Māori and iwi aspirations. Māori must be included in any work about Māori data. An equity lens is required and ideally should include Māori in the research team as well as in external review/advisor roles.

Useful resources:

Te Mana Raraunga (Māori Data Sovereignty Network) has helpful resources and guidance.
The Health Research Council of NZ (HRC) also produces Guidelines for Researchers on Health Research Involving Māori.
NZ government’s Principles for the safe and effective use of data and analytics (Government Chief Data Steward and Office of the Privacy Commissioner 2018)

Government Chief Data Steward and Office of the Privacy Commissioner. 2018. “Principles for the safe and effective use of data and analytics”. https://www.stats.govt.nz/assets/Uploads/Data-leadership-fact-sheets/Principles-safe-and-effective-data-and-analytics-May-2018.pdf.

4.1 Finding data

4.1.1 The Integrated Data Infrastructure

4.2 Understand the data

4.3 Ethics & privacy

4.3.1 Consent

4.3.2 Use and re-use of data

4.3.3 Social license - use of health data

4.4 Data sovereignty