4 The unique health data landscape
Access to health data requires a lot of initial consideration, possibly more than accessing data from other industries. Some key points to consider include:
Knowing the data you’re able to access is critical to understanding what is possible, in addition to understanding the question or problem you are trying to solve
Data access can be one of the more tricky and time consuming aspects of a data science project. Access should be addressed at a very early stage. It can take a lot of time to understand what data is available, agree on access, create a data sharing agreement, and to receive the data (see Accessing data).
Health data is extremely sensitive and has restrictions and controls due to the close linkage with individual lives. Those who have access to data are in a privileged position and should treat this taonga (treasure) with utmost care.
New Zealand’s health system provides a unique individual identifier - the National Health Index (NHI) number - which lets data from different health sources be linked together, in contrast to systems used in other countries which make this linkage more difficult.
Timeliness of health data can be different and this can also significantly impact model evaluation. There can be significant lags between data being collected, processed and refreshed. For example, there is a specific chain of events that lead to updates to national health data collections, and this can take a significant amount of time - often many months. This can also affect planning for evaluations, as a model that goes live today may not be able to be effectively evaluated until several months later.
Healthcare data is often complex and ‘dirty’ (inaccurate, incomplete or inconsistent). When possible, liaise with analysts who work within the organisation to gain a local understanding of the data. Clinical expertise is required to put the information in context
All of us have obligations as Te Tiriti partners to improve equity for Māori, including health equity. Depending on your background, this concept may be less familiar, but it is imperative to understand when working with health data and/or answering health questions.
There are also significant considerations related to ethics, privacy, consent and social licence. The sections that follow cover these areas.
4.1 Finding data
Data for health data science projects is everywhere! There’s no shortage of available data in New Zealand. However, keep the following concepts in mind when you are considering how to find data, or access data you may already be interested in:
Data can only be used for the purpose for which it was collected; any other use is called “secondary purpose” and requires additional consent (see Use and re-use of data section)
Health data used for research purposes needs thought around how the research will be conducted efficiently, ethically, and with privacy and safety front-of-mind.
To understand population trends and context, aggregated data sets and web tools are publicly available via the Ministry of Health and Statistics New Zealand (Stats NZ). Micro data (at the level of the individual) is available to researchers on application to National Collections or Stats NZ for Confidentialised Unit Record Files (CURFs). Dissemination of micro data is in accordance with the Privacy Act, health legislation and contracts and access is strictly controlled according to use.
Useful resources:
Through a review of Aotearoa New Zealand health datasets, PDH produced an interactive and updatable list of data available in New Zealand. This includes data from Figure.nz and many other sources. (Precision Driven Health 2022)
A recent research project which undertook a local algorithm scan produced a whitepaper report on The future of healthcare algorithms in Aotearoa New Zealand (New zealand health sector algorithm scan 2021).
Where can I find health information? (Manatū Hauora): Data sources commonly used when analysing the health of New Zealand populations
Health statistics and data sets (Manatū Hauora)
4.1.1 The Integrated Data Infrastructure
Supported by Statistics NZ, the Integrated Data Infrastructure (IDI) is a repository of individual-level data from multiple government sources, able to be linked together, anonymised, and used for research. It’s a massive resource which is accessible through a prescribed process with very high safety and privacy requirements. It may not be possible to derive individual-level insights from IDI data.
The Virtual Health Information Network includes many researchers who are using the IDI. VHIN’s IDI guides can help you understand what centralised data is available and how it could be used in your project.
If you’re interested in what you could do with the IDI, it’s best to make contact with a researcher who already has experience in using this platform. Refer to the list of research using Stats NZ microdata.
4.2 Understand the data
Knowing all the data you’re able to access is critical to understanding what is possible, in addition to understanding the question or problem you are trying to solve.
Healthcare data is often complex and ‘dirty’ (inaccurate, incomplete or inconsistent). When possible, liaise with analysts who work within the organisation to gain a local understanding of the data.
Health data is often only available after a significant lag time, and with a slow refresh rate. For example, there is a specific chain of events that lead to updates to national health data collections, and this can take a significant amount of time - often many months.
At an early stage, it is valuable to consider:
Data landscape - What data is collected? How is access managed? Does it help address the problem you are trying to solve?
Characteristics - What is the format, type and size of data? When was the data collected? How often will you receive it?
Availability and quality - How much historical data is available, and what is the quality? Use the minimum necessary!
Purpose - What purpose was the data collected for, and does that influence your interpretation?
Consent - Is the use of data covered by existing patient consent?
Data collection, maintenance, publication - Distinguish between data already collected and new data created by the study. How do you plan to maintain and/or publish this data?
Personally identifiable information - Is de-identification required? Who will do this? (See Data identifability)
Data availability - How long will it take to source and are there any reporting or system lags?
Labelling/annotation - Does the project need Human In the Loop (HITL) mechanism for annotating and validating the input and output data?
4.3 Ethics & privacy
Concepts around the legal, privacy, and ethical dimensions of health data projects are often interlinked. Legal perspectives consider the fit of the project with the laws of the data source jurisdiction as well as any legal requirements of the analysis location, if these aren’t the same place. Privacy perspectives are around what data is collected, how it’s collected safely, where it is stored, and its lifecycle. Ethical conduct of a project includes ensuring the risks to participants of data use are outweighed by the benefits of the project.
All research in New Zealand which uses data from humans needs to be undertaken in an ethical manner. In some cases, approval from a registered ethics committee is required before the project can be started. In many cases, the organisation undertaking the research (such as your employer) may also have specific research or ethical approval processes as well.
When dealing with health data, be careful when assessing if a data science project requires ethical approval. Ethical approval is typically required for any evidence-generating studies, or studies which propose changes to current standard of care (before any model is evaluated/validated) and should always be sought prior to accessing patient data. Ethical approval is given through a written application process and gives permission for the research to be conducted by named investigators during a specific time period.
Ethics approvals in New Zealand are provided through committees which have had centralised approval to follow appropriate standards. The ethical standards are set by the National Ethical Standards for Health and Disability Research and Quality Improvement and apply to researchers, health service providers and disability service providers, regardless of whether or not an additional approval process is required. Think of these like a set of best practices for undertaking work with health data or human participants - follow them regardless of whether anyone is requiring you to do so!
Most formal approvals for health data research are provided at a national level by the Health and Disability Ethics Committees (HDEC); in some cases a more localised approval is required instead or in addition (for example, the Auckland Health Research Ethics Committee).
On top of this, hospitals or clinical organisations usually have their own research offices which require separate notification (such as Research Office, Hauora a Toi Bay of Plenty, Te Whatu Ora).
Find out if your study requires HDEC review on the HDEC website.
Many of the questions asked in an ethics application and approval process are important in evaluating the risks and benefits and can indicate if there is social licence for the intended research. This process forces researchers and organisations to consider whether their work has a net benefit for society. The concept of “social license” or societal acceptance for use of health data is a related idea (see Social license - use of health data).
Seek appropriate ethics approvals prior to accessing patient data.
The implications of assuming that ethics is not required can have significant downstream effects on a project, such as reputation issues or delays. HDEC can provide ‘Scope of Review’ services to advise if the research you are doing is considered exempt from requiring an ethics application. Consider seeking written evidence of this as an assurance for stakeholders. See the HDEC website for the current process.
Some ethics applications can take months before an approval is granted, so allow adequate time for this process, factoring in when review meetings are held. It’s also likely that further questions may be asked at this review stage.
Consider if existing patient consent is sufficient to cover use of the data. See Consent below.
Generally all use of administrative health data for research purposes will need to go through the HDEC process. Consent for use of administrative data is mainly around its use for improving the care of that individual within the health service or quality improvement processes (defined as audits or other activities). Unless the work is specifically with business intelligence within a healthcare organisation, you should ask for assurance to confirm if ethical approval from HDEC is required, or not.
In New Zealand, health information follows the Health Information Privacy Code 2020. The rules of this code can be summarised for health agencies as below. These principles are also highly relevant to the use of health data for research purposes.
- Only collect health information if you really need it.
- Get it straight from the people concerned where possible.
- Tell them what you’re going to do with it.
- Be considerate when you’re getting it.
- Take care of it once you’ve got it.
- People can see their health information if they want to.
- They can correct it if it’s wrong.
- Make sure health information is correct before you use it.
- Get rid of it when you’re done with it.
- Use it for the purpose you got it.
- Only disclose it if you have a good reason.
- Make sure that health information sent overseas is adequately protected.
- Only assign unique identifiers where permitted.
Privacy Impact Assessments (PIAs) should also be conducted at an early stage to identify potential data protection risks on the data of the individuals included. Measures should be adopted to eliminate or mitigate risks. The Office of the Privacy Commissioner offers some guidance on PIAs.
Useful resources:
4.3.1 Consent
You need consent before you can use data - particularly data that contains sensitive information. In the General Data Protection Regulation (GDPR), a regulation in EU law on data protection and privacy, consent is defined as:
“any freely given, specific, informed and unambiguous indication of the data subject’s wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her.”
Some health providers give patients the opportunity to opt in for their data to be used for research and operational purposes, or to improve their care. Opting in may or may not explicitly allow for the processing of their de-identified data by third parties. More commonly, the patient hasn’t provided explicit consent.
Whether it’s appropriate to use this data will depend on context (for example public good vs. commercial gain), the degree to which the data is anonymised, whether data is provided in aggregated form, and the purpose the data is being used for.
4.3.2 Use and re-use of data
‘Use’ of data relates to using data for the purpose for which it was consented and collected. Re-use of data (or secondary use) is when you use data collected for another purpose.
When you re-use data, you should be mindful of its purpose, coverage, bias, timeliness, and applicability to the secondary use. For instance, the National Minimum Data Set is gathered for policy formation, performance monitoring, research and review. It may be useful for understanding hospitalisations, but does not provide a complete picture of an individual’s health journey.
4.4 Data sovereignty
Data sovereignty refers to the understanding that data is subject to the laws of the nation within which it is collected and stored. In New Zealand, there is also a focus on where the data is stored and processed. Data agreements should take care to address these points so they are clear to all parties. Data handling and ethical dimensions of the project also need to ensure data sovereignty is at front of mind.
Māori data sovereignty recognises that Māori data should be subject to Māori governance. Māori data sovereignty supports tribal sovereignty and the realisation of Māori and iwi aspirations. Māori must be included in any work about Māori data. An equity lens is required and ideally should include Māori in the research team as well as in external review/advisor roles.
Useful resources:
Te Mana Raraunga (Māori Data Sovereignty Network) has helpful resources and guidance.
The Health Research Council of NZ (HRC) also produces Guidelines for Researchers on Health Research Involving Māori.
NZ government’s Principles for the safe and effective use of data and analytics (Government Chief Data Steward and Office of the Privacy Commissioner 2018)
4.3.3 Social license - use of health data
Social licence is the implied permission to make decisions about the management and use of the public data in a way that ensures trust and confidence in the way that data is managed. Organisations who store health data are stewards.
Any use of data, outside of that which has been explicitly consented, is subject to ethics and requires consideration of social licence.
Conceptually, the level of engagement required to gain social license is higher for decisions that are more likely to affect an individual and are fully automated, compared to those that are manual and affect an entire population. Social licence isn’t ‘gained’ or ‘approved’; organisations need to gauge people’s thoughts, feelings, perceptions on the use of their data. These attitudes are dynamic and constantly evolving.
Assurance to stakeholders should be provided through transparency and governance to limit reputational harm, particularly when using patient data in a commercial context. Consider what data you’re handling, how identifiable it is and if patient consent is specifically required. This is particularly important for data that may be re-used for secondary purposes - the original consent should be carefully reviewed.
Data should be used for public good in a way that is equitable, with consideration for any unintended consequences, such as increased clinician workload (due to workflow disruption), potential for misuse of the model, and perpetuating biases that exist in the data. Data needs to be treated with care and a suitable data management plan can help.
In our experience, privacy, confidentiality, security, transparency, communication and the purpose for which data is used, are often raised as concerns by clinicians, administrators, legal experts, and other stakeholders.
Useful resources:
A Path to Social Licence Guidelines for Trusted Data Use - Data Futures Partnership 2017 (PDF)
Manatū Hauora research on social licence for health data re-use