Finnish health data infrastructure is among the most comprehensive in the world. Kanta, THL registries, and Kela compensation data form one of the broadest national health data repositories in existence. Yet the same system routinely produces situations where an individual patient's complete clinical picture — multiple chronic conditions, medication regimens, diagnostic findings — does not coalesce into a coordinated care decision. This paper argues that this is not a technical failure but a structural one: an integration gap between data collection and clinical utility. The Health Data Continuity Index (HDCI) is introduced as a constructed measurement instrument for this gap. HDCI does not describe how sick a population is — it describes how well the system converts what it knows into what it does.
Health system performance is conventionally measured through cost efficiency, queue lengths, and population-level morbidity statistics. These metrics describe the system as a production process — a ratio of inputs to outputs. This framing systematically omits one critical variable: how much of the collected health data converts into coordinated, integrated care.
Finland is internationally exceptional in health data coverage. Yet this same system routinely produces a structurally predictable failure: the patient with multiple simultaneous conditions, several medications from different prescribers, chronic findings across multiple specialties — whose complete picture no single clinician holds or is prompted to synthesise. Data exists for each component. The integration does not occur.
This is not a data volume problem. It is not a digitisation problem. It is an integration gap — a structural discontinuity between what the system records and what it acts upon.
The analogy to ACI's energy analysis is precise: the Nordic power system does not lack capacity data — it lacks instruments that make capacity adequacy visible under compound stress. The Finnish health system does not lack patient data — it lacks instruments that make integration failure visible under multi-morbidity conditions.
HDCI is the instrument for the second problem.
Health Data Continuity Index is a constructed index: it derives new indicators from existing data sources rather than aggregating pre-existing official metrics. This is methodologically equivalent to ACI's energy instruments (EPP, FS(p), SP_cluster) — which also derive new diagnostic quantities from publicly available grid and market data.
HDCI consists of five components:
Measures the coverage of recorded health data relative to expected documentation given population morbidity profile.
Measures the fraction of care contacts in which multi-specialty integration is achieved for patients with multiple simultaneous conditions. This is HDCI's central component — the direct measure of coordination rather than volume.
Measures the interval between initial clinical finding and first responsive care action. Defined at the level of the first recorded finding (diagnosis code, laboratory result, risk flag) rather than the referral letter — capturing the full latency of institutional response.
Measures the prevalence of uncoordinated polypharmacy: patients with five or more medications sourced from two or more prescribers, without a shared care plan entry.
Measures the fraction of total care contacts that are preventive in nature: screenings, risk assessments, lifestyle counselling.
Three-component core index. RKI_adjusted, IAI, and RVI are the three components with direct outcome validation pathways and the lowest endogeneity risk. DCI and PAI are retained as diagnostic side-indicators but excluded from the composite (see §6).
DCI (Data Capture Index) has a structural endogeneity problem — it shares causal ancestors with the care contact rate it uses as a predictor. It is reported separately as a documentation completeness indicator, not included in HDCI_v1. PAI (Prevention Activity Index) is an input-effort indicator that measures care system activity, not integration effectiveness. It is reported as dashboard metadata.
All HDCI_v1 weights are Bayesian priors subject to outcome-constrained empirical re-estimation. The pilot (DT-006) will report both prior-weighted and outcome-optimised HDCI_v1.
HDCI is not a theoretical proposal. The data required to calculate it exists. The obstacle is access structure, not data existence.
| Component | Data Source | Availability |
|---|---|---|
| DCI | THL Avohilmo · Sotkanet Morbidity Index | Partial — morbidity index open; Kanta entry volume requires Findata licence |
| IAI | THL Avohilmo + Terveys-Hilmo (linked) | Available — requires patient-level record linkage |
| RVI | THL Hilmo · hoitoonpääsyn seuranta | Available — referral and procedure timestamps linkable |
| RKI | Kela lääkekorvausrekisteri | Available — prescriber source included; requires Findata request |
| PAI | THL Avohilmo (procedure codes) | Available — requires preventive code classification |
The structural situation is identical to ACI's energy analysis: Fingrid's production data is open; NVE's reservoir data requires API access and proxy construction. HDCI's case is that all required data exists within Finnish health registries — but patient-level linkage requires formal secondary use licensing through Findata.
THL and Kela jointly produce the Kansallinen terveysindeksi — a composite index combining records from THL, Kela, the Pension Insurance Centre, and Statistics Finland. This demonstrates that multi-registrar health data linkage is already operationally established in Finland. HDCI does not require a new technical or legislative framework. It requires a decision to calculate a different quantity from an already-functioning linkage infrastructure.
The National Health Index measures how sick the population is. HDCI measures how well the system responds to what it knows about the sick population. Both use the same data infrastructure. Neither replaces the other.
THL is prepared to assume responsibility for the Finnish Health Data Space (FHDS) data governance and licensing function. FHDS will consolidate health data infrastructure, licensing, and research facilitation — and enable more individually tailored care, cost containment, and accelerated research.
HDCI is a natural first use case for FHDS. It does not require new legislation, new agencies, or new data collection. It requires a decision: calculate the integration index from already-available registry linkages and publish it as part of national health system monitoring.
Theoretical weights are calibrated against known outcome indicators:
HDCI operates at two levels with different temporal properties:
Registry-level HDCI — calculated from FHDS-linked THL and Kela registries. Suitable for population-level monitoring and inter-welfare-area comparison. Update frequency: quarterly to annual. This level is operable now with Findata licensing.
Field-level HDCI — a conceptual architecture in which HDCI components are calculated in near-real-time from continuously collected individual-level data, processed locally on a sovereign device without requiring central registry access. This is the operating model proposed in the AetherOne™ / Debian Quantum AI framework: explainable, offline-capable, auditable health intelligence at the point of care. This level describes a design target for future development, not a deployed capability.
HDCI's introduction is not a measurement addition — it is a framing shift. The current welfare system governance framework relies on three primary performance measures:
| Current metric | What it measures | What it omits |
|---|---|---|
| Cost per capita | Total expenditure | Whether expenditure reflects integration or fragmentation |
| Queue length | Access latency | Whether access leads to coordinated or siloed care |
| Morbidity index | Population illness burden | How well the system responds to known illness burden |
These metrics describe the system as a production process. HDCI describes it as a coordination process. The distinction matters because a system can score well on all three current metrics while systematically failing to integrate care for its most complex patients.
| Actor | Role under HDCI framework |
|---|---|
| THL | Calculate and publish HDCI from registry linkages; maintain calibration methodology |
| Kela | Provide RKI data; target polypharmacy coordination services to high-RKI populations |
| Welfare areas | Use HDCI for internal integration diagnostics; identify lowest-IAI patient populations |
| STM | Include HDCI in welfare area performance monitoring alongside cost and queue metrics |
HDCI also makes visible a structural problem in the current governance logic: welfare areas that achieve integration and thereby reduce future care episodes receive less future funding. The incentive structure penalises the outcome HDCI is designed to encourage. Integration accounting — measuring coordination rather than consumption — is a prerequisite for incentive reform.
Integration failure is not a Finnish peculiarity — it is a structural property of health systems that evolved from episodic acute-care models into complex multi-morbidity environments. What is comparatively unusual about Finland is the gap between data richness and integration measurement.
| Country | Integration measurement | Gap |
|---|---|---|
| Sweden | Vårdanalys produces systematic integration assessments; sammanhållen vård is a monitored indicator. Information-Driven Healthcare programme (Vinnova 2019–2024) united all 21 regional health authorities around shared AI-assisted data use — the closest Nordic equivalent to HDCI in practice. | No composite integration index comparable to HDCI — but institutional infrastructure for one is already operational. |
| Denmark | Nationale Kvalitetsprogram (launched 2016) sets eight national healthcare goals agreed between government, Danske Regioner, and KL. One goal is explicitly sammenhængende patientforløb — coordinated patient pathways. Annual status reports use traffic-light indicators per region against each goal. Learning and quality teams operate across regional and municipal boundaries to spread coordinated care practices. | Region-level tracking of coordination indicators exists, but no composite integration index. Indicators are measured separately, not aggregated into a single integration score. |
| Netherlands | Bundled payments for diabetes, COPD, and vascular risk management introduced 2010. Health insurers pay a single annual fee to a care group (typically 4–150 GPs in a region) covering all primary care for the condition. Initial evaluation found improved organisation and coordination of care, better collaboration, and better adherence to care protocols. Care coordination improved measurably in early years. | Two structural limitations have emerged: (1) costs increased 13–52% per disease group over seven years — savings in secondary care were not recaptured; (2) the model is disease-specific and works less well for multimorbid patients, who have problems crossing disease-bundle boundaries. The Netherlands is now developing person-centred bundled payments combined with shared savings to address multimorbidity — moving toward whole-system integration logic. |
| Estonia | X-tee platform enables real-time data exchange between health actors | Infrastructure without integration measurement |
| Finland | World-class registry coverage; Kansallinen terveysindeksi established | No integration measurement instrument — HDCI fills this gap |
HDCI's IAI, RVI, and RKI components directly correspond to OECD Patient Safety and Quality care coordination indicators. A Nordic HDCI comparison — calculated from Finnish, Swedish, and Danish registry data — would constitute the first composite integration index in the Nordic health policy space.
The Netherlands experience is particularly instructive for HDCI's design rationale. Bundled payments improved coordination within disease categories but produced cost increases and struggled with multimorbid patients — precisely the population where integration failure is most consequential. The Dutch are now evolving toward whole-system person-centred payment. HDCI's architecture starts where the Dutch are heading: a whole-system integration index covering all multi-morbid patients, not organised by disease category. This is not a coincidence — it reflects the same structural finding that disease-specific integration instruments are insufficient for the multi-morbidity challenge.
The structural analogy to ACI's energy analysis extends to the health domain. Just as Stockholm Exergi validated the SGFA energy architecture by making a final investment decision in March 2025, Sweden has already deployed the institutional infrastructure that HDCI requires — before Finland has formally decided to build it.
Vårdanalys — Sweden's independent health analysis authority — produces systematic annual assessments of care integration, including sammanhållen vård (coordinated care), transitions between care levels, and polypharmacy management. These are the IAI, RVI, and RKI components of HDCI, measured separately rather than as a composite. Vårdanalys demonstrates that independent, systematic integration measurement is institutionally achievable within a Nordic health system — without requiring new data collection, only a mandate and an analytical framework.
Information-Driven Healthcare (Vinnova, 2019–2024) — a national programme that united all 21 Swedish regional health authorities with academia and industry to deploy AI-assisted care at scale. The programme's explicit objective — offering more information-driven, personalised, and scalable healthcare by systematically utilising all existing data — is precisely the institutional commitment that HDCI operationalises as a measurement. Sweden built the institutional coalition first, then deployed the tools. The programme is now concluded; its infrastructure remains.
Five structural observations emerge from this analysis:
1. Data existence does not imply data use. Finland's registry infrastructure is internationally exceptional; its integration measurement is not. The gap is structural, not technical.
2. HDCI is calculable. All five components have identified data sources within existing Finnish health registries. The path from theoretical framework to empirical index requires Findata licensing, not new data collection.
3. FHDS is the institutional home. THL's readiness to assume FHDS governance creates a natural institutional location for HDCI — an index calculated from the same registry linkages FHDS will facilitate.
4. Current metrics are insufficient but not wrong. Cost, queue, and morbidity metrics are valid. They are insufficient because they do not measure the coordination dimension. HDCI complements rather than replaces existing monitoring.
5. The incentive structure requires reform. Integration accounting is necessary but not sufficient — incentive reform must follow. Measuring integration is the diagnostic precondition for correcting the governance dynamics that currently penalise it.
| Priority | Description | Timeline |
|---|---|---|
| 1. Pilot calculation | HDCI calculated for one welfare area from existing registry data; correlation with known outcome measures validated | 6–12 months |
| 2. Weight calibration | Theoretical weights (0.15 / 0.30 / 0.25 / 0.20 / 0.10) calibrated empirically against preventable hospitalisation and adverse medication event data | 12–18 months |
| 3. FHDS integration | HDCI calculation documented as FHDS first use case; technical specification for registry linkage published | 12–24 months |
| 4. Nordic comparison | HDCI calculated comparably from Finnish, Swedish, and Danish registries; first Nordic integration index produced | 24–36 months |
| 5. Incentive modelling | HDCI-linked welfare area funding adjustment modelled; analogous to Dutch bundled payment model applied to integration achievement | 36–48 months |
HDCI is not a solution. It is a diagnostic instrument. Its purpose is to make visible what the system already knows but does not currently measure: how much of its data becomes care.
HDCI is a constructed measurement instrument. Its analytical value depends on whether its components measure what they claim to measure, and whether observed variation reflects genuine integration differences rather than artefacts of data structure or documentation practice. This section identifies the principal validity threats and specifies how they are addressed in the pilot design.
DCI has an endogeneity problem that must be stated explicitly: morbidity drives care contacts, which drive Kanta entries, which constitute the DCI numerator. The same causal structure produces both the predictor and the outcome. Without an exogenous instrument, DCI risks becoming a residual measure of documentation intensity rather than a measure of data capture quality.
The pilot addresses this by adding Avohilmo contact rate as an exogenous instrument in the expected entry model:
expected_entries_i = f(morbidity_index, age, sex, Avohilmo_contact_rate_i)
Avohilmo contact rate is independently measured and separates care contact frequency (how often the patient enters the system) from Kanta entry rate (how often those contacts produce structured records). A patient with high contact rate but low Kanta entry rate has a genuine documentation gap — not a care frequency artefact. A patient with low contact rate and proportionally low Kanta entries does not. This distinction is what DCI is intended to capture.
Model specification: negative binomial regression (appropriate for count data with overdispersion). Sensitivity analysis: Poisson specification and stratum-mean alternative. If rankings across specifications are unstable, DCI requires further instrument development before use as a composite component.
IAI is a structural proxy for integration, not a clinically invariant measure. The 30-day episode window treats all care episodes as having the same temporal integration structure. This is false in clinical reality: a post-surgical patient may require multi-specialty contact within days; a stable diabetic may have meaningful integrated episodes spanning months; a psychiatric patient's integration needs operate on a different timescale entirely. A single window cannot be neutral across these clinical contexts.
IAI should therefore be interpreted as a system-level diagnostic, not as a patient-level truth claim about whether care was integrated. What it measures is whether the system produces multi-specialty contact within a defined time window — a structural observation about system organisation, not about care quality.
Sensitivity analysis: 15, 30, and 90-day windows. If sub-region IAI rankings are stable across windows, the 30-day default is robust as a structural indicator. If rankings change materially, disease-stratified analysis is required and IAI reverts to a domain-specific measure rather than a composite component. This is a methodological finding, not a failure of the instrument.
RKI is the strongest HDCI component — prescriber multiplicity combined with absence of a shared care plan is the most direct structural proxy for coordination failure available in Finnish registry data. However, its validity depends on Kanta care plan adoption rates, which vary by municipality and care setting and are not under patient or clinician control.
The pilot will report two RKI values:
RKI_raw = N_risk / N_poly RKI_adjusted = RKI_raw / Kanta_care_plan_adoption_rate
RKI_adjusted isolates genuine coordination failure from documentation infrastructure variation. A high RKI_adjusted in a care setting with high Kanta adoption is a stronger signal of coordination failure than a high RKI_raw in a setting with low adoption. Both are reported; composite HDCI uses RKI_adjusted.
Cross-validation against adverse medication event outcomes (H2) will test whether RKI_adjusted captures genuine coordination failure: if RKI_adjusted predicts adverse events independently of adoption rate, the adjustment is valid. If residual adoption-rate effects persist after adjustment, further modelling is required.
RVI measures the latency between the system's first recorded awareness of a clinical finding and its first recorded response. It is a measure of institutional throughput — how quickly the system converts observation into action — not of clinical appropriateness. A low RVI (fast response) may reflect efficient care, aggressive diagnostic coding, or high system sensitivity to marginal findings. These are not equivalent.
The "first finding" is not a neutral observation: it depends on the testing frequency, coding practices, and diagnostic sensitivity of the care setting. A care setting that orders more tests will produce earlier "first findings" and therefore mechanically higher RVI values, independently of actual response quality. RVI should be interpreted as a system-level indicator of response velocity, not as a patient-level measure of care quality.
HDCI does not assume that "integration" is a unitary latent variable. Integration in healthcare is multidimensional: information flow, care responsibility coordination, and clinical decision consistency are distinct constructs that do not necessarily co-vary. HDCI aggregates observable proxies for these dimensions into a single diagnostic score. It is a composite proxy index, not a structural equation model for integration as a latent entity.
This distinction matters for interpretation: HDCI scores are diagnostic signals about where integration-relevant observables are weakest, not measurements of an underlying integration quantity. The instrument is analogous to a credit score or an air quality index — practically useful as a signal, but not a direct measurement of the phenomenon it proxies.
Based on the identifiability analysis above, weights are revised as follows:
HDCI = 0.15·DCI + 0.30·IAI + 0.25·RVI + 0.25·RKI + 0.05·PAI
Changes from v0.5: RKI weight increased 0.20 → 0.25 (strongest component, direct outcome validation pathway). PAI weight reduced 0.10 → 0.05 (input-effort indicator only; does not measure prevention effectiveness). PAI is retained as a process indicator but its composite contribution is minimised to reflect its limited outcome validity.
All weights remain Bayesian priors subject to outcome-constrained re-estimation. The pilot will report both prior-weighted HDCI (v0.6 weights) and outcome-optimised HDCI. Divergence between the two is a finding requiring examination, not a failure.
All five components are derived from administrative documentation systems. A welfare area with higher documentation quality will score higher on HDCI independently of care integration quality. Three mitigations are built into the pilot design: outcome validation tests real health effects; within-Pohjois-Savo comparison controls for regional documentation culture; DCI uses Avohilmo contact rate as an exogenous instrument to separate documentation completeness from care frequency. Residual documentation bias cannot be eliminated — it must be acknowledged in all publications.
The following directed acyclic graph makes explicit the causal relationships between inputs, components, and outcomes. It identifies the endogeneity risks and the exogenous instruments used to address them.
EXOGENOUS INPUTS
─────────────────────────────────────────────────────
Morbidity (Sotkanet) Age / Sex
│ │
└──────────┬──────────────┘
▼
Care Contacts ◄── Avohilmo Contact Rate
(endogenous) (exogenous instrument)
│
┌──────────┼──────────────┐
▼ ▼ ▼
Kanta Multi-spec First finding
entries episodes → First action
(DCI) (IAI) (RVI)
│ │
└──────────┬──────────────┘
│
┌──────────┤
▼ ▼
Prescriber Kanta care
multiplicity plan entries
└────────► RKI_raw
│
Kanta adoption ─┤
rate (control) ▼
RKI_adjusted ◄── strongest component
PREVENTIVE
contacts / total ─► PAI (input-effort only)
COMPOSITE
─────────────────────────────────────────────────────
HDCI = 0.15·DCI + 0.30·IAI + 0.25·RVI + 0.25·RKI_adj + 0.05·PAI
VALIDATION OUTCOMES
─────────────────────────────────────────────────────
H1: Preventable hospitalisations ◄── HDCI composite
H2: Adverse medication events ◄── RKI_adjusted (primary)
H3: Inter-municipality variation ◄── HDCI components
The DAG makes visible the two main endogeneity risks: (1) DCI shares causal ancestors with care contact rate — addressed by the Avohilmo instrument; (2) RKI shares causal ancestors with Kanta adoption rate — addressed by the adjusted/raw decomposition. RVI's endogeneity (diagnostic system sensitivity affecting "first finding") is acknowledged but not fully resolved in v0.6 — it is a constraint on RVI interpretation, not a reason to exclude it.
DCI = (actual_kanta_entries) / (expected_kanta_entries)
expected = f(morbidity_index, age, sex)
Source: Sotkanet 5642 · THL Avohilmo · Findata licence
IAI = (multi_specialty_entries_per_episode) / (total_entries)
Source: THL Avohilmo + Terveys-Hilmo (patient-level linkage)
RVI = 1 - (latency_days / max_latency_threshold)
latency = first_care_action - first_recorded_finding
Source: THL Hilmo · Avohilmo timestamps
RKI = (patients: ≥5 drugs, ≥2 prescribers, no shared care plan) /
(patients: ≥5 drugs)
Source: Kela lääkekorvausrekisteri · Kanta care plan register
PAI = (preventive_contacts) / (total_contacts)
preventive = screenings + risk assessments + lifestyle counselling
Source: THL Avohilmo procedure codes
Weight: 0.05 (input-effort indicator only)
RKI_raw = (patients: ≥5 drugs, ≥2 prescribers, no shared care plan) /
(patients: ≥5 drugs)
RKI_adjusted = RKI_raw / Kanta_care_plan_adoption_rate
Use RKI_adjusted in composite
HDCI_v1 = 0.40·RKI_adjusted + 0.35·IAI + 0.25·RVI
v0.7 three-component core — Bayesian priors subject to outcome-constrained re-estimation
DCI = actual_Kanta_entries / expected_Kanta_entries [side indicator, not in composite]
PAI = preventive_contacts / total_contacts [dashboard metadata, not in composite]
v0.7 revision: DCI and PAI removed from composite due to endogeneity (DCI) and input-effort-only status (PAI). HDCI_v1 is the three-component integration stress index. DCI and PAI reported separately. See §6 for full identifiability analysis.