How can machine learning serve patients and enable value-based healthcare?

18 May 2021

Machine learning to identify and characterise metastatic breast cancer patients in Sweden: a population-based study

The application of artificial intelligence (AI) and specifically, machine learning in scientific research has become increasingly prevalent. Its use has become important – from processing large amounts of data, making accurate predictions, and facilitating the research efforts of scientists to help them make discoveries more efficiently. Recently, machine learning was applied as a novel approach to an epidemiological study which holds promise for the future of precision medicine.

In Sweden, a consortium of researchers from Uppsala and Örebro University, PAREXEL International, Nordic Market Access AB, and Novartis Oncology joined forces to use machine learning in a study for metastatic breast cancer (mBC). Since treatment decisions for mBC are increasingly complex and the patient subset is poorly characterised despite the availability of national health registers, it has traditionally been difficult to identify mBC patients’ true prevalence and characteristics at the national level. This is mostly attributed to the lack of variables or missing information on recurrence from early to late disease. This is a barrier to timely identification of the patients that would benefit the most from new, innovative therapies.

In order to address this problem, researchers developed an algorithm trained to identify mBC patients in Swedish national health registers. The aim of this was to estimate the number of mBC patients and to describe their characteristics and survival outcomes. To do so, researchers conducted a retrospective database study on Swedish national data (National Patient Register, Prescribed Drug Register, Cancer Register and the Cause of Death Register) which were linked with metastatic status, outcome and biomarker data from a regional breast cancer register (Uppsala University Hospital) via a unique personal identification number (PIN). These national registers contain complete data on all Swedish residents, and the PIN is used throughout the Swedish system, enabling direct data linkage whilst maintaining anonymised data. Patients included were defined as having a breast cancer diagnosis during 2009-2016.

The results of the study showed that it was possible to describe the epidemiology and survival of the full national Swedish mBC population. Although previous studies have used machine learning for cancer detection and prognostication, this is the first study to apply machine learning algorithms to identify patient subsets in national population health registers. For example, the study found that between 2009 and 2016 there were a total of 150,235 patients alive with a breast cancer diagnosis. Out of this group, 13,826 out of 150,235 patients (9.2%) were identified as having mBC. Median age at mBC diagnosis was 67.5 years and median survival was estimated at 29.8 months. Advanced age by the time of diagnosis and hormone receptor (HR) negative tumors were associated with worse overall survival.

This study has provided a pure demonstration of the value of AI and machine learning in various scientific applications, including in making important predictions about mBC populations on a national level. By being able to successfully facilitate and speed up the identification of the correct patient population, it has proven its potential to enable a value-based approach to cancer care. For payers, it enables the quantification of health and societal gains through innovative care for specific patient populations. For healthcare professionals, the healthcare system and authorities, it could facilitate a more accurate projection of care needs as well as efficiency in providing that care.

In a value-based health care system there is a win-win for both patients and the health care system. Firstly, the quality of life of patients is improved through being able to access the right treatment, and secondly, the health care system can ensure an efficient use of resources by providing the right treatments to patients. This study shows how machine learning can facilitate a value-based approach to cancer care, and the importance of access to linkable health data as well as collaboration between different public and private stakeholders to make this a reality.

Jelena Duza

Global Public Affairs, Novartis


  • Limitations to this case study relate to the type of data collected. As this was a retrospective registry study, the data utilised were not specifically collected for the purposes of the study and limited to the variables collected in the registries.
  • An unselected breast cancer cohort of this size is rare in Europe. Future research should focus on further validating the model and on developing its classification ability by incorporating important prognostic factors.