Can we trust machine learned models to keep our secrets?

Jump to a section

What are the risks?

Machine Learned (ML) models learn through generalising patterns in data. The fact that these models are derived from data raises an interesting question: can we trust ML models to keep their training data secret, or is there a risk that this data could be inadvertently leaked to an adversary? An adversary is an individual or organisation trying to access the data for a reason that is not legitimate – for example, to gain access to sensitive personal information used to train the model.

Data confidentiality

Organisations already put polices in place to ensure that the handling of their data conforms to data confidentiality requirements and GDPR. However, if data leakage is possible from ML models, what security policies should we apply to these algorithms? Should models inherit the policies of their training data?

Let’s consider some examples. The extraction of training data, or its characteristics, from a model is called model inversion. For example, it has been proven possible to recover an approximate image of a face used in training a facial recognition algorithm, based only on the individual’s name. Model inversion clearly has implications when sensitive or personal data was used during training.

Alternatively, establishing whether a given data instance was in a model’s training set is known as membership inference. Consider a model trained to predict the susceptibility of individuals to a specific medical condition based on data such as age, gender, and lifestyle information. With access to a specific individual’s private data and the model, it may be possible to infer whether their data was part of the model’s training data, and therefore infer that the person suffers from the condition.

Restricting access

With the potential for model inversion or membership inference, we need to consider how to protect the data used to train a model. Fundamentally, it’s important to restrict the level of access to ML models by untrusted parties to the minimum possible. ‘Access’ could mean that the adversary has got hold of the model itself, or that they are able to query the model’s behaviour:

Having access to the model itself is akin to having the source code of the algorithm and potentially enables what is referred to as a white box In this case, an adversary can use mathematics to extract information about the training data directly from the algorithm. Many ML models are shared by organisations, either for collaboration or for profit, making white box attacks possible.
With no access to the model itself, an attacker can instead query the model’s behaviour with carefully crafted inputs and check the outputs. This enables the attacker to see how the model acts under different conditions and to infer information about the training data. This is known as a black box
Querying a model’s behaviour requires some form of Application Programming Interface (API) to access the model. Machine learning as-a-service provided though major cloud platforms such as Google’s Prediction API, Amazon’s Machine Learning (Amazon ML) and Microsoft’s Azure Machine Learning (Azure ML) can open vulnerabilities for black box attacks. These platforms are great in democratising machine learning, enabling customers to upload their data and train ML models. However, the model owner might then open API access to their model to untrusted sources, unaware of the risks this could have on sensitive training data.

The need to share

Sometimes there is a legitimate need to share an ML model or to share access to its API with an untrusted party. There may also be occasions where a model’s security cannot be guaranteed. In such cases, we need to consider approaches to strengthen the model itself against attacks – to better hide its data.

A good starting point is to consider the nature and spread of the training data itself. Model inversion and inference become increasingly feasible when there is a limited amount of training data for a specific aspect of the model’s learning; this causes the model to remember, rather than generalise. In particular, deep learned models have an amazing capacity to simply remember individual characteristics of their training data.

Comprehensive and balanced training data sets can remove training data instance ‘uniqueness’ and may go some way towards reducing the risk of information being leaked. This does not provide absolute certainty that the model is protected, however, as there is no certainty that individual training features are not still being remembered.

Another approach is to introduce noise in the model’s training data, its training algorithm, or its outputs. You can think of this as ‘blurring’ the model and therefore its training data. A statistical technique known as differential privacy provides a measure of privacy on data used to train ML models where noise has been introduced in this way. This is useful as it provides a real quantifiable assurance of data privacy. Unsurprisingly, introducing noise to machine learning results in a privacy versus utility trade-off; greater privacy guarantees can reduce the model accuracy.

Understanding the risks

So, we can’t trust ML models to keep their training data secret, but we can understand the risks and introduce appropriate mitigations. It’s encouraging that differential privacy is now being introduced into the libraries and tools used by data scientists – an indication that the risk of data leakage from ML models is becoming a mainstream concern. Ultimately, however, unless we can be certain of the confidentiality of information embedded within models, it may be prudent to treat them with the same sensitivity and policies as the data that they have been trained on. This may impact the model’s lifecycle, the security measures placed around the model and the extent to which it is shared.

In summary, to keep sensitive training data safe:

Prevent unnecessary sharing of, or access to, the model. Apply the same policies to the model as you would to the data on which it was trained.
If the model must be shared, or may be accessed by untrusted parties, make it more difficult to extract its sensitive training data. Ensure comprehensive and balanced training data, and exploit methods such as differential privacy.

Conclusion

Roke helps organisations safeguard their machine‑learned models with end‑to‑end security—from zero‑trust governance and differential privacy to confidential computing, model watermarking, and adversarial red‑teaming—ensuring ML systems can be trusted to keep secrets, resist extraction, and operate securely in sensitive government and defence applications.

AI & Analytics Cyber & Networks

Can we trust machine learned models to keep our secrets?

Jump to a section

What are the risks?

Data confidentiality

Restricting access

The need to share

Understanding the risks

Conclusion

Contact

Head Office

Contact

Social

Can we trust machine learned models to keep our secrets?

Jump to a section

What are the risks?

Data confidentiality

Restricting access

The need to share

Understanding the risks

Conclusion

Relevant Insights

How Immersion Will Shape the Future of Threat Response Training

Virtual Humans: A Vital Component of Cyber Training

Collaboration in defence – Heightened interoperability through innovation