Logs for Casual Deep Learning | Peter’s Secluded Realm

type

status

date

slug

summary

#1 Background of Casual Learning

#1.1 What is Casual Learning?

In general, causality refers to the connection between an effect and the cause of it. - arXiv, 2209.08860

In a word, Casual Learning = learn causal models/features from data.

Causal research encompasses two important aspects: Causal learning and Causal reasoning, as described in Jonas Peters et al.'s book. Causal learning focuses on discovering causal models, causal relationships, and causal features from data. On the other hand, causal reasoning aims to study phenomena or infer causal changes of variables based on causal models. This includes causal effect analysis, intervention analysis, counterfactual analysis, and more.

In the field of artificial intelligence, causal learning shares the same purpose as traditional causal learning. However, there may be differences in the tools and data dimensions used to address the problem. Artificial intelligence tends to utilize machine learning or deep learning tools to tackle high-dimensional causal learning tasks, while traditional causal inference may rely on statistical tools to address low-dimensional causal learning tasks.

Causal learning has different ultimate goals depending on the application context. One important goal is to address the problem of Out-of-Distribution Generalization (OODG) in machine learning. OODG refers to the ability of a machine learning model to make accurate predictions or inferences on data that falls outside the distribution of the training data.

#1.2 A Straightforward Example to Demonstrate the Significance of Casual Learning

By incorporating causal learning into machine learning algorithms, researchers aim to improve the generalization capabilities of models beyond the observed data distribution. Causal learning enables the discovery of causal relationships and mechanisms that can help models reason about and generalize to unseen or different data distributions. This can lead to more robust and reliable predictions in real-world scenarios where the data distribution may change or differ from the training data.

Therefore, one of the important goals of causal learning is to enhance the out-of-distribution generalization performance of machine learning models. However, it is worth noting that causal learning has broader implications beyond OODG and can be applied in various fields and applications to understand causal relationships and make informed decisions.

Take an image classification task as an example. Assume we have a large amount of the following images as the training dataset of the classification task between camels and cows:

In our training data, camels are mostly found in desert areas (with a predominantly yellow background), while cows are predominantly found in vegetation-rich areas (with a predominantly green background). Consequently, since the background often occupies the main content of the images, our model may learn features that are related to the background, such as background color, and use that to classify the images. Even if we achieve good results on the training set, there can be issues if the distribution of the test set is different from the training set. In other words, if the new test set also consists of mostly yellow backgrounds for camels and mostly green backgrounds for cows, then our model may still perform fairly well. However, if the situation is as depicted in the example below, our model is likely to perform poorly.

It's important to ensure that our model is trained on a diverse range of backgrounds to improve its ability to generalize to different scenarios. This can be achieved by including images with various background colors and textures in our training dataset.

Furthermore, we can even freely use photo editing software to replace the background with any desired form, like the situation depicted in the image below.

In the given image, even if we replace the background with a situation that does not exist in real life, humans can still easily identify the animals in the picture. One possible reason for this is that humans rely on causal features, such as the shape of the animals, for classification rather than correlated or spurious features like the yellow or green background. Therefore, if our model can learn causal features, wouldn't it be able to accurately identify the animals in the images, just like humans do? This is indeed one of the ultimate goals of causal learning.

By training models to learn and leverage causal features, we can improve their ability to generalize to novel situations, handle variations in the data distribution, and make more reliable predictions. However, it is important to note that causal learning is a challenging task, and developing models that can truly understand and reason about causal relationships is an active area of research in machine learning and artificial intelligence.

#1.3 How is Casual Feature Defined?

In the previous example, we can roughly consider the shape of an animal as a causal feature and the yellow-green background as a correlated feature. But how can we differentiate between causal and correlated features? Let's start with the definition of causality. Causality can be seen as a process where an influence is unidirectionally exerted. In other words, a cause variable can affect an effect variable, but the effect cannot influence the cause. The mechanism from cause to effect is known as a causal mechanism. This unidirectional process can be represented by a causal graph, which is a directed arrow from cause to effect.

If the effect variable is influenced by external interference or intervention, the current causal mechanism will no longer exist because the effect variable is determined by the intervention rather than the previous cause variable. However, if the intervention is applied to the cause variable, the causal mechanism will still exist because the effect variable is originally determined by the value of the causal variable. This can be understood as the cause variable being the input of a function, the causal mechanism being a unidirectional assignment function, and the effect variable being the output of the function. Therefore, no matter how we change the value of the input, it will not change the form of the function, but only affect the output value. The causality between the input and output remains unchanged. With these concepts in mind, we can depict a possible causal graph for the previous example of classifying camels and cows as follows:

The solid arrows in the graph represent causal associations, while the dashed lines represent correlations. These correlations are non-causal and may be influenced by selection bias or confounders (common causes). We can apply interventions to test whether the graph is correct. For example, if we change the shape of the animal in the graph, it will directly affect our ability to classify the type of animal. This indicates that shape is a causal feature. On the other hand, if we change the background, it will not affect our classification. This suggests that the background color is only a correlated feature. Therefore, the presented graph is reasonable.

#2 How is Deep Casual Learning Introduced

Courtesy: arXiv, 2211.03374

The above image illustrated differences between causal learning and deep causal learning.

The comparison between (a) and (b) shows the theoretical advantages of deep causal learning. In the framework of deep causal learning, unstructured data can be processed with the representational power of neural networks. With the modeling capabilities of neural networks, in causal discovery, observational data and (known or unknown) intervention data can be comprehensively used in the presence of unobserved confounders to obtain a causal graph that is closer to the facts. With the fitting ability of neural networks, the estimation bias of causal effects can be reduced in causal inference. The 4 orange arrows represent the neural network's empowerment of representation, discovery, and inference.

(c) and (d) demonstrate the advantages of deep causal learning in more detail by exploring examples of the effect of exercise on blood pressure. We assume that the ground truth of exercise on blood pressure is .

#3 General Deep Casual Learning

💡

For more detail information of this section, please refer to arXiv 2211.03374. Here I will just list the key points.

#3.1 Preliminaries

There are two mainstream frameworks for causal inference: the structural causal model (SCM) and the potential outcome model (POM).

#3.2-3.5 Deep Representation Learning For Casual Variables, Casual Discoveries and Deep Casual Inference

This section is neglected due to the page size limitation. Please refer to arXiv 2211.03374 for further reference.

#3.6 Exemplar Papers

Representation Learning via Invariant Causal Mechanisms

Idea: The proposed method considers data as a combination of style and content, with both being independent of each other. In this case, the style is not relevant to us, and only the content is useful for downstream tasks. Therefore, we aim to make changes in style that do not affect the conditional distribution of our desired output value .

Out-of-distribution Generalization with Causal Invariant Transformation (Closed-source)

idea: Raised the concept of causal invariant transformations to learn OOTD problems.

Which Invariance Should We Transfer? A Causal Minimax Learning Approach

Idea: The paper proposes a causal minimax learning approach which aims to minimize the worst-case risk across all possible dataset shifts. The authors build their method by introducing some basic assumptions commonly made in causal inference and learning such as the structural causal model. They also provide a graphical condition for the whole stable set to be optimal, and an efficient algorithm to search for the optimal subset when the condition fails.

(Though it is pretty hard to understand the mathematical concepts, need to dig deeper)

#4 Deep Casual Learning in MIA

Since the papers are too long, I would list some of the papers that I think is interested.

Foundation:

Causality matters in medical imaging

Nature Communications - Scarcity of high-quality annotated data and mismatch between the development dataset and the target environment are two of the main challenges in developing predictive tools...

https://www.nature.com/articles/s41467-020-17478-w

Key challenges in machine learning for medical imaging. a Data scarcity and b–d data mismatch. X represents images and Y, annotations (e.g. diagnosis labels). refers to the distribution of data available for training a predictive model, and is the test distribution, i.e. data that will be encountered once the model is deployed. Dots represent data points with any label, while circles and crosses indicate images with different labels (e.g. cases vs. controls).

Exemplar article:

Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Image

Idea: simply plug in the concept of backdoor casual inference into MIL.

#5 Conclusion

Casual Learning, as an new emerging deep learning field:

requires a solid understanding of causality, including both causal learning and causal reasoning.

aims to improve the generalization capabilities of machine learning models beyond the observed data distribution by discovering causal relationships and mechanisms that can help models reason about and generalize to unseen or different data distributions.

is a challenging task and developing models that can truly understand and reason about causal relationships is an active area of research in machine learning and artificial intelligence.

Fin.