You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -15,10 +17,70 @@ When discussing **generative models**, it's essential to understand how machine
15
17
1.**Discriminative Modeling:** This approach involves building a model that directly predicts classification labels or identifies the decision boundary between elephants and dogs.
16
18
2.**Generative Modeling:** This approach entails constructing separate models for elephants and dogs, capturing their respective characteristics. A new animal is then compared against each model to determine which it resembles more closely.
17
19
18
-
In discriminative modeling, the focus is on learning the conditional probability of labels given the input data, denoted as $$ p(y\mid{x}) $$. Techniques like logistic regression exemplify this by modeling the probability of a label based on input features. Alternatively, methods such as the perceptron algorithm aim to find a decision boundary that maps new observations to specific labels $$\{0,1\}$$, such as 0 for dogs and 1 for elephants.
20
+
In discriminative modeling, the focus is on learning the conditional probability of labels given the input data, denoted as
21
+
$$ p(y|{x}) $$. Techniques like logistic regression exemplify this by modeling the probability of a label based on input features. Alternatively, methods such as the perceptron algorithm aim to find a decision boundary that maps new observations to specific labels $$\{0,1\}$$, such as $$0$$ for dogs and $$1$$ for elephants.
22
+
23
+
Conversely, generative modeling focuses on understanding how the data is generated by learning the joint probability distribution $$p(x,y)$$ or the likelihood
24
+
$$p(x|{y})$$ along with the prior probability $$p(y)$$. This approach models the distribution of the input data for each class, enabling the generation of new data points and facilitating classification by applying Bayes' theorem to compute the posterior probability:
25
+
26
+
$$
27
+
% p(y\mid{x})=p(x\mid{y})p(y)/p(x)
28
+
p(y|x) = \frac{p(x|y)p(y)}{p(x)}
29
+
$$
30
+
31
+
The denominator $$p(x)$$ is the marginal probability that sums the joint probability $$p(x,y)$$ over all possible labels $$y$$:
32
+
33
+
$$
34
+
\begin{aligned}
35
+
p(x) &= \sum_{y} p(x,y) \\
36
+
&= \sum_{y} p(x|y)p(y) \\
37
+
&= p(x|y=0)p(y=0) + p(x|y=1)p(y=1)
38
+
\end{aligned}
39
+
$$
40
+
41
+
Actually $$p(x)$$ acts as a normalization constant as it does not depend on the label $$y$$. To be more specific, $$p(x)$$ does not change no matter how $$y$$ varies. So when calculating
Let's consider a new binary classification of emails as spam or not spam. $$ x^{(i)} $$ is the feature vector of the $$i$$-th email, and $$ y^{(i)} $$ is the label indicating whether the email is spam ($$1$$) or not spam ($$0$$). Following examples show how discriminative and generative models approach the same problem differently.
52
+
53
+
### Example 1: Logistic Regression as a Discriminative Model
54
+
55
+
Since the label $$y$$ can only take on values $$0$$ or $$1$$, it makes sense to choose a hypothesis $$h_{\theta}(x)$$ that ranges in $$[0,1]$$ to represent the probability of
56
+
$$p(y=1|x)$$. Then we can set the threshold of $$h_{\theta}(x)$$ to be $$0.5$$ to predict whether a email is spam or not spam. Logistic function fits this case well as it ranges in $$[0,1]$$ for $$z\in(-\infty, +\infty)$$:
From the plot, we can see $$g(z)$$ tends to $$0$$ as $$z\to-\infty$$ and tends to $$1$$ as $$z\to+\infty$$. When $$z=0$$, $$g(z)=0.5$$. $$g(z)$$ or $$h_{\theta}(x)$$ is always bounded between $$0$$ and $$1$$. To keep the convention of letting $$x_0=1$$, we can rewrite the variables in the hypothesis as $$z = \theta^T x = \theta_0 + \sum_{j=1}^n \theta_i x_j$$, where $$\theta_0$$ is the bias term and $$\theta_j$$ is the weight of the $$j$$-th feature $$x_j$$. Please note that other functions that smoothly and monotonically increase from $$0$$ to $$1$$ can be also considered for $$h_{\theta}(x)$$.
73
+
74
+
<!-- to check:notes page22 -->
75
+
76
+
### Example 2: Gaussian Discriminant Analysis as a Generative Model
77
+
78
+
79
+
80
+
81
+
82
+
19
83
20
-
Conversely, generative modeling focuses on understanding how the data is generated by learning the joint probability distribution
21
-
$$p(x,y)$$ or the likelihood $$p(x\mid{y})$$ along with the prior probability $$p(y)$$. This approach models the distribution of the input data for each class, enabling the generation of new data points and facilitating classification by applying Bayes' theorem to compute the posterior probability $$ p(y\mid{x})=p(x\mid{y})p(y)/p(x) $$.
0 commit comments