Skip to content

Commit ef06bb3

Browse files
author
liuzi
committed
comparison
1 parent 576fb0d commit ef06bb3

File tree

3 files changed

+74
-1
lines changed

3 files changed

+74
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# [Liuzi's GitHub Pages Site](https://liuzi.github.io)
22

3-
Welcome to Liuzi's GitHub Pages site, powered by Jekyll and the Chirpy theme. This site is a collection of my learning notes, projects, and tutorials, with a focus on GenAI (e.g., LLMs, Diffusion Models, GAN, etc.) and its applications.
3+
Welcome to Liuzi's GitHub Pages site, powered by Jekyll and the Chirpy theme. This site is a collection of my learning notes, projects, and tutorials, with a focus on GenAI (e.g., Transformer, Diffusion Models, GAN, etc.) and its applications.
44

55
## About
66

_posts/2025-02-15-generative-models.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,79 @@ p(x|y=1) &= \frac{1}{(2\pi)^{n/2}|\Sigma|^{1/2}}exp\left(-\frac{1}{2}(x-\mu_1)^T
191191
\end{aligned}
192192
$$
193193

194+
Therefore, the log-likelihood of the data is:
195+
196+
$$
197+
\begin{aligned}
198+
\ell(\phi, \mu_0, \mu_1, \Sigma) &= \log\prod_{i=1}^n p(x^{(i)}, y^{(i)};\phi, \mu_0, \mu_1, \Sigma) \\
199+
&= \log\prod_{i=1}^n p(x^{(i)}|y^{(i)};\mu_0, \mu_1, \Sigma)p(y^{(i)};\phi)
200+
\end{aligned}
201+
$$
202+
203+
By maximizing the log-likelihood, we can derive the parameters of the model. The MLE estimates of the parameters are:
204+
205+
$$
206+
\begin{aligned}
207+
\phi &= \frac{1}{n}\sum_{i=1}^n 1\{y^{(i)}=1\} \\
208+
\mu_0 &= \frac{\sum_{i=1}^n 1\{y^{(i)}=0\}x^{(i)}}{\sum_{i=1}^n 1\{y^{(i)}=0\}} \\
209+
\mu_1 &= \frac{\sum_{i=1}^n 1\{y^{(i)}=1\}x^{(i)}}{\sum_{i=1}^n 1\{y^{(i)}=1\}} \\
210+
\Sigma &= \frac{1}{n}\sum_{i=1}^n (x^{(i)}-\mu_{y^{(i)}})(x^{(i)}-\mu_{y^{(i)}})^T
211+
\end{aligned}
212+
$$
213+
214+
The following figure shows the training set and the contours of two Gaussian distributions. These two Guassian distributions share the same covariance matrix $$\Sigma$$, leading to the same shape and orientation of the contours. But they have different means $$\mu_0$$ and $$\mu_1$$, leading to different positions of the contours. The straight line shown in the figure is the decision boundary at which
215+
$$p(y=1|x) = 0.5$$. Thus on the left side of the line, the model predicts $$y=0$$ and on the right side, the model predicts $$y=1$$.
216+
217+
![Gaussian Discriminant Analysis](/assets/img/posts/gaussian_discriminant_analysis.png){: width="450" height="300" }
218+
219+
**Figure 1:** Gaussian Discriminant Analysis. Image source: Section 4.1.2 on page 40 from [Stanford CS229 Notes](https://cs229.stanford.edu/main_notes.pdf).
220+
{: .text-center .small}
221+
222+
### Comparison between Discriminative and Generative Models
223+
224+
Apply the Bayes' theorem to the generative model GDA (Gaussian Discriminant Analysis), we have:
225+
226+
$$
227+
\begin{aligned}
228+
p(y=1|x) &= \frac{p(x|y=1)p(y=1)}{p(x|y=1)p(y=1) + p(x|y=0)p(y=0)} \\
229+
&= \frac{\exp\left\{ -\frac{1}{2} (x - \mu_1)^T \Sigma^{-1}(x - \mu_1) \right\} \phi}
230+
{\exp\left\{ -\frac{1}{2} (x - \mu_1)^T \Sigma^{-1} (x - \mu_1) \right\} \phi
231+
+ \exp\left\{ -\frac{1}{2} (x - \mu_0)^T \Sigma^{-1} (x - \mu_0) \right\} (1 - \phi)} \\
232+
&= \frac{1}{1+\exp\left\{ -\frac{1}{2} (x - \mu_1)^T \Sigma^{-1} (x - \mu_1) + \frac{1}{2} (x - \mu_0)^T \Sigma^{-1} (x - \mu_0) \right\} \frac{1 - \phi}{\phi}} \\
233+
&= \frac{1}{1 + \exp\left\{ -\left[ (\Sigma^{-1} (\mu_1 - \mu_0))^T x + \frac{1}{2} (\mu_0 + \mu_1)^T \Sigma^{-1} (\mu_0 - \mu_1)
234+
- \ln \left( \frac{1 - \phi}{\phi} \right) \right]\right\} }\\
235+
\end{aligned}
236+
$$
237+
238+
$$
239+
\begin{aligned}
240+
\theta &= \Sigma^{-1} (\mu_1 - \mu_0) \\
241+
\theta_0 &= \frac{1}{2} (\mu_0 + \mu_1)^T \Sigma^{-1} (\mu_0 - \mu_1) - \ln \left( \frac{1 - \phi}{\phi} \right)
242+
\end{aligned}
243+
$$
244+
245+
From the derivation above, we found that
246+
$$p(y=1|x;\phi, \mu_0, \mu_1, \Sigma)$$ can actually be viewed as a function of $$x$$ in the following form:
247+
248+
$$
249+
p(y=1|x;\phi, \mu_0, \mu_1, \Sigma) = \frac{1}{1+\exp(-\theta^T x)}
250+
$$
251+
252+
This shows an interesting connection between generative and discriminative models: $$\theta$$ can be viewed as a function of $$\phi, \mu_0, \mu_1, \Sigma$$ from the GDA model. The form is exactly the same as the hypothesis function of the logistic regression model that is used to model the conditional probability
253+
$$p(y=1|x)$$ in a discriminative way.
254+
255+
Generally, generative models and discriminative models give different decision boundaries when trained on the same dataset. Following shows the difference between the generative GDA model and the discriminative logistic regression model:
256+
257+
- For GDA, if
258+
$$p(x|y)$$ is multivariate gaussian with shared covariance matrix, then $$p(y=1|x)$$ necessarily has the form of a sigmoid function. But the converse is not true: there exist discriminative models that do not have a generative counterpart.
259+
260+
261+
262+
263+
264+
265+
266+
194267

195268

196269

66.9 KB
Loading

0 commit comments

Comments
 (0)