liuzi
diff --git a/‎_config.yml‎
Lines changed: 1 addition & 1 deletion b/‎_config.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎_posts/2025-02-15-generative-models.md‎
Lines changed: 65 additions & 3 deletions b/‎_posts/2025-02-15-generative-models.md‎
Lines changed: 65 additions & 3 deletions
diff --git a/‎assets/images/liuzi-avatar.png‎ ‎assets/img/avatar/liuzi-avatar.png‎assets/images/liuzi-avatar.png renamed to assets/img/avatar/liuzi-avatar.png b/‎assets/images/liuzi-avatar.png‎ ‎assets/img/avatar/liuzi-avatar.png‎assets/images/liuzi-avatar.png renamed to assets/img/avatar/liuzi-avatar.png
diff --git a/‎assets/img/posts/sigmoid.png‎
36.5 KB b/‎assets/img/posts/sigmoid.png‎
36.5 KB
@@ -97,7 +97,7 @@ theme_mode: # [light | dark]
 cdn:
 
 # the avatar on sidebar, support local or CORS resources
-avatar: /assets/images/liuzi-avatar.png
+avatar: /assets/img/avatar/liuzi-avatar.png
 
 # The URL of the site-wide social preview image used in SEO `og:image` meta tag.
 # It can be overridden by a customized `page.image` in front matter.
 
@@ -2,6 +2,8 @@
 layout: post
 title: Generative Models
 date: 2025-02-15 21:10 +0800
+categories: [Fundamentals]
+tags: [generative models, discriminative models, bayes' theorem]
 pin: true
 math: true
 mermaid: true
@@ -15,10 +17,70 @@ When discussing **generative models**, it's essential to understand how machine
 1. **Discriminative Modeling:** This approach involves building a model that directly predicts classification labels or identifies the decision boundary between elephants and dogs.
 2. **Generative Modeling:** This approach entails constructing separate models for elephants and dogs, capturing their respective characteristics. A new animal is then compared against each model to determine which it resembles more closely.
 
-In discriminative modeling, the focus is on learning the conditional probability of labels given the input data, denoted as $$ p(y\mid{x}) $$. Techniques like logistic regression exemplify this by modeling the probability of a label based on input features. Alternatively, methods such as the perceptron algorithm aim to find a decision boundary that maps new observations to specific labels $$\{0,1\}$$, such as 0 for dogs and 1 for elephants.
+In discriminative modeling, the focus is on learning the conditional probability of labels given the input data, denoted as 
+$$ p(y|{x}) $$. Techniques like logistic regression exemplify this by modeling the probability of a label based on input features. Alternatively, methods such as the perceptron algorithm aim to find a decision boundary that maps new observations to specific labels $$\{0,1\}$$, such as $$0$$ for dogs and $$1$$ for elephants.
+
+Conversely, generative modeling focuses on understanding how the data is generated by learning the joint probability distribution $$p(x,y)$$ or the likelihood 
+$$p(x|{y})$$ along with the prior probability $$p(y)$$. This approach models the distribution of the input data for each class, enabling the generation of new data points and facilitating classification by applying Bayes' theorem to compute the posterior probability:
+
+$$ 
+% p(y\mid{x})=p(x\mid{y})p(y)/p(x)
+p(y|x) = \frac{p(x|y)p(y)}{p(x)}
+$$
+
+The denominator $$p(x)$$ is the marginal probability that sums the joint probability $$p(x,y)$$ over all possible labels $$y$$:
+
+$$
+\begin{aligned}
+p(x) &= \sum_{y} p(x,y) \\
+     &= \sum_{y} p(x|y)p(y) \\
+     &= p(x|y=0)p(y=0) + p(x|y=1)p(y=1)
+\end{aligned}
+$$
+
+Actually $$p(x)$$ acts as a normalization constant as it does not depend on the label $$y$$. To be more specific, $$p(x)$$ does not change no matter how $$y$$ varies. So when calculating 
+$$p(y|x)$$, we do not need to compute $$p(x)$$:
+
+$$
+\begin{aligned}
+\arg\max_y p(y|x) &= \arg\max_y \frac{p(x|y)p(y)}{p(x)}\\
+&= \arg\max_y p(x|y)p(y) \\
+\end{aligned}
+$$
+
+Let's consider a new binary classification of emails as spam or not spam. $$ x^{(i)} $$ is the feature vector of the $$i$$-th email, and $$ y^{(i)} $$ is the label indicating whether the email is spam ($$1$$) or not spam ($$0$$). Following examples show how discriminative and generative models approach the same problem differently.
+
+### Example 1: Logistic Regression as a Discriminative Model
+
+Since the label $$y$$ can only take on values $$0$$ or $$1$$, it makes sense to choose a hypothesis $$h_{\theta}(x)$$ that ranges in $$[0,1]$$ to represent the probability of 
+$$p(y=1|x)$$. Then we can set the threshold of $$h_{\theta}(x)$$ to be $$0.5$$ to predict whether a email is spam or not spam. Logistic function fits this case well as it ranges in $$[0,1]$$ for $$z\in(-\infty, +\infty)$$:
+
+$$
+h_{\theta}(x) = g(\theta^T x) = \frac{1}{1 + e^{-\theta^T x}}
+$$
+
+where 
+
+$$
+g(z) = \frac{1}{1 + e^{-z}} 
+$$ 
+
+is called the logistic function or sigmoid function. Below is a plot of the sigmoid function:
+
+![Sigmoid Function](/assets/img/posts/sigmoid.png){: width="300" height="250" }
+
+From the plot, we can see $$g(z)$$ tends to $$0$$ as $$z\to-\infty$$ and tends to $$1$$ as $$z\to+\infty$$. When $$z=0$$, $$g(z)=0.5$$. $$g(z)$$ or $$h_{\theta}(x)$$ is always bounded between $$0$$ and $$1$$. To keep the convention of letting $$x_0=1$$, we can rewrite the variables in the hypothesis as $$z = \theta^T x = \theta_0 + \sum_{j=1}^n \theta_i x_j$$, where $$\theta_0$$ is the bias term and $$\theta_j$$ is the weight of the $$j$$-th feature $$x_j$$. Please note that other functions that smoothly and monotonically increase from $$0$$ to $$1$$ can be also considered for $$h_{\theta}(x)$$.
+
+<!-- to check:notes page22 -->
+
+### Example 2: Gaussian Discriminant Analysis as a Generative Model
+
+
+
+
+
+
 
-Conversely, generative modeling focuses on understanding how the data is generated by learning the joint probability distribution 
-$$p(x,y)$$ or the likelihood $$p(x\mid{y})$$ along with the prior probability $$p(y)$$. This approach models the distribution of the input data for each class, enabling the generation of new data points and facilitating classification by applying Bayes' theorem to compute the posterior probability $$ p(y\mid{x})=p(x\mid{y})p(y)/p(x) $$.