You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2025-02-15-generative-models.md
+42-6Lines changed: 42 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,11 +14,11 @@ mermaid: true
14
14
15
15
When discussing **generative models**, it's essential to understand how machine learning approaches tasks. Consider a scenario where we aim to distinguish between elephants and dogs, there are primarily two modeling approaches: discriminative and generative:
16
16
17
-
1.**Discriminative Modeling:** This approach involves building a model that directly predicts classification labels or identifies the decision boundary between elephants and dogs.
18
-
2.**Generative Modeling:** This approach entails constructing separate models for elephants and dogs, capturing their respective characteristics. A new animal is then compared against each model to determine which it resembles more closely.
17
+
1.**Discriminative Modeling:** This approach involves building a model that directly predicts **classification labels** or identifies the **decision boundary** between elephants and dogs.
18
+
2.**Generative Modeling:** This approach entails constructing separate models for elephants and dogs, capturing their **respective characteristics**. A new animal is then compared against each model to determine which it resembles more closely.
19
19
20
20
In discriminative modeling, the focus is on learning the conditional probability of labels given the input data, denoted as
21
-
$$p(y|{x})$$. Techniques like logistic regression exemplify this by modeling the probability of a label based on input features. Alternatively, methods such as the perceptron algorithm aim to find a decision boundary that maps new observations to specific labels $$\{0,1\}$$, such as $$0$$ for dogs and $$1$$ for elephants.
21
+
$$p(y|{x})$$. Techniques like logistic regression exemplify this by modeling the probability of a label based on input features. Alternatively, methods such as the perceptron algorithm aim to find a decision boundary that maps new observations to specific labels $$\{0,1\}$$, such as $$0$$ for dogs and $$1$$ for elephants.
22
22
23
23
Conversely, generative modeling focuses on understanding how the data is generated by learning the joint probability distribution $$p(x,y)$$ or the likelihood
24
24
$$p(x|{y})$$ along with the prior probability $$p(y)$$. This approach models the distribution of the input data for each class, enabling the generation of new data points and facilitating classification by applying Bayes' theorem to compute the posterior probability:
@@ -37,7 +37,7 @@ p(x) &= \sum_{y} p(x,y) \\
37
37
\end{aligned}
38
38
$$
39
39
40
-
Actually $$p(x)$$ acts as a normalization constant as it does not depend on the label $$y$$. To be more specific, $$p(x)$$ does not change no matter how $$y$$ varies. So when calculating
40
+
Actually $$p(x)$$ acts as a **normalization constant** as it does not depend on the label $$y$$. To be more specific, $$p(x)$$ does not change no matter how $$y$$ varies. So when calculating
41
41
$$p(y|x)$$, we do not need to compute $$p(x)$$:
42
42
43
43
$$
@@ -144,7 +144,7 @@ $$
144
144
\end{aligned}
145
145
$$
146
146
147
-
Therefore gives us the stochastic gradient ascent rule, where $$(y^{(i)}-h_{\theta}(x^{(i)}))x_j^{(i)}$$ is the gradient of the loss function with respect to the $$i$$-th training example:
147
+
Therefore gives us the stochastic gradient ascent rule, where $$(y^{(i)}-h_{\theta}(x^{(i)}))x_j^{(i)}$$ is the gradient of the log-likelihood function with respect to the $$i$$-th training example:
### Example 2: Gaussian Discriminant Analysis as a Generative Model
157
157
158
+
Let's say the feature vector $$x$$ of an email is using TF-IDF[[2]](#references) that measures the importance of words in the email. TF-IDF (Term Frequency-Inverse Document Frequency) is a numerical statistic that reflects the importance of a word in a document relative to a collection of documents (corpus). It is calculated as:
[2] Manning, Christopher D., et al. "[Introduction to Information Retrieval](https://nlp.stanford.edu/IR-book/pdf/irbookprint.pdf)". Stanford University, 2009.
0 commit comments