diff --git a/docs/1. Initializing/1.4. git.md b/docs/1. Initializing/1.4. git.md index 79db059e..91aaf2dc 100644 --- a/docs/1. Initializing/1.4. git.md +++ b/docs/1. Initializing/1.4. git.md @@ -2,18 +2,18 @@ description: Learn how to use Git for version control and collaboration in MLOps projects, enabling you to track changes, revert to previous versions, and work with others effectively. --- -# 1.4. Git +# 1.4. Git 🌿 -## What is Git? +## What is Git? πŸ€” [Git](https://git-scm.com/) is a distributed version control system that is integral for managing both small and large projects effectively. It excels in tracking source code changes during software development, enabling multiple developers to collaborate on the same project without conflicts. Git is highly regarded for its robust data integrity, versatility, and support for complex, nonlinear development workflows.
- Git from XKCD + Git from XKCD
Git (source)
-## Why do you need Git? +## Why is Git Essential? 🎯 Git serves several critical purposes in software development: @@ -22,41 +22,123 @@ Git serves several critical purposes in software development: - **Backup and Restore**: With changes stored in a repository, Git acts as a backup mechanism. You can revert your project to a prior state or retrieve lost data as needed. - **Branching and Merging**: Git enables you to create branches for experimenting or developing new features independently of the main project, which can later be merged back into the mainline without disrupting the ongoing development. -## How can you install Git? +## Installation & Setup πŸ› οΈ -To install Git, consult the [Git Installation Guide](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git), which offers comprehensive instructions for a variety of operating systems. This ensures you can efficiently set up Git on any development environment. +### Installation + +To install Git, consult the [Git Installation Guide](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git), which offers comprehensive instructions for a variety of operating systems. ```bash -# installation on mac with brew +# Installation on macOS with Homebrew brew install git -# install on linux with apt +# Installation on Debian/Ubuntu with APT +sudo apt update sudo apt install git ``` -## How should you use Git for your project? +### Initial Configuration + +Before you start using Git, you need to configure it with your name and email address. This is important because every Git commit uses this information. + +```bash +git config --global user.name "Your Name" +git config --global user.email "youremail@example.com" +``` + +You can verify your configuration with: +```bash +git config --list +``` + +## The Git Workflow: A Practical Guide πŸš€ + +Understanding the basic Git workflow is key to using it effectively. It consists of three main states: + +1. **Working Directory**: This is your local project folder with all its files. +2. **Staging Area (Index)**: This is an intermediate area where you can build up your next commit. It allows you to choose which changes you want to include. +3. **Repository (`.git` directory)**: This is where Git stores the project's history and metadata. A commit takes the files from the staging area and stores that snapshot permanently to your repository. + +Here’s a step-by-step guide to a common workflow: + +1. **Initialize a Repository**: In your project directory, run this command to create a new local Git repository. + ```bash + git init + ``` + +2. **Check the Status**: Use `git status` to see the state of your working directory and staging area. It shows which files are new, modified, or staged. + ```bash + git status + ``` + +3. **Stage Files**: Add files to the staging area to prepare them for a commit. You can add one file or all of them. + ```bash + # Stage a specific file + git add README.md -For Git beginners, starting with a foundational tutorial, such as the one provided by [GitHub's Git Tutorial](https://docs.github.com/en/get-started/using-git/about-git), is recommended. Here's a simplified guide to using Git in your project: + # Stage all new and modified files + git add . + ``` -1. **Initialize Git**: In your project directory, execute `git init` to create a new local Git repository. -2. **Stage Files**: Stage files for your next commit with `git add `, for instance, `git add README.md LICENSE.txt`. -3. **Check Status**: Use `git status` to view staged changes, unstaged changes, and untracked files. -4. **Commit Changes**: With `git commit -m "Initial Commit"`, commit your staged changes to the repository, including a descriptive message about the changes. +4. **Commit Changes**: Commit the staged files to your repository. The commit message should be a short, descriptive summary of the changes. + ```bash + git commit -m "feat: Add initial project files" + ``` -## Should you commit every file in your project? +5. **View History**: Use `git log` to see the history of commits in your repository. + ```bash + git log + ``` -When using Git, it's important to selectively track files. Consider the following guidelines: +## Branching: Working in Parallel 🌿 -- **Exclude Secrets**: Sensitive data, such as API keys and passwords, should never be committed to your repository. -- **Manage Large Files**: For files exceeding 100MB (e.g., dataset files), use Git Large File Storage ([git-lfs](https://git-lfs.github.com/)) instead of directly committing them to your Git repository. -- **Omit Cache Files**: Do not track temporary or environment-specific files (e.g., `.venv`, `mlruns`, log files) that don't contribute to the project's primary function. +Branches allow you to work on different features or experiments without affecting the main codebase (often called the `main` or `master` branch). -To exclude certain files and directories from being tracked, create a `.gitignore` file in your project's root directory. This file should list patterns to match filenames you wish to exclude, for example: +- **Create a new branch and switch to it**: + ```bash + git checkout -b new-feature-branch + ``` +- **Switch back to the main branch**: + ```bash + git checkout main + ``` +- **Merge the new feature branch into main**: + ```bash + git merge new-feature-branch + ``` + +## Connecting to the World: Remote Repositories 🌐 + +So far, everything has been on your local machine. To collaborate or to have a backup of your code, you use remote repositories (like on GitHub). + +1. **Add a remote repository**: + ```bash + git remote add origin + ``` +2. **Push your changes to the remote**: + ```bash + git push -u origin main + ``` +The `-u` flag sets the upstream branch, so next time you can just run `git push`. + +## What Not to Commit: The `.gitignore` File 🚫 + +You don't want to commit every file to your repository. A `.gitignore` file tells Git which files or folders to ignore. + +- **Exclude Secrets**: Sensitive data like API keys and passwords should never be committed. +- **Manage Large Files**: Use [Git Large File Storage (git-lfs)](https://git-lfs.github.com/) for files over 100MB (like datasets). +- **Omit Temporary Files**: Ignore environment-specific files (`.venv`), cache (`.pytest_cache`), logs, and build artifacts (`/dist`). + +Here is an example `.gitignore` file: ```text # https://git-scm.com/docs/gitignore -# Build +# Environments +.env +.venv/ + +# Build artifacts /dist/ /build/ @@ -66,30 +148,32 @@ To exclude certain files and directories from being tracked, create a `.gitignor .mypy_cache/ .ruff_cache/ .pytest_cache/ +__pycache__/ -# Editor -/.idea/ -/.vscode/ +# Editor-specific +.vscode/ +.idea/ .ipynb_checkpoints/ -# Environs -.env -/.venv/ - -# Project -/docs/* +# Project-specific /mlruns/* /outputs/* !**/.gitkeep -# Python +# Python files *.py[cod] -__pycache__/ ``` +Adhering to these practices ensures your repository remains streamlined and secure. + +## πŸ”‘ Key Takeaways -Adhering to these practices ensures your repository remains streamlined, containing only pertinent project files and thus enhancing the clarity and efficiency of your development process. +- **Version Control is Key**: Git is indispensable for tracking changes, collaborating, and maintaining project history. +- **Understand the Workflow**: Master the cycle of modifying, staging, and committing changes. +- **Branching for Parallel Development**: Use branches to isolate new features or experiments without disrupting the main codebase. +- **Remote Repositories for Collaboration and Backup**: Push your local changes to platforms like GitHub for sharing and safekeeping. +- **`.gitignore` for Clean Repositories**: Properly configure `.gitignore` to exclude unnecessary or sensitive files from your repository. -## Git additional resources +## Additional Resources πŸ“š - **[`.gitignore` example from the MLOps Python Package](https://github.com/fmind/mlops-python-package/blob/main/.gitignore)** - [About Git](https://docs.github.com/en/get-started/using-git/about-git)