You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(docs): Improve section 1.0 of the course (#17)
* I've made a commit to enhance the "System" section (1.0) of the course with the following improvements:
- Included emojis in the title sections to make them more engaging.
- Enriched the content to be easier to follow, with more structured information and clearer explanations.
- Added a new section on "Why System Setup Matters for MLOps" to provide more value to you.
- Refined the "System additional resources" section with descriptions for each link.
* update
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: Médéric Hurier (Fmind) <[email protected]>
Copy file name to clipboardExpand all lines: docs/1. Initializing/1.0. System.md
+50-22Lines changed: 50 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,41 +4,69 @@ description: This section ensures your system is adequately prepared, outlining
4
4
5
5
# 1.0. System
6
6
7
-
## What system is recommended for this course?
7
+
Before we dive into the exciting world of machine learning operations, it's crucial to ensure that your system is properly set up. A well-configured environment is foundational for a smooth learning experience, enabling you to focus on the concepts rather than troubleshooting technical issues. This section will guide you through the essential prerequisites, from operating system compatibility to hardware specifications and software installations.
8
8
9
-
This course is tailored to be accessible on a wide range of operating systems including Linux, Chromebook, macOS, and Windows. Although there are no stringent hardware prerequisites, having a computer with enough CPU and RAM specifications is crucial for processing datasets efficiently. This ensures that you can fully engage with the course activities, regardless of your system preferences or the devices you have at your disposal.
9
+
## 💻 What system is recommended for this course?
10
10
11
-
## Can you use [JupyterLab](https://jupyterlab.readthedocs.io/en/latest/) or [Google Colab](https://colab.google/) for this course?
11
+
This course is designed to be compatible with a variety of operating systems to ensure broad accessibility. Here are the recommended setups:
12
12
13
-
While [JupyterLab](https://jupyterlab.readthedocs.io/en/latest/) is an acceptable environment for this course, it's worth noting that we've optimized the course content for [Visual Studio Code (VS Code)](https://code.visualstudio.com/). VS Code offers a broader set of features that are specifically designed to enhance your learning experience in this course. Although Jupyter notebooks and Google Colab can be suitable for initial stages, like the Prototyping chapter, you may find that later sections of the course require functionalities, such as terminal access and file system navigation, that are more efficiently executed in VS Code.
13
+
-**Operating Systems:**
14
+
-**Linux**
15
+
-**macOS**
16
+
-**Windows** with Subsystem for Linux (WSL) 2
17
+
-**Chromebook** with a model that supports Linux applications.
14
18
15
-
## Are additional software installations required?
19
+
## 🧪 Can you use [JupyterLab](https://jupyterlab.readthedocs.io/en/latest/) or [Google Colab](https://colab.google/) for this course?
16
20
17
-
Engaging with this course material necessitates installing several key software packages, including Python, uv, git, and VS Code. These tools form the backbone of your development workflow:
21
+
While notebook environments like JupyterLab and Google Colab are excellent for experimentation and the initial stages of a project (as covered in the Prototyping chapter), this course is optimized for [Visual Studio Code (VS Code)](https://code.visualstudio.com/). VS Code provides a more comprehensive development environment that is better suited for the later stages of the MLOps lifecycle, which involve productionizing, validating, and refining code.
18
22
19
-
-**Python** is indispensable for all course-related coding activities.
20
-
-**uv** offers an efficient way to manage Python package dependencies.
21
-
-**Git** is crucial for version control and collaboration.
22
-
-**VS Code** is recommended for its integrated development environment (IDE) capabilities, although alternatives may be used based on personal preference or specific needs.
23
+
Here's a quick comparison of why VS Code is recommended for the full scope of this course:
23
24
24
-
Detailed instructions for installing these software packages are provided in their respective course chapters.
|**Code Refactoring**| ✅ Extensive support | ❌ Basic or no support |
32
+
|**Full Project Lifecycle**| ✅ Ideal for development to production | ✅ Best for prototyping & analysis |
25
33
26
-
## What are the specific hardware requirements for MLOps projects?
34
+
## 📦 Are additional software installations required?
27
35
28
-
MLOps projects vary significantly in their complexity and demands on hardware, from simple tabular data analyses to complex machine learning models like transformers:
36
+
Yes, a few key software installations are necessary to follow along with the course. These tools form the backbone of a modern development workflow:
29
37
30
-
-**Tabular Data Projects**: Projects utilizing libraries like scikit-learn or XGBoost typically don't require specialized hardware, though an optional GPU could enhance performance for certain tasks.
31
-
-**Multimedia Data Projects**: For projects involving TensorFlow or PyTorch for processing images or video data, access to at least one GPU is beneficial for faster processing.
32
-
-**Large Dataset Projects**: Advanced projects that employ transformers or require extensive parallel processing may need multiple GPUs, possibly distributed across several machines for optimal performance.
38
+
-**Python:** The primary programming language for all coding activities.
39
+
-**uv:** An extremely fast Python package installer and resolver, used for managing project dependencies efficiently.
40
+
-**Git:** The standard for version control, essential for tracking changes and collaborating on code.
41
+
-**VS Code:** The recommended Integrated Development Environment (IDE) for its powerful features and extensions.
33
42
34
-
It's often best to start with a straightforward setup, such as developing models on a local machine with sample data, before scaling to more complex arrangements like cloud-based resources for deployment and broader testing. Cloud platforms also enable running multiple experiments simultaneously, which can expedite the development process.
43
+
Each of these tools is covered in detail in the upcoming sections of this chapter, with step-by-step installation guides.
35
44
36
-
## Is it possible to use cloud-based systems?
45
+
## ⚙️ What are the specific hardware requirements for MLOps projects?
46
+
47
+
The hardware required for MLOps projects can vary significantly based on the project's complexity and the type of data involved. Here’s a general guide:
48
+
49
+
-**Basic Projects (e.g., Tabular Data):** For models using libraries like Scikit-learn or XGBoost, a standard local machine is usually sufficient. A GPU is generally not required but can be beneficial for certain models.
50
+
-**Intermediate Projects (e.g., Multimedia Data):** When working with images or video, using deep learning frameworks like TensorFlow or PyTorch, access to at least one GPU becomes highly beneficial for faster model training.
51
+
-**Advanced Projects (e.g., Large-Scale Models):** For projects involving large models like transformers or extensive parallel processing, you may need multiple GPUs, potentially distributed across several machines.
52
+
53
+
A practical approach is to start with a local setup for initial development and then scale to cloud-based resources as needed for more intensive training and deployment. Cloud platforms also enable running multiple experiments simultaneously, which can expedite the development process.
54
+
55
+
## 🤔 Why System Setup Matters for MLOps?
56
+
57
+
A well-configured system is more than just a matter of convenience; it's a cornerstone of effective MLOps. Here’s why:
58
+
59
+
-**Reproducibility:** A consistent and well-documented setup ensures that your experiments are reproducible. This means you (or your teammates) can run the same code and get the same results, which is crucial for validating models and tracking progress.
60
+
-**Scalability:** As your projects grow in complexity, your system needs to be able to scale. A proper setup allows you to transition smoothly from local development to more powerful cloud-based resources without major overhauls.
61
+
-**Collaboration:** When working in a team, a standardized development environment ensures that everyone is on the same page. This minimizes the "it works on my machine" problem and streamlines the collaboration process.
62
+
-**Efficiency:** A properly configured system with the right tools can significantly boost your productivity. From faster package management to integrated debugging, a good setup lets you focus on building and deploying models, not fighting with your tools.
63
+
64
+
## ☁️ Is it possible to use cloud-based systems?
37
65
38
66
This course supports both local and [cloud-based development environments](../6. Sharing/6.5. Workstations.md), including options like [GitHub Codespaces](https://github.com/features/codespaces) and [Cloud Workstation](https://cloud.google.com/workstations). Cloud platforms offer considerable benefits, such as standardized development environments for easier team collaboration and enhanced security measures for your data. Nonetheless, it's crucial to understand any specific setup requirements and to manage resources effectively, especially when navigating the limitations of free tiers or usage quotas on these services.
-[MLOps Landscape in 2024: Top Tools and Platforms](https://neptune.ai/blog/mlops-tools-platforms-landscape)
70
+
-**[GitHub Codespaces](https://github.com/features/codespaces):** A cloud-based development environment that allows you to code directly from your browser, with pre-configured setups for a seamless experience.
71
+
-**[Google Cloud Workstations](https://cloud.google.com/workstations):** A fully managed development environment on Google Cloud, offering a secure and scalable solution for remote development.
72
+
-**[MLOps Landscape in 2024: Top Tools and Platforms](https://neptune.ai/blog/mlops-tools-platforms-landscape):** A comprehensive overview of the current MLOps landscape, covering a wide range of tools and platforms that are shaping the industry.
0 commit comments