- Complete the problems using descriptive statistics and Python.
- Then, understand a new dataset.
- Use the knowledge learned to describe it statistically.
Follow the instructions below:
- Create a new repository by forking the Git project or by clicking here.
- Open the newly created repository in Codespace using the Codespace button extension.
- Once the Codespace VSCode has finished opening, start your project by following the instructions below.
-
Once you start working on the project, you will see a
./notebook/problems.ipynb
file containing a list of exercises. -
Before starting, make sure to select the appropriate Kernel.
- When you open the notebook, a message will appear at the top indicating "Select Kernel".
- Click on "Select Kernel" (as shown in the image).
-
A list with available options will be displayed. Select "Python Environments" and choose the Python version you want to use.
- Make sure to select the version specified in the
devcontainer.json
file, as this is the recommended one for the project.
- Make sure to select the version specified in the
Note: We also incorporated a
./notebook/solutions.ipynb
file that we strongly suggest you only use if you are stuck for more than 30 min or if you have already finished and want to compare it with your approach.
Once you have finished solving the exercises, be sure to commit your changes, push
to your repository and go to 4Geeks.com to upload the repository link.
You worked with a real dataset from the IMDb portal and applied descriptive statistics to analyze something uncommon: the length of movie titles. Through measures of central tendency, dispersion, and distribution shape, you discovered how cinema communicates even through its names.
Post an interesting insight from your results, such as how long the most common titles are, how dispersed they are, or which title is the longest. Accompany it with a chart.
"What can statistics tell us about movie titles?
I analyzed 1,000 real titles from the IMDb dataset and discovered this with Python:
• Average length: 23 characters
• Mode: 15 characters
• The longest title exceeds 60 charactersMost titles are short, but some extremely long ones create a clear positive skew. Cinema can also be analyzed with data! #DataScience #Python #Statistics #IMDb #Visualization #Storytelling"