Email Extractor is a simple Python automation script that scans a text file, extracts all email addresses, removes duplicate entries, and generates a detailed report.
This project was developed as part of the CodeAlpha Python Programming Internship to demonstrate the use of file handling, regular expressions, and basic data analysis in Python.
- Extracts email addresses from a text file
- Removes duplicate email addresses
- Saves extracted emails to a separate file
- Generates a report with statistics
- Displays the total number of emails found
- Provides domain-wise email distribution
- Python 3
- Regular Expressions (
re) - Collections Module (
Counter) - File Handling
CodeAlpha_EmailExtractor/
│
├── email_extractor.py
├── input.txt
├── extracted_emails.txt
├── report.txt
└── README.md
- The user provides a text file containing email addresses.
- The script scans the file using a regular expression.
- All email addresses are extracted.
- Duplicate emails are removed automatically.
- Extracted emails are saved to
extracted_emails.txt. - A summary report is generated in
report.txt.
Hello,
Contact us at support@gmail.com
For internship queries:
shivansh@gmail.com
team@yahoo.com
hr@outlook.com
support@gmail.com
hr@outlook.com
shivansh@gmail.com
support@gmail.com
team@yahoo.com
EMAIL EXTRACTION REPORT
==============================
Total Emails Found: 5
Unique Emails: 4
Domain Statistics:
gmail.com : 2
yahoo.com : 1
outlook.com : 1
git clone https://github.com/your-username/CodeAlpha_EmailExtractor.gitcd CodeAlpha_EmailExtractorpython email_extractor.pyExample:
input.txt
Through this project, I learned:
- File handling in Python
- Working with regular expressions
- Data extraction techniques
- Managing duplicate data using sets
- Generating reports automatically
- Basic automation scripting
- Support for PDF files
- Support for Word documents
- Export results to CSV format
- Email validation checks
- Graphical User Interface (GUI)
Shivansh Bajaj
Python Programming Intern – CodeAlpha