Skip to content

Shivanshbajaj1/CodeAlpha_EmailExtractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

📧 Email Extractor

Overview

Email Extractor is a simple Python automation script that scans a text file, extracts all email addresses, removes duplicate entries, and generates a detailed report.

This project was developed as part of the CodeAlpha Python Programming Internship to demonstrate the use of file handling, regular expressions, and basic data analysis in Python.


Features

  • Extracts email addresses from a text file
  • Removes duplicate email addresses
  • Saves extracted emails to a separate file
  • Generates a report with statistics
  • Displays the total number of emails found
  • Provides domain-wise email distribution

Technologies Used

  • Python 3
  • Regular Expressions (re)
  • Collections Module (Counter)
  • File Handling

Project Structure

CodeAlpha_EmailExtractor/
│
├── email_extractor.py
├── input.txt
├── extracted_emails.txt
├── report.txt
└── README.md

How It Works

  1. The user provides a text file containing email addresses.
  2. The script scans the file using a regular expression.
  3. All email addresses are extracted.
  4. Duplicate emails are removed automatically.
  5. Extracted emails are saved to extracted_emails.txt.
  6. A summary report is generated in report.txt.

Sample Input

Hello,

Contact us at support@gmail.com

For internship queries:
shivansh@gmail.com
team@yahoo.com
hr@outlook.com
support@gmail.com

Sample Output

extracted_emails.txt

hr@outlook.com
shivansh@gmail.com
support@gmail.com
team@yahoo.com

report.txt

EMAIL EXTRACTION REPORT
==============================

Total Emails Found: 5
Unique Emails: 4

Domain Statistics:
gmail.com : 2
yahoo.com : 1
outlook.com : 1

Installation & Usage

Step 1: Clone the Repository

git clone https://github.com/your-username/CodeAlpha_EmailExtractor.git

Step 2: Navigate to the Project Folder

cd CodeAlpha_EmailExtractor

Step 3: Run the Script

python email_extractor.py

Step 4: Enter the Input File Name

Example:

input.txt

Learning Outcomes

Through this project, I learned:

  • File handling in Python
  • Working with regular expressions
  • Data extraction techniques
  • Managing duplicate data using sets
  • Generating reports automatically
  • Basic automation scripting

Future Improvements

  • Support for PDF files
  • Support for Word documents
  • Export results to CSV format
  • Email validation checks
  • Graphical User Interface (GUI)

Author

Shivansh Bajaj

Python Programming Intern – CodeAlpha

About

A Python automation tool that extracts email addresses from text files, removes duplicates, and generates a detailed report.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages