The Telegram Snowball Sampling Tool is a Python-based utility designed for conducting comprehensive network analysis of Telegram channels through three main methods:
- Forwarded Messages - Automatically discovers channels through message forwards
- Channel Recommendations - Collects Telegram's built-in channel recommendations
- URL Extraction - Maps external connections by extracting URLs from messages
The tool creates detailed edge lists for network visualization and provides extensive analysis capabilities.
This tool implements multiple discovery methods to map the complex network structure of Telegram channels:
Snowball sampling discovers channels through forwarded messages, starting with a seed channel and expanding outward. This method identifies both the origin and dissemination paths of information, creating a directed network structure.
The tool leverages Telegram's built-in recommendation algorithm to discover topically related channels. This provides additional network insights beyond just forward relationships.
By capturing external URLs shared in messages, the tool maps connections between Telegram channels and external websites, providing a more comprehensive view of the information ecosystem.
The Telegram Snowball Sampling Tool can take several days to complete its run due to the exponential nature of the sampling process. Each iteration potentially adds a new set of channels, growing exponentially (e.g., 3 channels in the first iteration can lead to 9 in the second and 27 in the third).
- Limit Iterations: Keep to 3 iterations or fewer to balance depth and runtime
- Filter Forwards: Focus on channels with multiple mentions to target relevant content
- Limit Posts Per Channel: Set a reasonable maximum for posts to check per channel
- Adjust Feature Settings: Selectively enable/disable recommendations and URL extraction based on your needs
- Automated discovery of Telegram channels through three methods:
- Forwarded message tracking
- Channel recommendations retrieval
- URL extraction from messages
- Customizable parameters for depth, frequency thresholds, and scope
- Comprehensive edge list creation for network analysis
- Network visualization ready output for tools like Gephi
- Network metrics calculation and analysis
- Environment-based configuration system
- Detailed logging for monitoring progress
telegram-snowball-sampling/
├── src/
│ └── telegram_snowball_sampling/
│ ├── __init__.py # Package exports
│ ├── config.py # Configuration manager
│ ├── edge_list.py # Handles edge list creation
│ ├── merge_csv_data.py # CSV merging utility
│ ├── recommendations.py # Channel recommendations module
│ └── utils.py # Utility functions
├── example_config.env # Template environment variables
├── .env # Your environment variables (created from example_config.env)
├── main.py # Main application script
├── network_analysis.py # Network analysis script
├── README.md # Project documentation
├── requirements.txt # Python dependencies
├── EdgeList/ # Created during execution - edge list files
├── merged/ # Created during execution - merged results
├── network_analysis/ # Created during analysis - network metrics
└── results/ # Created during execution - individual run results
- Python 3.10 or higher
- Telethon library
- NetworkX and Matplotlib libraries for analysis and visualization
- A registered Telegram application (for API credentials)
- All dependencies listed in requirements.txt
- Clone the repository:
git clone https://github.com/yourusername/telegram-snowball-sampling.git
cd telegram-snowball-sampling
- Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install the required dependencies:
pip install -r requirements.txt
The tool automatically creates a .env
file from the template and will prompt you for your Telegram API credentials when first run. You can also manually configure the following options in the .env
file:
Variable | Description | Default |
---|---|---|
TELEGRAM_API_ID | Your Telegram API ID | (required) |
TELEGRAM_API_HASH | Your Telegram API Hash | (required) |
TELEGRAM_SESSION_NAME | Name for the Telegram session | session_name |
DEFAULT_MIN_MENTIONS | Minimum mentions threshold | 5 |
DEFAULT_ITERATIONS | Number of iterations | 3 |
DEFAULT_MAX_POSTS | Maximum posts to check per channel | 100 |
DEFAULT_INCLUDE_RECOMMENDATIONS | Whether to include channel recommendations | True |
DEFAULT_RECOMMENDATIONS_DEPTH | Maximum depth for recommendations | 2 |
DEFAULT_INCLUDE_URLS | Whether to extract URLs from messages | True |
RESULTS_FOLDER | Directory for storing results | results |
MERGED_FOLDER | Directory for merged results | merged |
EDGE_LIST_FOLDER | Directory for edge list files | EdgeList |
EDGE_LIST_FILENAME | Name of the edge list file | Edge_List.csv |
MERGED_FILENAME | Name of the merged file | merged_channels.csv |
DEBUG | Enable debug logging | False |
Run the main script:
python main.py
The script will:
- Prompt for Telegram API credentials if not configured
- Ask for seed channels (comma-separated)
- Request parameters for iterations, minimum mentions, etc.
- Begin the data collection process using all enabled methods
- Save results to CSV and edge list files
- Offer to run network analysis on the collected data
Analyzes messages in each channel to find forwards from other channels. This reveals information flow between channels.
Retrieves Telegram's own channel recommendations for each discovered channel. These recommendations are based on Telegram's algorithm which considers content similarity and user overlap.
Extracts all URLs shared in messages across channels, creating connections between Telegram channels and external websites.
The tool generates several outputs:
-
Individual Run Results (in the
results
folder):- CSV files containing channel IDs, names, and usernames
- URL lists from message content
-
Edge List (in the
EdgeList
folder):- CSV file with network connections, including:
- Forward relationships
- Recommendation relationships
- URL connections
- Connection types and weights for advanced analysis
- CSV file with network connections, including:
-
Merged Results (in the
merged
folder):- Consolidated CSV with all unique channels found across multiple runs
-
Network Analysis (in the
network_analysis
folder, when analysis is run):- Network metrics in Excel format
- Gephi-compatible GEXF file for visualization
- Basic network visualization image
The included network analysis script (network_analysis.py
) provides:
-
Basic Network Metrics:
- Node and edge counts
- Network density
- Connected components
- Average path length
-
Key Influencer Identification:
- Top source channels (with most outgoing connections)
- Top receiver channels (with most incoming connections)
-
Connection Type Analysis:
- Distribution of connection types (forwards vs. recommendations vs. URLs)
- Weight distribution analysis
-
Visualization:
- Gephi-compatible GEXF file
- Basic visualization image
- Network metrics in Excel format
Run network analysis separately:
python network_analysis.py --edge-list EdgeList/Edge_List.csv --output-dir network_analysis
For advanced network visualization:
- Download and install Gephi
- Import the GEXF file from the network_analysis folder
- Apply layouts like ForceAtlas2 to organize the network
- Style nodes based on metrics like degree or betweenness
- Run community detection algorithms to identify clusters
A detailed guide is created in the results folder after each run.
This tool is for educational and research purposes only. Please ensure that you comply with Telegram's terms of service and respect privacy and ethical guidelines when using this tool.
Contributions are welcome! Please feel free to submit a Pull Request.
- Add language detection for message content filtering
- Implement community detection algorithms
- Add multi-API parallel processing for improved performance
- Create live network visualization capabilities