A powerful data extraction tool for collecting in-depth user posts and engagement data from Xueqiu, China’s leading financial social network. It helps analysts and researchers turn real investor discussions into structured datasets for sentiment, trend, and market behavior analysis.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for xueqiu-user-posts-scraper you've just found your team — Let’s Chat. 👆👆
This project extracts detailed post data from Xueqiu user profiles, focusing on investment-related discussions, engagement metrics, and financial context. It solves the challenge of manually collecting large-scale investor sentiment data and is built for analysts, researchers, and quantitative teams working with Chinese market insights.
- Targets individual Xueqiu user timelines with configurable limits
- Captures rich engagement, metadata, and financial references
- Designed for scalable, repeatable data collection
- Supports downstream analytics such as sentiment and correlation studies
| Feature | Description |
|---|---|
| User Profile Targeting | Scrape posts from specific Xueqiu user profiles. |
| Rich Post Metadata | Extracts titles, full text, timestamps, and post types. |
| Engagement Metrics | Collects likes, favorites, comments, and repost counts. |
| Financial Context Mapping | Identifies referenced stocks and symbols. |
| Scalable Collection | Handles multiple profiles with configurable limits. |
| Field Name | Field Description |
|---|---|
| id | Unique post identifier. |
| user_id | Author identifier for behavioral analysis. |
| title | Post headline or subject. |
| text | Full post content. |
| created_at | Publication timestamp. |
| like_count | Number of likes received. |
| fav_count | Bookmark or favorite count. |
| retweet_count | Repost or share count. |
| reply_count | Number of replies or comments. |
| stock_list | Referenced financial symbols. |
| meta_keywords | Contextual and analytical metadata. |
[
{
"id": 345575898,
"user_id": 1821992043,
"title": "股息为盾,成长为矛",
"created_at": 1754361591000,
"like_count": 370,
"fav_count": 190,
"retweet_count": 30,
"reply_count": 356,
"stock_list": [
{ "symbol": "BK2049", "type": "35" },
{ "symbol": "BK2415", "type": "35" }
],
"text": "最近市场上关于红利股和成长股之间的讨论非常激烈..."
}
]
Xueqiu User Posts Scraper/
├── src/
│ ├── main.py
│ ├── collectors/
│ │ ├── profile_collector.py
│ │ └── post_parser.py
│ ├── utils/
│ │ └── time_utils.py
│ └── config/
│ └── settings.example.json
├── data/
│ ├── inputs.sample.json
│ └── sample_output.json
├── requirements.txt
└── README.md
- Financial analysts use it to study investor sentiment, so they can anticipate market trends.
- Quantitative teams use it to enrich trading models, so they can incorporate social signals.
- Academic researchers use it to analyze behavioral finance patterns, so they can publish data-driven insights.
- Market intelligence teams use it to track influential investors, so they can monitor opinion leaders.
Does this scraper work with private profiles? Only publicly accessible profiles are supported. Private or restricted profiles may return incomplete data.
How many posts can be collected per user? The limit is configurable, allowing control over data volume and collection depth.
Is the extracted data suitable for sentiment analysis? Yes, the full text and engagement metrics are structured for NLP and sentiment pipelines.
Can this handle multiple users in one run? Yes, multiple profile URLs can be processed in a single execution.
Primary Metric: Average extraction speed of 20–30 posts per minute per profile under standard conditions.
Reliability Metric: Stable collection with a success rate above 98% for public profiles.
Efficiency Metric: Low overhead processing with structured JSON output optimized for analytics workflows.
Quality Metric: High data completeness, capturing text, engagement, and financial references in a single dataset.
