Skip to content

Predicting whether Apple’s closing price will go up or down on the next trading day by enriching classical time-series features with investor sentiment extracted from StockTwits.

License

Notifications You must be signed in to change notification settings

gaeldatascience/apple-stock-prediction

Repository files navigation

Apple Stock Prediction with Sentiment Analysis (AAPL)

Predicting whether Apple’s closing price will go up or down on the next trading day by enriching classical time-series features with investor sentiment extracted from StockTwits.
The code accompanies our master-thesis “Can Sentiment Analysis Improve the Prediction of Stock Price Direction? An Empirical Study on Apple Inc. (AAPL)” (Université Paris-Est Créteil, 2025).


Executive summary

To evaluate the predictive power and economic value of sentiment-enriched models, we simulate a daily long/short strategy: each trading day, the model forecasts the direction of Apple Inc.’s stock price for the next session.
A fully invested position (100 % exposure) is taken according to the signal — long if the model predicts an increase, short otherwise.
Transaction costs are set to zero to isolate pure model performance.

The table below reports the weighted F1-score (a class-balanced performance metric) and the cumulative return of each model-strategy pair during the out-of-sample period:

Model Scenario Weighted F1 Out-of-sample capital (100 % exposure, 0 % fees)
LSTM + VADER Price + VADER 0.599 +36.9 %
LSTM + FinBERT Price + FinBERT 0.573 +111.5 % (from $1 000 → $2 115)
Ensemble SVM + RoBERTa Price + RoBERTa 0.567 +19.1 %
Buy-&-hold +11.2 %

These results highlight the added value of sentiment indicators: LSTM models enriched with VADER or FinBERT sentiment consistently outperform price-only baselines and the buy-and-hold benchmark — both in terms of classification performance and capital appreciation.

Experimental protocol

  • The training and hyperparameter tuning period spans December 31, 2019 to July 6, 2021, which corresponds to the first 70 % of the dataset (382 trading days).

  • For LSTM models, this period is used both for hyperparameter selection and model training. The resulting architecture is then evaluated chronologically on the remaining 30 % (161 trading days), with no re-training or peeking into future data.

  • For SVM and Ensemble SVM models, only hyperparameters are tuned during the training phase. The models are then retrained daily on a rolling window and tested in a walk-forward fashion, using past data only.

The out-of-sample evaluation window runs from July 7, 2021 to February 24, 2022, covering 161 consecutive trading days.


Project structure

.
├── functions/        # Re-usable helper modules (data-prep, features, modelling)
├── scripts/          # Command-line pipelines (training, back-testing, plots)
├── plots/            # Automatically generated figures
├── results/          # CSV + pickle artefacts (metrics, predictions, simulations)
├── main.ipynb        # End-to-end walk-through notebook
├── pyproject.toml    # Poetry/uv dependency manifest
└── Makefile          # One-command recipes (install, lint, test, run)

Quick start

Prerequisites: Python ≥ 3.10 and a recent gcc/clang toolchain for torch.

# 1 Clone
git clone https://github.com/gaeldatascience/apple-stock-prediction.git
cd apple-stock-prediction

# 2 Create an isolated env (uv or venv)
python -m venv .venv
source .venv/bin/activate         # PowerShell ➜ .venv\Scripts\Activate.ps1

# 3 Install core & dev dependencies
pip install --upgrade pip
pip install uv                    # fast resolver (optional)
uv pip install -r requirements.txt     # or: pip install -e .

Key scripts

File Purpose
scripts/data_collection.py Download historical OHLCV data via Yahoo Finance (yfinance), aggregate raw StockTwits parquet files into a single DataFrame (data/tweets_aggregated.pq)
scripts/compute_sentiment_analysis.py Clean tweet text, extract and transform like‐counts, and compute sentiment labels using base keyword mapping, VADER, FinBERT, and a StockTwits-fine-tuned RoBERTa
scripts/model_functions.py Define and evaluate SVM (with optional bagging/SMOTE) and PyTorch LSTM classifiers; implements rolling-window grid search, metrics computation, and test routines
scripts/trading.py Simulate long/short trading strategies for both SVM and LSTM models, with adjustable window sizes, investment fraction, and transaction costs

Paper

The full methodology (feature engineering, SMOTE balancing, walk-forward validation, trading rules) is detailed in the open-access PDF located at thesis.pdf.


Abstract

This article examines the extent to which integrating sentiment signals extracted from the StockTwits platform can improve daily predictions of Apple Inc. (AAPL) stock price movements. The dataset combines 543 stock market observations (closing price, volume, volatility) and approximately 915,000 StockTwits messages related to AAPL for the period December 31, 2019, to February 27, 2022. Four sentiment analysis methods are used: “Bullish/Bearish” self-annotations, VADER, FinBERT, and a RoBERTa model fine-tuned to StockTwits. Scores are aggregated on a daily basis and weighted by message popularity. Five modeling scenarios—from a simple “price only” model to “price + sentiment” combinations—are evaluated with three algorithms: SVM, Ensemble SVM (bagging of five SVMs), and LSTM network. The hyperparameters of the SVMs are optimized using sliding walk-forward, while the LSTM is trained on 70% of the data and then tested chronologically on the remaining 30%. Class imbalance is corrected using SMOTE. Performance is measured using the weighted F1 score.

The results show the systematic superiority of LSTM (average F1 = 57.06%) over the SVM ensemble (55.84%) and the simple SVM (54.53%). The best score is achieved with the combination of LSTM + VADER (F1 = 59.91%, +2.7 points compared to the price-only model). An out-ofsample simulation (July 7, 2021 to February 24, 2022) illustrates the economic value of these signals: by investing all capital without transaction costs, the LSTM + FinBERT strategy increases initial capital from $1,000 to $2,115 (+111.5%), more than 100 percentage points better than a simple buy-and-hold approach, which only achieves +11.2%. LSTM + VADER achieves +36.9% over the period. Simple SVMs, which lack sequential memory, remain significantly in deficit.

These results highlight the tangible contribution of sentiment indicators and the relevance of deep learning models for capturing the psychological dynamics of markets. However, the limitations associated with the uniqueness of the asset and the daily horizon suggest that the approach should be extended to other securities, intraday granularities, and multi-asset architectures.


📄 License & citation

This repository is released under the MIT License; see LICENSE.

If you use this work in academic research, please cite:

@mastersthesis{pefourque_traore_2025,
  title  = {Can Sentiment Analysis Improve the Prediction of Stock Price Direction?},
  author = {Pefourque, Gaël and Traore, Djibril},
  school = {Université Paris-Est Créteil},
  year   = {2025},
  url    = {https://github.com/gaeldatascience/apple-stock-prediction}
}

About

Predicting whether Apple’s closing price will go up or down on the next trading day by enriching classical time-series features with investor sentiment extracted from StockTwits.

Topics

Resources

License

Stars

Watchers

Forks

Languages