Skip to content

AriNguyen/SentimentAnalysis

Repository files navigation

Sentiment Analysis

Status: ongoing

Overview

A repo to implement different ML models on sentiment analysis task on 10k fillings reports

What are 10k fillings reports?

  • Extensive financial report filing by public companies.
  • Release every year to the U.S Securities and Exchange Commission (SEC)
  • Contain 100+ pages

How to read a 10k report?

  • There are 4 parts in a 10k.

Part I contains 4 items:

  1. Business
    • Risk Factors
    • Unresolved
  2. Properties
  3. Legal Proceedings
  4. Mine Safety Disclosures

Part II contains 5 items: 5. Market for Registrant’s Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities 6. Selected Financial Data 7. Management’s Discussion and Analysis of Financial Condition and Results of Operations 8. Quantitative and Qualitative Disclosures about Market Risk 9. Financial Statements and Supplementary Data, Controls and Procedures, Other Information

Part III contains 5 items: 10. Directors, Executive Officers and Corporate Governance 11. Executive Compensation 12. Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters 13. Certain Relationships and Related Transactions, and Director Independence” 14. Principal Accountant Fees and Services

What can we exploit from 10k data?

Research has shown that the sentiment meaning from 10ks are informative when making investing decision link.

The research by Azimi and Alabama on 7 billion words and 220 million sentences from the full text of all 10-K filings by U.S. public companies made during 1994-2017 shows that "Positive (negative) sentiment predicts higher (lower) abnormal return and lower (higher) abnormal trading volume around the 10-K filing date. The market overreacts to negative sentiment and underreacts to positive sentiment during the filing period. All of these effects are larger for negative sentiment than for positive sentiment. Positive sentiment also predicts higher future profitability, higher operating cash flow, lower cash holding, and lower financial leverage. Negative sentiment predicts these variables in the opposite direction."

Models

Simple logistic regression model

  • A very simple model. I use this to do some experiments with the data. Just don't implement this model.
  • Follow this tutorial

Using BERT and TensorFlow

  • Follow this tutorial- )
  • Use transfer learning and fine-tuning methods

Using Stanford CoreNLP Model

FinBERT model

Preprocess 10k data

https://towardsdatascience.com/nlp-preprocessing-with-nltk-3c04ee00edc0

Resolving Errors

Cannot import seaborn in jupyter notebook

import sys
print(sys.path)
sys.path.append('/Users/aringuyen/Desktop/PROJECTS/env_python_3/lib/python3.8/site-packages')

No module named wordcloud

https://stackoverflow.com/questions/47298070/importerror-no-module-named-wordcloud

Install scikit-learn

/usr/local/opt/[email protected]/bin/python3.9 -m pip install scikit-learn

TypeError: expected string or bytes-like object

  • Resole: run jupyter in Pycharm (problems in sys.path)

About

Learn sentiment analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published