Sentiment Analysis

Status: ongoing

Overview

A repo to implement different ML models on sentiment analysis task on 10k fillings reports

What are 10k fillings reports?

Extensive financial report filing by public companies.
Release every year to the U.S Securities and Exchange Commission (SEC)
Contain 100+ pages

How to read a 10k report?

There are 4 parts in a 10k.

Part I contains 4 items:

Business
- Risk Factors
- Unresolved
Properties
Legal Proceedings
Mine Safety Disclosures

Part II contains 5 items: 5. Market for Registrant’s Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities 6. Selected Financial Data 7. Management’s Discussion and Analysis of Financial Condition and Results of Operations 8. Quantitative and Qualitative Disclosures about Market Risk 9. Financial Statements and Supplementary Data, Controls and Procedures, Other Information

Part III contains 5 items: 10. Directors, Executive Officers and Corporate Governance 11. Executive Compensation 12. Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters 13. Certain Relationships and Related Transactions, and Director Independence” 14. Principal Accountant Fees and Services

What can we exploit from 10k data?

Research has shown that the sentiment meaning from 10ks are informative when making investing decision link.

The research by Azimi and Alabama on 7 billion words and 220 million sentences from the full text of all 10-K filings by U.S. public companies made during 1994-2017 shows that "Positive (negative) sentiment predicts higher (lower) abnormal return and lower (higher) abnormal trading volume around the 10-K filing date. The market overreacts to negative sentiment and underreacts to positive sentiment during the filing period. All of these effects are larger for negative sentiment than for positive sentiment. Positive sentiment also predicts higher future profitability, higher operating cash flow, lower cash holding, and lower financial leverage. Negative sentiment predicts these variables in the opposite direction."

Models

Simple logistic regression model

A very simple model. I use this to do some experiments with the data. Just don't implement this model.
Follow this tutorial

Using BERT and TensorFlow

Follow this tutorial- )
Use transfer learning and fine-tuning methods

Using Stanford CoreNLP Model

Follow this tutorial
Implement at sentiment-analysis-coreNLP.ipynb
Conclusion:

FinBERT model

Implement at sentiment-analysis-FinBERT.ipynb
Result
Conclusion

Preprocess 10k data

https://towardsdatascience.com/nlp-preprocessing-with-nltk-3c04ee00edc0

Resolving Errors

Cannot import seaborn in jupyter notebook

import sys
print(sys.path)
sys.path.append('/Users/aringuyen/Desktop/PROJECTS/env_python_3/lib/python3.8/site-packages')

No module named wordcloud

https://stackoverflow.com/questions/47298070/importerror-no-module-named-wordcloud

Install scikit-learn

/usr/local/opt/[email protected]/bin/python3.9 -m pip install scikit-learn

TypeError: expected string or bytes-like object

Resole: run jupyter in Pycharm (problems in sys.path)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
10k-preprocess.ipynb		10k-preprocess.ipynb
Flowers10k.xlsx		Flowers10k.xlsx
README.md		README.md
pdf_retrieval.py		pdf_retrieval.py
read10k.py		read10k.py
requirement.txt		requirement.txt
sentiment-analysis-FinBERT.ipynb		sentiment-analysis-FinBERT.ipynb
sentiment-analysis-bert-tensorflow.ipynb		sentiment-analysis-bert-tensorflow.ipynb
sentiment-analysis-coreNLP.ipynb		sentiment-analysis-coreNLP.ipynb
sentiment-analysis-logictic-regression.ipynb		sentiment-analysis-logictic-regression.ipynb
sentiment-analysis.py		sentiment-analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sentiment Analysis

Overview

What are 10k fillings reports?

How to read a 10k report?

What can we exploit from 10k data?

Models

Simple logistic regression model

Using BERT and TensorFlow

Using Stanford CoreNLP Model

FinBERT model

Preprocess 10k data

Resolving Errors

Cannot import seaborn in jupyter notebook

No module named wordcloud

Install scikit-learn

TypeError: expected string or bytes-like object

About

Uh oh!

Releases

Packages

Languages

AriNguyen/SentimentAnalysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis

Overview

What are 10k fillings reports?

How to read a 10k report?

What can we exploit from 10k data?

Models

Simple logistic regression model

Using BERT and TensorFlow

Using Stanford CoreNLP Model

FinBERT model

Preprocess 10k data

Resolving Errors

Cannot import seaborn in jupyter notebook

No module named wordcloud

Install scikit-learn

TypeError: expected string or bytes-like object

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages