Status: ongoing
A repo to implement different ML models on sentiment analysis task on 10k fillings reports
- Extensive financial report filing by public companies.
- Release every year to the U.S Securities and Exchange Commission (SEC)
- Contain 100+ pages
- There are 4 parts in a 10k.
Part I contains 4 items:
- Business
- Risk Factors
- Unresolved
- Properties
- Legal Proceedings
- Mine Safety Disclosures
Part II contains 5 items: 5. Market for Registrant’s Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities 6. Selected Financial Data 7. Management’s Discussion and Analysis of Financial Condition and Results of Operations 8. Quantitative and Qualitative Disclosures about Market Risk 9. Financial Statements and Supplementary Data, Controls and Procedures, Other Information
Part III contains 5 items: 10. Directors, Executive Officers and Corporate Governance 11. Executive Compensation 12. Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters 13. Certain Relationships and Related Transactions, and Director Independence” 14. Principal Accountant Fees and Services
Research has shown that the sentiment meaning from 10ks are informative when making investing decision link.
The research by Azimi and Alabama on 7 billion words and 220 million sentences from the full text of all 10-K filings by U.S. public companies made during 1994-2017 shows that "Positive (negative) sentiment predicts higher (lower) abnormal return and lower (higher) abnormal trading volume around the 10-K filing date. The market overreacts to negative sentiment and underreacts to positive sentiment during the filing period. All of these effects are larger for negative sentiment than for positive sentiment. Positive sentiment also predicts higher future profitability, higher operating cash flow, lower cash holding, and lower financial leverage. Negative sentiment predicts these variables in the opposite direction."
- A very simple model. I use this to do some experiments with the data. Just don't implement this model.
- Follow this tutorial
- Follow this tutorial- )
- Use transfer learning and fine-tuning methods
- Follow this tutorial
- Implement at sentiment-analysis-coreNLP.ipynb
- Conclusion:
- Implement at sentiment-analysis-FinBERT.ipynb
- Result
- Conclusion
https://towardsdatascience.com/nlp-preprocessing-with-nltk-3c04ee00edc0
import sys
print(sys.path)
sys.path.append('/Users/aringuyen/Desktop/PROJECTS/env_python_3/lib/python3.8/site-packages')
https://stackoverflow.com/questions/47298070/importerror-no-module-named-wordcloud
/usr/local/opt/[email protected]/bin/python3.9 -m pip install scikit-learn
- Resole: run jupyter in Pycharm (problems in sys.path)