-
Notifications
You must be signed in to change notification settings - Fork 180
Open
Labels
Description
Issue:
The lines of code added and removed do not match with the exact commit data on github.
Command:
p2o.py --enrich --index sds-lfn-pnda-git-raw --index-enrich sds-lfn-pnda-git -e [redacted] -g --bulk-size 500 --scroll-size 1000 --db-host [redacted] --db-sortinghat [redacted] --db-user [redacted] --db-password [redacted] git https://github.com/pndaproject/platform-salt
Data for a commit (JSON)
Actual Data:
{
..............................
"_source": {
"branches": [],
"title": "Support for Hortonworks HDP",
"author_user_name": "",
"time_to_commit_hours": 0.48,
"author_gender": "Unknown",
"lines_changed": 3266,
....................
"files": 70,
"lines_added": 3040,
"tag": "https://github.com/pndaproject/platform-salt",
"author_uuid": "ce136f854a3309899616dc4583176a89abb7a2f3",
"hash": "c305399c1bccbdb0021395f1d6066419228846b6",
"Commit_name": "James Clarke",
"lines_removed": 226,
...............................
}
Expected Data:
For the following commit:
pndaproject/platform-salt@c305399
lines_added : 3085
lines_removed: 188
This is specifically, as you can see is a tagged release commit. Do these commits are processed in a different way than rest? and 40+ error per commit constitutes for large errors after aggregation.