-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathmsr_analysis.log
More file actions
1041 lines (1029 loc) · 59.7 KB
/
msr_analysis.log
File metadata and controls
1041 lines (1029 loc) · 59.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
2025-11-25 15:38:53 - INFO - ======================================================================
2025-11-25 15:38:53 - INFO - Logging initialized - Output to console and msr_analysis.log
2025-11-25 15:38:53 - INFO - ======================================================================
2025-11-25 15:39:03 - INFO - ======================================================================
2025-11-25 15:39:03 - INFO - PHASE 1: DATASET EXPLORATION
2025-11-25 15:39:03 - INFO - ======================================================================
2025-11-25 15:39:03 - INFO - This phase will:
2025-11-25 15:39:03 - INFO - 1. Load dataset from HuggingFace (or cache)
2025-11-25 15:39:03 - INFO - 2. Understand the dataset schema
2025-11-25 15:39:03 - INFO - 3. Identify PR metrics fields (additions/deletions/files)
2025-11-25 15:39:03 - INFO - 4. Check data quality
2025-11-25 15:39:03 - INFO -
2025-11-25 15:39:03 - INFO -
----------------------------------------------------------------------
2025-11-25 15:39:03 - INFO - [STEP 1/4] LOAD DATASET FROM HUGGINGFACE
2025-11-25 15:39:03 - INFO - ----------------------------------------------------------------------
2025-11-25 15:39:03 - INFO - Dataset: hao-li/AIDev
2025-11-25 15:39:03 - INFO - Config: pull_request
2025-11-25 15:39:03 - INFO - Cache location: /Users/ryan/Library/Mobile Documents/com~apple~CloudDocs/MySchool/umflint/courses-icloud/swe535/presentations/msr2026/code/v2q1/.cache/huggingface/datasets
2025-11-25 15:39:03 - INFO - Downloading config 'pull_request' from HuggingFace...
2025-11-25 15:39:03 - INFO - (First download may take a minute)
2025-11-25 15:39:03 - INFO - Will be cached to: /Users/ryan/Library/Mobile Documents/com~apple~CloudDocs/MySchool/umflint/courses-icloud/swe535/presentations/msr2026/code/v2q1/.cache/huggingface/datasets
2025-11-25 15:39:03 - INFO -
2025-11-25 15:39:05 - INFO - ✓ Dataset loaded successfully!
2025-11-25 15:39:05 - INFO - Available splits: ['train']
2025-11-25 15:39:05 - INFO - - Split 'train': 33,596 records
2025-11-25 15:39:05 - INFO -
✓ Step 1 Complete - Config 'pull_request' ready (33,596 records)
2025-11-25 15:39:05 - INFO -
----------------------------------------------------------------------
2025-11-25 15:39:05 - INFO - [STEP 2/4] UNDERSTAND DATASET SCHEMA
2025-11-25 15:39:05 - INFO - ----------------------------------------------------------------------
2025-11-25 15:39:05 - INFO - Analyzing primary split: 'train'
2025-11-25 15:39:05 - INFO - Number of records: 33,596
2025-11-25 15:39:05 - INFO -
--- DATASET FEATURES (COLUMNS) ---
2025-11-25 15:39:05 - INFO - • id
2025-11-25 15:39:05 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:39:05 - INFO - • number
2025-11-25 15:39:05 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:39:05 - INFO - • title
2025-11-25 15:39:05 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:05 - INFO - • body
2025-11-25 15:39:05 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:05 - INFO - • agent
2025-11-25 15:39:05 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:05 - INFO - • user_id
2025-11-25 15:39:05 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:39:05 - INFO - • user
2025-11-25 15:39:05 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:05 - INFO - • state
2025-11-25 15:39:05 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:05 - INFO - • created_at
2025-11-25 15:39:05 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:05 - INFO - • closed_at
2025-11-25 15:39:05 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:05 - INFO - • merged_at
2025-11-25 15:39:05 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:05 - INFO - • repo_id
2025-11-25 15:39:05 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:39:05 - INFO - • repo_url
2025-11-25 15:39:05 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:05 - INFO - • html_url
2025-11-25 15:39:05 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:05 - INFO -
--- SAMPLE RECORD (First Entry) ---
2025-11-25 15:39:05 - INFO - id: 3264933329
2025-11-25 15:39:05 - INFO - number: 2911
2025-11-25 15:39:05 - INFO - title: Fix: Wait for all partitions in load_collection when some are still loading
2025-11-25 15:39:05 - INFO - body: ## Summary
Fixes an issue where `load_collection` would return success when collection progress reached 100%, even if individual partitions were stil... [truncated]
2025-11-25 15:39:05 - INFO - agent: Claude_Code
2025-11-25 15:39:05 - INFO - user_id: 108661493
2025-11-25 15:39:05 - INFO - user: weiliu1031
2025-11-25 15:39:05 - INFO - state: closed
2025-11-25 15:39:05 - INFO - created_at: 2025-07-26T02:59:01Z
2025-11-25 15:39:05 - INFO - closed_at: 2025-07-29T07:01:20Z
2025-11-25 15:39:05 - INFO - merged_at: None
2025-11-25 15:39:05 - INFO - repo_id: 191751505
2025-11-25 15:39:05 - INFO - repo_url: https://api.github.com/repos/milvus-io/pymilvus
2025-11-25 15:39:05 - INFO - html_url: https://github.com/milvus-io/pymilvus/pull/2911
2025-11-25 15:39:05 - INFO -
✓ Step 2 Complete - Analyzed 14 fields
2025-11-25 15:39:05 - INFO -
----------------------------------------------------------------------
2025-11-25 15:39:05 - INFO - [STEP 3/4] IDENTIFY PR METRICS FIELDS
2025-11-25 15:39:05 - INFO - ----------------------------------------------------------------------
2025-11-25 15:39:05 - INFO - Analyzing config: 'pull_request'
2025-11-25 15:39:05 - INFO - Searching for PR metrics fields...
2025-11-25 15:39:05 - INFO - Target metrics: additions, deletions, files touched
2025-11-25 15:39:05 - INFO -
2025-11-25 15:39:05 - INFO - --- METRICS FIELDS SUMMARY ---
2025-11-25 15:39:05 - INFO - Config analyzed: 'pull_request'
2025-11-25 15:39:05 - INFO - Additions fields: []
2025-11-25 15:39:05 - INFO - Deletions fields: []
2025-11-25 15:39:05 - INFO - Files fields: []
2025-11-25 15:39:05 - INFO - Other relevant: []
2025-11-25 15:39:05 - INFO -
2025-11-25 15:39:05 - WARNING - ======================================================================
2025-11-25 15:39:05 - WARNING - ⚠ NOTICE: No PR metrics fields found in this config
2025-11-25 15:39:05 - WARNING - ======================================================================
2025-11-25 15:39:05 - INFO -
2025-11-25 15:39:05 - INFO - The 'pull_request' config contains PR metadata only:
2025-11-25 15:39:05 - INFO - - PR identification (id, number, title)
2025-11-25 15:39:05 - INFO - - User/agent information
2025-11-25 15:39:05 - INFO - - Timestamps (created, closed, merged)
2025-11-25 15:39:05 - INFO - - Links (html_url, repo_url)
2025-11-25 15:39:05 - INFO -
2025-11-25 15:39:05 - INFO - This is expected for the 'pull_request' config.
2025-11-25 15:39:05 - INFO - Metrics will be extracted from 'pr_commit_details' in Phase 2.
2025-11-25 15:39:05 - INFO -
2025-11-25 15:39:05 - INFO - Available fields in this config:
2025-11-25 15:39:05 - INFO - - id
2025-11-25 15:39:05 - INFO - - number
2025-11-25 15:39:05 - INFO - - title
2025-11-25 15:39:05 - INFO - - body
2025-11-25 15:39:05 - INFO - - agent
2025-11-25 15:39:05 - INFO - - user_id
2025-11-25 15:39:05 - INFO - - user
2025-11-25 15:39:05 - INFO - - state
2025-11-25 15:39:05 - INFO - - created_at
2025-11-25 15:39:05 - INFO - - closed_at
2025-11-25 15:39:05 - INFO - - merged_at
2025-11-25 15:39:05 - INFO - - repo_id
2025-11-25 15:39:05 - INFO - - repo_url
2025-11-25 15:39:05 - INFO - - html_url
2025-11-25 15:39:05 - INFO -
2025-11-25 15:39:05 - INFO - ✓ Step 3 Complete - Metrics field identification done
2025-11-25 15:39:05 - INFO -
----------------------------------------------------------------------
2025-11-25 15:39:05 - INFO - [STEP 4/4] CHECK DATA QUALITY
2025-11-25 15:39:05 - INFO - ----------------------------------------------------------------------
2025-11-25 15:39:05 - INFO - Checking data quality on sample of 1,000 records
2025-11-25 15:39:05 - INFO - (Total dataset: 33,596 records)
2025-11-25 15:39:05 - INFO -
2025-11-25 15:39:05 - INFO - --- MISSING/NULL VALUE ANALYSIS ---
2025-11-25 15:39:05 - INFO - body:
2025-11-25 15:39:05 - INFO - Null values: 1/1000 (0.1%)
2025-11-25 15:39:05 - INFO - Total missing: 1/1000 (0.1%)
2025-11-25 15:39:05 - INFO -
2025-11-25 15:39:06 - INFO - closed_at:
2025-11-25 15:39:06 - INFO - Null values: 196/1000 (19.6%)
2025-11-25 15:39:06 - INFO - Total missing: 196/1000 (19.6%)
2025-11-25 15:39:06 - INFO -
2025-11-25 15:39:06 - INFO - merged_at:
2025-11-25 15:39:06 - INFO - Null values: 497/1000 (49.7%)
2025-11-25 15:39:06 - INFO - Total missing: 497/1000 (49.7%)
2025-11-25 15:39:06 - INFO -
2025-11-25 15:39:06 - INFO - --- METRICS FIELDS QUALITY CHECK ---
2025-11-25 15:39:06 - WARNING - No metrics fields identified yet
2025-11-25 15:39:06 - INFO - --- QUALITY SUMMARY ---
2025-11-25 15:39:06 - INFO - Total fields checked: 14
2025-11-25 15:39:06 - INFO - Fields with missing data: 3
2025-11-25 15:39:06 - INFO - Sample size: 1,000 records
2025-11-25 15:39:06 - INFO -
✓ Step 4 Complete - Data quality check done
2025-11-25 15:39:06 - INFO -
======================================================================
2025-11-25 15:39:06 - INFO - ✓ PHASE 1 for 'pull_request' COMPLETE - Dataset partially ready
2025-11-25 15:39:06 - INFO - ======================================================================
2025-11-25 15:39:06 - INFO - Dataset loaded: 33596 records
2025-11-25 15:39:06 - INFO - Configuration: pull_request
2025-11-25 15:39:06 - INFO - Primary split: train
2025-11-25 15:39:06 - INFO - Cache status: downloaded
2025-11-25 15:39:06 - INFO - ======================================================================
2025-11-25 15:39:06 - INFO - PHASE 1: DATASET EXPLORATION
2025-11-25 15:39:06 - INFO - ======================================================================
2025-11-25 15:39:06 - INFO - This phase will:
2025-11-25 15:39:06 - INFO - 1. Load dataset from HuggingFace (or cache)
2025-11-25 15:39:06 - INFO - 2. Understand the dataset schema
2025-11-25 15:39:06 - INFO - 3. Identify PR metrics fields (additions/deletions/files)
2025-11-25 15:39:06 - INFO - 4. Check data quality
2025-11-25 15:39:06 - INFO -
2025-11-25 15:39:06 - INFO -
----------------------------------------------------------------------
2025-11-25 15:39:06 - INFO - [STEP 1/4] LOAD DATASET FROM HUGGINGFACE
2025-11-25 15:39:06 - INFO - ----------------------------------------------------------------------
2025-11-25 15:39:06 - INFO - Dataset: hao-li/AIDev
2025-11-25 15:39:06 - INFO - Config: pr_commit_details
2025-11-25 15:39:06 - INFO - Cache location: /Users/ryan/Library/Mobile Documents/com~apple~CloudDocs/MySchool/umflint/courses-icloud/swe535/presentations/msr2026/code/v2q1/.cache/huggingface/datasets
2025-11-25 15:39:06 - INFO - Downloading config 'pr_commit_details' from HuggingFace...
2025-11-25 15:39:06 - INFO - (First download may take a minute)
2025-11-25 15:39:06 - INFO - Will be cached to: /Users/ryan/Library/Mobile Documents/com~apple~CloudDocs/MySchool/umflint/courses-icloud/swe535/presentations/msr2026/code/v2q1/.cache/huggingface/datasets
2025-11-25 15:39:06 - INFO -
2025-11-25 15:39:16 - INFO - ✓ Dataset loaded successfully!
2025-11-25 15:39:16 - INFO - Available splits: ['train']
2025-11-25 15:39:16 - INFO - - Split 'train': 711,923 records
2025-11-25 15:39:16 - INFO -
✓ Step 1 Complete - Config 'pr_commit_details' ready (711,923 records)
2025-11-25 15:39:16 - INFO -
----------------------------------------------------------------------
2025-11-25 15:39:16 - INFO - [STEP 2/4] UNDERSTAND DATASET SCHEMA
2025-11-25 15:39:16 - INFO - ----------------------------------------------------------------------
2025-11-25 15:39:16 - INFO - Analyzing primary split: 'train'
2025-11-25 15:39:16 - INFO - Number of records: 711,923
2025-11-25 15:39:16 - INFO -
--- DATASET FEATURES (COLUMNS) ---
2025-11-25 15:39:16 - INFO - • sha
2025-11-25 15:39:16 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:16 - INFO - • pr_id
2025-11-25 15:39:16 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:39:16 - INFO - • author
2025-11-25 15:39:16 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:16 - INFO - • committer
2025-11-25 15:39:16 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:16 - INFO - • message
2025-11-25 15:39:16 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:16 - INFO - • commit_stats_total
2025-11-25 15:39:16 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:39:16 - INFO - • commit_stats_additions
2025-11-25 15:39:16 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:39:16 - INFO - • commit_stats_deletions
2025-11-25 15:39:16 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:39:16 - INFO - • filename
2025-11-25 15:39:16 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:16 - INFO - • status
2025-11-25 15:39:16 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:16 - INFO - • additions
2025-11-25 15:39:16 - INFO - Type: Value(dtype='float64', id=None)
2025-11-25 15:39:16 - INFO - • deletions
2025-11-25 15:39:16 - INFO - Type: Value(dtype='float64', id=None)
2025-11-25 15:39:16 - INFO - • changes
2025-11-25 15:39:16 - INFO - Type: Value(dtype='float64', id=None)
2025-11-25 15:39:16 - INFO - • patch
2025-11-25 15:39:16 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:39:16 - INFO -
--- SAMPLE RECORD (First Entry) ---
2025-11-25 15:39:16 - INFO - sha: 2f9d54dda4f0c87c19e0bbeb9936f525d0587e16
2025-11-25 15:39:16 - INFO - pr_id: 3271196926
2025-11-25 15:39:16 - INFO - author: devin-ai-integration[bot]
2025-11-25 15:39:16 - INFO - committer: devin-ai-integration[bot]
2025-11-25 15:39:16 - INFO - message: Add llms.txt compilation system for AI model documentation
- Create docs/compile_llms_txt.py script to compile all documentation
- Add GitHub Actions... [truncated]
2025-11-25 15:39:16 - INFO - commit_stats_total: 23008
2025-11-25 15:39:16 - INFO - commit_stats_additions: 23008
2025-11-25 15:39:16 - INFO - commit_stats_deletions: 0
2025-11-25 15:39:16 - INFO - filename: .github/workflows/compile-llms-txt.yml
2025-11-25 15:39:16 - INFO - status: added
2025-11-25 15:39:16 - INFO - additions: 38.0
2025-11-25 15:39:16 - INFO - deletions: 0.0
2025-11-25 15:39:16 - INFO - changes: 38.0
2025-11-25 15:39:16 - INFO - patch: @@ -0,0 +1,38 @@
+name: Compile llms.txt
+
+on:
+ push:
+ branches: [ main ]
+ paths:
+ - 'docs/**'
+ - 'README.md'
+ - 'CONTRIB... [truncated]
2025-11-25 15:39:16 - INFO -
✓ Step 2 Complete - Analyzed 14 fields
2025-11-25 15:39:16 - INFO -
----------------------------------------------------------------------
2025-11-25 15:39:16 - INFO - [STEP 3/4] IDENTIFY PR METRICS FIELDS
2025-11-25 15:39:16 - INFO - ----------------------------------------------------------------------
2025-11-25 15:39:16 - INFO - Analyzing config: 'pr_commit_details'
2025-11-25 15:39:16 - INFO - Searching for PR metrics fields...
2025-11-25 15:39:16 - INFO - Target metrics: additions, deletions, files touched
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:16 - INFO - ✓ Found ADDITIONS field: 'commit_stats_additions'
2025-11-25 15:39:16 - INFO - Sample value: 23008
2025-11-25 15:39:16 - INFO - Type: int
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:16 - INFO - ✓ Found DELETIONS field: 'commit_stats_deletions'
2025-11-25 15:39:16 - INFO - Sample value: 0
2025-11-25 15:39:16 - INFO - Type: int
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:16 - INFO - ✓ Found FILES field: 'filename'
2025-11-25 15:39:16 - INFO - Sample value: .github/workflows/compile-llms-txt.yml...
2025-11-25 15:39:16 - INFO - Type: str
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:16 - INFO - ✓ Found ADDITIONS field: 'additions'
2025-11-25 15:39:16 - INFO - Sample value: 38.0
2025-11-25 15:39:16 - INFO - Type: float
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:16 - INFO - ✓ Found DELETIONS field: 'deletions'
2025-11-25 15:39:16 - INFO - Sample value: 0.0
2025-11-25 15:39:16 - INFO - Type: float
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:16 - INFO - Found related field: 'changes'
2025-11-25 15:39:16 - INFO - Sample value: 38.0
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:16 - INFO - Found related field: 'patch'
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:16 - INFO - --- METRICS FIELDS SUMMARY ---
2025-11-25 15:39:16 - INFO - Config analyzed: 'pr_commit_details'
2025-11-25 15:39:16 - INFO - Additions fields: ['commit_stats_additions', 'additions']
2025-11-25 15:39:16 - INFO - Deletions fields: ['commit_stats_deletions', 'deletions']
2025-11-25 15:39:16 - INFO - Files fields: ['filename']
2025-11-25 15:39:16 - INFO - Other relevant: ['changes', 'patch']
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:16 - INFO - ======================================================================
2025-11-25 15:39:16 - INFO - ✓ SUCCESS: Found PR metrics fields!
2025-11-25 15:39:16 - INFO - ======================================================================
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:16 - INFO - ✓ Additions data: ['commit_stats_additions', 'additions']
2025-11-25 15:39:16 - INFO - ✓ Deletions data: ['commit_stats_deletions', 'deletions']
2025-11-25 15:39:16 - INFO - ✓ Files data: ['filename']
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:16 - INFO - This config contains the metrics data needed for analysis.
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:16 - INFO - ✓ Step 3 Complete - Metrics field identification done
2025-11-25 15:39:16 - INFO -
----------------------------------------------------------------------
2025-11-25 15:39:16 - INFO - [STEP 4/4] CHECK DATA QUALITY
2025-11-25 15:39:16 - INFO - ----------------------------------------------------------------------
2025-11-25 15:39:16 - INFO - Checking data quality on sample of 1,000 records
2025-11-25 15:39:16 - INFO - (Total dataset: 711,923 records)
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:16 - INFO - --- MISSING/NULL VALUE ANALYSIS ---
2025-11-25 15:39:16 - INFO - filename:
2025-11-25 15:39:16 - INFO - Null values: 1/1000 (0.1%)
2025-11-25 15:39:16 - INFO - Total missing: 1/1000 (0.1%)
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:16 - INFO - status:
2025-11-25 15:39:16 - INFO - Null values: 1/1000 (0.1%)
2025-11-25 15:39:16 - INFO - Total missing: 1/1000 (0.1%)
2025-11-25 15:39:16 - INFO -
2025-11-25 15:39:17 - INFO - additions:
2025-11-25 15:39:17 - INFO - Null values: 1/1000 (0.1%)
2025-11-25 15:39:17 - INFO - Total missing: 1/1000 (0.1%)
2025-11-25 15:39:17 - INFO -
2025-11-25 15:39:17 - INFO - deletions:
2025-11-25 15:39:17 - INFO - Null values: 1/1000 (0.1%)
2025-11-25 15:39:17 - INFO - Total missing: 1/1000 (0.1%)
2025-11-25 15:39:17 - INFO -
2025-11-25 15:39:17 - INFO - changes:
2025-11-25 15:39:17 - INFO - Null values: 1/1000 (0.1%)
2025-11-25 15:39:17 - INFO - Total missing: 1/1000 (0.1%)
2025-11-25 15:39:17 - INFO -
2025-11-25 15:39:17 - INFO - patch:
2025-11-25 15:39:17 - INFO - Null values: 131/1000 (13.1%)
2025-11-25 15:39:17 - INFO - Total missing: 131/1000 (13.1%)
2025-11-25 15:39:17 - INFO -
2025-11-25 15:39:17 - INFO - --- METRICS FIELDS QUALITY CHECK ---
2025-11-25 15:39:17 - INFO - commit_stats_additions:
2025-11-25 15:39:17 - INFO - Missing: 0.0%
2025-11-25 15:39:17 - INFO - Status: ✓ GOOD (< 5% missing)
2025-11-25 15:39:17 - INFO -
2025-11-25 15:39:17 - INFO - additions:
2025-11-25 15:39:17 - INFO - Missing: 0.1%
2025-11-25 15:39:17 - INFO - Status: ✓ GOOD (< 5% missing)
2025-11-25 15:39:17 - INFO -
2025-11-25 15:39:17 - INFO - commit_stats_deletions:
2025-11-25 15:39:17 - INFO - Missing: 0.0%
2025-11-25 15:39:17 - INFO - Status: ✓ GOOD (< 5% missing)
2025-11-25 15:39:17 - INFO -
2025-11-25 15:39:17 - INFO - deletions:
2025-11-25 15:39:17 - INFO - Missing: 0.1%
2025-11-25 15:39:17 - INFO - Status: ✓ GOOD (< 5% missing)
2025-11-25 15:39:17 - INFO -
2025-11-25 15:39:17 - INFO - filename:
2025-11-25 15:39:17 - INFO - Missing: 0.1%
2025-11-25 15:39:17 - INFO - Status: ✓ GOOD (< 5% missing)
2025-11-25 15:39:17 - INFO -
2025-11-25 15:39:17 - INFO - --- QUALITY SUMMARY ---
2025-11-25 15:39:17 - INFO - Total fields checked: 14
2025-11-25 15:39:17 - INFO - Fields with missing data: 6
2025-11-25 15:39:17 - INFO - Sample size: 1,000 records
2025-11-25 15:39:17 - INFO -
✓ Step 4 Complete - Data quality check done
2025-11-25 15:39:17 - INFO -
======================================================================
2025-11-25 15:39:17 - INFO - ✓ PHASE 1 COMPLETE - Dataset ready for extraction
2025-11-25 15:39:17 - INFO - ======================================================================
2025-11-25 15:39:17 - INFO - Dataset loaded: 711923 records
2025-11-25 15:39:17 - INFO - Configuration: pr_commit_details
2025-11-25 15:39:17 - INFO - Primary split: train
2025-11-25 15:39:17 - INFO - Cache status: downloaded
2025-11-25 15:39:21 - INFO - ======================================================================
2025-11-25 15:39:21 - INFO - PHASE 2: DATA EXTRACTION
2025-11-25 15:39:21 - INFO - ======================================================================
2025-11-25 15:39:21 - INFO - Goal: Extract per-PR metrics (additions, deletions, files)
2025-11-25 15:39:21 - INFO -
2025-11-25 15:39:21 - INFO -
----------------------------------------------------------------------
2025-11-25 15:39:21 - INFO - EXTRACTING PER-PR METRICS
2025-11-25 15:39:21 - INFO - ----------------------------------------------------------------------
2025-11-25 15:39:21 - INFO - Source datasets:
2025-11-25 15:39:21 - INFO - - pull_request: 33,596 PRs
2025-11-25 15:39:21 - INFO - - pr_commit_details: 711,923 file-level records
2025-11-25 15:39:21 - INFO -
2025-11-25 15:39:21 - INFO - Step 1: Loading PR metadata...
2025-11-25 15:39:21 - INFO - Loaded 33,596 PRs
2025-11-25 15:39:21 - INFO -
Step 2: Aggregating commit-level data per PR...
2025-11-25 15:39:21 - INFO - This may take a minute for 711k records...
2025-11-25 15:39:42 - INFO - Processed 100,000 / 711,923 records...
2025-11-25 15:40:03 - INFO - Processed 200,000 / 711,923 records...
2025-11-25 15:40:24 - INFO - Processed 300,000 / 711,923 records...
2025-11-25 15:40:45 - INFO - Processed 400,000 / 711,923 records...
2025-11-25 15:41:06 - INFO - Processed 500,000 / 711,923 records...
2025-11-25 15:41:27 - INFO - Processed 600,000 / 711,923 records...
2025-11-25 15:41:48 - INFO - Processed 700,000 / 711,923 records...
2025-11-25 15:41:51 - INFO - ✓ Processed all 711,923 records
2025-11-25 15:41:51 - INFO - ✓ Found metrics for 33,580 unique PRs
2025-11-25 15:41:51 - INFO -
Step 3: Building metrics DataFrame...
2025-11-25 15:41:51 - INFO - ✓ Created metrics DataFrame: 33,580 rows
2025-11-25 15:41:51 - INFO -
Step 4: Merging with PR metadata...
2025-11-25 15:41:51 - INFO - ✓ Final DataFrame: 33,596 PRs with metrics
2025-11-25 15:41:51 - INFO -
Step 5: Computing derived metrics...
2025-11-25 15:41:51 - INFO - ✓ Added derived metrics:
2025-11-25 15:41:51 - INFO - - net_change (additions - deletions)
2025-11-25 15:41:51 - INFO - - churn_ratio (deletions / additions)
2025-11-25 15:41:51 - INFO - - size_category (XS/S/M/L/XL)
2025-11-25 15:41:51 - INFO -
======================================================================
2025-11-25 15:41:51 - INFO - SUMMARY STATISTICS
2025-11-25 15:41:51 - INFO - ======================================================================
2025-11-25 15:41:51 - INFO -
Dataset Totals:
2025-11-25 15:41:51 - INFO - Total PRs: 33,596
2025-11-25 15:41:51 - INFO - Total additions: 69,337,968 lines
2025-11-25 15:41:51 - INFO - Total deletions: 39,601,070 lines
2025-11-25 15:41:51 - INFO - Total files touched: 575,913 files
2025-11-25 15:41:51 - INFO -
Per-PR Statistics:
2025-11-25 15:41:51 - INFO - Additions:
2025-11-25 15:41:51 - INFO - Mean: 2063.9
2025-11-25 15:41:51 - INFO - Median: 72.0
2025-11-25 15:41:51 - INFO - Min: 0
2025-11-25 15:41:51 - INFO - Max: 2591290
2025-11-25 15:41:51 - INFO - Deletions:
2025-11-25 15:41:51 - INFO - Mean: 1178.7
2025-11-25 15:41:51 - INFO - Median: 10.0
2025-11-25 15:41:51 - INFO - Min: 0
2025-11-25 15:41:51 - INFO - Max: 2040393
2025-11-25 15:41:51 - INFO - Files Touched:
2025-11-25 15:41:51 - INFO - Mean: 17.1
2025-11-25 15:41:51 - INFO - Median: 3.0
2025-11-25 15:41:51 - INFO - Min: 0
2025-11-25 15:41:51 - INFO - Max: 1641
2025-11-25 15:41:51 - INFO -
PR Size Distribution:
2025-11-25 15:41:51 - INFO - L: 1,841 (5.5%)
2025-11-25 15:41:51 - INFO - M: 8,818 (26.2%)
2025-11-25 15:41:51 - INFO - S: 13,060 (38.9%)
2025-11-25 15:41:51 - INFO - XL: 3,748 (11.2%)
2025-11-25 15:41:51 - INFO - XS: 6,129 (18.2%)
2025-11-25 15:41:51 - INFO -
2025-11-25 15:41:51 - INFO - ----------------------------------------------------------------------
2025-11-25 15:41:51 - INFO - Saving results to CSV...
2025-11-25 15:41:53 - INFO - ✓ Saved to: pr_metrics.csv
2025-11-25 15:41:53 - INFO -
✓ Extraction complete!
2025-11-25 15:41:53 - INFO -
======================================================================
2025-11-25 15:41:53 - INFO - ✓ PHASE 2 COMPLETE - Per-PR metrics extracted
2025-11-25 15:41:53 - INFO - ======================================================================
2025-11-25 15:41:53 - INFO - Total PRs processed: 33,596
2025-11-25 15:41:53 - INFO - Output file: pr_metrics.csv
2025-11-25 15:42:06 - INFO - ======================================================================
2025-11-25 15:42:06 - INFO - PHASE 3: ANALYSIS & VISUALIZATION
2025-11-25 15:42:06 - INFO - ======================================================================
2025-11-25 15:42:06 - INFO - Goal: Analyze and visualize PR change patterns
2025-11-25 15:42:06 - INFO -
2025-11-25 15:42:06 - INFO - Step 1: Performing statistical analysis...
2025-11-25 15:42:06 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:06 - INFO - STATISTICAL ANALYSIS
2025-11-25 15:42:06 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:07 - INFO - Dataset totals: 33,596 PRs, 69,337,968 additions, 39,601,070 deletions, 575,913 files
2025-11-25 15:42:07 - INFO - Additions - Mean: 2063.9, Median: 72.0
2025-11-25 15:42:07 - INFO - Deletions - Mean: 1178.7, Median: 10.0
2025-11-25 15:42:07 - INFO - Files touched - Mean: 17.1, Median: 3.0
2025-11-25 15:42:07 - INFO - Size distribution: XS=6129, S=13060, M=8818, L=1841, XL=3748
2025-11-25 15:42:07 - INFO - ✓ Statistical analysis saved to: analysis_summary.txt
2025-11-25 15:42:07 - INFO -
Step 2: Generating visualizations...
2025-11-25 15:42:07 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:07 - INFO - GENERATING VISUALIZATIONS
2025-11-25 15:42:07 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:07 - INFO - Creating: additions_distribution.png
2025-11-25 15:42:08 - INFO - Creating: deletions_distribution.png
2025-11-25 15:42:08 - INFO - Creating: files_distribution.png
2025-11-25 15:42:08 - INFO - Creating: size_distribution.png
2025-11-25 15:42:09 - INFO - Creating: pr_metrics_overview.png
2025-11-25 15:42:09 - INFO - ✓ All visualizations saved to: figures/
2025-11-25 15:42:09 - INFO -
======================================================================
2025-11-25 15:42:09 - INFO - ✓ PHASE 3 COMPLETE
2025-11-25 15:42:09 - INFO - ======================================================================
2025-11-25 15:42:09 - INFO - Analysis outputs saved to:
2025-11-25 15:42:09 - INFO - - figures/ (5 PNG visualizations)
2025-11-25 15:42:09 - INFO - - analysis_summary.txt (text report)
2025-11-25 15:42:20 - INFO - ======================================================================
2025-11-25 15:42:20 - INFO - PHASE 4: DATASET EXPLORATION (Question 2)
2025-11-25 15:42:20 - INFO - ======================================================================
2025-11-25 15:42:20 - INFO - This phase will:
2025-11-25 15:42:20 - INFO - 1. Load dataset from HuggingFace (or cache)
2025-11-25 15:42:20 - INFO - 2. Understand the dataset schema
2025-11-25 15:42:20 - INFO - 3. Identify comment text fields
2025-11-25 15:42:20 - INFO - 4. Check data quality
2025-11-25 15:42:20 - INFO -
2025-11-25 15:42:20 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:20 - INFO - [STEP 1/4] LOAD DATASET FROM HUGGINGFACE
2025-11-25 15:42:20 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:20 - INFO - Dataset: hao-li/AIDev
2025-11-25 15:42:20 - INFO - Config: pr_review_comments
2025-11-25 15:42:20 - INFO - Cache location: /Users/ryan/Library/Mobile Documents/com~apple~CloudDocs/MySchool/umflint/courses-icloud/swe535/presentations/msr2026/code/v2q1/.cache/huggingface/datasets
2025-11-25 15:42:20 - INFO - Downloading config 'pr_review_comments' from HuggingFace...
2025-11-25 15:42:20 - INFO - (First download may take a minute)
2025-11-25 15:42:20 - INFO - Will be cached to: /Users/ryan/Library/Mobile Documents/com~apple~CloudDocs/MySchool/umflint/courses-icloud/swe535/presentations/msr2026/code/v2q1/.cache/huggingface/datasets
2025-11-25 15:42:20 - INFO -
2025-11-25 15:42:21 - INFO - ✓ Dataset loaded successfully!
2025-11-25 15:42:21 - INFO - Available splits: ['train']
2025-11-25 15:42:21 - INFO - - Split 'train': 19,450 records
2025-11-25 15:42:21 - INFO -
✓ Step 1 Complete - Config 'pr_review_comments' ready (19,450 records)
2025-11-25 15:42:21 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:21 - INFO - [STEP 2/4] UNDERSTAND DATASET SCHEMA
2025-11-25 15:42:21 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:21 - INFO - Analyzing primary split: 'train'
2025-11-25 15:42:21 - INFO - Number of records: 19,450
2025-11-25 15:42:21 - INFO -
--- DATASET FEATURES (COLUMNS) ---
2025-11-25 15:42:21 - INFO - • id
2025-11-25 15:42:21 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:42:21 - INFO - • pull_request_review_id
2025-11-25 15:42:21 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:42:21 - INFO - • user
2025-11-25 15:42:21 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:21 - INFO - • user_type
2025-11-25 15:42:21 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:21 - INFO - • diff_hunk
2025-11-25 15:42:21 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:21 - INFO - • path
2025-11-25 15:42:21 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:21 - INFO - • position
2025-11-25 15:42:21 - INFO - Type: Value(dtype='float64', id=None)
2025-11-25 15:42:21 - INFO - • original_position
2025-11-25 15:42:21 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:42:21 - INFO - • commit_id
2025-11-25 15:42:21 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:21 - INFO - • original_commit_id
2025-11-25 15:42:21 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:21 - INFO - • body
2025-11-25 15:42:21 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:21 - INFO - • pull_request_url
2025-11-25 15:42:21 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:21 - INFO - • created_at
2025-11-25 15:42:21 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:21 - INFO - • updated_at
2025-11-25 15:42:21 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:21 - INFO - • in_reply_to_id
2025-11-25 15:42:21 - INFO - Type: Value(dtype='float64', id=None)
2025-11-25 15:42:21 - INFO -
--- SAMPLE RECORD (First Entry) ---
2025-11-25 15:42:21 - INFO - id: 2200843160
2025-11-25 15:42:21 - INFO - pull_request_review_id: 3010483959
2025-11-25 15:42:21 - INFO - user: dpgeorge
2025-11-25 15:42:21 - INFO - user_type: User
2025-11-25 15:42:21 - INFO - diff_hunk: @@ -56,6 +56,14 @@
#define PHY_SPEED_100FULL (6)
#define PHY_DUPLEX (4)
+// PHY interrupt registers (common for LAN87xx and DP838xx)
2025-11-25 15:42:21 - INFO - path: ports/stm32/eth_phy.h
2025-11-25 15:42:21 - INFO - position: None
2025-11-25 15:42:21 - INFO - original_position: 4
2025-11-25 15:42:21 - INFO - commit_id: 47bace5680b27e235dc5d06ee5c3adff54079d7d
2025-11-25 15:42:21 - INFO - original_commit_id: 05231c28d4ac24eac705507ce6b50e6e504e76d0
2025-11-25 15:42:21 - INFO - body: These constants aren't used anywhere.
2025-11-25 15:42:21 - INFO - pull_request_url: https://api.github.com/repos/micropython/micropython/pulls/17613
2025-11-25 15:42:21 - INFO - created_at: 2025-07-11T14:09:34Z
2025-11-25 15:42:21 - INFO - updated_at: 2025-07-11T14:09:34Z
2025-11-25 15:42:21 - INFO - in_reply_to_id: None
2025-11-25 15:42:21 - INFO -
✓ Step 2 Complete - Analyzed 15 fields
2025-11-25 15:42:21 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:21 - INFO - [STEP 3/4] IDENTIFY COMMENT FIELDS
2025-11-25 15:42:21 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:21 - INFO - Analyzing config: 'pr_review_comments'
2025-11-25 15:42:21 - INFO - Searching for comment text fields...
2025-11-25 15:42:21 - INFO - Target: fields containing review comment text
2025-11-25 15:42:21 - INFO -
2025-11-25 15:42:21 - INFO - ✓ Found comment field: 'body'
2025-11-25 15:42:21 - INFO - Sample: These constants aren't used anywhere....
2025-11-25 15:42:21 - INFO - Type: string
2025-11-25 15:42:21 - INFO -
2025-11-25 15:42:21 - INFO - --- COMMENT FIELDS SUMMARY ---
2025-11-25 15:42:21 - INFO - Config analyzed: 'pr_review_comments'
2025-11-25 15:42:21 - INFO - Comment fields found: ['body']
2025-11-25 15:42:21 - INFO -
2025-11-25 15:42:21 - INFO - ======================================================================
2025-11-25 15:42:21 - INFO - ✓ SUCCESS: Found comment text fields!
2025-11-25 15:42:21 - INFO - ======================================================================
2025-11-25 15:42:21 - INFO -
2025-11-25 15:42:21 - INFO - ✓ Comment text fields: ['body']
2025-11-25 15:42:21 - INFO -
2025-11-25 15:42:21 - INFO - This config contains the comment data needed for analysis.
2025-11-25 15:42:21 - INFO -
2025-11-25 15:42:21 - INFO - ✓ Step 3 Complete - Comment field identification done
2025-11-25 15:42:21 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:21 - INFO - [STEP 4/4] CHECK DATA QUALITY
2025-11-25 15:42:21 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:21 - INFO - Checking data quality on sample of 1,000 records
2025-11-25 15:42:21 - INFO - (Total dataset: 19,450 records)
2025-11-25 15:42:21 - INFO -
2025-11-25 15:42:21 - INFO - --- MISSING/NULL VALUE ANALYSIS ---
2025-11-25 15:42:21 - INFO - diff_hunk:
2025-11-25 15:42:21 - INFO - Null values: 55/1000 (5.5%)
2025-11-25 15:42:21 - INFO - Total missing: 55/1000 (5.5%)
2025-11-25 15:42:21 - INFO -
2025-11-25 15:42:21 - INFO - position:
2025-11-25 15:42:21 - INFO - Null values: 612/1000 (61.2%)
2025-11-25 15:42:21 - INFO - Total missing: 612/1000 (61.2%)
2025-11-25 15:42:21 - INFO -
2025-11-25 15:42:21 - INFO - in_reply_to_id:
2025-11-25 15:42:21 - INFO - Null values: 492/1000 (49.2%)
2025-11-25 15:42:21 - INFO - Total missing: 492/1000 (49.2%)
2025-11-25 15:42:21 - INFO -
2025-11-25 15:42:21 - INFO - --- METRICS FIELDS QUALITY CHECK ---
2025-11-25 15:42:21 - INFO - --- QUALITY SUMMARY ---
2025-11-25 15:42:21 - INFO - Total fields checked: 15
2025-11-25 15:42:21 - INFO - Fields with missing data: 3
2025-11-25 15:42:21 - INFO - Sample size: 1,000 records
2025-11-25 15:42:21 - INFO -
✓ Step 4 Complete - Data quality check done
2025-11-25 15:42:21 - INFO -
======================================================================
2025-11-25 15:42:21 - INFO - ✓ PHASE 4 for 'pr_review_comments' COMPLETE
2025-11-25 15:42:21 - INFO - ======================================================================
2025-11-25 15:42:21 - INFO - Dataset loaded: 19450 records
2025-11-25 15:42:21 - INFO - Configuration: pr_review_comments
2025-11-25 15:42:21 - INFO - Primary split: train
2025-11-25 15:42:21 - INFO - Cache status: downloaded
2025-11-25 15:42:21 - INFO - Comment fields: ['body']
2025-11-25 15:42:21 - INFO - ======================================================================
2025-11-25 15:42:21 - INFO - PHASE 4: DATASET EXPLORATION (Question 2)
2025-11-25 15:42:21 - INFO - ======================================================================
2025-11-25 15:42:21 - INFO - This phase will:
2025-11-25 15:42:21 - INFO - 1. Load dataset from HuggingFace (or cache)
2025-11-25 15:42:21 - INFO - 2. Understand the dataset schema
2025-11-25 15:42:21 - INFO - 3. Identify comment text fields
2025-11-25 15:42:21 - INFO - 4. Check data quality
2025-11-25 15:42:21 - INFO -
2025-11-25 15:42:21 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:21 - INFO - [STEP 1/4] LOAD DATASET FROM HUGGINGFACE
2025-11-25 15:42:21 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:21 - INFO - Dataset: hao-li/AIDev
2025-11-25 15:42:21 - INFO - Config: pr_reviews
2025-11-25 15:42:21 - INFO - Cache location: /Users/ryan/Library/Mobile Documents/com~apple~CloudDocs/MySchool/umflint/courses-icloud/swe535/presentations/msr2026/code/v2q1/.cache/huggingface/datasets
2025-11-25 15:42:21 - INFO - Downloading config 'pr_reviews' from HuggingFace...
2025-11-25 15:42:21 - INFO - (First download may take a minute)
2025-11-25 15:42:21 - INFO - Will be cached to: /Users/ryan/Library/Mobile Documents/com~apple~CloudDocs/MySchool/umflint/courses-icloud/swe535/presentations/msr2026/code/v2q1/.cache/huggingface/datasets
2025-11-25 15:42:21 - INFO -
2025-11-25 15:42:22 - INFO - ✓ Dataset loaded successfully!
2025-11-25 15:42:22 - INFO - Available splits: ['train']
2025-11-25 15:42:22 - INFO - - Split 'train': 28,875 records
2025-11-25 15:42:22 - INFO -
✓ Step 1 Complete - Config 'pr_reviews' ready (28,875 records)
2025-11-25 15:42:22 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:22 - INFO - [STEP 2/4] UNDERSTAND DATASET SCHEMA
2025-11-25 15:42:22 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:22 - INFO - Analyzing primary split: 'train'
2025-11-25 15:42:22 - INFO - Number of records: 28,875
2025-11-25 15:42:22 - INFO -
--- DATASET FEATURES (COLUMNS) ---
2025-11-25 15:42:22 - INFO - • id
2025-11-25 15:42:22 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:42:22 - INFO - • pr_id
2025-11-25 15:42:22 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:42:22 - INFO - • user
2025-11-25 15:42:22 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:22 - INFO - • user_type
2025-11-25 15:42:22 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:22 - INFO - • state
2025-11-25 15:42:22 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:22 - INFO - • submitted_at
2025-11-25 15:42:22 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:22 - INFO - • body
2025-11-25 15:42:22 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:22 - INFO -
--- SAMPLE RECORD (First Entry) ---
2025-11-25 15:42:22 - INFO - id: 2885691382
2025-11-25 15:42:22 - INFO - pr_id: 3107321792
2025-11-25 15:42:22 - INFO - user: coderabbitai[bot]
2025-11-25 15:42:22 - INFO - user_type: Bot
2025-11-25 15:42:22 - INFO - state: COMMENTED
2025-11-25 15:42:22 - INFO - submitted_at: 2025-06-01T14:22:22Z
2025-11-25 15:42:22 - INFO - body: **Actionable comments posted: 2**
<details>
<summary>🔭 Outside diff range comments (1)</summary><blockquote>
<details>
<summary>ocode_python/tools/b... [truncated]
2025-11-25 15:42:22 - INFO -
✓ Step 2 Complete - Analyzed 7 fields
2025-11-25 15:42:22 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:22 - INFO - [STEP 3/4] IDENTIFY COMMENT FIELDS
2025-11-25 15:42:22 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:22 - INFO - Analyzing config: 'pr_reviews'
2025-11-25 15:42:22 - INFO - Searching for comment text fields...
2025-11-25 15:42:22 - INFO - Target: fields containing review comment text
2025-11-25 15:42:22 - INFO -
2025-11-25 15:42:22 - INFO - ✓ Found comment field: 'body'
2025-11-25 15:42:22 - INFO - Sample: **Actionable comments posted: 2**
<details>
<summary>🔭 Outside diff range comments (1)</summary><blockquote>
<details>
<summary>ocode_python/tools/bash_tool.py (1)</summary><blockquote>
`440-459`: ...
2025-11-25 15:42:22 - INFO - Type: string
2025-11-25 15:42:22 - INFO -
2025-11-25 15:42:22 - INFO - --- COMMENT FIELDS SUMMARY ---
2025-11-25 15:42:22 - INFO - Config analyzed: 'pr_reviews'
2025-11-25 15:42:22 - INFO - Comment fields found: ['body']
2025-11-25 15:42:22 - INFO -
2025-11-25 15:42:22 - INFO - ======================================================================
2025-11-25 15:42:22 - INFO - ✓ SUCCESS: Found comment text fields!
2025-11-25 15:42:22 - INFO - ======================================================================
2025-11-25 15:42:22 - INFO -
2025-11-25 15:42:22 - INFO - ✓ Comment text fields: ['body']
2025-11-25 15:42:22 - INFO -
2025-11-25 15:42:22 - INFO - This config contains the comment data needed for analysis.
2025-11-25 15:42:22 - INFO -
2025-11-25 15:42:22 - INFO - ✓ Step 3 Complete - Comment field identification done
2025-11-25 15:42:22 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:22 - INFO - [STEP 4/4] CHECK DATA QUALITY
2025-11-25 15:42:22 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:22 - INFO - Checking data quality on sample of 1,000 records
2025-11-25 15:42:22 - INFO - (Total dataset: 28,875 records)
2025-11-25 15:42:22 - INFO -
2025-11-25 15:42:22 - INFO - --- MISSING/NULL VALUE ANALYSIS ---
2025-11-25 15:42:22 - INFO - body:
2025-11-25 15:42:22 - INFO - Null values: 640/1000 (64.0%)
2025-11-25 15:42:22 - INFO - Total missing: 640/1000 (64.0%)
2025-11-25 15:42:22 - INFO -
2025-11-25 15:42:22 - INFO - --- METRICS FIELDS QUALITY CHECK ---
2025-11-25 15:42:22 - INFO - --- QUALITY SUMMARY ---
2025-11-25 15:42:22 - INFO - Total fields checked: 7
2025-11-25 15:42:22 - INFO - Fields with missing data: 1
2025-11-25 15:42:22 - INFO - Sample size: 1,000 records
2025-11-25 15:42:22 - INFO -
✓ Step 4 Complete - Data quality check done
2025-11-25 15:42:22 - INFO -
======================================================================
2025-11-25 15:42:22 - INFO - ✓ PHASE 4 for 'pr_reviews' COMPLETE
2025-11-25 15:42:22 - INFO - ======================================================================
2025-11-25 15:42:22 - INFO - Dataset loaded: 28875 records
2025-11-25 15:42:22 - INFO - Configuration: pr_reviews
2025-11-25 15:42:22 - INFO - Primary split: train
2025-11-25 15:42:22 - INFO - Cache status: downloaded
2025-11-25 15:42:22 - INFO - Comment fields: ['body']
2025-11-25 15:42:22 - INFO - ======================================================================
2025-11-25 15:42:22 - INFO - PHASE 4: DATASET EXPLORATION (Question 2)
2025-11-25 15:42:22 - INFO - ======================================================================
2025-11-25 15:42:22 - INFO - This phase will:
2025-11-25 15:42:22 - INFO - 1. Load dataset from HuggingFace (or cache)
2025-11-25 15:42:22 - INFO - 2. Understand the dataset schema
2025-11-25 15:42:22 - INFO - 3. Identify comment text fields
2025-11-25 15:42:22 - INFO - 4. Check data quality
2025-11-25 15:42:22 - INFO -
2025-11-25 15:42:22 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:22 - INFO - [STEP 1/4] LOAD DATASET FROM HUGGINGFACE
2025-11-25 15:42:22 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:22 - INFO - Dataset: hao-li/AIDev
2025-11-25 15:42:22 - INFO - Config: pr_comments
2025-11-25 15:42:22 - INFO - Cache location: /Users/ryan/Library/Mobile Documents/com~apple~CloudDocs/MySchool/umflint/courses-icloud/swe535/presentations/msr2026/code/v2q1/.cache/huggingface/datasets
2025-11-25 15:42:22 - INFO - Downloading config 'pr_comments' from HuggingFace...
2025-11-25 15:42:22 - INFO - (First download may take a minute)
2025-11-25 15:42:22 - INFO - Will be cached to: /Users/ryan/Library/Mobile Documents/com~apple~CloudDocs/MySchool/umflint/courses-icloud/swe535/presentations/msr2026/code/v2q1/.cache/huggingface/datasets
2025-11-25 15:42:22 - INFO -
2025-11-25 15:42:23 - INFO - ✓ Dataset loaded successfully!
2025-11-25 15:42:23 - INFO - Available splits: ['train']
2025-11-25 15:42:23 - INFO - - Split 'train': 39,122 records
2025-11-25 15:42:23 - INFO -
✓ Step 1 Complete - Config 'pr_comments' ready (39,122 records)
2025-11-25 15:42:23 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:23 - INFO - [STEP 2/4] UNDERSTAND DATASET SCHEMA
2025-11-25 15:42:23 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:23 - INFO - Analyzing primary split: 'train'
2025-11-25 15:42:23 - INFO - Number of records: 39,122
2025-11-25 15:42:23 - INFO -
--- DATASET FEATURES (COLUMNS) ---
2025-11-25 15:42:23 - INFO - • id
2025-11-25 15:42:23 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:42:23 - INFO - • pr_id
2025-11-25 15:42:23 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:42:23 - INFO - • user
2025-11-25 15:42:23 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:23 - INFO - • user_id
2025-11-25 15:42:23 - INFO - Type: Value(dtype='int64', id=None)
2025-11-25 15:42:23 - INFO - • user_type
2025-11-25 15:42:23 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:23 - INFO - • created_at
2025-11-25 15:42:23 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:23 - INFO - • body
2025-11-25 15:42:23 - INFO - Type: Value(dtype='string', id=None)
2025-11-25 15:42:23 - INFO -
--- SAMPLE RECORD (First Entry) ---
2025-11-25 15:42:23 - INFO - id: 2927293042
2025-11-25 15:42:23 - INFO - pr_id: 3107321792
2025-11-25 15:42:23 - INFO - user: coderabbitai[bot]
2025-11-25 15:42:23 - INFO - user_id: 136622811
2025-11-25 15:42:23 - INFO - user_type: Bot
2025-11-25 15:42:23 - INFO - created_at: 2025-06-01T14:15:35Z
2025-11-25 15:42:23 - INFO - body: <!-- This is an auto-generated comment: summarize by coderabbit.ai -->
<!-- walkthrough_start -->
## Walkthrough
This update refines shell command p... [truncated]
2025-11-25 15:42:23 - INFO -
✓ Step 2 Complete - Analyzed 7 fields
2025-11-25 15:42:23 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:23 - INFO - [STEP 3/4] IDENTIFY COMMENT FIELDS
2025-11-25 15:42:23 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:23 - INFO - Analyzing config: 'pr_comments'
2025-11-25 15:42:23 - INFO - Searching for comment text fields...
2025-11-25 15:42:23 - INFO - Target: fields containing review comment text
2025-11-25 15:42:23 - INFO -
2025-11-25 15:42:23 - INFO - ✓ Found comment field: 'body'
2025-11-25 15:42:23 - INFO - Sample: <!-- This is an auto-generated comment: summarize by coderabbit.ai -->
<!-- walkthrough_start -->
## Walkthrough
This update refines shell command preparation and input handling in `BashTool`, impro...
2025-11-25 15:42:23 - INFO - Type: string
2025-11-25 15:42:23 - INFO -
2025-11-25 15:42:23 - INFO - --- COMMENT FIELDS SUMMARY ---
2025-11-25 15:42:23 - INFO - Config analyzed: 'pr_comments'
2025-11-25 15:42:23 - INFO - Comment fields found: ['body']
2025-11-25 15:42:23 - INFO -
2025-11-25 15:42:23 - INFO - ======================================================================
2025-11-25 15:42:23 - INFO - ✓ SUCCESS: Found comment text fields!
2025-11-25 15:42:23 - INFO - ======================================================================
2025-11-25 15:42:23 - INFO -
2025-11-25 15:42:23 - INFO - ✓ Comment text fields: ['body']
2025-11-25 15:42:23 - INFO -
2025-11-25 15:42:23 - INFO - This config contains the comment data needed for analysis.
2025-11-25 15:42:23 - INFO -
2025-11-25 15:42:23 - INFO - ✓ Step 3 Complete - Comment field identification done
2025-11-25 15:42:23 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:23 - INFO - [STEP 4/4] CHECK DATA QUALITY
2025-11-25 15:42:23 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:23 - INFO - Checking data quality on sample of 1,000 records
2025-11-25 15:42:23 - INFO - (Total dataset: 39,122 records)
2025-11-25 15:42:23 - INFO -
2025-11-25 15:42:23 - INFO - --- MISSING/NULL VALUE ANALYSIS ---
2025-11-25 15:42:23 - INFO - --- METRICS FIELDS QUALITY CHECK ---
2025-11-25 15:42:23 - INFO - --- QUALITY SUMMARY ---
2025-11-25 15:42:23 - INFO - Total fields checked: 7
2025-11-25 15:42:23 - INFO - Fields with missing data: 0
2025-11-25 15:42:23 - INFO - Sample size: 1,000 records
2025-11-25 15:42:23 - INFO -
✓ Step 4 Complete - Data quality check done
2025-11-25 15:42:23 - INFO -
======================================================================
2025-11-25 15:42:23 - INFO - ✓ PHASE 4 for 'pr_comments' COMPLETE
2025-11-25 15:42:23 - INFO - ======================================================================
2025-11-25 15:42:23 - INFO - Dataset loaded: 39122 records
2025-11-25 15:42:23 - INFO - Configuration: pr_comments
2025-11-25 15:42:23 - INFO - Primary split: train
2025-11-25 15:42:23 - INFO - Cache status: downloaded
2025-11-25 15:42:23 - INFO - Comment fields: ['body']
2025-11-25 15:42:27 - INFO - ======================================================================
2025-11-25 15:42:27 - INFO - PHASE 5: DATA EXTRACTION (Question 2)
2025-11-25 15:42:27 - INFO - ======================================================================
2025-11-25 15:42:27 - INFO - Goal: Extract and categorize review comments per PR
2025-11-25 15:42:27 - INFO - Categories: correctness, style, security, testing, other
2025-11-25 15:42:27 - INFO -
2025-11-25 15:42:27 - INFO -
----------------------------------------------------------------------
2025-11-25 15:42:27 - INFO - EXTRACTING PER-PR REVIEW METRICS
2025-11-25 15:42:27 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:27 - INFO - Source datasets:
2025-11-25 15:42:27 - INFO - - pull_request: 33,596 PRs (metadata)
2025-11-25 15:42:27 - INFO - - pr_review_comments: 19,450 records
2025-11-25 15:42:27 - INFO - - pr_reviews: 28,875 records
2025-11-25 15:42:27 - INFO - - pr_comments: 39,122 records
2025-11-25 15:42:27 - INFO - Total comment records: 87,447
2025-11-25 15:42:27 - INFO -
2025-11-25 15:42:27 - INFO - Step 1: Loading PR metadata...
2025-11-25 15:42:27 - INFO - Loaded 33,596 PRs
2025-11-25 15:42:27 - INFO -
Step 2: Discovering schemas and aggregating comments...
2025-11-25 15:42:27 - INFO - This may take a minute for 87k records...
2025-11-25 15:42:27 - INFO -
2025-11-25 15:42:27 - INFO - Processing pr_review_comments...
2025-11-25 15:42:27 - INFO - pr_review_comments fields: ['id', 'pull_request_review_id', 'user', 'user_type', 'diff_hunk', 'path', 'position', 'original_position', 'commit_id', 'original_commit_id', 'body', 'pull_request_url', 'created_at', 'updated_at', 'in_reply_to_id']
2025-11-25 15:42:27 - INFO - → Using 'id' as PR identifier
2025-11-25 15:42:27 - INFO - Processed 5,000 / 19,450...
2025-11-25 15:42:27 - INFO - Processed 10,000 / 19,450...
2025-11-25 15:42:28 - INFO - Processed 15,000 / 19,450...
2025-11-25 15:42:28 - INFO - ✓ Processed 19,450 pr_review_comments
2025-11-25 15:42:28 - INFO -
2025-11-25 15:42:28 - INFO - Processing pr_reviews...
2025-11-25 15:42:28 - INFO - pr_reviews fields: ['id', 'pr_id', 'user', 'user_type', 'state', 'submitted_at', 'body']
2025-11-25 15:42:28 - INFO - → Using 'pr_id' as PR identifier
2025-11-25 15:42:28 - INFO - Processed 5,000 / 28,875...
2025-11-25 15:42:29 - INFO - Processed 10,000 / 28,875...
2025-11-25 15:42:29 - INFO - Processed 15,000 / 28,875...
2025-11-25 15:42:29 - INFO - Processed 20,000 / 28,875...
2025-11-25 15:42:29 - INFO - Processed 25,000 / 28,875...
2025-11-25 15:42:29 - INFO - ✓ Processed 28,875 pr_reviews
2025-11-25 15:42:29 - INFO -
2025-11-25 15:42:29 - INFO - Processing pr_comments...
2025-11-25 15:42:29 - INFO - pr_comments fields: ['id', 'pr_id', 'user', 'user_id', 'user_type', 'created_at', 'body']
2025-11-25 15:42:29 - INFO - → Using 'pr_id' as PR identifier
2025-11-25 15:42:29 - INFO - Processed 5,000 / 39,122...
2025-11-25 15:42:30 - INFO - Processed 10,000 / 39,122...
2025-11-25 15:42:30 - INFO - Processed 15,000 / 39,122...
2025-11-25 15:42:30 - INFO - Processed 20,000 / 39,122...
2025-11-25 15:42:30 - INFO - Processed 25,000 / 39,122...
2025-11-25 15:42:30 - INFO - Processed 30,000 / 39,122...
2025-11-25 15:42:31 - INFO - Processed 35,000 / 39,122...
2025-11-25 15:42:31 - INFO - ✓ Processed 39,122 pr_comments
2025-11-25 15:42:31 - INFO - ✓ Aggregated comments for 33,464 unique PRs
2025-11-25 15:42:31 - INFO - ✓ Total valid comments: 66,511
2025-11-25 15:42:31 - INFO -
2025-11-25 15:42:31 - INFO - Step 3: Classifying comments into categories...
2025-11-25 15:42:32 - INFO - Classifying comments into categories...
2025-11-25 15:42:32 - INFO - Categories: correctness, style, security, testing, other
2025-11-25 15:42:32 - INFO -
2025-11-25 15:42:32 - INFO - Total comments to classify: 66,511
2025-11-25 15:42:32 - INFO -
2025-11-25 15:42:32 - INFO - Classified 10,000 / 66,511 comments...
2025-11-25 15:42:33 - INFO - Classified 20,000 / 66,511 comments...
2025-11-25 15:42:34 - INFO - Classified 30,000 / 66,511 comments...
2025-11-25 15:42:35 - INFO - Classified 40,000 / 66,511 comments...
2025-11-25 15:42:36 - INFO - Classified 50,000 / 66,511 comments...
2025-11-25 15:42:37 - INFO - Classified 60,000 / 66,511 comments...
2025-11-25 15:42:38 - INFO - ✓ Classified all 66,511 comments
2025-11-25 15:42:38 - INFO - ✓ Processed 33,464 PRs
2025-11-25 15:42:38 - INFO -
2025-11-25 15:42:38 - INFO - Category distribution across all comments:
2025-11-25 15:42:38 - INFO - Correctness: 19,186 (28.8%)
2025-11-25 15:42:38 - INFO - Style: 11,500 (17.3%)
2025-11-25 15:42:38 - INFO - Security: 2,814 (4.2%)
2025-11-25 15:42:38 - INFO - Testing: 12,688 (19.1%)
2025-11-25 15:42:38 - INFO - Other: 20,323 (30.6%)
2025-11-25 15:42:38 - INFO -
2025-11-25 15:42:38 - INFO - Step 4: Building metrics DataFrame...
2025-11-25 15:42:38 - INFO - ✓ Created metrics DataFrame: 33,464 rows
2025-11-25 15:42:38 - INFO -
Step 5: Determining primary category per PR...
2025-11-25 15:42:38 - INFO - ✓ Added primary_category column
2025-11-25 15:42:38 - INFO -
Step 6: Merging with PR metadata...
2025-11-25 15:42:38 - INFO - ✓ Final DataFrame: 33,596 PRs total
2025-11-25 15:42:38 - INFO - ✓ PRs with comments: 14,015
2025-11-25 15:42:38 - INFO - ✓ PRs without comments: 19,581
2025-11-25 15:42:38 - INFO -
======================================================================
2025-11-25 15:42:38 - INFO - SUMMARY STATISTICS
2025-11-25 15:42:38 - INFO - ======================================================================
2025-11-25 15:42:38 - INFO -
Dataset Totals:
2025-11-25 15:42:38 - INFO - Total PRs: 33,596
2025-11-25 15:42:38 - INFO - PRs with review comments: 14,015
2025-11-25 15:42:38 - INFO - Total comments: 47,062
2025-11-25 15:42:38 - INFO - Average comments per PR: 3.4
2025-11-25 15:42:38 - INFO -
Category Distribution (Across All Comments):
2025-11-25 15:42:38 - INFO - Correctness: 15,529 (33.0%)
2025-11-25 15:42:38 - INFO - Style: 8,919 (19.0%)
2025-11-25 15:42:38 - INFO - Security: 1,480 (3.1%)
2025-11-25 15:42:38 - INFO - Testing: 9,421 (20.0%)
2025-11-25 15:42:38 - INFO - Other: 11,713 (24.9%)
2025-11-25 15:42:38 - INFO -
Primary Category Distribution (Per PR):
2025-11-25 15:42:38 - INFO - Correctness: 6,538 PRs (46.7%)
2025-11-25 15:42:38 - INFO - Style: 2,865 PRs (20.4%)
2025-11-25 15:42:38 - INFO - Other: 2,218 PRs (15.8%)
2025-11-25 15:42:38 - INFO - Testing: 2,155 PRs (15.4%)
2025-11-25 15:42:38 - INFO - Security: 239 PRs (1.7%)
2025-11-25 15:42:38 - INFO -
2025-11-25 15:42:38 - INFO - ----------------------------------------------------------------------
2025-11-25 15:42:38 - INFO - Saving results to CSV...
2025-11-25 15:42:38 - INFO - ✓ Saved to: review_metrics.csv
2025-11-25 15:42:38 - INFO -
✓ Extraction complete!