- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.1k
 
feat: parallelization #352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from 11 commits
2599712
              0c655fc
              641e182
              352447b
              4ffbcba
              4c0eb30
              bd25950
              8360d3d
              38539d5
              e0b52c1
              b2e8332
              7d0339a
              c4c77e6
              5c6f478
              5becc84
              5e8a6d1
              73c2cc3
              6984401
              7105749
              a931a13
              5411484
              7650794
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| export interface WorkerConfig { | ||
| workerIndex: number; | ||
| startIndex: number; | ||
| endIndex: number; | ||
| batchSize: number; | ||
| pageUrls: string[]; | ||
| listSelector: string; | ||
| fields: any; | ||
| pagination: { | ||
| type: string; | ||
| selector: string; | ||
| }; | ||
| } | ||
| 
     | 
||
| export interface SharedState { | ||
| totalScraped: number; | ||
| results: any[]; | ||
| } | ||
| 
         
      Comment on lines
    
      +15
     to 
      +18
    
   
  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Codebase verification Thread safety and type safety concerns are valid issues that need addressing The codebase shows: 
 Suggested improvements: 
 🔗 Analysis chainConsider thread safety and type safety for SharedState. The shared state between workers needs careful consideration: 
 Consider using a thread-safe data structure or implementing proper synchronization mechanisms. Also, define a generic type parameter: -export interface SharedState {
+export interface SharedState<T = unknown> {
     totalScraped: number;
-    results: any[];
+    results: T[];
 }🏁 Scripts executedThe following scripts were executed for the analysis: Error: No shell scripts found Script: #!/bin/bash
# Find all files importing or using SharedState
rg "SharedState" -l
# Look for actual usage patterns of SharedState
rg "SharedState" -A 5 -B 5
# Check for any thread synchronization patterns
rg -i "sync|mutex|lock|atomic" maxun-core/src/
# Look for type definitions related to worker results
ast-grep --pattern 'type $_ = $_' maxun-core/src/types/
Length of output: 12076  | 
||
| 
     | 
||
| export interface WorkerProgressData { | ||
| percentage: number; | ||
| currentUrl: string; | ||
| scrapedItems: number; | ||
| timeElapsed: number; | ||
| estimatedTimeRemaining: number; | ||
| failures: number; | ||
| performance: PerformanceMetrics; | ||
| } | ||
| 
     | 
||
| export interface PerformanceMetrics { | ||
| startTime: number; | ||
| endTime: number; | ||
| duration: number; | ||
| pagesProcessed: number; | ||
| itemsScraped: number; | ||
| failedPages: number; | ||
| averageTimePerPage: number; | ||
| memoryUsage: { | ||
| heapUsed: number; | ||
| heapTotal: number; | ||
| external: number; | ||
| rss: number; | ||
| }; | ||
| cpuUsage: { | ||
| user: number; | ||
| system: number; | ||
| }; | ||
| } | ||
| 
     | 
||
| export interface GlobalMetrics { | ||
| totalPagesProcessed: number; | ||
| totalItemsScraped: number; | ||
| totalFailures: number; | ||
| workersActive: number; | ||
| averageSpeed: number; | ||
| timeElapsed: number; | ||
| memoryUsage: NodeJS.MemoryUsage; | ||
| cpuUsage: NodeJS.CpuUsage; | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Modularize the pagination navigation logic.
The navigation logic is complex and could benefit from being split into smaller, focused functions for better maintainability and testing.
Consider extracting these functionalities:
Example refactor for URL collection: