Skip to content
This repository was archived by the owner on Apr 17, 2018. It is now read-only.

Link parsing is CPU intensive #41

@keith-turner

Description

@keith-turner

While running webindex on EC2 I have noticed the link parsing done by the load task is very CPU intensive. This is usually the bottleneck for loading data when running one load task per node.

For example on a 20 node m3.xlarge EC2 cluster with 20 load task running, the maximum load rate is around 1000 pages/sec. As load increases on the system from having more data (caused by compactions, etc), this takes more CPU and causes the load rate to drop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions