Skip to content

[RFC] PPL Chart Command #399

@penghuo

Description

@penghuo

Problem Statements

In observability, user usually try to ask TopN question to find root cause. For example, user want to find out, on last two week, top 5 high CPU utilization host and visualize CPU utilization of hosts on each day. Currently, PPL does not support this feature in different ways

  • No single command, using stats max(cpu) maxcpu by host | sort 5 maxcpu could return top5 host which have high cpu utilization. If user want to visualize each day's max cpu utilization of these 5 hosts, user must compose another query, e.g. where host in (h1, h2, h3, h4, h5) | status max(cpu) maxcpu by span(timestamp) host.
  • Response Format, PPL current response is row format, but visualization library prefer series data. e.g. Using series data, visualization library could use first column as x-axis, and other columns as series.
# row format
day  host  cpu
1     h1    30
1     h2    20
2     h1    40
3     h2    30

# series format
day  h1    h2
1    30    20
2    40    30

Requirements

Chart Command

Overview

Create a chart with a corresponding table of statistics.

Syntax

chart <aggregation> OVER <row-split> [BY <column-split> [limit=(top|bottom)<int>]] 
  • aggregation: A statistical aggregation function.
  • limit: Specifies a limit for the number of distinct values of column-split field to return, Default=top 5.
  • row-split: The field that you specify becomes the first column in the results table. The field values become the row labels in the results table. In a chart, the field name is used to label the X-axis. The field values become the X-axis values.
  • column-split: Specifies a field to use as the columns in the result table. By default, when the result are visualized, the columns become the data series in the chart.

Example

  • Example 1. For each response status, visualize top2 host’s request count. Top2 means, in a period, Top2 host has most request count.

Note: One aggregation, one row-split field, one column-split field.

chart sum(req) OVER status BY host limit top=2

# sample response
status host1 host2
200    100   200
404    10    20
503    1     2
  • Example 2. For each day, visualize top2 host’s cpu utilization. Top2 means, in a period, Top2 host has highest cpu utilization.

Note: One aggregation, one span row-split field, one column-split field.

chart max(cpu) OVER span(timestamp, 1d) BY host limit top=2

# sample response
timestamp host3 host4
Day1      30    20
Day2      20    70
Day3      50    50
  • Example 3. For each day, visualize top2 host cpu utilization and request count. Top2 means, in a period, Top2 host has highest cpu utilization.

Note: Two aggregations, one span row-split field, one column-split field.

chart max(cpu), sum(req) OVER span(timestamp, 1d) BY host limit top=2

# sample response
max(cpu)
timestamp host3 host4
Day1      30    20
Day2      20    70
Day3      50    50

sum(req)
timestamp host3 host4
Day1      200   100
Day2      100   900
Day3      800   500
  • Example 4. For each response status, total request count
chart sum(req) OVER status

# sample response
status sum
200    300
404    30
503    3

Metadata

Metadata

Assignees

Labels

PPLPiped processing languageRFCRequest For CommentsenhancementNew feature or requestfeature

Type

No type

Projects

Status

New

Status

Not Started

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions