-
Notifications
You must be signed in to change notification settings - Fork 178
chore: Apache HBase modernization #1537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
chore: Apache HBase modernization #1537
Conversation
…pache-hbase-modernization
Dasomeone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One teensy comment from my end, otherwise looks fantastic!
aalhour
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few requests for changes, below is a summary:
- Do we want to add
{%(filteringSelector)s}to the alerts below? - There are a few raw signals in regionserver and cluster files that we might want to change to counter type with a
rangeFunction: 'rate'or convert the query torate(...), orincrease(...). - There is a counter metric (
totalRequestRate) that might want arangeFunction: 'rate'added to it - I noticed that the
masterQueueSizemetric has a weird unit ('decbytes') but it's a number (total items in a queue), shouldn't it beshort? - There are a few duplicate signals between regionserver and cluster files with the same key, metric but slightly different queries (outlined below)
| { | ||
| alert: 'HBaseDeadRegionServer', | ||
| expr: ||| | ||
| server_num_dead_region_servers > %(alertsDeadRegionServer)s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use the filteringSelector?
apache-hbase-mixin/alerts.libsonnet
Outdated
| { | ||
| alert: 'HBaseOldRegionsInTransition', | ||
| expr: ||| | ||
| 100 * assignment_manager_rit_count_over_threshold / clamp_min(assignment_manager_rit_count, 1) > %(alertsOldRegionsInTransition)s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use the filteringSelector?
apache-hbase-mixin/alerts.libsonnet
Outdated
| { | ||
| alert: 'HBaseHighMasterAuthFailRate', | ||
| expr: ||| | ||
| 100 * rate(master_authentication_failures[5m]) / (clamp_min(rate(master_authentication_successes[5m]), 1) + clamp_min(rate(master_authentication_failures[5m]), 1)) > %(alertsHighMasterAuthFailRate)s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use the filteringSelector?
apache-hbase-mixin/alerts.libsonnet
Outdated
| { | ||
| alert: 'HBaseHighRSAuthFailRate', | ||
| expr: ||| | ||
| 100 * rate(region_server_authentication_failures[5m]) / (clamp_min(rate(region_server_authentication_successes[5m]), 1) + clamp_min(rate(region_server_authentication_failures[5m]), 1)) > %(alertsHighRSAuthFailRate)s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use the filteringSelector?
| unit: 'reqps', | ||
| sources: { | ||
| prometheus: { | ||
| expr: 'sum by(' + groupAggList + ') (master_authentication_successes{%(queriesSelector)s})', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example: This signal reads as "Rate of ..." but the query doesn't use the rate(...) function, which most likely will lead them to be displayed as as raw counter values instead of rates.
| unit: 'reqps', | ||
| sources: { | ||
| prometheus: { | ||
| expr: 'sum by(' + groupAggList + ') (rate(region_server_authentication_successes{%(queriesSelector)s}[$__rate_interval]))', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a raw metric that uses a rate function correctly but it's also a duplicate of the same signal and same metric in cluster.libsonnet file but that one doesn't use the rate function as this one. Can we remove it from the clusters signals?
| unit: 'reqps', | ||
| sources: { | ||
| prometheus: { | ||
| expr: 'sum by(' + groupAggList + ') (region_server_authentication_successes{%(queriesSelector)s})', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate of the same signal with the same metric (but different query) in regionserver.libsonnet: https://github.com/grafana/jsonnet-libs/pull/1537/files#r2560283881
| unit: 'reqps', | ||
| sources: { | ||
| prometheus: { | ||
| expr: 'sum by(' + groupAggList + ') (region_server_authentication_failures{%(queriesSelector)s})', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another duplicate of the same signal in regionserver.libsonnet file.
| unit: 'reqps', | ||
| sources: { | ||
| prometheus: { | ||
| expr: 'sum by(' + groupAggList + ') (rate(region_server_authentication_failures{%(queriesSelector)s}[$__rate_interval]))', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a correct raw signal that uses a rate function but it's a duplicate of another one in cluster.libsonnet, can we remove the other one from the cluster signals file?
apache-hbase-mixin/panels.libsonnet
Outdated
| g.panel.table.standardOptions.withLinks([ | ||
| { | ||
| title: '', | ||
| url: '/d/apachehbase-regionserver-overview?from=${__from}&to=${__to}&var-instance=${__data.fields["RegionServer"]}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to use the links from links.libsonnet?
…pache-hbase-modernization
This PR modernizes the Apache HBase mixin to use commonlib and g.libsonnet to generate its dashboards. This will make future stylizations easier to accomplish :)
Feel free to use this PR to generate a Sample environment: grafana/integration-sample-apps#102


Cluster Overview:
RegionServer Overview:


Logs:
