Camel K Operator Monitoring
The Camel K monitoring architecture relies on Prometheus and the eponymous operator. Make sure you’ve checked the Camel K monitoring prerequisites. |
Installation
The kamel install
command provides the --monitoring
option flag, that can be used to automatically creates the default resources required to monitor the Camel K operator, e.g.:
$ kamel install --monitoring=true
This creates:
The kamel install
command also provides the --monitoring-port
option, that can be used to change the port of the operator monitoring endpoint, e.g.:
$ kamel install --monitoring=true --monitoring-port=8888
Metrics
The Camel K operator monitoring endpoint exposes the following metrics:
Name | Type | Description | Buckets | Labels |
---|---|---|---|---|
|
|
Reconciliation request duration |
0.25s, 0.5s, 1s, 5s |
|
|
|
Build duration |
30s, 1m, 1.5m, 2m, 5m, 10m |
|
|
|
Build recovery attempts |
0, 1, 2, 3, 4, 5 |
|
|
|
Build queue duration |
5s, 15s, 30s, 1m, 5m, |
N/A |
|
|
Time to first integration readiness |
5s, 10s, 30s, 1m, 2m |
N/A |
Discovery
A PodMonitor
resource must be created for the Prometheus Operator to reconcile, so that the managed Prometheus instance can scrape the Camel K operator metrics endpoint.
As an example, hereafter is the PodMonitor
resource that is created when executing the kamel install --monitoring=true
command:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: camel-k-operator
labels: (1)
...
spec:
selector:
matchLabels: (2)
app: "camel-k"
camel.apache.org/component: operator
podMetricsEndpoints:
- port: metrics
1 | The labels must match the podMonitorSelector field from the Prometheus resource |
2 | This label selector matches the Camel K operator Deployment labels |
The Prometheus Operator getting started guide documents the discovery mechanism, as well as the relationship between the operator resources.
In case your operator metrics are not discovered, you may want to rely on Troubleshooting ServiceMonitor
changes, which also applies to PodMonitor
resources troubleshooting.
Alerting
The Prometheus Operator declares the AlertManager resource that can be used to configure Alertmanager instances, along with Prometheus instances. The following section assumes an AlertManager resource already exists in your cluster.
|
A PrometheusRule
resource can be created for the Prometheus Operator to reconcile, so that the managed AlertManager instance can trigger alerts, based on the metrics exposed by the Camel K operator.
As an example, hereafter is the alerting rules that are defined in PrometheusRule
resource that is created when executing the kamel install --monitoring=true
command:
Name | Severity | Description |
---|---|---|
|
warning |
More than 10% of the reconciliation requests have their duration above 0.5s over at least 1 min. |
|
warning |
More that 1% of the reconciliation requests have failed over at least 10 min. |
|
warning |
More that 10% of the successful builds have their duration above 2 min over at least 1 min. |
|
critical |
More than 1% of the successful builds have their duration above 5 min over at least 1 min. |
|
critical |
More that 1% of the builds for have errored over at least 10 min. |
|
warning |
More that 1% of the builds have been queued for more than 1 min over at least 1 min. |
|
critical |
More that 1% of the builds have been queued for more than 5 min over at least 1 min. |
You can register your own PrometheusRule
resources, to be used by Prometheus AlertManager instances to trigger alerts, e.g.:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: camel-k-alerts
spec:
groups:
- name: camel-k-alerts
rules:
- alert: CamelKIntegrationTimeToReadiness
expr: |
(
1 - sum(rate(camel_k_integration_first_readiness_seconds_bucket{le="60"}[5m])) by (job)
/
sum(rate(camel_k_integration_first_readiness_seconds_count[5m])) by (job)
)
* 100
> 10
for: 1m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the integrations
for {{ $labels.job }} have their first time to readiness above 1m.
More information can be found in the Prometheus Operator Alerting user guide. You can also find more details in Creating alerting rules from the OpenShift documentation.