From 3c19a9308dac63ae8105c27d927146fbd4ce9baa Mon Sep 17 00:00:00 2001 From: Ravind Kumar Date: Thu, 14 Sep 2023 02:23:37 -0400 Subject: [PATCH] DOCS-987: Reorganizing list.md for better RST compatibility (#18027) --- docs/metrics/prometheus/list.md | 176 +++++++++++++++++++++++++++----- 1 file changed, 151 insertions(+), 25 deletions(-) diff --git a/docs/metrics/prometheus/list.md b/docs/metrics/prometheus/list.md index 664dc629a..3f143a0f6 100644 --- a/docs/metrics/prometheus/list.md +++ b/docs/metrics/prometheus/list.md @@ -1,23 +1,40 @@ -# List of metrics reported cluster wide +# Cluster Metrics -Each metric includes a label for the server that calculated the metric. Each metric has a label for the server that generated the metric. +MinIO collects the following metrics at the cluster level. +Metrics may include one or more labels, such as the server that calculated that metric. -These metrics can be obtained from any MinIO server once per collection. +These metrics can be obtained from any MinIO server once per collection by using the following URL: -The replication metrics marked with * are only relevant for site replication, where metrics are published at the cluster level and not at bucket level. If bucket -replication is in use, these metrics are exported at the bucket level. +```shell +https://HOSTNAME:PORT/minio/metrics/v2/cluster +``` + +Replace ``HOSTNAME:PORT`` with the hostname of your MinIO deployment. +For deployments behind a load balancer, use the load balancer hostname instead of a single node hostname. + +## Audit Metrics | Name | Description | |:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_audit_failed_messages` | Total number of messages that failed to send since start. | | `minio_audit_target_queue_length` | Number of unsent messages in queue for target. | | `minio_audit_total_messages` | Total number of messages sent since start. | + +## Cache Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_cache_hits_total` | Total number of drive cache hits. | | `minio_cache_missed_total` | Total number of drive cache misses. | | `minio_cache_sent_bytes` | Total number of bytes served from cache. | | `minio_cache_total_bytes` | Total size of cache drive in bytes. | | `minio_cache_usage_info` | Total percentage cache usage, value of 1 indicates high and 0 low, label level is set as well. | | `minio_cache_used_bytes` | Current cache usage in bytes. | + +## Cluster Capacity Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_cluster_capacity_raw_free_bytes` | Total free capacity online in the cluster. | | `minio_cluster_capacity_raw_total_bytes` | Total capacity online in the cluster. | | `minio_cluster_capacity_usable_free_bytes` | Total free usable capacity online in the cluster. | @@ -30,21 +47,49 @@ replication is in use, these metrics are exported at the bucket level. | `minio_cluster_usage_deletemarker_total` | Total number of delete markers in a cluster | | `minio_cluster_usage_total_bytes` | Total cluster usage in bytes | | `minio_cluster_buckets_total` | Total number of buckets in the cluster | + +## Cluster Drive Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_cluster_drive_offline_total` | Total drives offline in this cluster. | | `minio_cluster_drive_online_total` | Total drives online in this cluster. | | `minio_cluster_drive_total` | Total drives in this cluster. | + +## Cluster ILM Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_cluster_ilm_transitioned_bytes` | Total bytes transitioned to a tier. | | `minio_cluster_ilm_transitioned_objects` | Total number of objects transitioned to a tier. | | `minio_cluster_ilm_transitioned_versions` | Total number of versions transitioned to a tier. | + +## Cluster KMS Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_cluster_kms_online` | Reports whether the KMS is online (1) or offline (0). | | `minio_cluster_kms_request_error` | Number of KMS requests that failed due to some error. (HTTP 4xx status code). | | `minio_cluster_kms_request_failure` | Number of KMS requests that failed due to some internal failure. (HTTP 5xx status code). | | `minio_cluster_kms_request_success` | Number of KMS requests that succeeded. | | `minio_cluster_kms_uptime` | The time the KMS has been up and running in seconds. | + +## Cluster Health Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_cluster_nodes_offline_total` | Total number of MinIO nodes offline. | | `minio_cluster_nodes_online_total` | Total number of MinIO nodes online. | | `minio_cluster_write_quorum` | Maximum write quorum across all pools and sets | | `minio_cluster_health_status` | Get current cluster health status | + +## Cluster Replication Metrics + +Metrics marked as ``Site Replication Only`` only populate on deployments with [Site Replication](https://min.io/docs/minio/linux/operations/install-deploy-manage/multi-site-replication.html) configurations. +For deployments with [bucket](https://min.io/docs/minio/linux/administration/bucket-replication.html) or [batch](https://min.io/docs/minio/linux/administration/batch-framework.html#replicate) configurations, these metrics populate instead under the [Bucket Metrics](#bucket-metrics) endpoint. + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_cluster_replication_current_active_workers` | Total number of active replication workers | | `minio_cluster_replication_average_active_workers` | Average number of active replication workers | | `minio_cluster_replication_max_active_workers` | Maximum number of active replication workers seen since server start | @@ -64,29 +109,46 @@ replication is in use, these metrics are exported at the bucket level. | `minio_cluster_replication_max_queued_bytes` | Maximum number of bytes queued for replication seen since server start | | `minio_cluster_replication_max_queued_count` | Maximum number of objects queued for replication seen since server start | | `minio_cluster_replication_recent_backlog_count` | Total number of objects seen in replication backlog in the last 5 minutes | -| `minio_heal_objects_errors_total` | Objects for which healing failed in current self healing run. | | `minio_cluster_replication_last_minute_failed_bytes` | Total number of bytes failed at least once to replicate in the last full minute. | | `minio_cluster_replication_last_minute_failed_count` | Total number of objects which failed replication in the last full minute. | -| `minio_cluster_replication_last_hour_failed_bytes` | * Total number of bytes failed at least once to replicate in the last full hour. | -| `minio_cluster_replication_last_hour_failed_count` | * Total number of objects which failed replication in the last full hour. | -| `minio_cluster_replication_total_failed_bytes` | * Total number of bytes failed at least once to replicate since server start. | -| `minio_cluster_replication_total_failed_count` | * Total number of objects which failed replication since server start. | -| `minio_cluster_replication_received_bytes` | * Total number of bytes replicated to this cluster from another source cluster. | -| `minio_cluster_replication_received_count` | * Total number of objects received by this cluster from another source cluster. | -| `minio_cluster_replication_sent_bytes` | * Total number of bytes replicated to the target cluster. | | -| `minio_cluster_replication_sent_count` | * Total number of objects replicated to the target cluster. | | -| `minio_cluster_replication_credential_errors` | * Total number of replication credential errors since server start | +| `minio_cluster_replication_last_hour_failed_bytes` | (_Site Replication Only_) Total number of bytes failed at least once to replicate in the last full hour. | +| `minio_cluster_replication_last_hour_failed_count` | (_Site Replication Only_) Total number of objects which failed replication in the last full hour. | +| `minio_cluster_replication_total_failed_bytes` | (_Site Replication Only_) Total number of bytes failed at least once to replicate since server start. | +| `minio_cluster_replication_total_failed_count` | (_Site Replication Only_) Total number of objects which failed replication since server start. | +| `minio_cluster_replication_received_bytes` | (_Site Replication Only_) Total number of bytes replicated to this cluster from another source cluster. | +| `minio_cluster_replication_received_count` | (_Site Replication Only_) Total number of objects received by this cluster from another source cluster. | +| `minio_cluster_replication_sent_bytes` | (_Site Replication Only_) Total number of bytes replicated to the target cluster. | | +| `minio_cluster_replication_sent_count` | (_Site Replication Only_) Total number of objects replicated to the target cluster. | | +| `minio_cluster_replication_credential_errors` | (_Site Replication Only_) Total number of replication credential errors since server start | +## Healing Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| +| `minio_heal_objects_errors_total` | Objects for which healing failed in current self healing run. | | `minio_heal_objects_heal_total` | Objects healed in current self healing run. | | `minio_heal_objects_total` | Objects scanned in current self healing run. | | `minio_heal_time_last_activity_nano_seconds` | Time elapsed (in nano seconds) since last self healing activity. | + +## Inter Node Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_inter_node_traffic_dial_avg_time` | Average time of internodes TCP dial calls. | | `minio_inter_node_traffic_dial_errors` | Total number of internode TCP dial timeouts and errors. | | `minio_inter_node_traffic_errors_total` | Total number of failed internode calls. | | `minio_inter_node_traffic_received_bytes` | Total number of bytes received from other peer nodes. | | `minio_inter_node_traffic_sent_bytes` | Total number of bytes sent to the other peer nodes. | + +## Bucket Notification Metrics + | `minio_notify_current_send_in_progress` | Number of concurrent async Send calls active to all targets. | | `minio_notify_target_queue_length` | Number of unsent notifications in queue for target. | + +## S3 API Request Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_s3_requests_4xx_errors_total` | Total number S3 requests with (4xx) errors. | | `minio_s3_requests_5xx_errors_total` | Total number S3 requests with (5xx) errors. | | `minio_s3_requests_canceled_total` | Total number S3 requests canceled by the client. | @@ -102,9 +164,18 @@ replication is in use, these metrics are exported at the bucket level. | `minio_s3_requests_ttfb_seconds_distribution` | Distribution of the time to first byte across API calls. | | `minio_s3_traffic_received_bytes` | Total number of s3 bytes received. | | `minio_s3_traffic_sent_bytes` | Total number of s3 bytes sent. | + +## Software Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_software_commit_info` | Git commit hash for the MinIO release. | | `minio_software_version_info` | MinIO Release tag for the server. | -| `minio_usage_last_activity_nano_seconds` | Time elapsed (in nano seconds) since last scan activity. | + +## Drive Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_node_drive_free_bytes` | Total storage available on a drive. | | `minio_node_drive_free_inodes` | Total free inodes. | | `minio_node_drive_latency_us` | Average last minute latency in µs for drive API storage operations. | @@ -113,17 +184,32 @@ replication is in use, these metrics are exported at the bucket level. | `minio_node_drive_total` | Total drives in this node. | | `minio_node_drive_total_bytes` | Total storage on a drive. | | `minio_node_drive_used_bytes` | Total storage used on a drive. | -| `minio_node_file_descriptor_limit_total` | Limit on total number of open file descriptors for the MinIO Server process. | -| `minio_node_file_descriptor_open_total` | Total number of open file descriptors by the MinIO Server process. | -| `minio_node_go_routine_total` | Total number of go routines running. | + +## Identity and Access Management (IAM) Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_node_iam_last_sync_duration_millis` | Last successful IAM data sync duration in milliseconds. | | `minio_node_iam_since_last_sync_millis` | Time (in milliseconds) since last successful IAM data sync. | | `minio_node_iam_sync_failures` | Number of failed IAM data syncs since server start. | | `minio_node_iam_sync_successes` | Number of successful IAM data syncs since server start. | + +## Information Lifecycle Management (ILM) Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_node_ilm_expiry_pending_tasks` | Number of pending ILM expiry tasks in the queue. | | `minio_node_ilm_transition_active_tasks` | Number of active ILM transition tasks. | | `minio_node_ilm_transition_pending_tasks` | Number of pending ILM transition tasks in the queue. | | `minio_node_ilm_versions_scanned` | Total number of object versions checked for ilm actions since server start. | + +## System Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| +| `minio_node_file_descriptor_limit_total` | Limit on total number of open file descriptors for the MinIO Server process. | +| `minio_node_file_descriptor_open_total` | Total number of open file descriptors by the MinIO Server process. | +| `minio_node_go_routine_total` | Total number of go routines running. | | `minio_node_io_rchar_bytes` | Total bytes read by the process from the underlying storage system including cache, /proc/[pid]/io rchar. | | `minio_node_io_read_bytes` | Total bytes read by the process from the underlying storage system, /proc/[pid]/io read_bytes. | | `minio_node_io_wchar_bytes` | Total bytes written by the process to the underlying storage system including page cache, /proc/[pid]/io wchar. | @@ -132,6 +218,11 @@ replication is in use, these metrics are exported at the bucket level. | `minio_node_process_resident_memory_bytes` | Resident memory size in bytes. | | `minio_node_process_starttime_seconds` | Start time for MinIO process per node, time in seconds since Unix epoc. | | `minio_node_process_uptime_seconds` | Uptime for MinIO process per node in seconds. | + +## Scanner Metrics + +| Name | Description | +|:----------------------------------------------|:----------------------------------------------------------------------------------------------------------------| | `minio_node_scanner_bucket_scans_finished` | Total number of bucket scans finished since server start. | | `minio_node_scanner_bucket_scans_started` | Total number of bucket scans started since server start. | | `minio_node_scanner_directories_scanned` | Total number of directories scanned since server start. | @@ -139,19 +230,38 @@ replication is in use, these metrics are exported at the bucket level. | `minio_node_scanner_versions_scanned` | Total number of object versions scanned since server start. | | `minio_node_syscall_read_total` | Total read SysCalls to the kernel. /proc/[pid]/io syscr. | | `minio_node_syscall_write_total` | Total write SysCalls to the kernel. /proc/[pid]/io syscw. | +| `minio_usage_last_activity_nano_seconds` | Time elapsed (in nano seconds) since last scan activity. | -# List of metrics exported per bucket level +# Bucket Metrics -Each metric includes a label for the server that calculated the metric. Each metric has a label for the server that generated the metric. Each -metric has a label that distinguishes the bucket. -These metrics can be obtained from any MinIO server once per collection. +MinIO collects the following metrics at the bucket level. +Each metric includes the ``bucket`` label to identify the corresponding bucket. +Metrics may include one or more additional labels, such as the server that calculated that metric. + +These metrics can be obtained from any MinIO server once per collection by using the following URL: + +```shell +https://HOSTNAME:PORT/minio/metrics/v2/bucket +``` + +Replace ``HOSTNAME:PORT`` with the hostname of your MinIO deployment. +For deployments behind a load balancer, use the load balancer hostname instead of a single node hostname. + +## Distribution Metrics | Name | Description | |:--------------------------------------------------|:--------------------------------------------------------------------------------| | `minio_bucket_objects_size_distribution` | Distribution of object sizes in the bucket, includes label for the bucket name. | | `minio_bucket_objects_version_distribution` | Distribution of object sizes in a bucket, by number of versions | -| `minio_bucket_quota_total_bytes` | Total bucket quota size in bytes. | + +## Replication Metrics + +These metrics only populate on deployments with [Bucket Replication](https://min.io/docs/minio/linux/administration/bucket-replication.html) or [Batch Replication](https://min.io/docs/minio/linux/administration/batch-framework.html) configurations. +For deployments with [Site Replication](https://min.io/docs/minio/linux/operations/install-deploy-manage/multi-site-replication.html) configured, select metrics populate under the [Cluster Metrics](#cluster-metrics) endpoint. + +| Name | Description | +|:--------------------------------------------------|:--------------------------------------------------------------------------------| | `minio_bucket_replication_last_minute_failed_bytes` | Total number of bytes failed at least once to replicate in the last full minute. | | `minio_bucket_replication_last_minute_failed_count` | Total number of objects which failed replication in the last full minute. | | `minio_bucket_replication_last_hour_failed_bytes` | Total number of bytes failed at least once to replicate in the last full hour. | @@ -163,16 +273,32 @@ These metrics can be obtained from any MinIO server once per collection. | `minio_bucket_replication_received_count` | Total number of objects received by this bucket from another source bucket. | | `minio_bucket_replication_sent_bytes` | Total number of bytes replicated to the target bucket. | | | `minio_bucket_replication_sent_count` | Total number of objects replicated to the target bucket. | | +| `minio_bucket_replication_credential_errors` | Total number of replication credential errors since server start | + +## Traffic Metrics + +| Name | Description | +|:--------------------------------------------------|:--------------------------------------------------------------------------------| | `minio_bucket_traffic_received_bytes` | Total number of S3 bytes received for this bucket. | | `minio_bucket_traffic_sent_bytes` | Total number of S3 bytes sent for this bucket. | + +## Usage Metrics + +| Name | Description | +|:--------------------------------------------------|:--------------------------------------------------------------------------------| | `minio_bucket_usage_object_total` | Total number of objects. | | `minio_bucket_usage_version_total` | Total number of versions (includes delete marker) | | `minio_bucket_usage_deletemarker_total` | Total number of delete markers. | | `minio_bucket_usage_total_bytes` | Total bucket size in bytes. | +| `minio_bucket_quota_total_bytes` | Total bucket quota size in bytes. | + +## Requests Metrics + +| Name | Description | +|:--------------------------------------------------|:--------------------------------------------------------------------------------| | `minio_bucket_requests_4xx_errors_total` | Total number of S3 requests with (4xx) errors on a bucket. | | `minio_bucket_requests_5xx_errors_total` | Total number of S3 requests with (5xx) errors on a bucket. | | `minio_bucket_requests_inflight_total` | Total number of S3 requests currently in flight on a bucket. | | `minio_bucket_requests_total` | Total number of S3 requests on a bucket. | | `minio_bucket_requests_canceled_total` | Total number S3 requests canceled by the client. | | `minio_bucket_requests_ttfb_seconds_distribution` | Distribution of time to first byte across API calls per bucket. | -| `minio_bucket_replication_credential_errors` | Total number of replication credential errors since server start |