服务器监控工具推荐：htop、Netdata、Prometheus

为什么需要监控

服务器出问题的时候你才知道监控有多重要。我曾经有一台服务器因为日志把磁盘写满了，网站挂了好几个小时才发现。从那以后，监控变成了每台服务器的标配。

今天介绍三款不同量级的监控工具，从简单到专业，按需选择。

htop：命令行实时监控

htop是top的加强版，几乎是每台Linux服务器必装的工具。

安装

1
2
3
4
$ sudo apt install -y htop

# CentOS
$ sudo yum install -y htop

使用

1
$ htop

1
2
3
4
5
6
7
8
9
  1  [|||||||||||||||                       32.5%]   Tasks: 45, 120 thr; 1 running
  2  [|||||||||                             18.2%]   Load average: 0.52 0.38 0.31
  Mem[|||||||||||||||||||||||||         1.2G/3.8G]   Uptime: 15 days, 03:22:17
  Swp[|                                64M/2.00G]

    PID USER      PRI  NI  VIRT   RES   SHR S CPU%  MEM%   TIME+  Command
   1234 mysql      20   0 1234M  456M  12M  S 12.3   8.5  45:23.4 /usr/sbin/mysqld
   2345 www-data   20   0  256M   89M   5M  S  3.2   2.1  12:34.5 nginx: worker
   3456 root       20   0  123M   45M   3M  S  1.5   1.2   5:67.8 /usr/bin/python3

htop快捷键

快捷键	功能
`F5`	树形视图
`F6`	排序选择
`F9`	终止进程
`F2`	设置界面
`t`	切换树形/列表
`M`	按内存排序
`P`	按CPU排序
`k`	杀掉进程
`/`	搜索进程
`q`	退出

htop的优点是轻量、即时，适合SSH登录后快速看一眼服务器状态。

Netdata：开箱即用的Web监控面板

Netdata是我最喜欢的轻量监控工具，安装一行命令，自带漂亮的Web界面，实时展示几百项指标。

安装

1
2
3
4
5
6
7
$ wget -O /tmp/netdata-kickstart.sh https://get.netdata.cloud/kickstart.sh
$ sh /tmp/netdata-kickstart.sh --stable-channel

# 安装完成后自动启动
$ systemctl status netdata
● netdata.service - Real time performance monitoring
     Active: active (running) since Wed 2026-01-21 10:00:00 UTC

安装完成后，访问 http://服务器IP:19999 即可看到监控面板。

防火墙放行

1
2
3
4
# 只允许特定IP访问Netdata（安全考虑）
$ sudo ufw allow from 你的IP to any port 19999

# 或者通过Nginx反向代理加密码保护（更推荐）

通过Nginx反向代理访问

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
server {
    listen 80;
    server_name monitor.example.com;

    auth_basic "Monitoring";
    auth_basic_user_file /etc/nginx/.htpasswd;

    location / {
        proxy_pass http://127.0.0.1:19999;
        proxy_set_header Host $host;
    }
}

1
2
3
4
5
6
# 创建密码文件
$ sudo apt install -y apache2-utils
$ sudo htpasswd -c /etc/nginx/.htpasswd admin
New password:
Re-type new password:
Adding password for user admin

Netdata监控项目

类别	监控指标
CPU	使用率、负载、温度
内存	使用量、缓存、Swap
磁盘	IO读写速度、使用率
网络	带宽、连接数、丢包
进程	进程列表、资源占用
应用	Nginx、MySQL、Docker等

配置告警

编辑告警配置：

1
$ sudo vim /etc/netdata/health.d/custom.conf

1
2
3
4
5
6
7
8
alarm: disk_space_usage
on: disk.space
lookup: average -1m percentage of used
units: %
every: 1m
warn: $this > 80
crit: $this > 90
info: Disk space usage is high

Prometheus + Grafana：专业级监控

如果你管理多台服务器，或者需要长期的历史数据和灵活的告警，Prometheus是行业标准。

架构

1
2
3
[服务器A: node_exporter] ──┐
[服务器B: node_exporter] ──┤── [Prometheus Server] ── [Grafana面板]
[服务器C: node_exporter] ──┘

安装Node Exporter（在被监控的服务器上）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# 下载
$ wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
$ tar xzf node_exporter-1.7.0.linux-amd64.tar.gz
$ sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/

# 创建systemd服务
$ sudo tee /etc/systemd/system/node_exporter.service > /dev/null <<EOF
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=nobody
ExecStart=/usr/local/bin/node_exporter
Restart=always

[Install]
WantedBy=multi-user.target
EOF

$ sudo systemctl daemon-reload
$ sudo systemctl enable --now node_exporter

# 验证
$ curl http://localhost:9100/metrics | head -5
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 2.5e-05

安装Prometheus（在监控服务器上）

用Docker安装最方便：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# docker-compose.yml
version: "3.8"

services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    restart: always

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    restart: always

volumes:
  prometheus_data:
  grafana_data:

Prometheus配置 prometheus.yml：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets:
        - '203.0.113.10:9100'
        - '203.0.113.11:9100'
        - '203.0.113.12:9100'

1
2
3
4
$ docker compose up -d
[+] Running 3/3
 ✔ Container monitoring-prometheus-1  Started
 ✔ Container monitoring-grafana-1     Started

访问 http://监控服务器IP:3000，默认账号admin/admin。

三种方案对比

特性	htop	Netdata	Prometheus+Grafana
安装难度	极简	简单	中等
界面	终端	Web	Web
实时性	实时	实时（1秒）	近实时（15秒）
历史数据	无	短期	长期
多服务器	不支持	支持（Cloud）	原生支持
告警	无	内置	强大
资源占用	极低	低	中等
适用场景	快速诊断	单台/少量服务器	大规模集群

我的建议

1-2台服务器：htop + Netdata就够了
3-10台服务器：上Prometheus + Grafana
生产环境：必须Prometheus + Grafana + AlertManager

不管选哪个方案，最起码得装个htop。

关于服务器性能优化，可以看Linux服务器性能优化。日志分析也是排查问题的重要手段，推荐阅读Linux日志管理与分析。