Skip to main content

Zabbix

监控系统

  • 监控系统:硬件、软件、业务指标;采用告警;

What to monitor?

  • Devices / Software

    • Server, Router, Switches, I/O System etc.
    • Operating System, Networks, Applications, etc.
  • Incidents(意外状况)

    • DB down, Replication stpoped, Server not reachable, etc.
  • Critical Events(关键事件)

    • Disk more than n% full or less than m Gbyte free, Replication more than n seconds lagging, Data node down.(磁盘利用率,主从复制从服务器延迟)
  • 100% CPU utilization, etc

    • Alert, mmediate intervention, fire fighting
  • Trends (includes time)

    • Nginx 结束请求的时间,图形方式获取
  • How long des it take until

    • my disk is full?
    • my Index Memory is fiiled up?
  • When does is happen

    • Peak? Backup?
  • How often does it happen? Does it happen periodically?

    • once a day? Always at sunday night?
  • How does it correlate to other infromatioins?

    • I/O problems during

NMS(Client) ---------协议---------> agent(Server)