使用简单脚本诊断Linux服务器负载问题

时间:2020-01-09 10:38:10  来源:igfitidea点击:

如果我们曾经担任管理员一段时间,则肯定会发现服务器CPU使用率或者内存利用率和/或者负载水平激增的情况。运行" top"也不会总是给我们答案。那么,如何找到那些消耗系统资源以杀死它们的偷偷摸摸的进程呢?

以下脚本可能会有所帮助。它是为Web服务器编写的,因此其中的某些部分专门寻找httpd进程,而某些部分则处理MySQL。根据服务器部署,只需注释/删除这些部分并添加其他部分。它应该用作起点。

该脚本版本的前提条件是一些免费软件在GNU通用公共许可证下发行,称为mytop(可从http://jeremy.zawodny.com/mysql/mytop/获得),该软件是检查MySQL性能的绝佳工具。它已经变老了,但对于我们这里的目的仍然很有效。
另外,我使用mutt作为邮件程序,我们可能希望更改脚本以仅使用内置于mail实用程序的linux。我每小时通过cron运行一次;视需要调整。哦,此脚本需要以root用户身份运行,因为它确实从服务器的某些受保护区域读取。

那么,让我们开始吧?

首先,设置脚本变量:

#!/bin/bash

#

# Script to check system load average levels to try to determine

# what processes are taking it overly high...

#

# 07Jul2010 tjones

#

# set environment

dt=`date +%d%b%Y-%X`

# Obviously, change the following directories to where your log files actually are kept

tmpfile="/tmp/checkSystemLoad.tmp"

logfile="/tmp/checkSystemLoad.log"

msgLog="/var/log/messages"

mysqlLog="/var/log/mysqld.log"

# the first mailstop is standard email for reports. Second one is for cell phone (with a pared down report)

mailstop="[email protected]"

mailstop1="[email protected]"

machine=`hostname`

# The following three are for mytop use - use a db user that has decent rights

dbusr="username"

dbpw="password"

db="yourdatabasename"

# The following is the load level to check on - 10 is really high, so you might want to lower it.

levelToCheck=10

接下来,检查负载级别以查看脚本是否应该继续:

# Set variables from system:

loadLevel=`cat /proc/loadavg | awk '{print }'`

loadLevel=$( printf "%0.f" $loadLevel )
# if the load level is greater than you want, start the script process. Otherwise, exit 0
if [ $loadLevel -gt $levelToCheck ]; then

echo "" > $tmpfile

echo "**************************************" >>$tmpfile

echo "Date: $dt " >>$tmpfile

echo "Check System Load & Processes " >>$tmpfile

echo "**************************************" >>$tmpfile

并继续进行检查,将结果写入临时文件。在此处根据情况添加或者删除项目:

# Get more variables from system:

httpdProcesses=`ps -def | grep httpd | grep -v grep | wc -l`
 # Show current load level:

echo "Load Level Is: $loadLevel" >>$tmpfile

echo "*************************************************" >>$tmpfile
 # Show number of httpd processes now running (not including children):

echo "Number of httpd processes now: $httpdProcesses" >>$tmpfile

echo "*************************************************" >>$tmpfile

echo "" >>$tmpfile
 # Show process list:

echo "Processes now running:" >>$tmpfile

ps f -ef >>$tmpfile

echo "*************************************************" >>$tmpfile

echo "" >>$tmpfile
 # Show current MySQL info:

echo "Results from mytop:" >>$tmpfile

/usr/bin/mytop -u $dbusr -p $dbpw -b -d $db >>$tmpfile

echo "*************************************************" >>$tmpfile

echo "" >>$tmpfile

注意,使用top命令,我们正在写入两个临时文件。一种是发送给手机的小得多的消息。如果我们不希望在凌晨三点收到手机警报,可以将其删除(并在脚本的后面删除第二个邮件例程)。

# Show current top:

echo "top now shows:" >>$tmpfile

echo "top now shows:" >>$topfile

/usr/bin/top -b -n1 >>$tmpfile

/usr/bin/top -b -n1 >>$topfile

echo "*************************************************" >>$tmpfile

echo "" >>$tmpfile

更多检查:

# Show current connections:

echo "netstat now shows:" >>$tmpfile

/bin/netstat -p >>$tmpfile

echo "*************************************************" >>$tmpfile

echo "" >>$tmpfile
 # Check disk space

echo "disk space:" >>$tmpfile

/bin/df -k >>$tmpfile

echo "*************************************************" >>$tmpfile

echo "" >>$tmpfile

然后将临时文件的内容写到一个更永久的日志文件中,并将结果通过电子邮件发送给相应的参与者。第二封邮件是精简后的结果,仅包含top中的标准:

# Send results to log file:

/bin/cat $tmpfile >>$logfile
 # And email results to sysadmin:

/usr/bin/mutt -s "$machine has a high load level! - $dt" -a $mysqlLog -a $msgLog $mailstop $logfile

然后做一些客房整理并退出:

# And then remove the temp file:

rm $tmpfile

rm $topfile

fi
#

exit 0

希望这可以帮助某人。完全组装的脚本是:

#!/bin/bash

#

# Script to check system load average levels to try to determine what processes are

# taking it overly high...

#

# set environment

dt=`date +%d%b%Y-%X`

# Obviously, change the following directories to where your log files actually are kept

tmpfile="/tmp/checkSystemLoad.tmp"

logfile="/tmp/checkSystemLoad.log"

msgLog="/var/log/messages"

mysqlLog="/var/log/mysqld.log"

# the first mailstop is standard email for reports. Second one is for cell phone (with a pared down report)

mailstop="[email protected]"

mailstop1="[email protected]"

machine=`hostname`

# The following three are for mytop use - use a db user that has decent rights

dbusr="username"

dbpw="password"

db="yourdatabasename"

# The following is the load level to check on - 10 is really high, so you might want to lower it.

levelToCheck=10

# Set variables from system:

loadLevel=`cat /proc/loadavg | awk '{print }'`

loadLevel=$( printf "%0.f" $loadLevel )
# if the load level is greater than you want, start the script process. Otherwise, exit 0
if [ $loadLevel -gt $levelToCheck ]; then

echo "" > $tmpfile

echo "**************************************" >>$tmpfile

echo "Date: $dt " >>$tmpfile

echo "Check System Load & Processes " >>$tmpfile

echo "**************************************" >>$tmpfile
 # Get more variables from system:

httpdProcesses=`ps -def | grep httpd | grep -v grep | wc -l`
 # Show current load level:

echo "Load Level Is: $loadLevel" >>$tmpfile

echo "*************************************************" >>$tmpfile
 # Show number of httpd processes now running (not including children):

echo "Number of httpd processes now: $httpdProcesses" >>$tmpfile

echo "*************************************************" >>$tmpfile

echo "" >>$tmpfile
 # Show process list:

echo "Processes now running:" >>$tmpfile

ps f -ef >>$tmpfile

echo "*************************************************" >>$tmpfile

echo "" >>$tmpfile
 # Show current MySQL info:

echo "Results from mytop:" >>$tmpfile

/usr/bin/mytop -u $dbusr -p $dbpw -b -d $db >>$tmpfile

echo "*************************************************" >>$tmpfile

echo "" >>$tmpfile
 # Show current top:

echo "top now shows:" >>$tmpfile

echo "top now shows:" >>$topfile

/usr/bin/top -b -n1 >>$tmpfile

/usr/bin/top -b -n1 >>$topfile

echo "*************************************************" >>$tmpfile

echo "" >>$tmpfile
 # Show current connections:

echo "netstat now shows:" >>$tmpfile

/bin/netstat -p >>$tmpfile

echo "*************************************************" >>$tmpfile

echo "" >>$tmpfile
 # Check disk space

echo "disk space:" >>$tmpfile

/bin/df -k >>$tmpfile

echo "*************************************************" >>$tmpfile

echo "" >>$tmpfile
 # Send results to log file:

/bin/cat $tmpfile >>$logfile
 # And email results to sysadmin:

/usr/bin/mutt -s "$machine has a high load level! - $dt" -a $mysqlLog -a $msgLog $mailstop $logfile
 # And then remove the temp file:

rm $tmpfile

rm $topfile

fi
#

exit 0