使用简单脚本诊断Linux服务器负载问题
时间:2020-01-09 10:38:10 来源:igfitidea点击:
如果我们曾经担任管理员一段时间,则肯定会发现服务器CPU使用率或者内存利用率和/或者负载水平激增的情况。运行" top"也不会总是给我们答案。那么,如何找到那些消耗系统资源以杀死它们的偷偷摸摸的进程呢?
以下脚本可能会有所帮助。它是为Web服务器编写的,因此其中的某些部分专门寻找httpd进程,而某些部分则处理MySQL。根据服务器部署,只需注释/删除这些部分并添加其他部分。它应该用作起点。
该脚本版本的前提条件是一些免费软件在GNU通用公共许可证下发行,称为mytop(可从http://jeremy.zawodny.com/mysql/mytop/获得),该软件是检查MySQL性能的绝佳工具。它已经变老了,但对于我们这里的目的仍然很有效。
另外,我使用mutt作为邮件程序,我们可能希望更改脚本以仅使用内置于mail
实用程序的linux。我每小时通过cron运行一次;视需要调整。哦,此脚本需要以root用户身份运行,因为它确实从服务器的某些受保护区域读取。
那么,让我们开始吧?
首先,设置脚本变量:
#!/bin/bash # # Script to check system load average levels to try to determine # what processes are taking it overly high... # # 07Jul2010 tjones # # set environment dt=`date +%d%b%Y-%X` # Obviously, change the following directories to where your log files actually are kept tmpfile="/tmp/checkSystemLoad.tmp" logfile="/tmp/checkSystemLoad.log" msgLog="/var/log/messages" mysqlLog="/var/log/mysqld.log" # the first mailstop is standard email for reports. Second one is for cell phone (with a pared down report) mailstop="[email protected]" mailstop1="[email protected]" machine=`hostname` # The following three are for mytop use - use a db user that has decent rights dbusr="username" dbpw="password" db="yourdatabasename" # The following is the load level to check on - 10 is really high, so you might want to lower it. levelToCheck=10
接下来,检查负载级别以查看脚本是否应该继续:
# Set variables from system: loadLevel=`cat /proc/loadavg | awk '{print }'` loadLevel=$( printf "%0.f" $loadLevel ) # if the load level is greater than you want, start the script process. Otherwise, exit 0 if [ $loadLevel -gt $levelToCheck ]; then echo "" > $tmpfile echo "**************************************" >>$tmpfile echo "Date: $dt " >>$tmpfile echo "Check System Load & Processes " >>$tmpfile echo "**************************************" >>$tmpfile
并继续进行检查,将结果写入临时文件。在此处根据情况添加或者删除项目:
# Get more variables from system: httpdProcesses=`ps -def | grep httpd | grep -v grep | wc -l` # Show current load level: echo "Load Level Is: $loadLevel" >>$tmpfile echo "*************************************************" >>$tmpfile # Show number of httpd processes now running (not including children): echo "Number of httpd processes now: $httpdProcesses" >>$tmpfile echo "*************************************************" >>$tmpfile echo "" >>$tmpfile # Show process list: echo "Processes now running:" >>$tmpfile ps f -ef >>$tmpfile echo "*************************************************" >>$tmpfile echo "" >>$tmpfile # Show current MySQL info: echo "Results from mytop:" >>$tmpfile /usr/bin/mytop -u $dbusr -p $dbpw -b -d $db >>$tmpfile echo "*************************************************" >>$tmpfile echo "" >>$tmpfile
注意,使用top命令,我们正在写入两个临时文件。一种是发送给手机的小得多的消息。如果我们不希望在凌晨三点收到手机警报,可以将其删除(并在脚本的后面删除第二个邮件例程)。
# Show current top: echo "top now shows:" >>$tmpfile echo "top now shows:" >>$topfile /usr/bin/top -b -n1 >>$tmpfile /usr/bin/top -b -n1 >>$topfile echo "*************************************************" >>$tmpfile echo "" >>$tmpfile
更多检查:
# Show current connections: echo "netstat now shows:" >>$tmpfile /bin/netstat -p >>$tmpfile echo "*************************************************" >>$tmpfile echo "" >>$tmpfile # Check disk space echo "disk space:" >>$tmpfile /bin/df -k >>$tmpfile echo "*************************************************" >>$tmpfile echo "" >>$tmpfile
然后将临时文件的内容写到一个更永久的日志文件中,并将结果通过电子邮件发送给相应的参与者。第二封邮件是精简后的结果,仅包含top
中的标准:
# Send results to log file: /bin/cat $tmpfile >>$logfile # And email results to sysadmin: /usr/bin/mutt -s "$machine has a high load level! - $dt" -a $mysqlLog -a $msgLog $mailstop $logfile
然后做一些客房整理并退出:
# And then remove the temp file: rm $tmpfile rm $topfile fi # exit 0
希望这可以帮助某人。完全组装的脚本是:
#!/bin/bash # # Script to check system load average levels to try to determine what processes are # taking it overly high... # # set environment dt=`date +%d%b%Y-%X` # Obviously, change the following directories to where your log files actually are kept tmpfile="/tmp/checkSystemLoad.tmp" logfile="/tmp/checkSystemLoad.log" msgLog="/var/log/messages" mysqlLog="/var/log/mysqld.log" # the first mailstop is standard email for reports. Second one is for cell phone (with a pared down report) mailstop="[email protected]" mailstop1="[email protected]" machine=`hostname` # The following three are for mytop use - use a db user that has decent rights dbusr="username" dbpw="password" db="yourdatabasename" # The following is the load level to check on - 10 is really high, so you might want to lower it. levelToCheck=10 # Set variables from system: loadLevel=`cat /proc/loadavg | awk '{print }'` loadLevel=$( printf "%0.f" $loadLevel ) # if the load level is greater than you want, start the script process. Otherwise, exit 0 if [ $loadLevel -gt $levelToCheck ]; then echo "" > $tmpfile echo "**************************************" >>$tmpfile echo "Date: $dt " >>$tmpfile echo "Check System Load & Processes " >>$tmpfile echo "**************************************" >>$tmpfile # Get more variables from system: httpdProcesses=`ps -def | grep httpd | grep -v grep | wc -l` # Show current load level: echo "Load Level Is: $loadLevel" >>$tmpfile echo "*************************************************" >>$tmpfile # Show number of httpd processes now running (not including children): echo "Number of httpd processes now: $httpdProcesses" >>$tmpfile echo "*************************************************" >>$tmpfile echo "" >>$tmpfile # Show process list: echo "Processes now running:" >>$tmpfile ps f -ef >>$tmpfile echo "*************************************************" >>$tmpfile echo "" >>$tmpfile # Show current MySQL info: echo "Results from mytop:" >>$tmpfile /usr/bin/mytop -u $dbusr -p $dbpw -b -d $db >>$tmpfile echo "*************************************************" >>$tmpfile echo "" >>$tmpfile # Show current top: echo "top now shows:" >>$tmpfile echo "top now shows:" >>$topfile /usr/bin/top -b -n1 >>$tmpfile /usr/bin/top -b -n1 >>$topfile echo "*************************************************" >>$tmpfile echo "" >>$tmpfile # Show current connections: echo "netstat now shows:" >>$tmpfile /bin/netstat -p >>$tmpfile echo "*************************************************" >>$tmpfile echo "" >>$tmpfile # Check disk space echo "disk space:" >>$tmpfile /bin/df -k >>$tmpfile echo "*************************************************" >>$tmpfile echo "" >>$tmpfile # Send results to log file: /bin/cat $tmpfile >>$logfile # And email results to sysadmin: /usr/bin/mutt -s "$machine has a high load level! - $dt" -a $mysqlLog -a $msgLog $mailstop $logfile # And then remove the temp file: rm $tmpfile rm $topfile fi # exit 0