Understanding Load Average on Linux

One of the things that really confused me  back in the last millennium was understanding the results I would get from “top -c” or “uptime”.  They showed load averages that seemed to make sense when they were low:  “0.32  looks like a low load. Is that 32 and I’m using 1/3 of the server capacity?”  It got more confusing when the number was something like 2 and the server seemed like it wasn’t busy at all.  That made no sense because then wouldn’t 2 mean 200%??

The load numbers on a Linux system are often confusing for people coming from Windows and other operating systems. The number often looks like a percentage, but it isn’t.

When you evaluate the system load you have to know the number of CPUs your system has. Otherwise those numbers mean nothing.

To describe this in general terms as it was taught to me, I would first start with the concept that a CPU core can do 1 thing at a time. Knowing this, the number of items in the system run queue can start to make sense. The system queue is the jobs that are either being worked on, or about to be started.

In a 1 CPU system any number below 1 means that the system is keeping up with its tasks. It has 1 CPU and there is on average 1 task being worked on (or about to be started). In a 2 CPU/dual core system, if there is 1 item in the system queue, one or the other CPU is working on it or just about to grab it and you won’t have any delay. If you are on a 16 core server and your load average is creeping up to 14, you are still not in bad shape because that’s less than your CPU core count and all of the jobs are being worked on or are about to be started.

Now if your numbers are double your CPU count, then you are definitely overloaded and the system is falling behind. All of the CPUs are busy and they all have another job to work on after they are done with their current job.

After that, you need to watch out for a snowballing effect as the system starts to spend more and more time juggling its resources and less time processing the demand of the users.

Oh and why are there three numbers?  Those are averages for different time periods. They are a  1 minute average, a 5 minute average and a 15 minute average.

( This explanation can get much more complicated.  For example those 3 numbers are exponentially weighted moving averages (EMA or EWMA), meaning that in a 15 minute average, the most recent load measurements carry more weight than the older ones. So a load that was ramping up in that 15 minute period would produce a much higher average than a load that was inverse and decreasing at the exact same rate in that 15 minute period. But since I want to keep this explanation simple, I’m not going to even mention that.)

Hope that helps!

Understanding free memory in Linux

It used to worry me when I found that Linux was using almost all the memory available to a system. However all that worry was for naught. Linux is very good at memory management and making sure it has enough memory to do what it needs to do. You can run out of memory of course, but you are likely in better shape than you think you are.

If you run a command like “top -c” your server will likely tell you almost all the memory is used:


# top -c
top - 12:26:21 up 4 days, 3:09, 2 users, load average: 0.73, 0.58, 0.48
Tasks: 568 total, 1 running, 567 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.4%us, 0.2%sy, 0.1%ni, 97.2%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8165824k total, 8064488k used, 101336k free, 83604k buffers
Swap: 6088624k total, 1338756k used, 4749868k free, 2445728k cached

Glancing at this result, you would think I only have 101mb free of 8gb here. However those numbers are as misleading as the load average on a multi-cpu server.

If you look at another command

# free -mt
total used free shared buffers cached
Mem: 7974 7921 53 0 27 2107
-/+ buffers/cache: 5786 2187
Swap: 5945 923 5022
Total: 13920 8844 5075

You get a similar misleading result, but you get to see the actual server condition too.

As you can see, the first free value is very low and that’s what is concerning you.

However I want to draw your attention to the next line. That’s really where you need to watch.
If you look at the buffers/cache line, you can see that used value is 5786mb and we have 2107mb in the free column. That free column is the Free + Cached + buffers (plus/minus rounding error of less than 2 Kbytes). That’s really the line that you need to watch.

From that line we can tell that we have used 5.79gb out of the 7.97gb of total physical memory already used by programs. We can also see that we have 2.19gb of RAM that is in the cached pool that is available for usage.

As I mentioned before, Linux doesn’t usually let memory go to waste. So you will watch that free number drop on the first line down to the double digits, but even then the cached value will be around 2gb. That means is we have roughly 2gb of memory available for programs right now. If a program needs more, it will pull it out of the cached memory pool and even after that, it will use the swap space before it is really out of memory and that is an additional 5gb.

To look at how much memory each program is using, I use this line:

# ps aux|head -1;ps aux | sort -nr -k 4 | head -20
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
102 4123 0.4 10.8 1196416 885840 ? Ssl Feb18 25:26 memcached [...]
mysql 17929 7.6 3.0 949372 248040 pts/3 Sl 06:05 23:32 /usr/libexec/mysqld [...]
root 7059 0.0 1.6 204360 138040 ? Ssl Feb18 0:28 /usr/sbin/clamd
apache 26931 1.2 0.8 225096 69956 ? S 11:06 0:05 /usr/bin/php-cgi
apache 26631 1.0 0.8 223304 66856 ? S 11:04 0:05 /usr/bin/php-cgi
apache 26458 1.5 0.8 223488 66824 ? S 11:03 0:09 /usr/bin/php-cgi
apache 23879 0.5 0.8 225068 68376 ? S 10:48 0:08 /usr/bin/php-cgi
root 26404 0.0 0.7 131956 58708 ? S Feb20 0:04 spamd child
root 24156 0.1 0.7 136320 63308 ? S 07:04 0:28 spamd child
apache 26937 2.5 0.7 221812 59788 ? S 11:06 0:11 /usr/bin/php-cgi
apache 26567 0.6 0.7 222416 61756 ? S 11:04 0:03 /usr/bin/php-cgi
apache 26405 0.0 0.7 222748 58228 ? S 11:03 0:00 /usr/bin/php-cgi
apache 23890 0.4 0.7 214040 57508 ? S 10:48 0:06 /usr/bin/php-cgi
apache 23851 0.1 0.7 221972 58596 ? S 10:48 0:01 /usr/bin/php-cgi
apache 17990 0.0 0.7 223916 58320 ? S 06:06 0:00 /usr/bin/php-cgi
apache 17164 0.1 0.7 215152 58956 ? S 10:13 0:04 /usr/bin/php-cgi
apache 14406 0.0 0.7 221164 63616 ? S Feb21 0:05 /usr/bin/php-cgi
root 7099 0.0 0.6 124312 49792 ? Ss Feb18 0:03 /usr/bin/spamd [...]
apache 26932 1.3 0.6 212336 52944 ? S 11:06 0:05 /usr/bin/php-cgi
apache 26628 2.2 0.6 213964 55164 ? S 11:04 0:11 /usr/bin/php-cgi

That shows you the 20 most memory intensive programs. Right now, on this server, the top to are memcached and mysqld – as it should be. Then there’s a huge list of php-cgi instances prelaunched to handle an influx of connections. Also the spam checker coming up a few times. Most of the instances only take up ~220K, which isn’t bad either. So from this, I can see that I have used a lot of memory in preparing many instances of php that are ready to go as connections come in. I also have APC installed and that is allowing the use of shared memory and that is reducing the overall footprint.

All in all, while I am showing a really low free memory value on this server, I know I actually have more memory available and already have a lot of existing memory taken up in preparation for when I get a much heavier load. As I speak, the server is dealing with 630 connections quite nicely.

 

BONUS TIP

To visually monitor memory usage, try this:

watch -n 1 -d free -mt

HOW TO: In PHP the MySQL Client API version doesn’t match the MySQL Server version

This is a fairly common situation. The short answer is that you usually don’t need to fix anything. This is a non-issue.  As long as your MySQL and MySQL Client have the same major version, you can and probably should just ignore the issue. There are no compatibility or performance issues involved.

This situation occurs when you use a package installer such as Yum to install a previously compiled version of PHP rather than compiling the version locally yourself. The package creator will have compiled PHP with a version the MySQL client that is compatible with that major release of MySQL.

To quote a 2008 article from the IUS Community project:

That said, I can tell you that in the last 4+ years of providing packages in this ‘mixed’ kind of fashion… we’ve never really come across any major issues of using php built against an older version of mysql… talking to a new version of mysql.

Now if you had to fix this, you could check to see if there is a specially compiled version of PHP with your particular MySQL client installed, but that’s unlikely.  You’re more than likely going to need to recompile php from your command line.  That slightly mismatched version is just the price you pay for convenience.

How to: Install MemCached on CentOS / Redhat using yum

Installing memcached on a server for use with W3Total cache can seem daunting if you haven’t done it before.  Once you’ve done it enough to work out a method and know the speed bumps you can come across,  you can

 

 

Use this command to determine the CentOS/RedHat version. You need this to know if you are working with version 4 5 or 6.

cat /etc/redhat-release

Use this command to determine if the OS is 64bit or 32bit (look for x64. If it is not there, 99% of the time you’re on 32 bit.)

uname -a

Retrieve the IUS Community repository installation files to allow an easy install of the memcache module from the IUS repo.

Browse the repo at http://dl.iuscommunity.org/pub/ius/stable/Redhat/ to make certain you have the right files for your server:

This example is for 64 bit Red Hat Enterprise Linux (RHEL) 5 and is current as of 2014-03-04:

cd /usr/src/
mkdir ius
cd ius
wget http://dl.iuscommunity.org/pub/ius/stable/Redhat/5/x86_64/ius-release-1.0-11.ius.el5.noarch.rpm
wget http://dl.iuscommunity.org/pub/ius/stable/Redhat/5/x86_64/epel-release-5-4.noarch.rpm

This example is for 32 bit Red Hat Enterprise Linux (RHEL) 5 and is current as of 2014-03-04:

cd /usr/src/
mkdir ius
cd ius
wget http://dl.iuscommunity.org/pub/ius/stable/Redhat/5/i386/ius-release-1.0-11.ius.el5.noarch.rpm
wget http://dl.iuscommunity.org/pub/ius/stable/Redhat/5/i386/epel-release-5-4.noarch.rpm

This example is for 64 bit Red Hat Enterprise Linux (RHEL) 6 and is current as of 2014-03-04:

cd /usr/src/
mkdir ius
cd ius
wget http://dl.iuscommunity.org/pub/ius/stable/Redhat/6/x86_64/ius-release-1.0-11.ius.el6.noarch.rpm
wget http://dl.iuscommunity.org/pub/ius/stable/Redhat/6/x86_64/epel-release-6-5.noarch.rpm

This example is for 32 bit Red Hat Enterprise Linux (RHEL) 6 and is current as of 2014-03-04:

cd /usr/src/
mkdir ius
cd ius
wget http://dl.iuscommunity.org/pub/ius/stable/Redhat/6/i386/ius-release-1.0-11.ius.el6.noarch.rpm
wget http://dl.iuscommunity.org/pub/ius/stable/Redhat/6/i386/epel-release-6-5.noarch.rpm

CENTOS 7 64 bit

cd /usr/src/
mkdir ius
cd ius

wget http://dl.iuscommunity.org/pub/ius/stable/Redhat/7/x86_64/ius-release-1.0-13.ius.el7.noarch.rpm

wget http://dl.iuscommunity.org/pub/ius/stable/Redhat/7/x86_64/yum-plugin-replace-0.2.7-1.ius.el7.noarch.rpm

wget http://dl.fedoraproject.org/pub/epel/beta/7/x86_64/epel-release-7-0.2.noarch.rpm

 

Install both of these files:

rpm -ivh *.rpm

Perform the install:

yum -y install memcached
service memcached start
chkconfig memcached on
pecl install memcache

Note: If you get a “Can’t compile c code” error, try making your tmp folder executible before running the pecl command:

mount -o,remount,rw,exec /var/tmp

Make sure you remount the tmp directory securely with noexec when you are done:

mount -o,remount,rw,noexec /var/tmp

 

Make sure this line has correctly been added to the php.ini file for your site:

extension=”memcache.so”

Sometimes the php.ini file has been overridden for a specific directory. Look for that, if phpinfo() tells you memcache is not active.  This picture shows a site with the default php.ini overridden by a local copy. It required the extension line to be manually added to the overriding ini file.

phpinfo() output

How to: Refresh your WordPress page unless a comment is being typed

A customer wanted a piece of code that allowed a page to be refreshed once the client has remained on the page after a certain amount of time. They’d used a refresh command previously, but the problem was that this interrupted anyone who was in the middle of typing a comment.

This is my solution to refresh the page every 5 minutes:

<script language="JavaScript">
    var sURL = unescape(window.location.pathname);
    var intValue = 0;
    function doLoad()
    {
        intValue=setTimeout( "refresh()", 300*1000 );
    }

    function refresh()
    {
        window.location.href = sURL;
    }

    function noRefresh(e)
    {
        switch (e.keyCode) {
            case 40:
            case 39:
            case 38:
            case 37:
            case 34:
            case 33:
                break;
            default:
                clearTimeout(intValue);
        }

    }

    if ($.browser.mozilla) {
        $(document).keypress(noRefresh);
    } else {
        $(document).keydown(noRefresh);
    }

    $(document).ready(doLoad());
</script>
<noscript>
    <meta http-equiv="refresh" content="300">
</noscript>

Fixing mysqldump: Got error: 1016: Can’t open file & mysqldump: Got error: 23

When doing exports of large databases using mysqldump, it is common to get errors that are along the lines of:

mysqldump: Got error: 1016: Can’t open file: ‘./databasename/tablename.frm’ (errno: 24) when using LOCK TABLES

or maybe

mysqldump: Got error: 23: Out of resources when opening file ‘./databasename/tablename.MYD’ (Errcode: 24) when using LOCK TABLES

Both come from the same cause and don’t worry – your database is not currupt..

The first action that mysqldump takes is to lock all of the tables so that the database cannot go out of sync from the beginning to the end of the export.  Of course, that means that your users can’t make changes while  doing the dump. And that’s just one more reason you need to be careful using mysqldump.

The solution to this problem is simple add –skip-lock-tables to your command line.

At that point the tables won’t be locked, and the export will run much faster. There is a slight downside in that people can use the table as it is being exported and you could potentially get an update written as you are exporting that table. But it is unlikely that you’ll hit any real world problems when doing this, especially with things like blog exports.

Notty Notty! re:”All my Server CPU is used by root@notty!! Have I been hacked?”

First – Breathe.  “notty” stands for “no teletypewriter”. Programs that connect to the server but don’t want the output displayed any where use a “No TTY” connection. So if you see “ssh: *@notty” on a task list somewhere, it just means there’s an ssh login on your server does not have a visual interface assigned to it.

This can appear during many different relatively common server activities. So it is not the tag of some hacker as you might have feared. One of the most common examples is the use of the scp command. scp remotely copies files from one computer to another. When it connects to the remote computer, it isn’t displaying that communication to a screen, so the connection is a notty connection.

Below is a partial screen scrape of a “top -c” command when scp is running:

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
32706 root      15   0 14284 7116 2336 S 68.1  0.3   6:15.37 sshd: root@notty
32709 root      18   0  6788 1468 1124 R  4.0  0.1   0:20.84 scp -r -f /home2

As you can see the cpu usage was pretty high and that’s what gets people worried. They are probably looking at “top” see how much longer it will take to finish copying files. Then they see that scp is taking up nothing while a task named “notty” is taking up huge amounts of CPU and they think someone is being “naughty”.  Now you know what is really happening.

So relax! It’s all good.

How to Add a TXT record to your 1and1 domain & How to use external DNS for a 1and1 hosted site.

Unfortunately there are lots of registrars that don’t allow you full access to your DNS settings.  1and1.com is one of these. If you host your site with 1and1.com and you want to add a TXT record to your domain for verification purposes or to set a SPF record whatever, you simply can’t do it… unless…

If you have access to another DNS server that allows you to edit your DNS zone and add TXT records, you can set your 1and1 domain to use that DNS server.  However, THEN you must edit the DNS zone on that server and have all of the A records point back to the IP address for you 1and1 account.

Here are the steps:

  1. Ping your site and write down the IP address
  2. Go to the 1and1 admin domain listing
  3. Select the row for your domain (should be the only one checked) and the click DNS->Edit DNS.
  4. Select My DNS Servers
  5. Enter the URLS for your other DNS server (Something like ns1.example.com and ns2.example.com
  6. Close and save and allow a few hours for this to update before testing it.
  7. In the mean time, go to your other DNS server and setup a new DNS zone for your domain.  It needs to have at LEAST an a record pointing to the IP address you wrote down for step one. You probably also want setup a cname for your www subdomain.  You also can setup your TXT record.
  8. After hitting save wait for a few hours and you should be done.

 

If you are a visual learner, here is a screen cast.

Generating random names in MySQL

I’ve improved my earlier random string generation procedures to better suit my needs. So I created a Random Name Generator for MySQL.

I’ve created two new procedures. They pick from the 100 most popular first names (well actually the 50 most popular male and 50 most popular female first names for the US) and the 100 most popular surnames (for the US).

Using these two procedures generate_fname() and generate_lname() you can create realistic random names and email addresses for your tests.

You can download the SQL here.

Cold storage before my best ideas melt away…