Linux: No space left on device – running out of Inodes – PHP sessions

A production Ubuntu server could not create anymore files claiming that no space was left. Strangely enough it was not about storage space but inode availability. The first response might be to increase disk size but it will not help unless you format the drive to assign a new number of inodes. Below is the sample output that highlights the problem.

# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       17G  7.7G  9.0G  46% /
devtmpfs       1001M  4.0K 1001M   1% /dev
none            201M  168K  201M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none           1001M   76K 1001M   1% /run/shm
/dev/xvdg        18G   11G  5.9G  66% /srv/mobi_BACKUP
# df -i
Filesystem      Inodes   IUsed   IFree IUse% Mounted on
/dev/xvda1     1066240 1061813    4427  100% /
devtmpfs        256189     333  255856    1% /dev
none            256229     250  255979    1% /run
none            256229       2  256227    1% /run/lock
none            256229      29  256200    1% /run/shm
/dev/xvdg      1179648   68692 1110956    6% /srv/mobi_BACKUP

The Problem

It should be very hard to exhaust the inodes unless huge amount of 0-sized or very small files are created. If you look in the image above, more than 1 million inodes have been used.

The Solution

It’s necessary to locate all those small files in the partition to get to the source of the problem. We want to find folders with unusual large number of files. A bash command can be of help starting at the root / (borrowed from Ivan).

for i in /*; do echo $i; find $i | wc -l; done

The command will list the directories with the number of files inside, from that point on you can keep on looking up some sub-folders until you find the source. If a sub-folder has too many files, the script might just hang for a long time, so you might have found the troublemaker.

for i in /var/lib/*; do echo $i; find $i | wc -l; done

In this particular case it turned out that PHP sessions were at the source of the issue.

Reason: PHP Sessions

This server is configured to keep the sessions using files thus the Garbage Collection does not take place. Therefore, the sessions have to be cleaned manually. From the previous Stack Overflow link, a cron task can be used for the cleanup:

# /etc/cron.d/php5: crontab fragment for php5
#  This purges session files older than X, where X is defined in seconds
#  as the largest value of session.gc_maxlifetime from all your php.ini
#  files, or 24 minutes if not defined.  See /usr/lib/php5/maxlifetime

# Look for and purge old sessions every 30 minutes
09,39 *     * * *     root   [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -r -0 rm

In order to work, the previous code has to be located inside the file /etc/cron.d/php5

The time can be adjusted according to the actual PHP configuration.

In this particular case there were about 1 million PHP session files, running the cron script would just take too much time. Therefore, a manual delete did the work.

 

WordPress: Cannot connect to database – MySQL killed

The MySQL server at the SysCrunch server was always being killed, the reason was due to lack of resources (RAM). Even after a memory increase to a total of 3+ GB, the problem persisted.

This blog post describes how to set up a monitoring tool (Monit) under CentOS 6 to monitor processes and to restart them when necessary. It also explains how to avoid the Apache server from exhausting the memory.

The problem

Essentially, the Apache server was consuming most of the RAM leaving little for other processes. Thus, MySQL was eventually killed as well as the ElasticSearch service. This happened as the number of web services and websites in the server increased. This would manifest when bots like googlebot started scanning intensively the server. Yes, Apache and MySQL are running in the same server with the current configuration.

Monit

The steps to install the monitoring tool Monit under CentOS 6 are similar to other service like Apache, you can also follow this tutorial or this one (Monit part) if you are under CentOS 7 and this might help for Ubuntu.

Install Monit

sudo yum install monit

Monit Web Interface

First open the configuration file, you can also use the nano editor.

sudo vi /etc/monit.conf

Find the set httpd port line, uncomment and update the configuration, here is a sample configuration:

set httpd port 2812 and
    use address 192.192.192.192  # Your IP address (private/public) for connections.
    allow 0.0.0.0/0.0.0.0        # What networks are allowed? Any network.
    allow admin:password         # Set username and password
    allow @monit # allow users of group 'monit' to connect
    allow @users readonly # allow users of group 'users' to connect readonly

Add monit to the system startup:

sudo chkconfig monit on

Start the monit service:

sudo service monit start

You can test the web interface in a browser using the provided IP address or hostname (e.g. http://192.192.192.192:8912 or http://hostname.com:8912). It will ask your for the username and password you have defined in the config file.

This is a sample capture of our server web interface:

Monit-web-interface-SysCrunch

Monit Monitoring

Monit can monitor multiple processes and execute actions based on some conditions like CPU and memory usage.

Apache Web Server

The paths might be different in your system, make sure to check them before. Find and uncomment the process httpd line, this is a sample config:

check process httpd with pidfile /var/run/httpd/httpd.pid
    start program = "/etc/init.d/httpd start" with timeout 60 seconds
    stop program  = "/etc/init.d/httpd stop"
    if cpu > 70% for 2 cycles then alert
    if cpu > 80% for 5 cycles then restart
    if totalmem > 1500.0 MB for 5 cycles then restart
    if children > 250 then restart

Essentially, the Apache server cannot take more than 1.5 GB, if it reaches that limit at least during 5 cycles, it gets restarted to free resources. The same happens if too many workers (250) are created. These values depend heavily on your system resources and the number of processes you are running.

MySQL Server

The default monit config file does not include a sample for MySQL. Following the Apache configuration and some samples from stack overflow, this is a proposed config that you can add after the Apache config:

check process mysql with pidfile /var/run/mysqld/mysqld.pid
  group mysql
  start program = "/etc/init.d/mysqld start"
  stop program = "/etc/init.d/mysqld stop"
  if failed host 127.0.0.1 port 3306 then alert
  if failed host 127.0.0.1 port 3306 then restart
  if 5 restarts within 5 cycles then timeout

If the server does not respond on port 3306, it raises an alert first and then does a restart. If too many restarts happen within a short period (5) then it timeouts.

ElasticSearch

Similar to the MySQL config, this is a proposed configuration that you can put after the Apache or MySQL configs.

check process elasticsearch with pidfile /var/run/elasticsearch/elasticsearch.pid
  start program = "/etc/init.d/elasticsearch start"
  stop program = "/etc/init.d/elasticsearch stop"
  if failed host 127.0.0.1 port 9200 then alert
  if failed host 127.0.0.1 port 9200 protocol http then restart
  if 5 restarts within 5 cycles then timeout

I haven’t seen any alerts or restarts of this service yet. Notice the protocol http condition for the restart, it might be necessary as ElasticSearch runs as a http application on that port.

Configuring Alerts

Monit lets you configure alerts by e-mail, in this case we use gmail as SMTP and you can also default to localhost to deliver the e-mails. You can refer to the Monit documentation for more details.

Find and uncomment the set mailserver line to ressemble to:

set mailserver smtp.gmail.com PORT 587 USERNAME "your-user@gmail.com" PASSWORD "your-password" using TLSV1 with timeout 30 seconds

You have to replace the username and password. You can then find the set alert line to define the recipient:

set alert dummy@syscrunch.com not on { instance, action }

This alert is skipping Monit actions like start, stop or performing user actions in order to avoid trivial cases.

IMPORTANT: Gmail will not send e-mails with this configuration unless you enable the less secured Apps option for your account at http://www.google.com/settings/security/lesssecureapps.

Apache Server RAM Solution

After monitoring the processes with Monit, the problem was with the Apache server that was running too many worker instances. This would lead to exhausting the system memory and lead to other processes being killed. The issue is that most of the web services (virtual hosts) were configured with a HTTP Keep Alive header, this allows for faster response times but it consumes more resources as the connection is kept open.

The solution was to remove the Header set Connection keep-alive from the virtual hosts. It can also be set explicitly to close Header set Connection close. Alternatively there are other options to tune the persistent connections like MaxKeepAliveRequests  and KeepAliveTimeout , it’s recommended to have a good understanding since for HTTP 1.1 clients, the default option is to keep connections alive, you can read more in the Apache docs for Keep Alive.

In the end with this configuration, it was possible to launch 3000 thousand requests with 60 concurrent requests using Apache Benchmark (3 parallel calls) and the server handled it without any problems. Here is a sample output of one of the tests:

Server Hostname:        www.syscrunch.com
Server Port:            80

Document Path:          /
Document Length:        47998 bytes

Concurrency Level:      20
Time taken for tests:   122.177 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      48396000 bytes
HTML transferred:       47998000 bytes
Requests per second:    8.18 [#/sec] (mean)
Time per request:       2443.541 [ms] (mean)
Time per request:       122.177 [ms] (mean, across all concurrent requests)
Transfer rate:          386.83 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       55  139  93.7    113     973
Processing:   320 2278 594.2   2249    8164
Waiting:      188 2011 601.7   1987    8009
Total:        380 2417 594.7   2375    8336

Percentage of the requests served within a certain time (ms)
  50%   2375
  66%   2481
  75%   2562
  80%   2616
  90%   2855
  95%   3265
  98%   4264
  99%   4564
 100%   8336 (longest request)

I wrote this article to document the solution to this problem and it might be updated over time.

Hope it’s useful for some other people.

Apache Server: KeepAlive, good practice

The KeepAlive parameter is more important than you might actually think, we learnt the hard way. Many good articles have discussed the KeepAlive topic (e.g. KeepAlive On/Off) with good arguments. Here I’ll present rather a summary.

I personally set up our apache server for development machine in EC2/VPC (Amazon Web Services). I set up the KeepAlive parameter to On because we were concerned with speed more than with memory. The server was running fine with sometimes Apache taking spikes in memory usage. The problem started when our blog began to increase traffic and suddenly it was practically impossible to connect  to the server, it was running low in memory with Apache taking up to 1GB in memory! And as usual in the worst moment! The first time, a graceful restart of Apache fixed the problem but it was just getting worse.

How much traffic did it take? Not that much, a simple Apache Benchmark test with 20 concurrent connections and 50 queries was enough to render the server unusable.

ab -c 20 -n 50 http://our-server/

Just imagine several (smart) spiders hitting your server, that would be it.

Then, I had to set the KeepAlive value to Off, the requests take a bit longer but our server now can take much more. You can fine tune the worker values too.

sudo vi /etc/httpd/conf/httpd.conf; # in CentOS and set: KeepAlive Off

Conclusion, if your going to expose your server to a broad audience set KeepAlive to Off. However, if you rather control the amount of traffic and the authorised host/machines that connect to your server, then you might set KeepAlive to On.

As a final advise, block common attacks to your server. I personally use the script that is also used in phpmyadmin. See below, you can add it to your VirtualHost configuration or to an .htaccess file.

# Allow only GET and POST verbs
RewriteCond %{REQUEST_METHOD} !^(GET|POST)$ [NC,OR]
# Ban Typical Vulnerability Scanners and others
# Kick out Script Kiddies
RewriteCond %{HTTP_USER_AGENT} ^(java|curl|wget).* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*(libwww-perl|curl|wget|python|nikto|wkito|pikto|scan|acunetix).* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*(winhttp|HTTrack|clshttp|archiver|loader|email|harvest|extract|grab|miner).* [NC]
RewriteRule .* - [F]

Hope it was useful and do not forget to optimize your server!