Getting and scaling Magento in the cloud

Posted by Adrian Duke on 26 January 2012

I was recently asked to research getting Magento into the cloud and making it scalable for one of our clients. This decision came about as we believe performance is a large factor in conversion, coupled with server reliability and server uptime. You may ask "Why don't you host your own servers?" well we do... But they don't scale on demand and they are certainly prone to hardware faults where the response time is probably a couple of hours, not minutes. It has become obvious that the cloud has many advantages over a rack full of servers in one data centre. I'm not writing this post to convert you to the cloud so lets get on...

Overview

What's to follow is a set of instructions on how I went about getting Magento on the cloud in a scalable configuration, that's not to say it's the 'correct' way, but it works. The following are the tools I used:

  • Magento 1.6.1.0 Community Edition (should work with 1.5.0.0+)
  • Amazon AWS (EC2, ELB, RDS, S3/Cloudfront)
  • OnePica Image CDN plugin
  • Scalr (Open source cloud management platform)
  • NFSv3 (you can use v4)
  • s3cmd

Scalr isn't required, but it does provide some handy features like managing a DNS record for each server instance or allowing you to execute scripts across multiple servers. I'm also certain you could translate these instructions to other cloud systems as none of them depend on any particular AWS feature. Here is what we are trying to achieve (pardon the icons, they were the best I could find):

Server configuration in amazon cloud

The image above depicts:

  • (ELB) Load balancer, balancing our web servers
  • (S3) CDN which will host our cached media files and skin (skin/) files (you could use cloudfront)
  • (EC2)x2 Public facing web servers which will contain our Magento code base (I put 2x so you can see how the load balancer and web server scaling will work)
  • (EC2) Admin web server which will be a seperated administration-only server
  • (EBS) Persistant storage mounted on the admin server containing our media files (media/) shared out via NFS
  • (RDS) Database server which speaks for itself

Admin Server (EC2)

Overview

Our admin box will serve as the central code base that all web servers will sync from, and also where our media files will be served from. You may be thinking "Well that's the central point of failure" and to an extent you're correct, but what we can do is ensure regular backups as well as using scalr to mange the box (server goes down, new server goes up).

Scalr Setup 

Create yourself a new farm, call it whatever you like. Go to the roles and create yourself a new application server with apache (I used Ubuntu 10.04 app-apache64-ubuntu-ebs), add any scaling options you would like but make sure minimum instances is 1 (as it's a required server). No need for load balancing on this server, but I would suggest you add an elastic IP if you use a payment gateway like SagePay (as they only allow communications from certain IP's). When using scalr images they are all bundled with EBS volumes, if you're using your own AMI make sure you add a persistent volume now.

Save the farm and get it started, within a minute or two scalr should register that there isn't a server running and will pop us our admin box. Once it's up and running, ssh yourself in (click the terminal icon next to the server), you should now be root.

Please go through the magento requirements, as there are a few things you are going to need... Ones that I can remember are:

  • php5
  • php5-cli
  • php5-mysql
  • php5-mcrypt
  • php5-gd
  • php5-curl
Make sure you restart apache after installing new modules.

Apache 

The first thing we need to do if you're using one of the scalr application server images is to make sure AllowOverride is set to all on /var/www by doing the following:

$ vi /etc/apache2/sites-enabled/000-default
# Change line[11] "AllowOverride None" to "AllowOverride All"

This will allow .htaccess files to work (required for magento). Next make sure to clear out your /var/www folder:

$ rm -rf /var/www/*

From here you can use your own method to pull in your code base (or a fresh version of Magento 1.6.1.0) to /var/www... For example:

# This method would require you to download your private key
# for the farm you have created. Tools > SSH Keys...
$ rsync -av -i /path/to/private/key.pem /path/to/local/magento-source/ \
root@<external admin ip>:/var/www/

 Next we will need to create an RDS / database instance to use for magento.

 Now you will need to modify your Magento local.xml settings to reflect the new database location and settings:

$ vi /var/www/app/etc/local.xml

If you're installing from fresh, go ahead and run the installer and come back once you're done. Don't forget to:

$ chmod -R a+rw /var/www/media/ /var/www/var/

You should now have a working Magento install running from the RDS / Database instance, good work.

OnePica Image CDN

This plugin is great, it will cache your product media files to S3 / Cloudfront and various other sources. We are going to use S3 in this post, so go ahead and install the plugin from magento connect... You may need to chown your /var/www so that the user www-data (apache2) can write to /var/www:

$ chown -R www-data: /var/www

Create yourself a new S3 bucket and get your Access Key ID and Secret Access Key at the ready. Now log back into the admin server and go to System > Configuration > Catalog > Image CDN. if it's not there, you've done something wrong (try logging out of the admin and back in again and clearing the cache). Use the following settings:

-- General Settings
Current Adapter: Amazon S3/Cloudfront
File Result Cahce: In Database
Defaults for everything else

--Amazon S3/Cloudfront
Access Key ID: [access key]
Secret Access Key: [secret key]
Bucket: [bucket you just created]
Base URL: http://s3-[zone].amazonaws.com/[bucket]
Secure Base URL: https://s3-[zone].amazonaws.com/[bucket]

Save the settings and clear the cache, now try out a category page... Right click one of the product images and inspect the url, it should be from amazon s3. Congratulations, you have completed the first part of the CDN.

Skin CDN

Now we are going to place the skin files onto s3 and point Magento that way. I used the s3cmd sync tool to get the skin files up on s3, you can install it like this:

# Import S3tools signing key
$ wget -O- -q http://s3tools.org/repo/deb-all/stable/s3tools.key | sudo apt-key add -

# Add the repo to sources.list
$ sudo wget -O/etc/apt/sources.list.d/s3tools.list http://s3tools.org/repo/deb-all/stable/s3tools.list

# Refresh package cache and install the newest s3cmd
$ sudo apt-get update && sudo apt-get install s3cmd

# You'll want to configure it
$ s3cmd --configure

Now lets get our skin into s3 using the bucket you used before:

# -P = Make public
$ s3cmd sync -P /var/www/skin s3://[bucket]/

Great, now we should have a folder called skin/ in the root of our bucket with all our skin files.

Go back to the magento admin System > Configuration > Web now this bit is important, make sure you switch to 'Default Store View' as there is a slight gotcha with CDN'ing your skin files, if you try and upload images via Magento admin it won't work because the image uploader is flash and it would require a cross-domain policy file, which we don't want to do, so you circumvent it by only setting the default store configuration to use skin CDN:

-- Unsecure
Base Skin URL: http://s3-[zone].amazonaws.com/[bucket]/skin/
-- Secure
Base Skin URL: https://s3-[zone].amazonaws.com/[bucket]/skin/

Save the settings, clear the cache and go check where the css files are coming from, it should be s3. Done!

NFS Server

Next you will want to install NFS:

# Install nfs
$ sudo apt-get install nfs-kernel-server

# Lets add our network share in
$ vi /etc/exports
# /etc/exports: the access control list for filesystems which may be exported
#               to NFS clients.  See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes       hostname1(rw,sync,no_subtree_check) hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
# /srv/nfs4        gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)
# /srv/nfs4/homes  gss/krb5i(rw,sync,no_subtree_check)

/var/www/media         *(rw,sync,no_subtree_check)
$ /etc/init.d/nfs-kernel-server restart

You may be thinking "* WTF?!", but with amazon all EC2 instances are locked down, conveniently scalr will modify the default security group to allow any connection between servers in this group (all servers will be added to default), so it looks wide open but it's not. Any of your servers that do connect over NFS will be communicating in plain text (granted it will be internal ip's) if this is an issue for you, feel free to setup kerberos.

Up until this point we have been doing everything on the Admin box.

 Its now time to create and configure our web servers. I'd suggest creating an image of the admin box now and giving it some sane name like "Admin" so we have nice dns entries in scalr.

 Admin URL

So you should now have a fully working web server and admin server. There are a few configuration settings we will need to change to allow for seperate admin domains and web page domains (note this is for 1.6.1.0, there is a fix for < 1.6.1.0... ask me in the comments). This configuration assumes your load balancer is up and running.

Navigate over to the admin external hostname or ip and log into the admin area. Go to System > Configuration > General > Web and set the following settings:

-- Url Options
Auto-redirect to Base URL: Yes
-- Unsecure
Base URL: http://[load balancer]
-- Secure
Base URL: https://[load balancer]

Save settings, now head over to System > Configuration > Advanced > Admin:

-- Admin Base URL
Use Custom Admin URL: 
Custom Admin URL: http://[external admin hostname]

Save settings. Now clear the cache on both web server and admin server via ssh. You should now find you can only get to the admin area of the admin box and the whole site on the load balancer.

This is the end of the tutorial and I apologise for making you jump around.

 One more time for the conclusion.

RDS / Database 

This step is fairly straight forward, pop a new MySQL RDS instance (default engine is fine). You could use one of scalr's MySQL ami's and have it auto scale them but thats not within the scope of this post.

If you are using a pre-existing database you can use some of these commands to get you going:

# Export your DB from a box running MySQLd
$ mysqldump -u [username] -p [magento db name] -h [host / ip] > [magento db name].sql

# Connect to RDS / Database instance and create a database to import into
$ echo "create database [database name];" | mysql -h [database host / ip] -u [database instance username] -p
# Import database dump into RDS / Database instance
$ cat [magento db name].sql | \
mysql -h [database instance hostname] -u [database instance uname] -p [database name]

 Once your done you can continue with your admin server configuration.

Web Server (EC2) 

Overview

Our web servers will be our public facing servers that will be load balanced and fairly minimal, the guts to these servers will be coming from the admin box with a little help of a start up script.

Scalr Setup

Edit the farm you created for the admin role and add a new role, use the same image as you used before (I used Ubuntu 10.04 app-apache64-ubuntu-ebs).

Scaling Options

Mine are 1 - 5 instances, 5min waits, 15minute load average over: 5 and under: 1.5. Make sure to add yourself a load balancer, your health check target can be 'HTTP:80/' and for the most reliability check all 'Availability Zones'.

Now add your listeners, mine are (note if your going to add SSL, upload your certificate to Tools > AWS > IAM > Certificates > Add New):

Protocol Load Balancer Port Instance Port SSL Certificate
HTTP 80 80  
HTTPS 443 80 arn:aws:iam:xxxx

Placement and Type

I set my 'Availability Zone' to distribute equally.

I left all the other settings as default. No need for elastic ips or extra EBS.

Now make sure you install all the same modules as before for PHP otherwise you'll have an unhappy Magento.

Apache

This process is the same as the admin role apache setup here. You won't need to grab your code base though.

NFS Client

Now we are going to install the client ready for use but not configure it as we will have our startup script do all the hard work. Note that if you try and use fstab it may get nuked by scalr when it creates images of the server. Go ahead and install nfs-common:

$ sudo apt-get install nfs-common

Now lets try mounting the admin media share before we go on:

# Create folder to mount to
$ mkdir /var/www/media

# Mount
$ mount -t nfs -o proto=tcp,port=2049 [admin internal ip or hostname]:/var/www/media

# Provided that worked
$ umount /var/www/media

Great we have a working media share!

Startup Script

This is the fun part of the web servers, auto mounting and auto loading the code base. The script below is a modified version of the one I wrote, it won't work without modifications. Things to note are:

  • I created a ssh key on the admin box for a user called ubuntu which I then copied to the web server instance. 
  • I also created a user on the web server box who chowns /var/www/* called replicant.
  • I setup postfix and installed mail-utils to allow email alerts

This script will try to ping the admin box and if successful mount the nfs share and copy the Magento code base across. You can run it multiple times.

#!/bin/bash
ADMIN=[internal admin hostname]
LOGFILE=/var/log/webserver.log
MEDIA=/var/www/media
DEVEMAIL=admin@email.com
ROOT_UID=0
# File to check the existance of
MEDIA_TEST=`mount|grep $MEDIA|awk '{print $3'}`

if [ "$UID" -ne "$ROOT_UID" ]; then
	echo "`date`: Not running as Root" | mail -s "Not Root!" $DEVEMAIL
	echo "`date`: Not running as Root" >> $LOGFILE
	exit
fi

echo "`date`: Web server started" >> $LOGFILE

# Check if admin box is alive
ping -c 1 $ADMIN &> /dev/null

if [ $? -ne 0 ]; then
	# Try again just incase
	sleep 60
	ping -c 1 $ADMIN &> /dev/null

	if [ $? -ne 0 ]; then
		echo "`date`: ping failed, $ADMIN host is down!" | mail -s "$ADMIN host is down!" $DEVEMAIL
		echo "`date`: Ping failed $ADMIN is down, exitting." >> $LOGFILE
		exit
	fi
fi

if [ ! -e  "$MEDIA"]; then
	echo "`date`: No media folder to mount too" | mail -s "Missing media folder!" $DEVEMAIL
	echo "`date`: No media folder to mount too" >> $LOGFILE
	exit
fi

# Mount media over NFS
if [ "X$MEDIA_TEST" = "X" ]; then # Check if already mounted
	mount -t nfs -o proto=tcp,port=2049 $ADMIN:$MEDIA $MEDIA &> /tmp/mount-media

	if [ $? -ne 0 ]; then
		echo "`date`: Error mounting media from $ADMIN check /tmp/mount-media" | mail -s "Media failed to mount!" $DEVEMAIL
		echo "`date`: Error mounting media from $ADMIN check /tmp/mount-media" >> $LOGFILE
		exit
	fi
fi

# Remove known hosts so no rsync/ssh error about DNS spoofing
sudo -u replicant rm /home/replicant/.ssh/known_hosts
sudo -u replicant touch /home/replicant/.ssh/known_hosts

# Get code up to date
sudo -u replicant rsync -av -e "ssh -o 'StrictHostKeyChecking no' -i /home/replicant/.ssh/id_rsa" --exclude 'media/' --exclude 'var/' --exclude '.svn' root@$ADMIN:/var/www/ /var/www/ &> /tmp/rsync

if [ $? -ne 0 ]; then
	echo "`date`: Error rsync'ing code base from $ADMIN check /tmp/rsync" | mail -s "Rsync error!" $DEVEMAIL
	echo "`date`: Error rsync'ing code base from $ADMIN check /tmp/rsync" >> $LOGFILE
	echo "root@$ADMIN:/var/www /var/www" >> $LOGFILE
	exit
fi

rm -rf /var/www/var/cache/*
rm -rf /var/www/var/session/*

chmod -R a+rw /var/www/shop/var/

echo "`date`: Startup script successful!" >> $LOGFILE

With your startup script in place, you should be almost ready with your web server. Execute your script and try browsing to the ip address, you should find a working version of Magento (or it may redirect you to the admin box). Your now ready to make an image of the server and modify the scalr farm to execute your script on HostUp.

ELB / Load Balancer

This is fairly simple with ELB, unless you start using SSL in which case it gets a little more complicated (but remains doable). I will assume you do want SSL and the solution I am going to explain will use it up until the load balancer. You could use backend authentication but that is not within the scope of this post. If you're not using ELB find out what http header is set for SSL, this will be important.

You'll want to set the load balancers stickiness to Application Based on both HTTP and HTTPS and use the cookie name of: PHPSESSID.

SSL

SSL was the last hurdle I had to tackle and it was painful, but lucky enough for you I have the answers here... Hopefully you have followed all the steps above and configured everything with SSL in mind, now it's time to get your hands dirty.

Scalr Setup

Your going to need to add in your virtual host that will accomodate for SSL, if your using scalr its pretty simple... Websites > Apache Virtual Hosts create yourself a new host and tick the 'SSL' check box. Upload your certificates and point this VH at your admin server. Your certificate locations will be as follows:

  • Certificate = /etc/aws/keys/ssl/https.crt
  • Private Key = /etc/aws/keys/ssl/https.key
  • Certificate Chain = /etc/aws/keys/ssl/https-ca.crt

When you hit save, the VH will be uploaded to your running web server instances and enabled... Whoop! If you're thinking "Why the admin box?", that's because we are having the load balancer forward HTTPS to HTTP on the web servers so we only need the certificates on the LB and Admin Box.

Magento

Thanks to Magento's yoyo feature set, we can't just use the Offloader Header setting as that would be too easy... Instead you're going to have to create a rewrite rule and this is where you need to know the http header that your load balancer uses to signify HTTPS being on.

Add this to your .htaccess on the admin box:

$ vi /var/www/.htaccess
############################################
## you can put here your magento root folder
## path relative to web root

    #RewriteBase /magento/
    RewriteBase /

    RewriteCond %{HTTP:X-Forwarded-Proto} https
    RewriteRule .* - [E=HTTPS:on]

This condition takes the forwarded header from the load balancer (in this case ELB's X-Forwarded-Proto which is either 'http' or 'https') and sets HTTPS 'on' if it equates to 'https'. This is due to how magento decides whether a url should be secure or not, in some places it supports the 'Offleader Header' and in some places (I'm looking at you Zend framework) it checks the HTTPS environment header... Great... But atleast we have a fix!

Now head into the admin backend and set System > Configuration > General > Web > Offloader Header to 'HTTPS'. Done!

 You can now set your urls and you should be all finished

 Conclusion

If you're still with me, well done! That was a lot of information. You should end up with a scaleable Magento system, what I will end this post with is a few points you can research yourself:

  • SSL is only up to the ELB Load balancer in this tutorial, if you require encryption from the ELB to the web servers you will have to look into backend authentication.
  • The NFS share is also unencrypted unless you use something like Kerberos. I personally don't think it's necessary as the only things being shared are the public media files anyway.
  • The Admin box is your single point of failure. While the web servers will manage for a while on cache (about 2 minutes) you have to consider the possiblity of the admin box going down and having something in place to handle it like scalr.
  • Persistence on the admin box whilst using scalr isn't TRUE persistence. Scalr treats EBS volumes as disposable. While not a huge issue if you backup your admin box (s3sync can help there), it does become a problem if it goes down... see point above.
  • This is a big undertaking and isn't cheap. Serving ~100,000 uniques / ~500,000 page views per month this will rack up a bill of around $700 - 900

I am sure there are 100 other ways of doing some of the stuff here so let me know if you have any better solutions :).

If you would like to find out more about the Magento services we offer, visit our Magento development page or call us on 020 7183 1072.

Comments (4)

  • Adrian Adrian on October 18th, 2012

    Hi Marcelo,

    Apologies for the late reply. Yes the only purpose of NFS is to synchronise media data from the admin server to the web servers.

    Admin users will upload new images via the admin server and as the media folder is NFS shared it is instantly available on the web servers to use.

    Subsequently the OnePica module will then cache the images to S3 / CloudFront which will reduce the hits to the NFS share on web servers (which will always be slower than an actual local hard-drive)

    Thanks,
    Adrian

  • John John on November 8th, 2012

    Hi Adrian,
    Great article.
    On scalr, how did you determine which box was the admin server. Did you put it on a separate scalr role all together or just install the NFS server on the instance #1? Can you give more details on that?

  • Adrian Adrian on November 12th, 2012

    Hi John,

    Yes I created a separate role for it all together. As in our case the admin box is the central location to run cron jobs, maintenance tasks and integrations, which means the server image diverges a fair amount from just a 'web' box.

    I hope this helps!

    Thanks,
    Adrian

  • Gabriel Gabriel on February 17th, 2013

    Hi Adrian,

    Excellent article, thanks a lot. One doubt comes up to mind, though. How do you treat the session files? Are you using the database to store them? Shouldn't a NFS share be setup for that as well as for the media files?

    Thanks,
    Gabriel

Post your comment