Kernel Remove / Purge

dpkg -l | egrep -o 'linux-(image|headers)-[^ ]+' | grep '[0-9]' | egrep -v "$(dpkg -l | grep 'linux-image-[0-9]' | tail -n1 | sed 's:.*linux-image-\([0-9\.-]\+\)-.*:\1:')|$(uname -r)" | xargs -r sudo apt-get -y purge


Using FreeNAS/ZFS for vSphere NFS Datastores

As we have been in production for almost a year with our current setup, it is time to share our experience.

Our setup consists of 4 FreeNAS heads. 2 for vSphere NFS Datastores, 1 for CIFS (Windows filer) and 1 for Disastor Recovery – contains SNAPs of the ZFS datasets.

We started with the FreeNAS build. In the meantime only our CIFS-head have been upgraded to FreeNAS 9.3.

From 6 vSphere 5.5 hypervisors, we are driving 225 VMs on the two attached FreeNAS boxes. Our 2 main heads – the NFS for vSphere – consists of the following HW and specifications:

– Dell PowerEdge R720
– 2 X Intel Xeon CPU E5-2643 v2 @ 3.50GHz – 12 cores / 24 threads
– 256GB Memory
– 1 X LSI 9207-8i
– 1 X LSI 9207-8e
– 1 X Intel 800GB SSD (PCIe with built-in LSI 2108/SAS-1-controller)
– 1 X Chelsio T520-SO-CR (Dual port SFP+/10Gb)
– 4 X HGST s840Z SSD
– 2 X DNS 1640D (24 X 2.5″ JBOD)
– 48 X HGST 900GB SAS-2

The Intel SAS-1 SSD is not an optimal solution in the above configuration. It is a shame to throttle the L2ARC down to SAS-1 speeds. However our budget at the time was limited.
When we upgrade in a near future we insert an Intel DC P3700, also with 800GB. We hope to see it “fly” with *BSD, but we have not tried it yet.

As for random IOPS workloads, we have chosen to use mirrored vdevs. 23 pairs with 2 disks as spare. The amount of vdevs should however be an equal number if you are a purist.

The 4 s840Z SSDs are configured as SLOG devices. We have _not_ mirrored them as we saw the probability of both LSI adapters AND/OR the SSDs AND spindle-disk should both turn “upside-down” at the same time, would be very unlikely.
But to be fair, if you can not tolerate ANY loss whatsoever, you should mirror them!

We have made the following changes to a vanilla FreeNAS

kern.ipc.maxsockbuf = 33554432
net.inet.tcp.cc.algorithm = cubic
net.inet.tcp.recvbuf_inc = 524288
net.inet.tcp.recvbuf_max = 33554432
net.inet.tcp.sendbuf_inc = 16384
net.inet.tcp.sendbuf_max = 33554432
vfs.nfsd.server_max_nfsvers = 3
vfs.nfsd.server_min_nfsvers = 3
vfs.zfs.l2arc_write_boost = 33554432
vfs.zfs.l2arc_write_max = 33554432

vfs.zfs.scrub_delay = 3
vfs.zfs.txg.timeout = 1
vfs.zfs.vdev.async_read_max_active = 1
vfs.zfs.vdev.async_read_min_active = 1
vfs.zfs.vdev.async_write_max_active = 1
vfs.zfs.vdev.async_write_min_active = 1
vfs.zfs.vdev.max_active = 32
vfs.zfs.vdev.sync_read_max_active = 1
vfs.zfs.vdev.sync_read_min_active = 1
vfs.zfs.vdev.sync_write_max_active = 1
vfs.zfs.vdev.sync_write_min_active = 1

The above changes to a vanilla configuration have been found through a combination of internet sources, own experience and a lot of testing. We are now able to almost max-out the 10Gb transport AND have a very low-latency pool.
On the head/pool itself we can actually drive ~35Gb/s reads and ~25Gb/s writes, which is very consistent with the performance and amount of backend spindle disks.

Bear in mind that our CIFS and DR heads are almost 100% vanilla. The above is only changed on our NFS (vSphere Datastore) heads for improved responsiveness.

Early on we also decided to follow the LSI FW/Driver, as we found the built-in custom version “16” to be a bit chatty with a lot of errors being thrown to the console. “Errors” that we, in most cases, found should be suppressed.
However if you choose to follow the “LSI Phase” you will not get a lot of love from the FreeNAS forums 🙂 They stick to the proven FW and driver that is built-in.
We have been running with the LSI Phase 19 for the last year with no glitches – both firmware and driver.

External references to the above, where some of the settings are discussed:


Thanks for reading

ZFS. All your disk are belong to you.


VMware vCenter Server Appliance 5.5 Update 2 – Disk Utilization

The VMware vCenter Server Appliance from 5.5 Update 2 and onwards, hits your Postgres-SQL-logs pretty hard.

The symptoms will be that your partition /storage/db runs full. You have tweaked all you can on both ‘Task retention’ and ‘Event retention’ to no avail. You are still seeing extreme growth on the /storage/db-partition.

But now you found this post 🙂

VMware has a workarround on this – to stop any future growth:

But what about the 100’s of GB you already have in your logs? They will not be automatically purged by the above tweak.

After you have testet that your log-files no longer gets hit by the lines “WARNING: there is already a transaction in progress”, it is time to clean-up the logs.

Your Postgres-SQL-log-files are either stored in:


or …


I am not sure why mine are stored in /storage/db/serverlog, when alle the references online, suggests the former location, however there does not seem to be any other differences.

As the dreaded “WARNING: there is already a transaction in progress” will most likely consume more than 99% of your log usage, we will clean those logs.

For every log-file (postgres-YYYY-MM-DD_HHMMSS.log), do the following:

(Actually, deleting every log-files older than a year is something you might want to consider first)

# sed --in-place '/there is already a transaction in progress/d' /storage/db/vpostgres/pg_log/postgresql-YYYY-MM-DD_HHMMSS.log

or …

# sed --in-place '/there is already a transaction in progress/d' /storage/db/vpostgres/serverlog

(If your Postgres-SQL-log is in the above _single_ file, expect it to run for a long time. Ours was 200GB and the cleaning took hours!)

The above command will find every line containing ‘there is already a transaction in progress’, and delete that line. All other entries will be maintained.

To wipe all the ‘dirty’ blocks from the truncated file, you can use the following command:

# cat /dev/zero > /storage/db/deleteme; rm /storage/db/deleteme

Now vMotion to a new datastore – and back again – et voila, all your space are reclaimed!

Thanks for reading!


Make your own enterprise mail archive based on freely available software

You have :

– An Enterprise mail system that needs to be archived

You need :

– 2 *NIX – choose your own flavor – we use Ubuntu LTS
– An NFS share with adequite space

We found that we needed to archive 3 – 5 years of mail traffic from our mail system. We scanned the market and found that there are a lot of solutions, but they come at quite a hefty price.

As we have access to all the VMs necessary and quite a lump of disk space to our availability, we started out to design a solution our selves.

Our mail domain is mail.domain.tld – domain.local is our local AD controlled DNS.

VM no. 1 / mail.archive.domain.local

On *NIX VM no. 1 we installed a basic Postfix / Dovecot instance. We have chosen ‘Maildir’ as repository, more on that later.

Create 4 users ( both OS and Dovecot ), incoming, outgoing, inside, pickup ( you can get by with only one user for the actual archiving, but we find that splitting the data into 3 almost equal lumps are quite nice ).

Mount 4 NFS shares as follows – excerpts from /etc/fstab

nas.domain.local:/export/inside on /home/inside type nfs (rw,wsize=16384,vers=4)
nas.domain.local:/export/outgoing on /home/outgoing type nfs (rw,wsize=16384,vers=4)
nas.domain.local:/export/pickup on /home/pickup type nfs (rw,rsize=16384,wsize=16384,vers=4)
nas.domain.local:/export/incoming on /home/incoming type nfs (rw,wsize=16384,vers=4)

VM no. 2 / search.archive.domain.local

On *NIX VM no. 2 we installed maildir-utils – http://manpages.ubuntu.com/manpages/oneiric/man1/mu.1.html. It is available from the Ubuntu repositories, so all you need is :

#apt-get install maildir-utils maildir-utils-extra

Mount 4 NFS shares as follows – excerpts from /etc/fstab

nas.domain.local:/export/inside on /home/inside type nfs (ro,rsize=16384,vers=4)
nas.domain.local:/export/outgoing on /home/outgoing type nfs (ro,rsize=16384,vers=4)
nas.domain.local:/export/pickup on /home/pickup type nfs (rw,rsize=16384,wsize=16384,vers=4)
nas.domain.local:/export/incoming on /home/incoming type nfs (ro,rsize=16384,vers=4)

Note that we mount the NFS shared from search.archive.domain.local as ReadOnly! ( except the pickup-mount ) This is the indexing / search VM – no need to give ReadWrite.

We have created 3 cron jobs – They can be run daily, weekly or monthly ( /etc/cron.daily , etc. ) – as you preefer.

The jobs are as follows :


# Index /mnt/incoming/Maildir with maildir-utils / mu

mu index --quiet --autoupgrade --maildir=/mnt/incoming


# Index /mnt/outgoing/Maildir with maildir-utils / mu

mu index --quiet --autoupgrade --maildir=/mnt/outgoing


# Index /mnt/inside/Maildir with maildir-utils / mu

mu index --quiet --autoupgrade --maildir=/mnt/inside

Enterprise mail system / mail.domain.local / mail.domain.tld

On the enterprise mail system you create 3 rules for BCC’ing to the following.

Inside senders -> inside reciepients – BCC -> inside@mail.archive.domain.local
Outside senders -> inside reciepients – BCC -> incoming@mail.archive.domain.local
Inside senders -> ouside recipients – BCC -> outgoing@mail.archive.domain.local

( make your own logic / naming, if the above is confusing – it works for us 🙂 )

Now you have a working mail archiving. But we still need to be able to find an archived mail. Enter maildir-utils!

On the search.archive.domain.local the following are nifty commands :

#mu find to:user@domain.tld someone --fields "d f t l" (d=date, f=from, t=to, l=location)
Finds all mail to user@domain.tld AND ‘someone’ is present either in sender, subject or body

#mu find from:user@domain.tld someone --fields "d f t l"
Finds all mail from user@domain.tld AND ‘someone’ is present either in sender, subject or body.

#mu find from:user@domain.tld @otherdomain cc:someoneelse@domain.tld –fields “d f t l”
Finds all mail from user@domain.tld to someone @otherdomain.tld AND CC to someoneelse@domain.tld

#mu find to:user@domain.tld date:20140801..20140901 findthistext --fields "d f t l"
Finds all mail from user@domain.tld from 1st of August 2014 to 1st of September 2014 AND contains ‘findthistext’ in either sender, subject or body.

In your output you get the location of the actual mail. You can use this to qualify that you have actually found the right mail.

#mu view /mnt/inside/Maildir/cur/2358755128.M708023P28292.mail,S=3905,W=6436:2, | more

We are now ready to actually retrieve the mail from the archive and send it on its way to newfound glory!

#cp /mnt/inside/Maildir/cur/2358755128.M708023P28292.mail,S=3905,W=6436:2, /mnt/pickup/Maildir/cur/

On a mail client configured with the pickup user towards mail.archive.domain.local, you should now be able to see the retrieved mail in you favorite mail client, and hence decide its future.

Some notes about performance.

We now have about 2 years of archived mails – about 10 million mails in all. It stresses the above configuration, so here is a few tweaks / tunes that might be of need if your are reaching the systems limit.

– The search.archive.domain.local needs a lot of memory to run the mu-commands. No less than 16 GB.
– There are some Dovecot / Maildir / NFS tunes – have not found them for this blogpost, but will update when I find them.
– Consider making the initial Dovecot storing on local disks, and then rsync them to a NFS share instead. Let search.archive.domain.local catch them there. Should be much faster, but also doubles the amount of diskspace you need to have available.

Feel free to comment on the above, and make suggestions for changes. We aim to improve – always!

Thanks for reading!


Dell PowerEdge R715 ( R815 ) and ESXi 5.5 Stability

We had some initial problems with vSphere ESXi 5.5 on the Dell PowerEdge R715, that led to PSOD with normal load.

The Hypervisor HW was running with the same settings on ESXi 5.1 with no glitches.

We have not 100% identified the root problem, but after reading various blogs regarding best practice and other tunes, we have reached a stable setup. A setup I will share here.

Always have the firmware on your hypervisors at the current level, unless there is a very specific reason otherwise. Without further ado, here is our configuration.

Dell PowerEdge R715 ( Properly also PowerEdge R815 as they share chipset and BIOS version )

– BIOS version : 3.2.1
Processor settings
| HyperTransport Technology : HT3
| HT Assist : Enabled
| Virtualization Technology : Enabled
| DMA Virtualization : Enabled
| DRAM Prefetcher : Enabled
| Hardware Prefetch Training … : Enabled
| Hardware Prefecther : Enabled
| Execute Disable : Enabled
| Number of Cores per Processor : All
| Core Performance Boost Mode : Enabled
| Processor HPC Mode : Enabled
| C1E : Disabled

Power Management
| Power Management : Maximum Performance

– 2 X AMD Opteron 6348
– 2 X Dual 10 Gb Broadcom
– 1 X Quad 1 Gb Broadcom

– Broadcom firmware / driver : 7.8.53 / 2.2.4f.v55.3 ( SSH and ‘ethtool -i vmnicX’ )

VMware ESXi, 5.5.0, 1746974

The above configuration is running rock steady with normal load. “Normal” in our environment is 25% CPU and 75% Memory usage per hypervisor.

Thanks for reading!


Updating vSphere vCenter SA 5.1 to Build 1065184

We had some initial problems updating our production vCenter (VCSA) to Build 10651184.

Luckily i had made a snap before 🙂

On the console we saw the following :

nohup /opt/vmware/share/vami_sfcb_test > /dev/null 2>&1
./etc/init.d/vami-sfcb: line 238: xxxxx Aborted

… and it seemed to loop forever.

After searching the web, i found a solution :

– Insert Update ISO -> VCSA.
– VMware COS -> VCSA.

mount /dev/cdrom /media/
cd /media/update/package-pool
rpm -Uvh openssl-*
rpm -Uvh libopenssl0_9_8-0.9.8j-0.44.1.x86_64.rpm
rpm -Uvh openssh-5.1p1-41.55.1.x86_64.rpm

– Do the regular update from CD/DVD in WebGUI.
– Reboot VCSA.

Now I am not sure if the above will break anything in the future, so beware !

Also I read that doing the update from an ISO is the recommended way. Not sure where I read it though…

Thanks for reading.


Building our own high performance NAS / SAN

We have been using the Sun / Oracle ZFS Storage 7000 series for about 3 years now. We have grown pretty fond of it. Now it is time to make use of what we have learned.

We want to build our own NAS appliance. A high performing one that is. One to be used as storage for primary Microsoft DAG nodes, that way we can let the secondary Microsoft DAG nodes live on commercially backed FT storage AKA our Oracle ZFS Storage, and still achieve IOPS you would not have dreamed of just a few years back !

A lot of thought have gone into sizing it, but without further ado – here is what we have shopped :

Server : Dell PE R715
CPU : 2 X AMD 8 cores @3.4 GHz
Memory : 256 GB
HBA : LSI Logic 9207i-8 & LSI Logic 9207e-8
Spindels : 14 X Seagate Savvio 10.5K 600 GB SAS-2
ReadSSD / L2ARC : 5 X Seagate Pulsar 2, 200 GB SAS-2 ( MLC / eMLC )
WriteSSD / Log : 3 X Seagate Pulsar XT 2, 100 GB SAS-2 ( SLC )
JBOD : Dell MD1220

We want to have 10 X spindles in a mirror configuration, that gives us 5 vdevs for the ZFS pool, which should be fine as most of our data in time will travel to the ARC and L2ARC. We “only” have about 750 GB data that needs to be accelerated.

2 spindels are marked as spares and we keep the last 2 in the “drawer”.

4 X ReadSSDs in a striped configuration and 1 in the “drawer”.

2 X WriteSSDs in a striped configuration and 1 in the “drawer”.

Initially we are hoping that the above configuration will give us ~ 100K OPS on reads and ~ 50K OPS on writes – both random on the “hot” data in ARC & L2ARC. As we will see later, our hopes were more than met 🙂

We are looking at the following contenders as the NAS OS / appliance software :

– Nexenta / Nexentastor
– SmartOS / OpenIndiana with napp-it.
– NAS4Free
– FreeNAS
– ZFS-guru

All of the above have ZFS built into the kernel. And not “just” the ZFS pool version 28 / file version 5 from the last Open Source Solaris 2009Q4. They all draw from the Illumos project which aims at maintaning the Open Solaris code, and most importent for us, the ZFS code. Although the BSD deratives are not directly attached to the Illumos codebase – the ZFS is.

Nexenta and napp-it are commercial products, so if your hands are shaking a bit now already… you can opt for those. They fully back almost any HW you may choose, and makes available both HW FT ( 2 active / active heads ) and a bunch of plugins… but if you want to use the NAS as we are planning, there is no use of the extra safety with a commercially backed product. If our nodes go down, so what. The secondaries will continue to be online. At this point in time we are only trying to grasp a feel of it, not migrating all our storage needs.

Of the above, FreeNAS have some commercial backing via the iX-Systems which bought the code a few years back. Still keeping the OS free though, thanks for that. As I am running the FreeNAS on my home storage appliance, this was my first choice. Unfortunately the underlying FreeBSD is only at 8.3, and was not able to boot on the Dell PE R715 – comments on this is very welcome.

I also tried a clean FreeBSD 9.1 STABLE, but is was to much work to make a ZFS-on-root boot environment and forcing the ZFS to take advantage of the newer 4K optimized disks. At least for me. More FreeBSD-savvy persons would most likely succeed.

NAS4Free was the next we tried. It is built on FreeBSD 9.1, and they have a more aggressive approach to keeping the software current as opposed to FreeNAS. Me like !

The NAS4Free has a very nice and clean UI. Not always very logic / intuitive, but never the less functional. We ran NAS4Free for 2 days without glitches and it is still a contender for the “win”.

Secondly we registered for a Nexenta Enterprise and installed it as a zfs-on-root on two built-in SAS drives. Works as a charm. The UI is a bit confusing at times, especially if you are used to the very intuitive and very slick UI of the Oracle ZFS apliances. Also it is not nearly as responsive – but it works !

The Nexenta has some advatages over its competitors. I.e. plugins to make Amazon S3 backup vith your ZFS snapshots / ZFS clones – that is pretty sweet. And for that alone, it is still a contender for the win.

As for now we will be running the Nexenta until the trial runs out, and hopefully FreeNAS will be ready with a version built on FreeBSD 9.1 in early june 2013.

So are you waiting for some benchmarks ?

After 3 days of pounding the Nexenta from a VM wich reads and writes extensively on the same DB-file – a 200 GB one. Now I know the difference between performance IRL and a synthetic benchmark, but the goal here was to see if we could achieve Read OPS : 100K and Write OPS : 50K.

After running the query ( show max statistics ) on the Nexenta-box it gives med the following :

Peak Read OPS : 472.400 /sec.
Peak Write OPS : 320.800 /sec.

So, is that a lot? Yep !

Please comment if you feel I have left something out or you have some ideas on how we could improve the above setup.

Thanks for reading.

ZFS. All your disk are belong to you.