Saturday, December 15, 2012

ZFS

Finally I managed to get my desktop PC back online, after changing its motherboard, memory and hard drive, it's ready to use. Besides regular use my idea is to store all the content here until I get a proper NAS (if that ever happens).

Eventually I came to the point where I had to choose which file system to use, I could stay with ext4, however that's not really the best out there, I mean it is a very popular and good file system for general use, but there are other options that provide, for example, better data integrity.

And everyone seems to agree that the safest file system is zfs. Originally developed by Sun it's supposed to provide excellent data integrity. Unfortunately for Linux users, due to license's conflicts can't be included in the kernel, nonetheless there is an implementation using fuse which I plan to use and show here.

The content that I plan to store is mostly family pictures, videos and music, all of them already have some kind of compression, so a file system with compression will not do much for me, zfs supports compression but I won't use it, if you plan to store text file or non compressed formats then probably it will be a good idea to give it a try.

After installing it and get it running (it has a daemon called zfs-fuse), zfs is more than just a file system, it is also a volume manager, which means that you don't even need to partition the disk, you can just give the whole disk for it to use, or a partition, or even a file, and a file is what I will use for testing here.

First lets create the file, I'll use a 512MB file as data store:
# dd if=/dev/zero of=rawdisk.1 bs=4096 count=131072
After that we create a pool, which is like a container that can store (or not) multiple volumes, for this test I will use it as it is:
# zpool create testpool /home/paco/zfs-test/rawdisk.1
This not only creates the file system but it also mounts it on /testpool so it is ready to use, the information of the pool is stored in /var/lib/zfs/

The status of the pool can be visualized with:
# zpool status
 pool: testpool
 state: ONLINE
 scrub: none requested
config:

    NAME                             STATE     READ WRITE CKSUM
    testpool                         ONLINE       0     0     0
      /home/paco/zfs-test/rawdisk.1  ONLINE       0     0     0

errors: No known data errors
So far so good, now I have couple of images that I will copy there, after that, let's unmount the device:
# zfs unmount /testpool
Now I'll try to simulate a data corruption, since the device that I'm using for the pool is a file, I can modify some bytes of it and see how it goes, the images I took start with "JFIF" so let's try to find that on the file:
# hexdump -C rawdisk.1 | less 

0040ac00  ff d8 ff e0 00 10 4a 46  49 46 00 01 02 01 00 48  |......JFIF.....H|
0040ac10  00 48 00 00 ff e1 1d 7a  45 78 69 66 00 00 49 49  |.H.....zExif..II|
0040ac20  2a 00 08 00 00 00 0f 00  0f 01 02 00 09 00 00 00  |*...............|
0040ac30  c2 00 00 00 10 01 02 00  10 00 00 00 cb 00 00 00  |................|
0040ac40  12 01 03 00 01 00 00 00  01 00 00 00 1a 01 05 00  |................|

Let's add some disturbance on "JFIF", let's change the "IF" part with random data:
# echo $(((0x0040ac00)+8))

 4238344

# dd if=/dev/random of=rawdisk.1 seek=4238344  bs=1 count=2 conv=notrunc 

Let me explain this a bit, the first column of the hexdump is the offset, in our case it is 0x0040ac00 (hexadecimal) which represents the first byte "ff" from that we start counting, 1 is d8, 2 is ff, 3 is e0 and so on until we get to "IF" which is 49 and 46 (man ascii if you want to double check). With this information we use dd to introduce random data from /dev/random in that position, you can for example close it and hexdump it again and grep for 0040ac00, you will notice that "JFIF" will be "JF**" where * is something random.

I repeat it a couple of times and then mounted it again:
# zfs mount testpool

So far so good, now let's do a check:
# zpool scrub testpool

And let's check the output:
# zpool status testpool   
  pool: testpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed after 0h0m with 2 errors on Fri Dec 14 23:59:27 2012
config:

    NAME                             STATE     READ WRITE CKSUM
    testpool                         ONLINE       0     0     2
      /home/paco/zfs-test/rawdisk.1  ONLINE       0     0     4

errors: 2 data errors, use '-v' for a list

It managed to detected 2 errors, let see more:
# zpool status -v testpool
  pool: testpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed after 0h0m with 2 errors on Fri Dec 14 23:59:27 2012
config:

    NAME                             STATE     READ WRITE CKSUM
    testpool                         ONLINE       0     0     2
      /home/paco/zfs-test/rawdisk.1  ONLINE       0     0     4

errors: Permanent errors have been detected in the following files:

        /testpool/1920-1200-11295.jpg
        /testpool/1920-1200-217.jpg

These are the two image files that we modified. Very impressive!, I'm not using it with a mirror, so it can't restore them, but it is possible to add another disk/partition/file as a mirror, in which case it will recover the files from it.

References:
  1. Wikipedia's article about ZFS
  2. ZFS implementation for Linux using fuse
  3. Solaris ZFS Administration Guide 
Versión en español aquí (linuxlatino)