Administrator
|
by sloony67 » Fri Jan 22, 2010 10:59 pm
Just kidding .. 
Last edited by sloony67 on Sat Feb 20, 2010 6:40 am, edited 1 time in total.
- sloony67
- Member
-
- Posts: 10
- Joined: Thu Jan 14, 2010 9:55 pm
by nia » Sat Jan 23, 2010 12:51 am
Nice wish actually ... Hardware based target de-duplication is becoming very popular with Enterprise VTLS nowadays ..
~nia
- nia
- Forum Founder
-
- Posts: 273
- Joined: Sat Dec 12, 2009 12:51 pm
- Location: USA
by nia » Fri Feb 19, 2010 1:51 pm
Thanks, I did not know about lessfs. I was still waiting for zfs on Linux to support dedupe which I don't think it does yet.
~nia
- nia
- Forum Founder
-
- Posts: 273
- Joined: Sat Dec 12, 2009 12:51 pm
- Location: USA
by nia » Fri Feb 19, 2010 11:59 pm
I found this also: http://www.tummy.com/journals/entries/j ... 209_050553
Wednesday December 09, 2009 at 05:27
Subject: ZFS dedup Available in ZFS-FUSE Keywords: Dedup, Linux, Technical, ZFS
Posted by: Sean Reifschneider
The 0.6.0 ZFS-FUSE release doesn't include dedup, not surprisingly. I did some digging around and I found this git repository which has a version of ZFS-FUSE that includes the dedup code:
git clone ' http://rainemu.swishparty.co.uk/git/zfs' zfs-fuse-dedupe
I have not tested yet ..But would be interesting if it works as expected. 
~nia
- nia
- Forum Founder
-
- Posts: 273
- Joined: Sat Dec 12, 2009 12:51 pm
- Location: USA
by nia » Sat Feb 20, 2010 6:10 am
I have actually downloaded zfs-fuse, the dedup version, and got it installed already ..
I had to get it from http://rainemu.swishparty.co.uk/cgi-bin ... ;a=summary
because git gave me trouble.
Now I have a Gentoo running mhvtl with iscsi target and /opt/mhvtl is "zfs" file system set with property dedup=on
-
CODE: SELECT ALL
scst-mhvtl ~ # zfs list mhvtl/library
NAME USED AVAIL REFER MOUNTPOINT mhvtl/library 1.99G 1.95T 1.99G /opt/mhvtl
scst-mhvtl ~ # zfs get -r dedup mhvtl/library
NAME PROPERTY VALUE SOURCE mhvtl/library dedup on local
scst-mhvtl ~ #
~nia
- nia
- Forum Founder
-
- Posts: 273
- Joined: Sat Dec 12, 2009 12:51 pm
- Location: USA
by sloony67 » Sat Feb 20, 2010 6:44 am
WOW .. 
Cool ....
I thought I was really kidding, but it is true .. I did not know about "lessfs" and "zfs" filesystems that can do deduplication ... I learn something new every day ...
Good Stuff ..
- sloony67
- Member
-
- Posts: 10
- Joined: Thu Jan 14, 2010 9:55 pm
by nia » Sun Feb 21, 2010 7:19 pm
I am still unable to verify if dedup is actually working !!! ..
I still have not seen any change in status, see below:
-
CODE: SELECT ALL
scst-mhvtl ~ # zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
mhvtl 1.98T 14.7G 1.97T 0% 1.00x ONLINE - scst-mhvtl ~ #
~nia
- nia
- Forum Founder
-
- Posts: 273
- Joined: Sat Dec 12, 2009 12:51 pm
- Location: USA
by nia » Mon Feb 22, 2010 1:00 am
zfs dedup does not work on mhvtl tape data format !!!
-
CODE: SELECT ALL
scst-mhvtl ~ # file /opt/mhvtl/SDLT01S3/data
/opt/mhvtl/SDLT01S3/data: VAX COFF executable - version 7926
scst-mhvtl ~ #
I will have to read up more on this ..
~nia
- nia
- Forum Founder
-
- Posts: 273
- Joined: Sat Dec 12, 2009 12:51 pm
- Location: USA
by nia » Mon Feb 22, 2010 4:56 am
I found out with dedup enabled, ZFS will identify and remove duplicated regardless of the data format.
So in this case of mhvtl -- tape data files has to have the same data inside e.g:
-
CODE: SELECT ALL
2049646 -rw-rw---- 1 vtl vtl 2097157724 Feb 21 20:58 /opt/mhvtl/SDLT01S3/data
2049643 -rw-rw---- 1 vtl vtl 2097157724 Feb 21 21:02 /opt/mhvtl/SDLT02S3/data
Now show:
-
CODE: SELECT ALL
scst-mhvtl ~ # zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
mhvtl 1.98T 16.8G 1.97T 0% 1.12x ONLINE -
But as soon as I dump little more files into one of the tapes, dedup is gone.
Conclusion:
ZFS dedup will not be practical solution for mhvtl use as the odd of duplicate data is highly unlikely.
~nia
- nia
- Forum Founder
-
- Posts: 273
- Joined: Sat Dec 12, 2009 12:51 pm
- Location: USA
by sloony67 » Mon Feb 22, 2010 4:04 pm
Bummer 
- sloony67
- Member
-
- Posts: 10
- Joined: Thu Jan 14, 2010 9:55 pm
by nia » Tue Feb 23, 2010 12:35 am
Update and Good news
I am able to use ZFS deduplication feature with mhvtl after all
The key was to turn off compression in mhvtl .. yes.. This is just what I did and now I got some deduped data in zfs for /opt/mhvtl.
File type is now called "data" instead of "VAX COFF"
-
CODE: SELECT ALL
scst-mhvtl ~ # file /opt/mhvtl/SDLT63S3/data
/opt/mhvtl/SDLT63S3/data: data
I ended up using zfs built-in compression instead of mhvtl native.
Now, ZFS shows deduped stats as listed below:
-
CODE: SELECT ALL
scst-mhvtl ~ # zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
mhvtl 49.8G 3.51G 46.2G 7% 1.37x ONLINE -
But ...  it only appear to be working for backup application tape cloning, inline copy and duplication jobs .. This is what I have noticed so far. Since I have not tested enough, I will have to confirm ..
More to come later ..
~nia
- nia
- Forum Founder
-
- Posts: 273
- Joined: Sat Dec 12, 2009 12:51 pm
- Location: USA
by nia » Wed Feb 24, 2010 12:36 am
UPDATE:
Ok, I got it all wrong again .. This has been somewhat confusing..
First off mhvtl tape compression has nothing to do with not being able to dedup in zfs ..
Here is some more testing that I did which will make the picture more clear:
I am using mtx to control the library robot and tar to write data:
mtx -f /dev/sg10 load 1 1 mtx -f /dev/sg10 load 2 2
Test #1 >>> scst-mhvtl ~ # tar -cvf /dev/st1 /root/*
scst-mhvtl ~ # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
mhvtl 398G 117M 398G 0% 1.00x ONLINE -
Test #2 >>> tar -cvf /dev/st2 /root/*
scst-mhvtl ~ # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
mhvtl 398G 175M 398G 0% 1.99x ONLINE -
Test #3 >>> tar -rf /dev/st1 /root/*
scst-mhvtl ~ # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
mhvtl 398G 253M 398G 0% 1.49x ONLINE -
Test #4 >>> tar -rf /dev/st2 /root/*
scst-mhvtl ~ # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
mhvtl 398G 232M 398G 0% 1.99x ONLINE -
Test #5 >>> tar -rf /dev/st1 /usr/x86_64-pc-linux-gnu/*
scst-mhvtl ~ # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
mhvtl 398G 235M 398G 0% 1.98x ONLINE -
Test #6 >>> tar -rf /dev/st2 /usr/x86_64-pc-linux-gnu/*
scst-mhvtl ~ # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
mhvtl 398G 235M 398G 0% 1.99x ONLINE -
So far so good as expected and hoped, now this:
Test #7
>>> tar -cvf /dev/st1 /var/log/* scst-mhvtl ~ # zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT mhvtl 398G 434M 398G 0% 1.00x ONLINE -
As you see we lost all deduped data, which is also expected.
Test #8
>>> tar -rf /dev/st2 /var/log/* scst-mhvtl ~ # zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT mhvtl 398G 666M 397G 0% 1.00x ONLINE -
Test #9
>>> tar -rf /dev/st1 /root/* scst-mhvtl ~ # zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT mhvtl 398G 838M 397G 0% 1.00x ONLINE -
Last two tests appended the same data as done previously on both tapes but in different order. Result is no dedup.
Conclusion:
As you can see, dedup is only achieved if we have a situation of an exact data across multiple volumes written to the same exact blocks or the same order on each tape ..
Make sense, right ! .. This is sequential tape and not random access disk
I found NetBackup in-line copy and Vault duplication work pretty good, but not so much with NetWorker cloning.
~nia
- nia
- Forum Founder
-
- Posts: 273
- Joined: Sat Dec 12, 2009 12:51 pm
- Location: USA
by rami766 » Sat Feb 27, 2010 1:52 am
This maybe very useful in some cases. I still think mhvtl should develop into doing native dedupe in the future which could be a lot better than what zfs-fuse can do right now.
Rami
- rami766
- Member
-
- Posts: 42
- Joined: Sat Aug 14, 2010 12:04 am
by markh794 » Sat Feb 27, 2010 6:28 am
By the time the data stream is cut & diced and within a SCSI command block, I feel the chances of finding another block the same will be slim.
It needs to be chopped & diced at the source (i.e. Reading the file(s) at the file system) rather than after <insert backup software here> has chopped & diced, added its bit to it and packaged it up for writing to the tape device.
Even if the backup software is tar or cpio.
If the 'de-duplication engine' could be signaled the start of each file and start the de-duplication at the start of the file, we might have some hope.
My 2c worth.
Cheers
Mark
- markh794
- MHVTL - Developer
-
- Posts: 101
- Joined: Sat Feb 20, 2010 6:30 pm
- Location: Sydney, Australia
by rami766 » Sat Feb 27, 2010 3:56 pm
It needs to be chopped & diced at the source
I don't know what kind of technology EMC® Data Domain® deduplication storage systems is using but they claim that it all happen on the target system, backup application does not know about any data being deduped.
Rami
Rami
- rami766
- Member
-
- Posts: 42
- Joined: Sat Aug 14, 2010 12:04 am
by herve » Sat Mar 06, 2010 8:34 am
But as soon as I dump little more files into one of the tapes, dedup is gone. 
Conclusion: ZFS dedup will not be practical solution for mhvtl use as the odd of duplicate data is highly unlikely.
I don't understand, ZFS is supposed to be block level dedup, not file level, adding data at he end of a tape shouldn't have impact
i use dedup with opensolaris, not fuse, and i am plainty happy with it
perhaps porting mhvtl on opensolaris would be a good idea  (robust file system with compression, dedup, robust and easy SAN acces with comstar)
- herve
- Registered
-
- Posts: 9
- Joined: Sat Mar 06, 2010 6:46 am
by nia » Sat Mar 06, 2010 2:54 pm
I don't understand, ZFS is supposed to be block level dedup, not file level, adding data at he end of a tape shouldn't have impact
I don't understand either, but not sure why it did when I was trying to get mhvtl tapes to dedupe.
-
CODE: SELECT ALL
i use dedup with opensolaris, not fuse, and i am plainty happy with it
I am not sure about the Linux version I am using but sure would like to see zfs-fuse dedupe released as stable in Linux so I can test again.
-
CODE: SELECT ALL
perhaps porting mhvtl on opensolaris would be a good idea
It has been talked about it several times in OpenSolaris forums but no project started yet.
As of right now, mhvtl is the only open source VTL on the Market.
We don't need to wait for OpenSolaris and COMSTAR, Linux + mhvtl will just do. 
~nia
- nia
- Forum Founder
-
- Posts: 273
- Joined: Sat Dec 12, 2009 12:51 pm
- Location: USA
by herve » Sun Mar 07, 2010 8:47 pm
there is two projects for a VTL on solaris
One from nexenta http://www.nexentastor.org/projects/vtape/repository
the second on git http://github.com/imp/stmfssd
but both are fare from beeing usable
You're right MHVTL is the only solution "closed to" a market solution
I am a bit sceptic on MHVTL + SCST, to many erros, and STGT seem's not to be a short term solution
i'll do an other test replacing IBM LTO with SDLT
We don't need to wait for OpenSolaris and COMSTAR, Linux + mhvtl will just do. 
did you try ZFS + comstar ?
- herve
- Registered
-
- Posts: 9
- Joined: Sat Mar 06, 2010 6:46 am
by nia » Sun Mar 07, 2010 10:08 pm
I am a bit sceptic on MHVTL + SCST, to many erros
I am surprised. I am actually very happy with it so far..I hardly have any issues. I have multiple systems connecting at the same time with no issues also .. Mine is the Gentoo setup.
did you try ZFS + comstar ?
Yes for disk. Tape+Changer is not supported yet.
~nia
|