MHVTL Throughput

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

MHVTL Throughput

chitvan
Hello Sir,

I have been using the mhVTL with CA Arcserve 16.5 for around 2 months now.Every thing going  perfect(many thanks) except throughput.

I am only getting 20 Mbps although the link between my source and mhVTL is a one Gig link which is way low when compared to a physical VTL(very high speeds) .I do understand that mhVTL is not for production and not throughput oriented.
Also i have referred  posts like http://mhvtl-a-linux-virtual-tape-library.966029.n3.nabble.com/throughput-for-small-block-sizes-and-usleep-td4024996.html
and accordingly reduced my backoff value from default 400 to 40 but not much speed improvements.
Please suggest any workaround of any form to increase the throughput.Also while going through the code i saw(in vtllibrary.c) :

default:
                                MHVTL_LOG("ioctl(0x%x) returned %d",
                                                VTL_POLL_AND_GET_HEADER, ret);
                                sleep(1);
                                break;
Is above sleep(1) could be a performance degrader.

Please suggest for throughput ASAP

With Best regards,
Chitvan Chhabra
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

chitvan
please suggest where should i focus on performance on mhvtl code or iSCSI performance?
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

Mark Harvey
Administrator
In reply to this post by chitvan
chitvan wrote
Hello Sir,

I have been using the mhVTL with CA Arcserve 16.5 for around 2 months now.Every thing going  perfect(many thanks) except throughput.
Thanks for the kudos

I am only getting 20 Mbps although the link between my source and mhVTL is a one Gig link which is way low when compared to a physical VTL(very high speeds) .I do understand that mhVTL is not for production and not throughput oriented.

Also i have referred  posts like http://mhvtl-a-linux-virtual-tape-library.966029.n3.nabble.com/throughput-for-small-block-sizes-and-usleep-td4024996.html
and accordingly reduced my backoff value from default 400 to 40 but not much speed improvements.
The backoff value is the sleep time (uSeconds) between each poll looking for more work.
Without the syslog containing the loop count, it's a little hard to offer any suggestions.
Please provide the backoff value being used & the /var/log/messages (default syslog on most linux) with the mhVTL logging at verbose 1 (higher verbosity just hurts performance)

Please suggest any workaround of any form to increase the throughput.Also while going through the code i saw(in vtllibrary.c) :

default:
                                MHVTL_LOG("ioctl(0x%x) returned %d",
                                                VTL_POLL_AND_GET_HEADER, ret);
                                sleep(1);
                                break;
Is above sleep(1) could be a performance degrader.
Not sure what you mean here.. Without the syslog, I do no understand if you are receiving the 'ioctl(0xnnn) returned x' messages in your logs (I hope you are not as it means you have unknown ioctl() being returned to the user-space daemon.)
The sleep(1) is just so it doesn't chew up all CPU in an endless loop if this occurs. I've never seen this message occur - but the code is there to be better safe than sorry.

In summary:
Please re-run your tests with VERBOSE = 1 set in /etc/mhvtl/mhvtl.conf
Please provide the /etc/mhvtl/device.conf
Please provide the syslog (typically /var/log/messages) and the start/end times of your tests (so I know which entries in the syslog to review).

An FYI of what to look for:
Jun 14 11:12:43 mhnbu75 vtltape[6419]: CDB (2006) (delay 110005): 1d 00 00 00 00 00
Jun 14 11:12:43 mhnbu75 vtltape[6419]: spc_send_diagnostics(): SEND DIAGNOSTICS (2006) **
Jun 14 11:12:43 mhnbu75 vtltape[6419]: CDB (2007) (delay 805): 00 00 00 00 00 00
Jun 14 11:12:43 mhnbu75 vtltape[6419]: return_sense(): [Key/ASC/ASCQ] [02 3a 00]
Jun 14 11:12:43 mhnbu75 vtltape[6419]: ssc_tur(): Test Unit Ready (2007) ** : No, No tape loaded
Jun 14 11:12:43 mhnbu75 vtltape[6405]: CDB (2008) (delay 109605): 1d 00 00 00 00 00
Jun 14 11:12:43 mhnbu75 vtltape[6405]: spc_send_diagnostics(): SEND DIAGNOSTICS (2008) **
The 'delay' value is the total loop count between receiving this SCSI op code and the previous one.
So, in this example the 'send diagnostics' (SCSI command #2006) was 110005 uSec from previous OP code.
Command serial num 2007 (Test Unit Ready) occurred after 805 uSec, then there was a huge delay of 109605 before the following Send Diagnostics was received by the vtl.

If you are seeing large delay values, the performance issue is *not* at the VTL but it is spinning it's heals waiting on commands to arrive.
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

chitvan
Many thanks for your reply sir.

Please find below the information requested:

device.conf: https://drive.google.com/file/d/0B-DKBAcZ7TNRbmZYbWJ2Y0xIbDQ/view?usp=sharing
 Message  :https://drive.google.com/file/d/0B-DKBAcZ7TNRRENYNXoyYVdwOG8/view?usp=sharing

Start Time: Thu 23 Jun 2016 23:52:56
End Time Thu 23 Jun 2016 23:59:21

As per my understanding delay is around on an average 20000.PLease suggest what can be done for increasing the throughput.

Best Regards,
Chitvan Chhabra
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

Mark Harvey
Administrator
Jun 22 16:22:23 localhost vtllibrary[16547]: CDB (2080449) (delay 46805): 12 00 00 00 38 00
Jun 22 16:22:23 localhost vtllibrary[16547]: spc_inquiry(): INQUIRY ** (2080449)
Jun 22 16:22:23 localhost vtllibrary[16547]: CDB (2080453) (delay 405): 1a 08 1d 00 88 00
Jun 22 16:22:23 localhost vtllibrary[16547]: spc_mode_sense(): MODE SENSE 6 (2080453) **
Jun 22 16:22:23 localhost vtllibrary[16547]: CDB (2080455) (delay 805): b8 12 10 00 00 1f 00 00 0d f0 00 00
Jun 22 16:22:23 localhost vtllibrary[16547]: smc_read_element_status(): READ ELEMENT STATUS (2080455) **
Jun 22 16:22:23 localhost vtllibrary[16547]: CDB (2080456) (delay 405): b8 13 00 10 00 05 00 00 0d f0 00 00
Jun 22 16:22:23 localhost vtllibrary[16547]: smc_read_element_status(): READ ELEMENT STATUS (2080456) **
Jun 22 16:22:23 localhost vtllibrary[16547]: CDB (2080457) (delay 405): b8 14 01 00 00 05 00 00 0d f0 00 00
Jun 22 16:22:23 localhost vtllibrary[16547]: smc_read_element_status(): READ ELEMENT STATUS (2080457) **
Jun 22 16:22:23 localhost vtllibrary[16547]: CDB (2080458) (delay 805): b8 11 00 01 00 01 00 00 0d f0 00 00
Jun 22 16:22:23 localhost vtllibrary[16547]: smc_read_element_status(): READ ELEMENT STATUS (2080458) **
So, it seems the initiator can successfully send commands in a reasonable amount of time.
Here, we see 5 commands sent to the vtllibrary daemon with 405 / 805 uS delay between each command.
And this is during activity of data being written to the tape devices.

Yet, when we look at performance between each write command we see horrible performance (high delays)
Jun 22 16:22:29 localhost vtltape[16510]: processCommand(): 15300th contiguous WRITE_6 request (2081089) (delay 149450)
Jun 22 16:22:29 localhost vtltape[16510]: processCommand(): 15350th contiguous WRITE_6 request (2081139) (delay 139850)
Jun 22 16:22:30 localhost vtltape[16510]: processCommand(): 15400th contiguous WRITE_6 request (2081189) (delay 137050)
Jun 22 16:22:30 localhost vtltape[16510]: processCommand(): 15450th contiguous WRITE_6 request (2081239) (delay 137050)
Jun 22 16:22:31 localhost vtltape[16510]: processCommand(): 15500th contiguous WRITE_6 request (2081289) (delay 140250)
Delay of - lets call in 140000 average per 50 write commands: That is an average count of 2800 between each command. We've seen the initiator is capable of issuing commands with a count of 405 between op-codes. There is a 10x performance improvement to be gained..

As the initiator can send commands to the tape library in quick succession, it does not make a lot of sense to look at the SCSI transport - otherwise all op-codes would have large delays.
I feel the best effort would be to look at the initiator and the backup application tuning to see what you can achieve there.

There is nothing wrong here - it is just basic performance touble-shooting..

Try using something like 'dd' from the initiator to tape and see if you still get the same lack of performance. This might help narrow it down to the SCSI transport vs the actual backup application.

Try using 'dd' on the linux host directly to the tape device and review the 'delay' values:

Note: You need to measure apples-with-apples. So test the *same* block size your backup application is using with the 'dd' command.

Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

Mark Harvey
Administrator
In reply to this post by chitvan
Just reviewed the device.conf and noticed this defined for the location of the 'virtual media' home:
 Home directory: /mnt/iscsi/vtl

When you described the mhVTL and using iSCSI, I assume you are exporting the mhVTL devices via iSCSI target to the initiator.

However, it is common to use the /mnt as a 'generic' mount point.

You wouldn't be storing the virtual media data files on a file system via an iSCSI device ?

This could explain the significant differences in 'delay' loops between the vtllibrary (where the data is being generated from memory structures) vs data being written to a filesystem/disk via an iSCSI transport..

If you are using an iSCSI mounted disk to store the vtl media - please check the disk performance using one of the many tools available and confirm the throughput is reasonable. (e.g. A USB-2 portable HDD is capable of ~30MBytes / sec - typically 20MBytes/sec). USB-3 is much higher (typically upwards of 100 - 200MBytes / sec)..
It would be interesting to see what the I/O performance and I/O wait times are during a simple 'dd' of data to a file sitting on the /mnt/iscsi/vtl/ file system.
You always want to write at least 2x amount of ram in the host - otherwise native Linux caching will skew the results.
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

chitvan
Sir,
Many thanks for your analysis.

Summary of Analysis of Log files:

Performance of Disk on which Tape Media is residing:  dd if=/dev/sda1  of=/dev/st0  bs=65536  gave result(and mhVTL running) about 230 Mb/s(Netapp Filer).So no issues from disk peformance point of view.
(Apolgies for misguiding path of /mnt/iscsi/vtl) Although the name suggests iscsi but FC is used instead of iscsi.It was named iscsi because earlier no FC infra was available.


Block Size size is taken 65536 because that is the block size in which CA Arcserve is writing.This can be confirmed from the logs below .Around 5.5 Gb of data is written and total of around 93000 write requests have ben made clearly pointing to the fact that data was written in 65536 bytes block size.

With Backoff Value as 40:


For ease in readility i have extracted the subset of the log of the above port of the according to  time mentioned in previous log when back offf value of 40 was in effect.Link is :https://drive.google.com/file/d/0B-DKBAcZ7TNRMXR2VjRCR3Z2azQ/view?usp=sharing
Summary of Analysis of above log: Total time spent in delays(as per my understanding wastefull sleeps during 50 writes) per 50 writes is around on an average 20000 usecs.So in total of 37 secs(for 93000 writes around 1800 sets of 50 writes) .For op codes i can see values like 45,85 and some big values like 225,315,28605
,29485 etc

Summary with Back off value 10
Log with 10 backoff value:https://drive.google.com/file/d/0B-DKBAcZ7TNRVzFxZDFqRjhuUDQ/view?usp=sharing

Summary of Analysis of above log. Total time spent in delays(as per my understanding wastefull sleeps) per 50 writes is around on an average 10000 usecs. So in total of 18 secs.For op codes i can see values like 85.75 and big values like 6305 etc.

Total Time takes in both back values is around  is around 5 or 4 minutes or so  .So reducing back off value didnot reduce my overall time in big way .

Please suggest what should i look for now ,my disk at least is not the bottle neck (Directly considering 230 Mb/s it should be able to write 5.5 Gb of data in around 25 secs ).Where should i put my efforts now and how.

Eagerly Awaiting your analysis.

Thanks and Regards,
Chitvan Chhabra

 
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

chitvan
Sir,

Any chance to review the logs?

Thanks and Regards,
Chitvan Chhabra
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

chitvan
Sir,

I have also came across with code like:
In write_filemarks of vtlcart.c
        fsync(datafile);
        fsync(indxfile);
        fsync(metafile);

If this is called for each block of data (i.e 65536) then this would definitely slow down the disk  operation as buffers have to be written to disk.

If that the case , can this code be commented (although at the cost of force-flush guarantee)

Thanks and Regards,
Chitvan Chhabra
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

chitvan
Or  better is there any way to change the block size of tape.I am aware that what changes need to be done at CA Arcserve Side for changing the block size but not sure at the Linux side.I tried running  mt -f /dev/st0 defblksize 131072  but every time i reload the tape by running the inventory(via CA Arcserve) it again reverts back to 65536(checked in GUI by operator tab and then drive status).Anytime when tape is loaded in drive ,drive status in operator of GUI  always  shows block size of 65536.If we are able to successfully change the block size we would be in a position to increase the performance by  multiple times without any change in code.
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

Mark Harvey
Administrator
In reply to this post by chitvan
chitvan wrote
I have also came across with code like:
In write_filemarks of vtlcart.c
        fsync(datafile);
        fsync(indxfile);
        fsync(metafile);

If this is called for each block of data (i.e 65536) then this would definitely slow down the disk  operation as buffers have to be written to disk.
The SCSI SSC standards state that apon receiving WRITE_FILEMARK OP code, all cached data should be written to persistent storage. While you could remove the fsync() calls during the write_filemarks() function call, you will break that attempt to confirm to the standard.


Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

Mark Harvey
Administrator
In reply to this post by chitvan
chitvan wrote
Performance of Disk on which Tape Media is residing:  dd if=/dev/sda1  of=/dev/st0  bs=65536  gave result(and mhVTL running) about 230 Mb/s(Netapp Filer).So no issues from disk peformance point of view.
(Apolgies for misguiding path of /mnt/iscsi/vtl) Although the name suggests iscsi but FC is used instead of iscsi.It was named iscsi because earlier no FC infra was available.
Hmmm... 'dd' to the /dev/st0 using a 64k block size results in about 230Mb/sec - I assume you took the values from 'dd' - and this is in MBytes /sec and not Mbits / sec

With Backoff Value as 40:
The use of the delay as a 'uSecond' value is incorrect - sorry.
The actual code is as follows:
            case VTL_IDLE:
                usleep(sleep_time);

                /* While nothing to do, increase
                 * time we sleep before polling again.
                 */
                if (sleep_time < 1000000)
                    sleep_time += backoff;

It is the 'sleep_time' that is recorded in logs.
i.e. The backoff value is added to the sleep_time every loop there is nothing to do.

So if you have a backoff value of 40, the following will occur:

The 2nd time through the poll loop, 40 will be added to the initial sleep_time (MIN_SLEEP_TIME) of 5.
i.e. 45, and a usleep(45) will occur.
Then it will be 85, 125, 165 etc...
Any 'sleep' values greater than 1,000,000 (1 second) will not increase.
To clock up 'sleep' values in the 100 thousand range means the loop (waiting on another OP code to arrive from the initiator) occurred many times.

Note: The initial idea of the 'backoff' time was if say on average the initiator sent an op code every 100uS. If the backoff value checked at say 95uS and then added another 90us, the overall delay before the op code was processed would be 185uS - adding 85uSec delay per OP code - or close to 80% additional delay.. In this example, changing the backoff value to say 100 would result in almost no additional poll delay.. i.e. The first poll at usleep(5), the next after 105 uSec.

The 'downside' to making the increment small is the additional CPU / ioctl() call overhead every loop..

HOWEVER you case is quite different. The values are quite large - indicating the delay is in getting commands to the mhVTL user-space daemon.. Be it backup application, SCSI transport (iSCSI in this instance) or some other factor.
The fact you are seeing 230MB/Sec using the 'dd' command also backs this summation up.

Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

chitvan
Sir,

Even if Backup application may be slow in sending op code but increasing block size would certainly help in above scenario.Pls suggest how  can we achieve this at linux level.

Thanks and Regards,
Chitvan Chhabra
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

chitvan
Dear Sir,
Increasing block size at both ends that is at ca arcserve level and at linux level would lesson the opcodes sent for writing the data and hence delay.Also increase in block size would further optimize disk writing.Please suggest how can i change the block size in a persistable way.

Eagerly awaiting your reply.

Thanks and Regards,
Chitvan Chhabra
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

chitvan
Finally able to get the block sizes correct (256 K(from 64 K)) and can see twice performance improvement(of Backup) in my VM test environment of laptop.(Not sure about restore performance).Will test the same in Actual environment tomorrow and post the same.Hopefully it would be same there as well.
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

chitvan
Performance of Backup did improve twice in actual environment as well  although performance is still 50% the desired level(when compared to actual physical virtual Tape library).Working on optimizations of the same.Would post if any progress in that.

Thanks and Regards,
Chitvan Chhabra  
Reply | Threaded
Open this post in threaded view
|

Re: MHVTL Throughput

chitvan
Sir,

Kind of badly stuck here.Please suggest what can be done to optimize iSCSI or TCP/IP(or something else) for more performance(for remaining 50% performance).Please suggest how to find the bottle neck.

Thanks and Regards,
Chitvan Chhabra