[Disksim-users] Serious problems using dixtrac extracted parameters

Sat Jul 6 03:43:27 EDT 2013

Hi Anjo,

Thanks for getting back to me. I used the maxtor300g/maxtor146g models
that were released and I did look at the pdfs from the validation
stage, which do display a good correlation.

However! When you feed a completely sequential load into disksim, the
performance is on par with a completely random load, which absolutely
does not make sense.

I tracked down the issue to the disk_buffer_sector_done function
inside disksim_diskctlr.c
There are some useful printf() lines in there. What I explain was
observed with a trace containing two sequential reads with maxtor300g
* it takes a single sector for the bcount (a field which is used as a
state identifier in this function!) to go to 1. As a result, it loses
the opportunity to succeed this condition:

  if(curr->bcount == 1) {
	if(currblkno == firstblkno) {

As a result, it misses the opportunity to finish the second request
immediately and does a complete seek around the track to start the
second request. Which is pretty terrible.

As the code in this function is really complicated, a simple fix will
not work (I tried and started tripping asserts). The only way that
works for me (as I am making use of syssim) is to intentionally trim
the requests (provided they are large enough) by one or two sectors,
allowing the disk model to successfully start doing the next request
immediately. This workaround allowed me to get a 4-5x speed up in disk
model performance.

I did a comparison with one of the included disks (hp_c2490a). This
model appears to be much less detailed, anyways, apparently bcount
remains in value 1 for both the disksim requests, so I didn't observe
this bug with it!

* I only watched the outputs from maxtor300g but I suspect it is the
same for the rest of Dixtrac-generated models, because they have the
same issue with disk performance.

Regards,
-Tarihi

On Mon, Jun 17, 2013 at 2:43 PM, Anjo Vahldiek <vahldiek at mpi-sws.org> wrote:
> Hi,
>
> In general it is difficult to reason about differences without knowing how
> you issued the requests to disks (e.g. O_direct, by SCSI cmd, flushing
> caches, ...) and which exact models you used.
>
> My suggest is that you look at the validation step of dixtrac. At the end of
> dixtrac it runs a validation workload and produces 4 pdfs showing the
> difference between simulated & real device. (called mixed(.hist).pdf and
> random(.hist).pdf which are part of the tar) Especially the mixed workload
> in case of 146g shows a significant difference. The 300g model shows less
> differences.  (not sure which of the two you used) You should be able to use
> the code part to prodcude the same pdfs using your disks instead of ours.
>
> Hopefully this helps.
>
> Thanks,
> Anjo
>
>
> On Mon 17 Jun 2013 09:39:58 AM CEST, Mujtaba Tarihi wrote:
>>
>> Dear all,
>>
>> I have been attempting to make use of some parameters extracted via
>> dixtrac, such as the ones graciously provided by Anjo Vahldiek.
>> However, I have come across very surprising results, performance-wise.
>> I generated a random trace of aligned 16KB read requests and a
>> sequential trace of 16KB back-to-back read requests.
>> This is how long it took to run each trace (it is in milliseconds)
>> For the record, are the results obtained with the 2GB HP C2490A which
>> is supplied along with Disksim:
>> Random: 2232995.378781
>> Sequential: 624599.551791
>>
>> To make the comparison (somewhat) fair, I had to use the same random
>> trace for the larger drives:
>>
>> And these are the results obtained with the 300GB Maxtor drive:
>> Random: 237932.073905
>> Sequential: 619900.348468
>>
>> And these are the results obtained with the 146GB Maxtor drive:
>> Random: 345077.024293
>> Sequential: 329766.817393
>>
>> And these results are from the 146GB Seagate drive:
>> Random: 209403.240946
>> Sequential: 423689.049873
>>
>> Basically, with the exception of *slight* difference in case of the
>> 146GB maxtor file, the sequential trace takes longer to run, nearly
>> 2.5x times in the worst case!
>>
>> I just copied the files in the tar.gz files and used them and
>> layout.mappings is not necessary.
>>
>> My guess is that the model generated by the tool has issues with the
>> modelling firmware behavior, something like the scheduling algorithm,
>> or maybe it is botching up the mappings?
>>
>> Any help with be more than welcome :)
>>
>> Regards,
>> -Tarihi
>> _______________________________________________
>> Disksim-users mailing list
>> Disksim-users at ece.cmu.edu
>> https://sos.ece.cmu.edu/mailman/listinfo/disksim-users