A few weeks back, an undisclosed location had a technology waste “clean-up” which was definitely a golden opportunity for any tech-salvagist and word-inventor like myself. Interestingly, this salvage opportunity netted a few desktop computers, a sizable number of sticks of RAM, as well as many hard drives. Oh what fun!
Rather rarely seen, part of the haul included SCSI Ultra320 hard drives with 80-pin Single Connector Attachment (SCA) connectors. The haul included one 73Gb Seagate Cheetah 10K6 and five 73Gb Seagate Cheetah 10K7 drives. It seems likely that all of the drives were used in a RAID array of some sort in a server, as they were all housed in the same style of caddy when received. The Cheetah name was famed for their high performance, and the 10K7 series represents the last of the SCSI generation of drives. What memories!
As usual, it seems, discovering the hardware led me to a journey of discovery – there’s more than meets the eye when it comes to a few old drives.
Reading Out the Drives
Luckily for me, and unluckily for the former owner of the drives, I had an LSI Logic Ultra160 SCSI card and appropriate SCA adapter and LVD Wide SCSI cable and terminators. Out came my old recovery box with Linux to see what I could get out of them.
My hypothesis that the units were used in hardware RAID seems to be founded, as none of the drives had a recognizable file system. Secondly, attempting to read the drives actually ran into a lot of trouble, with all drives but one hitting numerous read errors. It seems the drives had seen better days. At the end of it all, it seems that two of the drives (the 10K6 and one 10K7) had damaged themselves so badly that they were unable to read their firmware and thus kept returning “not ready” with a capacity of 0 LBA and connected in Asynchronous Narrow SCSI. It seems the drives had seen much better days.
Instead of making a regular audio recording, I decided to use my magnetic pick-up system and record the sound of powering the drive on and off, and having it read and hit a read error. One thing you’ll notice is that the drive is very musical, with swept tones which are audible from the drive. This seems to serve an audible warning of trouble, as well as potentially being used to “shake” debris from the head or micro-step the heads across the target sector.
Because of the lost data, reconstructing the RAID array was not a possibility, so I instead tried to understand the drive’s internal state.
SMART for SCSI?
Not having really “lived” through the SCSI era practically, as it was typically an “expensive” bus, I was on the back-foot when it came to trying to learn about an obsolete form of connectivity. Luckily, I came across the pages for The Linux SCSI Generic (sg) Driver which taught me a lot about SCSI and what can be done, down to reading mode pages, sending raw commands and other nifty utilities for working with SCSI devices.
As it turns out, SCSI drives actually can support SMART style drive status information, which is known as Informational Exceptions (or IE for short). Unlike SMART for IDE/SATA drives which have semi-defined attributes, thresholds and values, SCSI drives tend to be quite variable in the information provided and its interpretation.
You can retrieve the SMART information using smartctl under Linux, issuing sudo smartctl -x /dev/sdx where /dev/sdx is the drive being checked. For example, the output from one of the drives is as follows:
It has some counters which show you the power on hours, time to the next SMART test, total written/read and correction invocations, as well as the non-medium error (e.g. cable error) count. The number of reallocated sectors is also visible in the “elements in grown defect list” line.
SCSI drives themselves have a lot of integrated intelligence, and as a result, can be reconfigured using both standardized and vendor specific commands. I found that you can find most of the information for a SCSI target using sginfo, for example, with sudo sginfo -A -d -fPhysical /dev/sdx, you can find out all of the supported mode pages, values and the defect lists of the drive as well. Pretty cool information which is normally “hidden” in regular ATA drives in the sense that the drives internally handle defects and shield them from the end user. Some sample output includes the physical device parameter mode pages (although the values are not necessarily sensible for zoned-bit-recording drives and may represent averages sometimes), and part of a defect listing.
I also found it was possible to instruct the drives to “low-level” format themselves again, and recertify their surfaces using sudo sg_format –format /dev/sdx. This allowed me to try and “recertify” the working drives for re-use and ensure all the data was destroyed.
The results of checking the drives show that only one drive passed all of the tests as good, with three drives having varying levels of reallocations (indicating unhealthy status), and two which were outright destroyed. All of the drives had been through 47,000 hours or thereabouts (5.365 years of continuous use) and were more than due for retirement.
The units were almost all provided by Cellnet, with the exception of the 10K6 unit which was unlabelled as to supplier, and was a refurbished drive, which is likely to have been field replaced due to a failure of a 10K7 unit, but Seagate may not have had any spare 10K7 units to send … so they settled for a 10K6 (slower, previous generation unit).
I thought it would be interesting to try and get a grasp of the spatial distribution of the physical defects on the drives to see whether there are special patterns. I imported the physical defect data into MATLAB to do some plotting (since it was faster than using Excel).
This plot is for serial number 3KT3XN2E, and seem to show some very unusual results. The grown errors for head 0 show a cluster near sector 0, in just one zone near the outer edge of the drive, which implies maybe some damage to the servo wedge in that zone. The grown defects on head 1 are concentrated in a ring, closer towards the end of the drive, spanning several tracks. The factory defects also show defects clustering around groups of tracks generally, with a few scattered errors.
Swapping the axis on one of the plots and zooming in (not necessarily the drive above) allows me to explain some interesting observations which may imply some axial error locality as well.
In order to improve the performance of drives under sequential workloads, when they finish one track and seek to the next, the sector 0 position of the next track is displaced (i.e. skewed) so that the head has time to settle on the next track before the sector 0 of the next track arrives.
If there is a defect at one position in the platter, then the sectors it affects in adjacent tracks are different due to the concept of cylinder skew. This is why you see the defective sectors seem to result in slanted lines of defective points.
The number of defective points suggests that the head may be “oscillating” above the track, as if it is “unstable” and may be bouncing up and down like a washboard road (which may result in further wear to the head and disk and eventually failure).
The defects for 3KT40S5C look rather different. No defects were grown on head 0, with some mostly track-localized defects on head 1 towards the outside zones. The factory defects however, seem to show scattered long length defects on head 0. Each platter and head combination seems to be different.
The grown defects for 3KT40T4K show the opposite trend to the drive above, with the grown defects mostly towards the inner radius, rather than the outer radius. Considering the drives are likely to have been in a RAID array and seen similar workload and “dwell” time over the tracks, the different resulting errors and wear is rather fascinating.
The plot for the healthiest drive, serial number 3KT418WA, showed no grown defects, but a more “sparse” factory error pattern, although somewhat more towards the outer zone. Maybe it is possible to predict which drives would have longer longevity based on the factory defect list alone, although this ignores the complexity of the drives when it comes to servo bursts etc.
Teardown: ST373307LC (Cheetah 10K6)
Seeing two of the drives are damaged beyond even mere recognition, I thought it would be good to tear them down. One thing that you will note upon picking up such high RPM drives is the sheer weight of the drive – this one tipped the scales at 706 grams with a thick heavy lid and tub, likely to dampen any vibrations.
The drive itself is a factory refurbished drive, as can be seen by the “Certified Repaired HDD” printing on the label and teal green border on the label. The drive features a spindle screw on the top lid as well, thus having the spindle “supported” at both ends.
The underside of the drive shows an LSI Logic SCSI controller which is a bit toasty and brown. A scattering of custom Seagate chips made by Agere can be seen, along with Samsung cache DRAM, ST Microelectronics flash memory for firmware and a Marvell controller which might be in control of the head actuator.
The sides of the drive showed machined/milled sides, with one hole covered by silver label used by the servo trackwriter at the factory. The double-plate thickness of the lid is clearly visible from the side profile.
The rear edge has the 80-pin SCA connector, but also shows some ribbing on the tub, likely to help ensure rigidity without using too much metal and acting as a heatsink. Some configuration jumpers are accessible from the front, and the serial numbers are also visible from the front.
No components are mounted on the rear of the PCB, of which is quite similar to the Barracuda 7200rpm SCSI drives of that era.
Underneath the PCB, sits a foam separator. The head stack connector is made by pins which physically protrude through the PCB to a connector. Underneath the foam, there was a hand-scribed HDA label, possibly from refurbishing, and further cut-outs for servo trackwriters can be seen under labels.
The underside of the top lid seems to have some springy metal to keep it in contact with the magnets possibly for grounding reasons.
The disk itself was hardly healthy. The platters were scored about the width of the head in a concentric ring, to the point that it had lost its polish. Maybe it spent a lot of time above that track and slowly eroded the surface of the disk, leading to an oscillating condition that continued to further erode the surface away to a rough sanded finish. The platters are smaller and lighter to reduce the spinning mass.
There is a recirculating air filter, as well as a breather hole filter near the magnets, which are fairly powerful large units to ensure quick seek times. The coil also features many turns to ensure it can produce the torque necessary to start and stop moving the heads quickly and (somewhat) loudly. Unusually, the wire is silver coloured rather than the more normal enameled copper wire brown-red.
The spindle clamp is a very different design, and I couldn’t actually undo it with the tools I had available.
There is a lot of evidence of concentric wear rings across the surface, especially towards the outside of the disk. More about this later on.
The head assembly seems to be able to cater for up to three platters by two heads, but the printed flexible cable only has connections for two platters. There is a silicon die head preamp mounted directly to the flexible cable.
The failure of the drive becomes immediately obvious upon close inspection of the head. It can be seen that the texture on the head slider has been worn away, leaving a trail of contamination. This spoils the aerodynamics of the head, which is unable to maintain a proper flying height.
Without the right fly height, it is more prone to head crashes and is not able to reliably read or write from the surface. Worse still, the liberated contamination will settle on, or fly into the head causing more surface damage over time. The head itself has four connections – two for read, two for write with no active fly height control (head heaters).
This is the cause of the concentric rings on the drive – obviously running ddrescue on this drive wasn’t necessarily wise, as it would have concentrated some time reading the “abraded” band on the platter, causing further damage to the head, which then caused further surface damage. Luckily the data on this drive was nothing more than a curiosity I could do without.
Teardown: ST373207LC (Cheetah 10K7)
This was the last generation of Cheetah SCSI drives from Seagate, which seemed a little unusual given the lower model number. The drive does have a lot more “modern” touches in common with the IDE drives of that era.
The newer drive is lighter, weighing in at 646 grams, as it features just one platter rather than two. Further to that, the lid is not as sturdy and thick, and the spindle is no longer secured to the lid.
The motor driver is a STMicroelectronics Smooth branded unit, and the SCSI interface is still an LSI Logic unit. There is still a firmware flash from STMicroelectronics and what appears to be a Marvell controller in charge of the actuator. No Agere chips are visible. The motor connection is a more modern “side contact” style.
The drive retains a milled sides, and a servo trackwriter access hole sealed by a thick aluminium tape.
A different, solid SCA connector is featured, along with the Cellnet label from the system provider. The unit doesn’t have any ribbing for heat sinking or material reduction. The serial number barcode is on the front end.
No components were mounted on the underside of the PCB, but it is noted that the head stack assembly uses a bed-of-nails style connection along with the motor, rather than using pins as the 10K6 did.
As before, there is a foam separator, and labelled covered access for a servo trackwriter to drive the head-arm. The actual cut is three slots for accessing three holes on the head-arm.
The cover features the same “contact” arrangement, although the breather hole filter and recirculating filter (of which there seem to be two) are adhered to the cover instead of being in the drive tub.
Like more modern drives, the spindle clamp on this drive is a torx screw type which can be undone. This drive shows a clear sign of distress – a scratch on the outer diameter tracks.
On the whole, it doesn’t look anywhere near as bad as the one above. There is provision for three platters, as above.
This one has a copper-coloured coil, as expected.
With the head arms removed, it seems that four platters could be accomodated, rather than three! The printed wire assembly only has mounting for two platters (i.e. four heads).
The texturing on the slider is present, and it seems much less worn. The head itself has five bonded terminals, although one appears to be not-connected.
While there’s not as many signs of distress, this drive also seems to have a failure which prevents it from loading its configuration and firmware from disk (the flash firmware is merely a bootloader) and thus it never becomes ready for use. I suppose it could have been possible to opt for fewer, larger disks, but that might not have met the organization’s need for speed or reliability.
The salvaged drives were quite unhealthy, and were on their last legs. They were a musical lot! It seems that the SCSI drives have some intelligence in them, which is nice, but the intelligence is also a little cruder than that of their ATA counterparts which is both good and bad. On the upside, we can get the defect lists “raw” from the drive, but on the downside, the health statistics that we get are not as detailed.
It was possible to do some crude analysis on the defect patterns on the media, which seemed to show each drive to be “an individual”, although these were well past their lifetime and were on the way out.
The severely damaged drives were taken apart, and their internals analyzed where physical drive trauma leading to failure was noted. It seems that drives can physically wear out their heads and platters due to contact which is only worsened as the damage causes problems with maintaining fly height. These older drives did not have any evidence of active fly height control, which may be of limited benefit.
This represented the end of Parallel SCSI drives from Seagate, which is to say, the end of an era.