Desktop drives are significantly less expensive than NAS Drives or Enterprise. Can they be used when building a budget RAID array?
They can, however it’s important to understand how desktop drives will increase required maintenance, downtime, expense, and risk of data loss. The biggest difference between desktop drives vs NAS drives or Enterprise drives is in how they communicate with the RAID controller during write operations.
A write operation in a RAID array begins with the controller sending data to the drive to write. The drive then sends a reply stating that the write request was received. Most controllers then wait 7 seconds to receive a reply from the drive that write operation has been completed successfully. If the drive replies with a success message within that 7 second period, the controller sends the next piece of data. If the drive sends a ‘fail’ message, the controller will send the write request again. However, if the drive sends no reply within the 7 seconds, the controller assumes the drive is unplugged or has mechanically failed and drops the drive from the array.
This is the biggest problem in using desktop drives. When a piece of data is sent to a desktop drive for a write, that drive will not send a reply back to the controller until the data is written successfully. As a drive ages, sectors gradually go bad and can require many write attempts to complete, taking longer than the 7 second wait. This causes the controller to drop the desktop drive offline, which places the array into recovery/rebuild mode.
Most of the time, the desktop drive will reply to the controller 10-15 seconds later, after the write operation is complete. By this time however, the controller has already dropped the drive from the array and now thinks it’s a new drive.Thus begins the long process of re-syncing or rebuilding the entire array, just as if the drive had mechanically failed and a new one was installed in it’s place. The data on that drive must now be rebuilt.
The danger for data loss increases exponentially in this case if the entire array is desktop drives, since an array rebuild/re-sync places a much higher load on the rest of the drives. The chances of another desktop drive dropping from the array is significantly higher during a high load period, which could cause a total loss of the array and all data.
When our data center was built 3 years ago, my team tested this out by building two RAID 60 storage servers. The first server contained (24) 3TB Seagate Barracuda desktop drives, and the second (24) 4TB Western Digital desktop drives. Each RAID 60 array consisted of two (8) drive RAID 6 arrays striped together. We tested several different Hardware RAID cards (Adaptec, Intel, LSI) and several software raid solutions (Server 2008 R2, Windows 7, FreeNAS Raid Z).
While Windows RAID failed significantly more often, the results were fairly consistent. 9 times out of 10 when a desktop drive showed as “Failed” and was dropped from the array, it popped back online less than 90 seconds later as a new drive, and the array rebuild process began automatically, as if it were a brand new replacement drive. While we never experienced a total array failure, on two occasions 4 desktop drives dropped from the same RAID 60 array within a two hour period (Our Raid 60 arrays can experience 4 drive failures without crashing). On many other occasions 2-3 drives “failed” (and then came back online). Rebuilds on these servers typically took 8-12 hours per drive, so multiple failures can place a heavy load on all drives in the array for days at a time. It’s also important to note that these were test servers, not production servers, so even during testing they were under rather light loads, with less than 5 testers using a server at the same time.
This is where NAS drives and Enterprise drives provide an additional huge layer of protection; upon receiving the write request, they send a message back to the controller every 3-5 seconds until the write operation is completed, regardless of how long the operation takes to complete. This prevents the controller from assuming the drive has failed and dropping it from the array. Enterprise drives offer a couple more layers of protection that are often not necessary, except in servers with a large number of drives and an exceptionally heavy data I/O load (like terabyte size databases that run thousands of transactions per second).
As a final note, our testing found this issue happening significantly more when Windows software mirroring or software RAID 5 is used- again though, only with desktop drives. NAS and Enterprise drives rarely seem to have this issue with hardware or software RAID. If you need to mirror two drives for data protection (or setup a small RAID 5 array) and are unable to purchase NAS drives, purchasing a $20 RAID card (like these) will give you a much higher reliability with desktop drives than using Windows software raid.
Leave a Reply