Raid 5 Data Recovery

Raid 5 Data Recovery

Raid 5 data recovery is one of the most demanded services for the reason that level 5 RAID-arrays are keeping the leading positions in terms of popularity among owners of servers and data storages. Advantages of such arrays include: increased performance, fault-tolerance, and relatively low cost of disk space.

RAID 5 is commonly overrated to the level where it is considered the data storage, working and backup server all at once. Such opinion is a misconception, and may lead to further problems.

RAID 5 – is the array of 3 of more disks united in a single system, which writes the data to all disks in blocks. Along with the data, checksums calculated with a certain algorithm are written in a cyclic manner homogenously across all disks, providing RAID 5 fault-tolerance.

Everyone familiar with the basics of mathematical logics surely remember the simple operation called “addition modulo 2” (“exclusive or”, XOR). This very operation is the basis of checksum calculation algorithm, and it allows on-the-fly recalculation of the contents of any disk missing in array, using only the data and checksums stored on the rest of disks.

The total disk space dedicated to redundancy is equal to the capacity of one RAID5 disk. Correspondingly, the remaining disk space available for storage will be equal to (n-1)*V, where n is the disk count in array, and V is the capacity of the array’s smallest disk in GB.

Such system is implemented either with the use of RAID-controller (this solution is called hardware RAID 5), or by means of operating system’s tools, i.e. software RAID.

Level 5 RAID arrays can be different in several aspects:

  • Disk count and capacity
  • Disk order in array
  • Array block size
  • Data block writing algorithm (structure)
  • Checksum “pattern”
  • Presence/absence of shifts and discontinuities in data writes
  • Presence/absence of RAID’s internal information areas

The procedure of raid 5 data recovery:

  • The physical condition of disks is tested, and the set of acceptable operations is determined.
  • Good disks are attached to the system bypassing any RAID-controllers, and excluding any possibilities of content modification.

Each disk’s content is analyzed in a HEX editor. Crucial points are located (MBR, boot-sectors, file system headers, partition boundaries), which allow identifying and confirming the information regarding the composition and level of RAID array in question.

As soon as we visualize the general picture of the puzzle, we proceed to assembling it. Of course, mosaic is best reconstructed from the structured pieces of a certain size. This can be a file allocation table or the data of particular types. The more experienced specialist is the more chances he has to successfully locate disk areas useful for analysis. At this stage it is possible to determine data block size, data structure, as well as checksum “pattern”. Also, shifts and discontinuities in the data stored on HDDs can be figured out.

Next, we find out whether the contents of operable disks are sufficient. If not, required missing parts are recovered. It is important to create the full image including every disk sector at this stage.

If we now have all the data on our hands, we can proceed to software assembling. That is, RAID-controller operation with the correct algorithm on the correct disk array is emulated by software tools. We arrange the disks in the correct order (while detaching unnecessary ones), and then adjust all parameters of the algorithm. It is not principal what soft exactly is used for this purpose.

Even if we get back (or recover) all the disks in array to normal operation, it is possible that we can still be unable to access particular blocks of data. As a rule, such discontinuities have cyclical nature and are caused by one or more incorrect procedures such as initialization, RAID rebuilds and so on. This is likely due to unqualified attempts of hardware RAID data recovery or “inconsistent” operation of the equipment in abnormal situations. Still, in many cases inaccessible blocks can be recovered by means of recalculation based on a certain set of other disks.