Ubuntu: How can I physically identify a single drive in a RAID array?


I have an external drive bay with 4 eSATA disks in it. My system has a 4-port eSATA card, as well as a pair of internal hardware RAID1 drives. The external drives are in software RAID1 pairs as /dev/md0 and /dev/md1. Both have been configured as LVM physical volumes to create my storagevg LVM volume group. Recently, a single drive went offline (I suspect cables), but there does not seem to be a good way to physically identify which drive I need to check, especially since initialization order isn't the same between boots. How can I find the disk needing attention?


Disk Utility (sitting in System -> Administration) will give you the serial numbers for all your disks.

Here's what I see (look at the top-right for the serial). You'll notice that this drive is within a mdadm RAID array. Disk Utility can penetrate the array for raw disk access.

Disk Utility

I have 6 of the same model of disk in my PC so I drew a little diagram showing their position in the case and the serial number so I can locate them quickly on serial in an emergency.

The opposite is also true in that if a disk dies, I just need to find which disks are showing up and I can eliminate them until I know which serial is missing.

Edit: I'm trying to improve my bash-fu so I wrote this command line version to just give you a list of disk serial numbers that are current in your machine. fdisk may chuck out some errors but that doesn't taint the list:

for disk in `sudo fdisk -l | grep -Eo '(/dev/[sh]d[a-z]):' | sed -E 's/://'`;  do      sudo hdparm -i $disk | grep -Eo 'SerialNo=.*' | sed -E 's/SerialNo=//';  done  

(And you can crumble that into one line if you need to - I've broken it up for readability)

Edit 2: ls /dev/disk/by-id/ is somewhat easier ;)


If you have trouble matching the drive serial number or port indication with your disks' spatial locations, you can run cat /dev/sdz >/dev/null (where sdz is the failed drive) and locate the drive by its LED (or by ear if you aren't in a noisy server room). If the drive won't even power up, that should be enough to tell which one it is. Be sure to put a visible label on the disks for next time.


The info that udisks gives (either on the commandline or in the GNOME Disk Utility) includes the disk serial number. On the disks I have, the serial number is printed on the upper side and on the front side (the one on the other side of the one that contains the connectors), both as numbers and with a barcode. Unfortunately, most PC cases make it impossible to read those serials without pulling the disk out...

You can also find the serial numbers in /dev/disk/by-id/.

As your disk is off-line, I assume it isn't "seen" by the kernel currently? In that case, you might have to go by elimination: you want the disk with a serial number that is not listed...


With software raid this is a common issue. Hardware raids tend to have a feature that allows you to blink the LED associated with a drive, assuming that your hardware supports that.

But with software RAID each drive has some unique metadata. Which you can read it from each drive using the command mdadm -E /dev/sda1 for each drive in the the array, modifying the devices to match your environment. So if you have a situation where a drive is giving you problems and is currently offline. I would run this on each drive that is online, recording the minor number for each drive. Then using a Live CD that supports MD, system rescue cd is a good one, with only one drive at a time connected and running this command to find the culprit. This probably isn't as straight forward as you'd like but it should work.



  $ lsscsi -l  [0:0:0:0]    disk    ATA      TOSHIBA THNS128G AGLA  /dev/sda    state=running queue_depth=1 scsi_level=6 type=0 device_blocked=0 timeout=30  [1:0:0:0]    cd/dvd  HL-DT-ST DVDRAM GT30N     LT09  /dev/sr0    state=running queue_depth=1 scsi_level=6 type=5 device_blocked=0 timeout=30  

if the disk is not in state running, that's a pretty good sign. So /proc/mdstat will tell you which member failed. Assuming you don't have a nice drive cage you'll have to drill down by serial number, sg_inq should help with that.

If you do have a good drive cage, you should be able to enable the disk beacon to help identify the faulty member.



To get the serial codes of all harddisks run:

lsblk -i -o kname,mountpoint,fstype,size,maj:min,name,state,rm,rota,ro,type,label,model,serial      KNAME MOUNTPOINT   FSTYPE   SIZE MAJ:MIN NAME   STATE   RM ROTA RO TYPE LABEL         MODEL            SERIAL  sda                         3.7T   8:0   sda    running  0    1  0 disk               WDC WD4000F9YZ-0 WD-WCCXXX4  sda1                        3.7T   8:1   `-sda1          0    1  0 part  sdb   /mnt/backup3 ext4     3.7T   8:16  sdb    running  0    1  0 disk backup_netops WDC WD4000F9YZ-0 WD-WCCXXX1  sdc                         3.7T   8:32  sdc    running  0    1  0 disk               WDC WD4000F9YZ-0 WD-WCCXXX3  sdc1  /mnt/backup2 ext4     3.7T   8:33  `-sdc1          0    1  0 part  sdd                         3.7T   8:48  sdd    running  0    1  0 disk               WDC WD4000F9YZ-0 WD-WCCXXX2  sdd1  /mnt/backup1 ext4     3.7T   8:49  `-sdd1          0    1  0 part  


It's simple. This for example is the output on my PC:

andrea@centurion:~$ cat /proc/mdstat   Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]   md0 : active raid1 sdh1[1] sdg1[0]        312568576 blocks [2/2] [UU]    unused devices: <none>  

as you can see I've /dev/sdh1 and /dev/sdg1 joined in /dev/md0


Since your array doesn't have SES smarts and the disk activity LED isn't directly drivable e.g. you need firmware support for that. The only other thing you can do is quiesce the I/O as best you can and then use something like dd or sg_read on the members themselves to stride a pattern of reads to the disk that creates a uniquely identifiable blink pattern using the activity LED, a poor man's beacon if you will. It's really your only alternative, unless bringing the array down is an option.

This kind of serviceability is what differentiates external storage arrays. Since you didn't plan ahead by scribbling down the serial numbers and their positions, you can't do the simple set difference to identify the faulty drive. It's the price you pay for the solution you deployed, whether you realize it or not, but hey, live and learn.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »