Ubuntu: Drive failure in Raid1. Can't replace



Question:

I have a software raid1 and mdadm emailed me that a drive failed. Following steps I found online, I marked as failed and removed the drive from the raid.

I replaced the drive with an exact make and model but with the new drive in, the raid won't start. cat /proc/mdstat just says its inactive.

I also can not add the new drive to the raid, keeps saying no superblock on the new drive even though I copied the partition from the working drive to the new one.

I removed the new drive and put back in the failed one. The raid does boot up and oddly its trying to rebuild right now but checking my mdadm.conf, it just looks goofy.

mdadm.conf

# mdadm.conf  #  # Please refer to mdadm.conf(5) for information about this file.  #    # by default (built-in), scan all partitions (/proc/partitions) and all  # containers for MD superblocks. alternatively, specify devices to scan, using  # wildcards if desired.  #DEVICE partitions containers    # auto-create devices with Debian standard permissions  CREATE owner=root group=disk mode=0660 auto=yes    # automatically tag new arrays as belonging to the local system  HOMEHOST     # instruct the monitoring daemon where to send mail alerts  MAILADDR myemail@gmail.com    # definitions of existing MD arrays    # This file was auto-generated on Sun, 30 Dec 2012 02:27:19 -0700  # by mkconf $Id$  DEVICE /dev/sdb1 /dev/sdb1  ARRAY /dev/md0 level=raid1 devices=/dev/sdb1,/dev/sdb1  

fdisk -l

  Disk /dev/sda: 640.1 GB, 640135028736 bytes  255 heads, 63 sectors/track, 77825 cylinders, total 1250263728 sectors  Units = sectors of 1 * 512 = 512 bytes  Sector size (logical/physical): 512 bytes / 512 bytes  I/O size (minimum/optimal): 512 bytes / 512 bytes  Disk identifier: 0x00058100       Device Boot      Start         End      Blocks   Id  System  /dev/sda1   *        2048  1241874431   620936192   83  Linux  /dev/sda2      1241876478  1250263039     4193281    5  Extended  /dev/sda5      1241876480  1250263039     4193280   82  Linux swap / Solaris    WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.      Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes  255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors  Units = sectors of 1 * 512 = 512 bytes  Sector size (logical/physical): 512 bytes / 4096 bytes  I/O size (minimum/optimal): 4096 bytes / 4096 bytes  Disk identifier: 0x00000000       Device Boot      Start         End      Blocks   Id  System  /dev/sdb1               1  3907029167  1953514583+  ee  GPT  Partition 1 does not start on physical sector boundary.    WARNING: GPT (GUID Partition Table) detected on '/dev/sdc'! The util fdisk doesn't support GPT. Use GNU Parted.      Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes  255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors  Units = sectors of 1 * 512 = 512 bytes  Sector size (logical/physical): 512 bytes / 4096 bytes  I/O size (minimum/optimal): 4096 bytes / 4096 bytes  Disk identifier: 0x00000000       Device Boot      Start         End      Blocks   Id  System  /dev/sdc1               1  3907029167  1953514583+  ee  GPT  Partition 1 does not start on physical sector boundary.    Disk /dev/md0: 2000.3 GB, 2000263380992 bytes  2 heads, 4 sectors/track, 488345552 cylinders, total 3906764416 sectors  Units = sectors of 1 * 512 = 512 bytes  Sector size (logical/physical): 512 bytes / 4096 bytes  I/O size (minimum/optimal): 4096 bytes / 4096 bytes  Disk identifier: 0x00000000    Disk /dev/md0 doesn't contain a valid partition table  

I don't even know what to put here that could help, but i'd really appreciate any help or advice to get this raid back in working order.

UPDATE: Adding more information after another attempt. After putting back the faulty drive, it started to rebuild then failed as expected.

Started by making new mdadm.conf file sudo su -c "/usr/share/mdadm/mkconf > /etc/mdadm/mdadm.conf"

  1. Failed then removed /dev/sdc1
  2. Installed new HD
  3. copied over partition using sgdisk -R /dev/sdc /dev/sdb
  4. Updated UUID sgdisk -G /dev/sdc
  5. sfdisk -r /dev/sdc
  6. mdadm --manage /dev/md0 --add /dev/sdc1

Trying to add sdc1 error mdadm: cannot get array info for /dev/md0


Solution:1

Well first of all a little more information on your setup would be nice so I could fill your partition names etc.

As you said you marked it as failed and removed it (I guess with mdadm --manage /dev/md0 --remove /dev/sdb1 or whatever your raid/physical partitions are for every partition).

Did you do this in a live system? Meaning is this a machine you can power down? Are the drives capable of hot plug?

You also said you copied the partition table (sfdisk -d /dev/sda | sfdisk /dev/sdb). How did you do this? Which partition table (MBR/GPT) is your device using?

If it's GPT you have to use sgdisk -R /dev/sdb /dev/sda to copy the partition table from sda to sdb.

After that you will have to give it a new UUID: sgdisk -G /dev/sdb.

Then use sfdisk -r /dev/sdb to have the Kernel reload your partition table.

Use mdadm /dev/md0 -a /dev/sdb1. As with the removing you have to do this for EVERY partition. Then use grub-mkdevicemap -n to generate a new device map for Grub2 and then grub-install /dev/sdb.

To the Edit: This seems to be really goofy your RAID consists of 2 times the same partition...

Maybe you'll want to create a new configuration: sudo su -c "/usr/share/mdadm/mkconf > /etc/mdadm/mdadm.conf".


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »