--> mdadm, GPT and kernel; unexpected behavour | Symonds.io

mdadm, GPT and kernel; unexpected behavour

This weekend I encountered some rather unexpected interaction between mdadm, GPT and the linux kernel.

I had added a ‘new’ disk to my personal RAID5, growing it from 3 to 4 disks.

mdadm --add /dev/md0 /dev/sdd
mdadm --grow --raid-devices=4 --backup-file=/root/grow_md0.bak /dev/md0

After the reshape had completed ~26h later, I rebooted the system to check that everything was working.

The array came up with a disk missing

11720662464 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

The missing drive was showing correctly as /dev/sdd and mdadm --examine /dev/sdd was correct

I re-added the missing drive, waited for it to resilver/rebuild and then rebooted again.

Again the disk was missing from the array

An issue showed when I ran fdisk -l /dev/sdd

The primary GPT table is corrupt, but the backup appears OK, so that will be used.
Disk /dev/sdd: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 1B1FA805-AF84-4575-8D86-4252CF6C6BF1

The disk had previously been used and had an existing; but now empty, GPT table. Although the raid configuration had destroyed the head partition table, the tail partition table had been left intact. This meant the kenel mapped it as a normal drive and stopped mdadm from building the array with the disk.

I removed the drive from the system, and with a USB->SATA blanked out the partition tables

Expert command (? for help): z
About to wipe out GPT on /dev/sdd. Proceed? (Y/N): Y
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Blank out MBR? (Y/N): Y

And then added the disk back to the array with

mdadm --add /dev/md0 /dev/sdd

After the rebuild ~12h I rebooted the system again, this time the disk was correctly included in the array.

Thanks to #Linux-Raid @ freenode for help debugging this issue