OMV fix accidentally split mirror RAID

I had setup a mirrored RAID in OMV and then the PC hardware, not the disks I should emphasize, failed and I needed to move the system disk and two data disks to other hardware.

Because I decided to test each disk separately this broke the mirror and in OMV I now saw:

Work off line

I was really keen to keep the two disks in the same state and not have them accidentially updated by other servers on my network. So for the entire time for this exercise I kept the OMV server and my desktop completely off the lan with a static IP set on my PC. I would strongly suggest you think of something similar.

Work off line

At this point I had a spare blank disk around and so took out one of the drives from the array, put the new one in, did a “Recover” in OMV and rebuilt the array, when the array was built I took the spare disk out. The point was if everything went to custard I could use the spare disk, which I had now turned into a disk from the mirror, to rebuild the array from scratch as it were.

If I login to OMV using ssh then if I run:

ARRAY /dev/md/Mirror metadata=1.2 name=nas1:Mirror UUID=391fb756:204b7361:f368c71b:fb4eea09
ARRAY /dev/md/Mirror metadata=1.2 name=nas1:Mirror UUID=391fb756:204b7361:f368c71b:fb4eea09

As you can see the two listed arrays are identical.

From the ealier screen shots you can see it thinks there are now two arrays md126 and md127. I can query each of these:

mdadm --detail /dev/md126

Gives

/dev/md126:
         Version : 1.2
   Creation Time : Thu Nov 10 19:41:47 2016
      Raid Level : raid1
      Array Size : 2930135488 (2794.40 GiB 3000.46 GB)
   Used Dev Size : 2930135488 (2794.40 GiB 3000.46 GB)
    Raid Devices : 2
   Total Devices : 1
     Persistence : Superblock is persistent
   Intent Bitmap : Internal
     Update Time : Wed Dec 23 09:37:53 2020       State : active, degraded
  Active Devices : 1
 Working Devices : 1
  Failed Devices : 0
   Spare Devices : 0
        Name : nas1:Mirror  (local to host nas1)
        UUID : 391fb756:204b7361:f368c71b:fb4eea09
      Events : 940496
 
Number   Major   Minor   RaidDevice State
    2       8       32        0      active sync   /dev/sdc
    1       0        0        1      removed

Then if I run:

Then if I run:

I get:

/dev/md127:
         Version : 1.2
   Creation Time : Thu Nov 10 19:41:47 2016
      Raid Level : raid1
      Array Size : 2930135488 (2794.40 GiB 3000.46 GB)
   Used Dev Size : 2930135488 (2794.40 GiB 3000.46 GB)
    Raid Devices : 2
   Total Devices : 1
     Persistence : Superblock is persistent
 Intent Bitmap : Internal
 Update Time : Wed Dec 23 09:59:36 2020       State : active, degraded
 Active Devices : 1
 Working Devices : 1
  Failed Devices : 0
   Spare Devices : 0
        Name : nas1:Mirror  (local to host nas1)        UUID : 391fb756:204b7361:f368c71b:fb4eea09      Events : 942450 Number   Major   Minor   RaidDevice State    0       0        0        0      removed    3       8       16        1      active sync   /dev/sdb

So nearly identical, the problem was how to join them.

If you want to see the details for one disk in the array you can run:

mdadm --examine /dev/sdc

This gives

/dev/sdc:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : 391fb756:204b7361:f368c71b:fb4eea09
            Name : nas1:Mirror  (local to host nas1)
   Creation Time : Thu Nov 10 19:41:47 2016
      Raid Level : raid1
    Raid Devices : 2
  Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
      Array Size : 2930135488 (2794.40 GiB 3000.46 GB)
   Used Dev Size : 5860270976 (2794.40 GiB 3000.46 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : fbd7e6a5:7e687f1e:4bd1c7cb:42945d52
 Internal Bitmap : 8 sectors from superblock
     Update Time : Wed Dec 23 09:37:53 2020
        Checksum : f4cb61d7 - correct
          Events : 940496
     Device Role : Active device 0
     Array State : A. ('A' == active, '.' == missing)

And for the other disk run:

mdadm --examine /dev/sdb

Giving:

/dev/sdb:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : 391fb756:204b7361:f368c71b:fb4eea09
            Name : nas1:Mirror  (local to host nas1)
   Creation Time : Thu Nov 10 19:41:47 2016
      Raid Level : raid1
    Raid Devices : 2
  Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
      Array Size : 2930135488 (2794.40 GiB 3000.46 GB)
   Used Dev Size : 5860270976 (2794.40 GiB 3000.46 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 665bdd07:200d588f:1a7bf968:f1b050ee
 Internal Bitmap : 8 sectors from superblock
     Update Time : Wed Dec 23 10:06:38 2020
        Checksum : 8c53478d - correct
          Events : 943424
     Device Role : Active device 1
     Array State : .A ('A' == active, '.' == missing)

The resolution

As with some of these things it can be a simple trick to fix, it’s just a ton of Googling to find it.

You need to run:

mdadm --manage /devmd126 --re-add /dev/sdc

Unfortunately this will give you an error:

mdadm: error opening /devmd126: No such file or directory

To fix this you need to run:

mdadm --stop /dev/md127

Now the problem is that because I had set this up as a share one of the current mirrors, md126 or md127 will be serving the SMB share. so you may get back:

mdadm: Cannot get exclusive access to /dev/md127:Perhaps a running process, mounted filesystem or active volume group?

If happens try to stop the other mirror so try:

mdadm --stop /dev/md126

Which gives:

mdadm: stopped /dev/md126

So now the mirror md127 is still running and you will recall earlier you ran “mdadm –detail /dev/md127” which included the line:

       3       8       16        1      active sync   /dev/sdb

So you need to readd the OTHER disk, which is sdc. So run the following:

mdadm --manage /dev/md127 --re-add /dev/sdc

Which returns:

mdadm: re-added /dev/sdc

Now in OMV you see the RAID is fixed:

At this point it’s fixed, see it was simple, once you got to the bottom of it.

Testing I did

I was quite concerned that the merged array was working. I when it was up I went to one of the shares and deleted some files and added one large video file.

Then I took one drive out and restarted OMV and check the files were as expected and I could play the video

Then I swapped and took other the other drive and connected the one I just removed. Again I checked the share and checked I could play the video.

So for myself I believe it is working ok

What are events

When you do a “–details” or “–examine” you will see a line that looks like:

Events : 944167

I assumed that this number should be constantly increasing whenever I changed the file system. I now don’t think that is the case and although I am not completely sure, I think it is related to “events” in the array, as in things that have happened on the array – I don’t believe just because the number doesn’t change or it goes up, that anything is broken.

It is also very important to note that the listed “events” is not real time or does not appear to be. You should give it a few minutes and repeat the command and you will often see the number change.

If there have been no file changes I have found that the numbers reported for each disk will stablize and be the same number.

Spares Missing Event emails

After fixing the array you may find you get emails from OMV with a subject similar to:

SparesMissing event on /dev/md/Mirror:nas1 [nas1.cantabrian]

If you do follow the instructions in the post: Spares missing event emails after RAID changes