PART 1 is here.
As I was saying, the problem began when a routine file copy locked up the main Frankenserver – which should not happen. Saturday morning, I checked and found a BIOS update for the motherboard in the Frankenserver was available. Normally, a BIOS update is very routine. This was not.
After applying the BIOS update, the main storage array (16 Terabytes) was corrupted and inaccessible. Not a great feeling. But I was not panicked yet as even if you delete files on hard drives, the files are there until they are overwritten with new data. So I knew the data was just there as the drives all appeared just fine.
But I needed a new motherboard as this one appear to have had an electrical failure. On a holiday weekend, my only solution was drive to a nearby (an hour and 45 minute drive) city to get a part before it closed. So I drove, got the board, drove back, ate, then installed the new board….
And the main storage array was still inaccessible. Now I started to get worried. Yes, the critical data was all backed up “in the cloud” but I had test restored that data before and it took over 3 weeks to download that much data. So I googled and googled – and thankfully found others who had this problem with a solution.
However the solution is a bit heart-stopping. You delete your storage array, rebuild it with the same settings on the same drive and use a utility to restore it. In theory, you are not deleting data but just the metadata “map” to the data and this utility will recreate it – in theory. But the reports of success were strong, so away I went and -
It worked, in just a few minutes and one machine restart. Everything back online and tested by early Sunday a.m. The utility is TestDisk - I had heard of it before but never used it. Needless to say, highly recommended.
How does that storage system work? Is it using Windows or Linux? Also, are you using the fake-raid provided by the motherboard? Fake raid is usually a disaster since it’s a proprietary solution where special windows drives collaborate with some half-baked hardware to implement it; during boot it’s the Bios that implements all the functionality. In Windows 7 Pro you can have Raid 0 and 1. In Linux you can have whatever you want and I suggest using LVM, too. Lesson from all this: do not use fake-raid, if that’s indeed what you were using. If you don’t want to invest in a proper RAID card, use software raid and it’ll usually perform just as well. About the only reason for a RAID card is if you want to have battery-backed cache to ensure data consistency in transactional environments (say databases). For photography work I don’t see much benefit from a battery-backed cache.