I recently started building out my third file server, and picked up some Adaptec 6805T RAID cards on eBay to interface with my multiple drive trays. Having built similar servers and also working with numerous Adaptec product lines in my day to day datacenter work, I consider myself to be a subject matter expert, but this experience seemed interesting enough to write a post about since the issue had me scratching my head for a bit, hopefully this will help someone who is in my shoes in the future.
An Unusual Error
One of the cards out of the batch was producing an unusual error.
One or more drives are either missing or not responding.
Please check if the drives are connected and powered on.
<<<< FATAL CONFIGURATION ERROR DETECTED >>>>
<<<< CANNOT CONTINUE BOOT PROCESS >>>>
<< Correct the problem and Reboot the system >>
On the surface, this error makes a lot of sense, it’s simply warning that the hard drives from the previous configuration are gone. Since I bought this card on eBay, and there’s no telling what the previous owner’s configuration was, this makes perfect sense.
What doesn’t make sense is this error message halting the boot process, and the lack of ability to accept this new configuration state. Normally, you would be prompted to accept or reject the new configuration when drives are removed.
What makes even LESS sense is the complete absence of any documentation or other posts online containing the text of this error. My Google searches returned nothing of use. No mention of this on any Adaptec resources I could find. No one on /r/homelab able to provide a helpful response to my post.
Troubleshooting Efforts
Troubleshooting this problem was difficult, because my computer would not boot with the PCIe card installed. This error message completely halted the boot process. Since PCIe isn’t hot plug, it’s not as if I could turn the computer on and then slide the card in.
I remembered that I have a few server boards which allow me the capability of disabling the PCIe option ROMs on a per-port basis, so my hope was that I could disable the oprom on the card and boot into Linux, then do further troubleshooting with arcconf.
So, I popped the card in and went to go into the BIOS, but of course, thanks to this error, I couldn’t even open the BIOS. The ROM initialization happened before the BIOS opened and thus locked up my system and I wasn’t able to proceed further.
I decided to put a working 6805T card in, disable the option ROM, then swap the card for the non-working one and hope that the ROM would stay disabled in the BIOS.
This worked, and I was able to get booted into Linux!
Trial & Error
Poking around arcconf I had a few ideas.
I tried resetting the controller to factory defaults:arcconf setconfig 1 default
Curiously, this alone did NOT fix the problem.
Reading the docs, I came across this command, and I was VERY hopeful:arcconf setbiosparams 1 BIOSHALTONMISSINGDRIVECOUNT <count>
But unfortunately, apparently my controller doesn’t “support” changing this setting?Setting the BIOS parameters in not supported on this controller.
This was definitely a letdown, as this seemed like the setting I wanted to change.
What Ended Up Working
I decided, since the firmware on the controller was outdated, I would update the firmware, try to reset as much as possible, and then test it.
So I downloaded the latest BIOS ROM image from Adaptec and proceeded:
arcconf romupdate 1 as680T01.ufi
arcconf setconfig 1 default
arcconf resetstatisticscounters 1
This worked! I was now able to boot into the Adaptec BIOS with the CTRL+A prompt, after moving the card to a PCIe slot where the option ROM was enabled.
Curiously, even after resetting everything, I STILL had to “accept” the new configuration the first time. But, after this, everything was working normally! 🙂
Recent Comments