Ubuntu: Machine Check Error occurs when installing Ubuntu (include image of log)



Question:

Here is the image for the logs: Image here. The installation process hung at this point. The Kernel panic message was not displayed.

The MCEs (at bottom of image) occurred soon after I selected "install Ubuntu" from the menu. I don't have any idea what CPU 0 ..., Bank 7 ..., TSC 0 ADDR <number> MISC <number>, or PROCESSOR 0:306f2 TIME <number> SOCKET 0 APIC 0 microcode 2d mean. Can someone explain them ? And, based on your experience or expertise, what may be the problem that triggered these messages? RAM, CPU, PSU or something else?

Also, the log mentions Run the above through mcelog --ascii. Where can I run any command like this in this situation?

Here are some spec for my setup:

  • USB stick for Ubuntu 16.04, created with UNetBootin;
  • Processor: Xeon E5-1650 v3;
  • Motherboard: ASRock X99 WS-E;
  • Power supply: EVGA SUPERNOVA 1600 G2 120-G2-1600-X1;
  • RAM: 16GB 288-Pin SDRAM DDR4 2400 ECC Registered;
  • GPU: EVGA GTX 680;

If any more information is helpful, please let me know. I really appreciate your help!

Edit: Just to be clear, my computer does not have any OS installed yet. I am building it from scratch. I encountered this problem when I was trying to install Ubuntu. Later, I made a Windows USB stick, but it didn't work either. After the Windows logo was displayed for 5 seconds, the screen went black and nothing happened.


Solution:1

The first step to decoding Machine Check Exception errors is to install mcelog and run that:

sudo apt-get install mcelog  sudo mcelog --ascii  

Maybe that will provide something more human readable.


Solution:2

MCE errors are usually caused by hardware issues. However, on Haswell, Broadwell and Skylake processors, they can also be caused by outdated firmware to work around processor errata/defects. The Xeon E5-v3 processor does have several MCE-generating errata, and therefore it will require a reasonably up-to-date firmware to get a microcode capable of supporting Linux.

The procedures to deal with possible hardware defects are well known, and you will find lots of information and guides in the network if you search for them. I will answer from the microcode/firmware angle, which is a lot less well known.

Assuming you are not doing anything as idiotic as insisting on trying to overclock/undervolt/underclock a system that is reporting MCE errors (i.e. ensure every overclock feature of the mainboard is inactive):

  1. Install the latest firmware (BIOS/UEFI) update from the system vendor, or chances are you will not even manage to install a Linux distro because it will crash before the end of the install (or corrupt the installed image).

If you installed that Xeon on a desktop board (which appears to be the case, since EVGA is not known to make server/workstation-class hardware), well, you may have to pester the motherboard vendor for a new BIOS version with the latest microcode and memory controller firmware from Intel, or hack that BIOS yourself to update its built-in microcode with the latest available from Intel -- search for BIOS modding forums for help, but do try talking to EVGA first, an official BIOS is much better.

  1. Install the intel-microcode package/"CPU microcode driver" when prompted for that, as long as the firmware has new enough microcode to actually manage to finish installing Ubuntu and boot the system without crashing, the intel-microcode package can be used to fix most remaining microcode issues.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »