HSM Alarm Codes

The Luna PCIe HSM 7 alarm messages indicate error conditions on the HSM card that might require user intervention. The alarms apply to a Luna HSM, compliant with security level FIPS 140-2 Level 3 . The alarm messages provide appropriate detail to alert HSM users of important events. Each alarm message has a unique character string for the message ID that allows higher level tools on the host system to parse for the alarm message IDs and generate notifications.

Messages are saved to the system log file in Linux host systems, allowing host application software like SNMP to parse the log file, and to the Windows Event Viewer in Windows host systems

Messages can be retrieved with the "dmesg" utility, to read messages from the driver log, which collects messages from the bootloader (BL), the firmware (FW), or from the Host Driver itself.

This section contains the following information:

>Alarm Generation and Handling

>List of HSM Alarm Codes

>HSM Alarm Code Samples

>Stored Data Integrity

Alarm Generation and Handling

Alarm messages can be generated due to the HSM BL, FW, and Host Driver SW detecting unexpected conditions. Other alarm messages are generated after unexpected interrupts or tamper events. For each of these problems detailed error information and an alarm message is output to notify the user that something special has happened.

At least one alarm message is output as a result of each tamper event by BL, FW, or Host Driver. Depending on the type of tamper all of them may report an alarm message related to the same tamper event. The message timestamps assist you to identify which alarm messages are for the same tamper event. Tamper alarm messages from BL, FW, and Host Driver have the same text description for the same tamper event. A specific type of tamper event is not reported again until FW clears the tamper information in the tamper circuit. If the tamper event happens after that, then either a new tamper condition has been detected or the same tamper event is still active and cannot be cleared.

Alarm Handling for Special Situations

Alarm messages are still generated during rare occurrences where BL, FW, or Host Driver might be in an abnormal state.

As long as the Host Driver is running, the BL and FW are able to output their alarm messages to the DLOG (driver log), which can be parsed to notify the user. If either BL or FW stops execution due to error detection, they output an alarm message to the Host Driver, which stores it in DLOG. All BL and FW checking for alarm conditions is stopped but all HW tamper event monitoring (soft and hard tampers) is still enabled including Host Driver monitoring. The card reset caused by these tampers restarts BL and possibly FW and the alarm messages are output. The following situations are also handled:

>BL starts before Host Driver is loaded (System power-up): Without Host Driver available, BL outputs all alarms only to an internal HSM log. When the Host Driver loads it resets the HSM card, causing BL to start again. BL can then send any new alarms to the host driver and either stop or proceed to FW, as the situation allows.

For an L3 card if FW is started it will output alarm messages for any existing tamper conditions. Any tamper event alarm messages including those not sent out while the Host Driver was not loaded can be fetched from the FRAM Log.  

NOTE   If needed, use lunash:> hsm supportInfo to output the FRAM Log in order to determine the tamper information, or to pass on to Thales Technical Support.

>FW halted due to internal error: In order to get to FW the Host Driver must be running so the FW halted alarm message will be stored in DLOG. No further BL or FW alarm messages are generated in this state until the next card reset.

>FW in locked state (tamper clear required): An alarm message is generated to signal locked state is active. FW is still doing periodic checks and FW alarm messages are still possible. Only a small subset of FW commands is available.  

>FW in Secure Transport Mode (STM): An alarm message is generated to signal STM is active. FW is still doing periodic checks and FW alarm messages are still possible. Only a small subset of FW commands are available.  

>Host Driver loses communications with the HSM card: If the Host Driver has any errors communicating with the K7 (BL or FW) it will generate alarm messages. The Host Driver also periodically checks that the Luna PCIe HSM 7 card is still present on the PCIe bus (i.e. chassis open causes a cold reset of the HSM) and if there is no response for a pre-determined period of time an alarm message is generated.

FRAM LOG

The Boot Loader and firmware also store all alarm event information in the FRAM Log in the non-volatile FRAM device on the K7. There is no specific FRAM Log partition for DLOG or alarm messages. Use LUNADIAG to retrieve the FRAM Log contents and return it to Thales Customer Support for further analysis. In the event the Host Driver is unavailable to receive this information, it is still present in the FRAM Log and can be retrieved long after the alarm event has finished.

List of HSM Alarm Codes

ALM ID Alarm Message Description Info
Host Driver Tamper
Flag
0001 Soft tamper - over voltage HSM voltage is above the operating range. HSM will stay in reset until voltage goes back in range. HCCSR: VST
0002 Soft tamper - temperature (nnC) HSM temperature (nn degrees Celsius) is outside the range (-2C to 80C). HSM will stay in reset until temperature goes back in range. HRCSR: TST
0003 Soft tamper - indeterminate cause A soft tamper occurred but cannot determine the cause.  
0004 Hard tamper - high temperature HSM temperature is higher than 88C. HT_T
0005 Hard tamper - low temperature HSM temperature is lower than -40C LT_T
0006 Hard tamper - over voltage HSM voltage is higher than the maximum allowed. OV_T, TC3_T
0009 Hard tamper - oscillator failure HSM tamper clock oscillator has failed OSC_T
0010 Decommission signal triggered Decommission button (connector P9) has been pressed. TC2_T
0011 Hard tamper - indeterminate cause A hard tamper occurred but cannot determine the cause.  
0012 Hardware Error Error detected in device hardware  
0013 High Temperature - nnC HSM has reached nn degrees Celsius and needs to be cooled to avoid tampering  
0014 Low Battery HSM battery voltage is below 2.75V and needs to be replaced soon.  
0015 PCIe Link Failure HSM no longer appears on PCIe bus. Chassis may have been opened.  
0016 Device Error Internal error detected during communications with HSM  
0017 Request Timed Out Request to HSM took too long  
 
Boot Loader Tamper
Flag  
1000 Unknown alarm ID xx in boot loader Illegal alarm ID used in Boot Loader.  
1001 HSM restart required Soft or hard tamper occurred. HSM needs to be restarted (reset) before firmware is allowed to run.  
1003 HSM halted - internal boot loader error Boot Loader detected an error during diagnostics and did not jump to FW.  
1004 Warning - boot loader diagnostic error Boot Loader detected an error during diagnostics that does not stop execution but needs to be investigated (i.e. fan, VPD, or RTC problems).  
1005 HSM FW signature check failed The FW image on the HSM failed authentication and will not be executed.  
1006 Soft tamper temperature/voltage HSM voltage or temperature is outside the acceptable range. HSM will stay in reset until back in range. PORSM status reg.
1007 Hard tamper - high temperature HSM voltage or temperature is outside the acceptable range. HSM will stay in reset until back in range. HT_T
1008 Hard tamper - low temperature HSM temperature is lower than -40C. LT_T
1009 Hard tamper - over voltage HSM voltage is higher than the maximum allowed. OV_T, TC3_T
1012 Hard tamper - oscillator failure HSM tamper clock oscillator has failed OSC_T
1013 Hard tamper - tamper configuration invalid HSM tamper configuration lost (set to defaults) due to power loss. FS_T
1014 Chassis opened Chassis open switch (connector P7) has been triggered. TC1_T
1015 HSM removed from chassis HSM was removed from host chassis then re-inserted CS
1016 Decommission signal triggered Decommission button (connector P9) has been pressed. TC2_T
 
Firmware  
2000 Unknown alarm ID xx in firmware Illegal alarm ID used in firmware.  
2001 High temperature warning activated HSM temperature is above 75C (FW checks every 2 minutes). This warning will not re-appear unless temperature drops below 75C and goes back up again.  
2002 High temperature warning deactivated HSM temperature has dropped below 75C.  
2003 Battery low voltage warning Battery voltage is below 2.75V (FW checks every hour). This warning will not re-appear unless voltage goes above 2.75V then back down. Battery should to be replaced soon.  
2004 Battery depleted Battery voltage is below 2.5V (FW checks every hour). HSM FW will be halted. Battery must to be replaced.  
2005 HSM deactivated Auto-activation data has been cleared  
2006 HSM decommissioned by FW All user crypto material has been invalidated due to KEK CRC failure, decommission signal, or tamper (if decommission on tamper enabled).  
2007 HSM zeroized All user crypto material has been erased. HSM product credentials still exist. This can occur for a variety of reasons including manual zeroization.  
2008 Internal data corruption Settings to control tamper monitoring are incorrect or Critical Security Parameter data (MTK) is invalid ( the tamper monitoring settings if incorrect are corrected. ). Otherwise there was an unexpected tamper security write protection change.  
2009 HSM halted - internal firmware error FW detected an error which caused it to halt itself. Can also be errors generated by the kernel such as: bad exception, out of memory, unrecoverable errors.  
2010 HSM locked - tamper clear required Limited set of FW commands available due to an HSM tamper condition. Tamper needs to be cleared before proceeding. Controlled tamper recovery must be enabled for this message to appear.  
2011 HSM unlocked - tamper clear done Tamper was cleared when in controlled tamper recovery mode.  
2012 HSM in secure transport mode Checked on every FW start-up to remind the user to do a recovery operation. Limited set of FW commands available.  
2013 HSM recovered from secure transport mode HSM in secure transport mode was recovered back to normal mode.  
2014 Auto-activation data invalid – HSM deactivated FW checked auto-activation data validity and failed. Re-activation required.  
2015 Hard tamper - high temperature (L3 only) HSM temperature was higher than 88C. HT_T
2016 Hard tamper - low temperature (L3 only) HSM temperature was lower than -40C. LT_T
2017 Hard tamper - over voltage (L3 only) HSM voltage was higher than the maximum allowed. OV_T, TC3_T
2018 Hard tamper - oscillator failure (L3 only) HSM tamper clock oscillator has failed OSC_T
2019 Hard tamper - tamper configuration invalid (L3 only) HSM tamper configuration lost (set to defaults) due to power loss. FS_T
2020 Chassis opened Chassis open switch (connector P7) has been triggered. TC1_T
2021 HSM was removed from chassis HSM was removed from host chassis just before this FW execution. HSM will be deactivated. CS
2022 Decommission signal triggered Decommission button (connector P9) has been pressed. TC2_T
2023 HSM fan x failure Fault detected in HSM on-board fan (fan 1 or fan 2).  
2024 Stored data integrity verify error Integrity of an object or CSP did not verify correctly. See Stored Data Integrity.  
2025 Firmware update in progress A firmware update procedure is in progress. Recorded in the logs, but not shown onscreen. [ Added with firmware 7.7.0 ]      
2026 Firmware update canceled A firmware update procedure was halted due to insufficient memory to continue - the HSM rolls back to the previous f/w. [ Added with firmware 7.7.0 ]  
2027 HSM storage exceeded Attempt to use storage beyond the size of a partition (which was doubled with firmware 7.7.0) - the update proceeds to completion, but some restrictions apply to the affected partition -- see Compare Behavior of Pre-Firmware 7.7, and V0, and V1 Partitions. This is recorded only in the logs, not onscreen, but a message "HSM storage is currently over capacity" is shown onscreen. [ Added with firmware 7.7.0 ]  
2028 HSM capacity exceeded Attempt to exceed the total memory size of the HSM cancels the operation. Refer to your backups. [ Added with firmware 7.7.0 ]  

HSM Alarm Code Samples

This section shows the details of some of the alarm event scenarios.

ALM = alarm message.

Temperature - High Warning

If HSM temperature reaches 75 degrees Celsius and then drops back below 75C the following actions occur:

>Temperature >= 75C

After 5 minutes at this temperature or higher, the Host Driver receives a 'High Temperature Warning' interrupt and issues an ALM

Firmware checks temperature at start-up and once per hour

Firmware issues ALM for high temperature warning activated

>Temperature < 75C

Firmware issues ALM for high temperature warning deactivated

Temperature – High Soft Tamper

When the temperature starts below 75C and reaches the high soft tamper limit of 80C and then drops back below 75C the following actions occur:

>Temperature >= 75C

After 5 minutes at this temperature or higher, the Host Driver receives a High Temperature Warning interrupt and issues an ALM

Firmware issues ALM for activation of high temperature warning

>Temperature >= 80C

Soft Tamper reset – card put into reset. Stays in reset until temperature lowers.  

Host Driver receives soft tamper interrupt and issues ALM (only one when soft tamper condition starts).

>Temperature < 80C

Bootloader issues soft tamper ALM, then an ALM that HSM restart is required and waits for host reset.  

User receives ALM and goes to LunaCM/Lunash to do an “hsm restart” command.  

Bootloader starts – jumps to firmware.  

Firmware starts – no actions taken for the soft tamper. If temperature >= 75C, firmware re-issues ALM for activation of high temperature warning.

>Temperature < 75C

Firmware issues ALM for deactivation of high temperature warning.

Temperature – High Hard Tamper

When the temperature starts below 75C and reaches high hard tamper limit of 88C and then drops back below 75C the following actions occur:

>Same as soft tamper described above up to when card is held in soft tamper reset

>Temperature > 88C

Hard Tamper reset – Card in hard tamper reset for 5 seconds then returns to soft tamper reset. K7 HW does erase/reset of all internal temporary memory. Tamper chip latches time and type of tamper. Host driver receives hard tamper interrupt and issues ALM.

HSM also erases auto-activation and STM data in tamper chip  

If decommission on tamper is enabled then key encryption data is erased in tamper chip as well

>Temperature < 80C

Bootloader starts – issues hard tamper ALM and logs it in FRAM Log  

Bootloader issues ALM that HSM restart is required and waits for host reset.  

User receives ALM and goes to LunaCM/Lunash to perform an hsm restart command.  

Bootloader starts – jumps to firmware.  

Firmware starts – saves hard tamper latches. If controlled tamper recovery is enabled, firmware locks HSM commands to a minimal subset only, and issues ALM for HSM locked. User must go to LunaCM/Lunash and perform a “tamper clear” command to get a full HSM command set. When tamper clear is issued, firmware outputs an ALM for HSM unlocked.  

Firmware – issues deactivation and decommission (if enabled for tamper) ALMs  

Firmware - temperature >= 75C, firmware re-issues ALM for activation of high temperature warning

>Temperature < 75C

Firmware issues ALM for deactivation of high temperature warning

>Temperature < 80C  

Bootloader starts – issues hard tamper ALM  

Bootloader erases all of flash except for Boot Loader area and issues ALM for 'HSM permanently tampered'  

Bootloader issues ALM that 'HSM restart is required' and waits for host reset.  

User receives ALM and goes to LunaCM/Lunash to do an “hsm restart” command.  

Bootloader starts – Only bootloader commands are available. Bootloader again issues 'ALM for HSM permanently tampered'. User can dump the FRAM Log using LUNADIAG.

Hard Tampers During Storage

When the HSM is powered off its tamper detection is powered by the on-card battery. Some hard tampers can occur when main power is not applied. The condition that caused the tamper might not be present (for example high or low temperature) when the HSM is powered back on, while others might never turn off (for example enclosure penetration, oscillator failure). If they occur while in storage, then after the HSM is powered up, the bootloader runs and logs the tamper events in FRAM Log and the serial port. Since the host K7 driver has not started yet, none of the messages from the bootloader are sent to the host, but other alarm messages are output later to notify the user.

Bootloader waits for the host driver to be loaded  

When the host driver starts up it immediately resets the HSM causing the bootloader to run again  

Bootloader does not re-log the same tamper events  

Bootloader jumps to firmware which outputs the ALM for the tamper event. If controlled tamper recovery is enabled firmware also outputs an ALM for the 'HSM is locked and a tamper clear is required'. The user can then use LunaCM or Lunash to clear the tamper

NOTE   If needed, use lunash:> hsm supportInfo to output the FRAM Log in order to determine the tamper information, or to pass on to Thales Technical Support.

Decommission with power on

If the HSM is powered on and a decommission is triggered either by the decommission switch or by a tamper (if decommission on tamper is enabled) then the HSM goes into reset for 5 seconds. The following alarm messages are output to FRAM Log, serial port, and host driver:

>The host driver immediately receives an interrupt and outputs an 'ALM for decommission triggered'  

>After 5 seconds lapses, the bootloader starts running and also outputs an 'ALM for decommission triggered'  

>Bootloader outputs an ALM for 'HSM restart required' and then waits  

>User gets alarm notification and performs an HSM restart  

>Bootloader restarts and jumps to firmware which finishes the decommission operations and firmware outputs an ALM for 'HSM decommissioned by firmware' and an ALM for 'HSM locked' (if enabled)

Decommission with power off

If the HSM is powered off and a decommission is triggered either by the decommission switch or by a tamper (if decommission on tamper is enabled) then the decommission is latched in the tamper chip. When the HSM is powered on the following alarm messages are output:

>Bootloader starts running and outputs an ALM for 'Decommission triggered' only to FRAM Log and serial port since the host driver is not loaded yet  

>Bootloader waits for the driver to be loaded which then forces a host reset  

>Bootloader restarts and jumps to firmware which finishes the decommission operations and firmware outputs an ALM for 'HSM decommissioned by firmware' and an ALM for 'HSM locked' (if enabled)  

NOTE   If needed, use lunash:> hsm supportInfo to output the FRAM Log in order to determine the tamper information, or to pass on to Thales Technical Support.

Chassis open with power on

If the HSM is powered on and the chassis open switch triggered then a cold reset is performed on the HSM which effectively removes the HSM from the PCIe bus. After about 10 seconds the HSM is released from reset and the following alarm messages are output:

>Host Driver notices the device is no longer present on the PCIe bus and outputs an ALM for 'HSM missing from PCIe bus'  

>Bootloader starts running and outputs an ALM for 'HSM chassis opened' only to FRAM Log and serial port  

>Bootloader waits for the driver to be loaded  

>User gets notification of missing HSM and powers off then on the host system

>Bootloader starts running and does not re-log the same tamper events

>Bootloader waits for the host driver to be loaded

>When the host driver starts up it immediately resets the HSM causing Bootloader to run again

>Bootloader jumps to firmware which finishes the chassis opened operations and firmware outputs an ALM for 'HSM chassis opened' and an ALM for 'HSM locked' (if enabled).

NOTE   If the chassis is still open then the HSM performs a cold reset after the tampers are cleared by firmware.

If needed, use lunash:> hsm supportInfo to output the FRAM Log in order to determine the tamper information, or to pass on to Thales Technical Support.

Chassis open with power off

If the HSM is powered off and the chassis open switch triggered then the chassis open is latched in the tamper chip. When the HSM is powered on the following alarm messages are output:

>Bootloader starts running and outputs an ALM for 'HSM chassis opened' only to FRAM Log and serial port

>Bootloader waits for the driver to be loaded which then forces a host reset

>Bootloader starts running and does not re-log the same tamper events

>Bootloader jumps to firmware which finishes the chassis opened operations and firmware outputs an ALM for 'HSM chassis opened' and an ALM for 'HSM locked' (if enabled)

NOTE   If the chassis is still open then the HSM performs a cold reset after the tampers are cleared by firmware.

Card removal

When an HSM is powered off and removed from the chassis a card removal latch is saved in the tamper chip. When the HSM is powered on the following alarm messages are output:

>Bootloader starts running and outputs an ALM for 'card removal' only to FRAM Log and serial port

>Bootloader waits for the driver to be loaded which then forces a host reset

>Bootloader starts running and does not re-log the same tamper events

>Bootloader restarts and jumps to firmware which outputs an ALM for 'HSM was removed from the chassis' and an ALM for 'HSM locked' (if enabled)

NOTE   If needed, use lunash:> hsm supportInfo to output the FRAM Log in order to determine the tamper information, or to pass on to Thales Technical Support.

Stored Data Integrity

The HSM performs data integrity checks at startup and during runtime.

Startup

If a check fails during startup, meaning that an object stored in flash memory was corrupted, then ALM 2024 is generated, along with additional log messages, and the HSM firmware halts:

k7pf0: [HSM] ALM2024: Stored data integrity verify error 
... additional messages that might include "LOG (SEVERE)" and "LOG (CRITICAL)", "Fatal error", and possibly also
k7pf0: [HSM] ALM2009: HSM halted - internal firmware error 

What to do

1.Restart the HSM.

2.If the ALM persists, cycle the power to the HSM.

3.If the ALM persists, zeroize the HSM.

4.If the ALM persists, contact Support.

Runtime

If a check fails during runtime, meaning that an object stored in volatile memory was corrupted, then ALM 2024 is generated, along with log messages, and the HSM is unable to perform any actions that involve the corrupted object:

k7pf0: [HSM] ALM2024: Stored data integrity verify error 
... additional messages that might include "LOG (SEVERE)"

What to do

1.Try restarting the HSM.

2.If an SDI alarm occurs during startup, see the section about "Startup", above.

3.If no SDI alarm occurs during startup, but an SDI alarm occurs later, contact Support.

Appliance reports out-of-service (OOS) code 30

Anything that halts the firmware (such as ALM_2004, ALM_2009, ALM_2026) results in an out-of-service code 30. Other critical events that halt the firmware include:

>failed self-test

>failure in the random number generator

>failure in integrity of the bootloader

>failure in integrity of the firmware

>failure in integrity of the HSM memory