Alarms
When in a production environment, the STH component is typically alarmed using the following alarms (we also include some guidelines regarding how to react if they arise):
- Alarm
STH-0
:- Description: Error when connecting to MongoDB or error when starting the Hapi server or any uncaught exception.
- Severity: CRITICAL.
- Detection strategy:
| lvl=ERROR | corr=NA | trans=NA | op=OPER_STH_SHUTDOWN
in the log messages. - Stop condition:
| lvl=INFO | corr=NA | trans=NA | op=OPER_STH_SERVER_LOG |
and msg containing'Everything OK'
in the log messages. - Procedure:
- Check the logs to infer the concrete error
- If error when connecting to MongoDB:
- Check the MongoDB instance or replica-set is running. If not, start it up.
- Check if the machine where the STH is running has connectivity to the MongoDB instance or replica-set. If not accessible, make it accessible.
- If any other error:
- Restart the STH server.
- Contact the development team to inform them about this error.
- Alarm
STH-1
:- Description: Internal Hapi server error.
- Severity: CRITICAL.
- Detection strategy:
| lvl=ERROR | corr=NA | trans=NA | op=OPER_STH_SERVER_LOG
in the log messages. - Stop condition:
| lvl=INFO | corr=NA | trans=NA | op=OPER_STH_SERVER_LOG |
and msg containing'Everything OK'
in the log messages. - Procedure:
- Restart the STH server.
- Contact the development team to inform them about this error.
- Alarm
STH-2
:- Description: Error when getting raw or aggregated data from a MongoDB collection.
- Severity: CRITICAL.
- Detection strategy:
lvl=ERROR
andmsg=Error when getting data from collection
in the log messages. - Stop condition:
| lvl=INFO | corr=NA | trans=NA | op=OPER_STH_SERVER_LOG |
and msg containing'Everything OK'
in the log messages. - Procedure:
- Check the MongoDB instance or replica-set is running. If not, start it up.
- Check if the machine where the STH is running has connectivity to the MongoDB instance or replica-set. If not accessible, make it accessible.
- Contact the development team to inform them about this error.
- Alarm
STH-3
:- Description: Error when getting the collection in MongoDB from which the raw or aggregated data should be retrieved.
- Severity: CRITICAL.
- Detection strategy:
lvl=ERROR
andmsg=Error when getting the collection
in the log messages. - Stop condition:
| lvl=INFO | corr=NA | trans=NA | op=OPER_STH_SERVER_LOG |
and msg containing'Everything OK'
in the log messages. - Procedure:
- Check the MongoDB instance or replica-set is running. If not, start it up.
- Check if the machine where the STH is running has connectivity to the MongoDB instance or replica-set. If not accessible, make it accessible.
- The problem could be related to the limitation MongoDB imposes on the namespaces maximum size (for further information, see: http://docs.mongodb.org/manual/reference/limits/ , for the concrete MongoDB instance version)
- Contact the development team to inform them about this error.
- Alarm
STH-4
:- Description: Error when storing raw data in the corresponding MongoDB collection.
- Severity: CRITICAL.
- Detection strategy:
lvl=ERROR
andmsg=Error when storing the raw data associated to a notification event
in the log messages. - Stop condition:
| lvl=INFO | corr=NA | trans=NA | op=OPER_STH_SERVER_LOG |
and msg containing'Everything OK'
in the log messages. - Procedure:
- Check the MongoDB instance or replica-set is running. If not, start it up.
- Check if the machine where the STH is running has connectivity to the MongoDB instance or replica-set. If not accessible, make it accessible.
- Contact the development team to inform them about this error.
- Alarm
STH-5
:- Description: Error when storing aggregated data in the corresponding MongoDB collection.
- Severity: CRITICAL.
- Detection strategy:
lvl=ERROR
andmsg=Error when storing the aggregated data associated to a notification event
in the log messages. - Stop condition:
| lvl=INFO | corr=NA | trans=NA | op=OPER_STH_SERVER_LOG |
and msg containing'Everything OK'
in the log messages. - Procedure:
- Check the MongoDB instance or replica-set is running. If not, start it up.
- Check if the machine where the STH is running has connectivity to the MongoDB instance or replica-set. If not accessible, make it accessible.
- Contact the development team to inform them about this error.
- Alarm
STH-6
:- Description: Error when creating the index to force TTL in the newly created collection.
- Severity: CRITICAL.
- Detection strategy:
lvl=ERROR
andmsg=Error when creating the index for TTL for collection
in the log messages. - Stop condition:
| lvl=INFO | corr=NA | trans=NA | op=OPER_STH_SERVER_LOG |
and msg containing'Everything OK'
in the log messages. - Procedure:
- Check the MongoDB instance or replica-set is running. If not, start it up.
- Check if the machine where the STH is running has connectivity to the MongoDB instance or replica-set. If not accessible, make it accessible.
- Contact the development team to inform them about this error.