Metadata-Version: 2.1
Name: monasca-notification
Version: 1.14.2.dev1
Summary: Reads alarms from Kafka and then notifies the customer using their configured notification method.
Home-page: https://github.com/openstack/monasca-notification
Author: OpenStack
Author-email: openstack-dev@lists.openstack.org
License: Apache
Description: Team and repository tags
        ========================
        
        .. image:: https://governance.openstack.org/tc/badges/monasca-notification.svg
            :target: https://governance.openstack.org/tc/reference/tags/index.html
        
        .. Change things from this point on
        
        Notification Engine
        ===================
        
        This engine reads alarms from Kafka and then notifies the customer using
        the configured notification method. Multiple notification and retry
        engines can run in parallel, up to one per available Kafka partition.
        Zookeeper is used to negotiate access to the Kafka partitions whenever a
        new process joins or leaves the working set.
        
        Architecture
        ============
        
        The notification engine generates notifications using the following
        steps:
        
        1. Read Alarms from Kafka, with no auto commit. -
           monasca\_common.kafka.KafkaConsumer class
        2. Determine notification type for an alarm. Done by reading from mysql. - AlarmProcessor class
        3. Send notification. - NotificationProcessor class
        4. Add successful notifications to a sent notification topic. - NotificationEngine class
        5. Add failed notifications to a retry topic. - NotificationEngine class
        6. Commit offset to Kafka - KafkaConsumer class
        
        The notification engine uses three Kafka topics:
        
        1. alarm\_topic: Alarms inbound to the notification engine.
        2. notification\_topic: Successfully sent notifications.
        3. notification\_retry\_topic: Failed notifications.
        
        A retry engine runs in parallel with the notification engine and gives
        any failed notification a configurable number of extra chances at
        success.
        
        The retry engine generates notifications using the following steps:
        
        1. Read notification json data from Kafka, with no auto commit. - KafkaConsumer class
        2. Rebuild the notification that failed. - RetryEngine class
        3. Send notification. - NotificationProcessor class
        4. Add successful notifications to a sent notification topic. - RetryEngine class
        5. Add failed notifications that have not hit the retry limit back to the retry topic. -
           RetryEngine class
        6. Discard failed notifications that have hit the retry limit. - RetryEngine class
        7. Commit offset to Kafka. - KafkaConsumer class
        
        The retry engine uses two Kafka topics:
        
        1. notification\_retry\_topic: Notifications that need to be retried.
        2. notification\_topic: Successfully sent notifications.
        
        Fault Tolerance
        ---------------
        
        When reading from the alarm topic, no committing is done. The committing
        is done only after processing. This allows the processing to continue
        even though some notifications can be slow. In the event of a
        catastrophic failure some notifications could be sent but the alarms
        have not yet been acknowledged. This is an acceptable failure mode,
        better to send a notification twice than not at all.
        
        The general process when a major error is encountered is to exit the
        daemon which should allow the other processes to renegotiate access to
        the Kafka partitions. It is also assumed that the notification engine
        will be run by a process supervisor which will restart it in case of a
        failure. In this way, any errors which are not easy to recover from are
        automatically handled by the service restarting and the active daemon
        switching to another instance.
        
        Though this should cover all errors, there is the risk that an alarm or
        a set of alarms can be processed and notifications are sent out multiple
        times. To minimize this risk a number of techniques are used:
        
        -  Timeouts are implemented for all notification types.
        -  An alarm TTL is utilized. Any alarm older than the TTL is not
           processed.
        
        Operation
        =========
        
        ``oslo.config`` is used for handling configuration options. A sample
        configuration file ``etc/monasca/notification.conf.sample`` can be
        generated by running:
        
        ::
        
            tox -e genconfig
        
        Monitoring
        ----------
        
        StatsD is incorporated into the daemon and will send all stats to the
        StatsD server launched by monasca-agent. Default host and port points to
        **localhost:8125**.
        
        -  Counters
        
           -  ConsumedFromKafka
           -  AlarmsFailedParse
           -  AlarmsNoNotification
           -  NotificationsCreated
           -  NotificationsSentSMTP
           -  NotificationsSentWebhook
           -  NotificationsSentPagerduty
           -  NotificationsSentFailed
           -  NotificationsInvalidType
           -  AlarmsFinished
           -  PublishedToKafka
        
        -  Timers
        
           -  ConfigDBTime
           -  SendNotificationTime
        
        Future Considerations
        =====================
        
        - More extensive load testing is needed:
        
           - How fast is the mysql db? How much load do we put on it. Initially I
             think it makes most sense to read notification details for each alarm
             but eventually I may want to cache that info.
           - How expensive are commits to Kafka for every message we read? Should
             we commit every N messages?
           - How efficient is the default Kafka consumer batch size?
           - Currently we can get ~200 notifications per second per
             NotificationEngine instance using webhooks to a local http server. Is
             that fast enough?
           - Are we putting too much load on Kafka at ~200 commits per second?
        
        
Keywords: openstack monitoring email
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Topic :: System :: Monitoring
Provides-Extra: test
Provides-Extra: jira_plugin
