Business Service Health Monitoring and Event Correlation to Minimize MTTR

Trianz worked with a global insurance provider to enhance its infrastructure analytics and AI-powered remediation capabilities in the cloud. The insurance provider delivers services and products to individuals, families, and corporate customers worldwide.

Challenges for the Insurance Provider with Cloud Visibility

Infrastructure-Analytics-and-AIOPS

The insurance provider was struggling with excessive mean-times-to-repair (MTTR) incidents on its network. Gaining visibility into business service health and co-dependent services was difficult, and manual thresholds and rule-based alerts were not fit for purpose. This led to event noise and inefficient IT operations management (ITOM).

The client used a diverse multi-cloud architecture but lacked cross-cloud insight, leading to a lack of infrastructure visibility.

Trianz was enlisted to build an IT Operations (ITOps) platform with intelligent monitoring algorithms, which would reduce the client’s service incident levels using data-driven analytics and AI remediation playbooks. This would bridge the current multi-cloud canyon, centralizing monitoring for the entire IT network and reducing the mean-time-to-detect (MTTD) incidents through better data correlation.

Technology Components

Trianz worked with the insurance provider to implement the following technology components:

  • BMC Cloud was used to orchestrate the insurance provider’s multi-cloud IT environment. This included cross-cloud spooling of IT resources, ingesting and centralizing monitoring data, cost management, dependency mapping, inventory management, as well as independent software vendor (ISV) relationship and licensing management.

  • Splunk ITSI was included to enable IT Service Intelligence (ITSI) for the insurance provider. This solution enabled end-to-end service visibility and streamlined incident resolution using real-time event correlation, automated incident prioritization, predictive analytics, and other ITSM orchestration tools.

    How Trianz Helped the Insurance Provider to Enable Multi-Cloud Analytics and Insight on BMC with Splunk

    Trianz started by conducting workshops across all business departments, helping key stakeholders to understand the present IT challenges and the plan-of-action to remediate using BMC and Splunk.

    From here, our infrastructure experts prototyped a solution using the requested technology components before seeking feedback from the insurance provider.

    After approval, the system was developed further. Sample operations data was collected across numerous monitoring solutions to develop correlation policies for event de-duplication. This eventually constituted a custom algorithm for incident and event correlation, developed in-house by Trianz for the insurance provider’s specific business context.

    Next, Splunk ITSI was integrated with the BMC multi-cloud platform to enable the collection and analysis of event and incident data, removing IT silos in the process. Visualized dashboards in Splunk enabled customizable monitoring and alerting, accessible by employees on device, or deployable to TV displays for greater IT awareness.

    Finally, the completed system was stress-tested to demonstrate the final incident prediction model, which uses historical events and trends to proactively predict IT incidents.

    Transformational Effects

    The insurance provider experienced tangible transformational effects after adopting the new BMC Splunk platform.

    To start, root cause analysis (RCA) was greatly simplified and much more efficient. This enabled the insurance provider to quickly and competently remediate IT problems using AI, analytics, and system log data.

    Anomaly detection acted as a safety net for un-predictable IT events, while the proactive incident remediation algorithm automated a large percentage of incident response workflows. This freed up time for the IT department without sacrificing IT service quality.

    A centralized, consolidated view of critical key performance indicators (KPIs), server metrics and business service health metrics led to greater awareness, faster incident responses, and a better understanding of the correlation between service health and underlying systems.

    Historical trends, patterns, and problem areas were tracked and archived, acting as a fuel source for the custom machine learning algorithm to perform incident remediation and event correlation.

    Experience the Trianz Difference

    Trianz enables digital transformations through effective strategies and excellence in execution. Collaborating with business and technology leaders, we help formulate and execute operational strategies to achieve intended business outcomes by bringing the best of consulting, technology expertise, and execution models.

    Powered by knowledge, research, and experience, we enable clients to transform their business ecosystems and achieve superior performance by leveraging infrastructure, cloud, analytics, digital, and security paradigms. Reach out to get in touch or learn more.

    Contact Us Today

    By submitting your information, you agree to our revised  Privacy Statement.

    TESTIMONIALS

    Get in Touch

    Let us help you
    transform and grow


    By submitting your information, you agree to our revised  Privacy Statement.

    Let’s Talk

    x

    Status message

    We're eager to assist you! Please leave a message and we'll get back to you shortly.

    By submitting your information, you agree to our revised  Privacy Statement.