Incident Management

Incident Management and Response

Incident management is a process for managing and responding to unplanned events or disruptions that affect an organization's services or infrastructure. The goal of incident management is to minimize the impact of these events and restore normal service operation as quickly as possible. Incident management typically involves identifying and classifying incidents, investigating and diagnosing the root cause, implementing a resolution, and communicating with relevant stakeholders throughout the process. It is a critical component of an organization's overall service management strategy, as effective incident management can help ensure the availability and reliability of the organization's services and infrastructure. Our incident management is based on a set of general best practices such as ITIL v4 and a strong set of well proven methodologies from both AWS and KeyCore.

In KeyCore Managed Services incident management and response involves the following steps:

Incident identification: Incidents are identified through various means, such as monitoring systems, user reports, or problem management processes.
Incident classification: Once an incident has been identified, it is classified according to its severity and impact on the organization's services. This helps prioritize the incident and determine the appropriate response.
Incident investigation: KeyCore Managed Services and AMS use their expertise to investigate the incident and identify the root cause.
Incident resolution: Based on the findings of the investigation, KeyCore Managed Services and AMS work to resolve the incident as quickly as possible, using their expertise and resources to restore normal service operation.
Incident communication: Throughout the incident management and response process, KeyCore Managed Services and AMS communicate with relevant stakeholders, such as customers and users, to keep them informed of the status of the incident and any actions being taken to resolve it.
Incident review: Once the incident has been resolved, KeyCore Managed Services and AMS conduct a review to identify any lessons learned and areas for improvement in the incident management and response process.

Overall, the goal of incident management and response in this setting is to minimize the impact of incidents on the organization's services and ensure timely and effective resolution.

Delivery Process

KeyCore Managed Services - Incident Management Process

This process starts with the initial detection of incidents and then raising a respective ticket.

Each incident is recorded so that it could be tracked, monitored, and updated throughout its life cycle.

Act No: 1	Act Name: Raise ticket on ITSM	Owner: Incident Requester / Service Desk Agent
Description: Once an Incident gets detected, the details are logged in ITSM to raise an incident ticket. Service Desk will refer KEDB to check whether it is a known error/ issue or not.

Decision Box

Act Name: Is it an Incident?

Owner: Service Desk

Description: The Service Desk determines if the ticket is an Incident or a Service Request (Service Request process will have those definitions). Service Requests are small repeatable requests for work such as password reset, access requests or requests for information.

If the ticket is a Service Request follow the Service Request process.

Output: Service Request or Incident

Act No: 2

Act Name: Categorize and Prioritize Incident

Owner: Service Desk

Description: Categorize and Prioritize the Incident in ITSM tool.

Categorization is assigning the Category, Type and Item (CTI), to allow the correct assignment of the ticket. Some of the incidents are related to the 3^rd party and they are not assigned to the L2 teams. Service Desk will raise these categories of the ticket and assign those tickets directly to the 3^rd party vendor.

Prioritization of Incident would be done based on impact and urgency of issue. Incidents are prioritized into P1, P2, P3 or P4 based on company’s prioritisation. While prioritizing the Incident, it gets treated based on the criticality.

Output: Categorized and Prioritized Incident

Decision Box	Act Name: Is this P1 Incident?	Owner: Service Desk/Incident Manager
Description: If it’s a critical incident (P1), then it triggers the critical/ major incident handling process.
Output: Prioritize as Critical/ Major Incident or continue as the normal incident

Act No: 3	Act Name: Assign to L2 Resolver Group	Owner: Service Desk
Description: Assign the Incident to the appropriate resolution group. Assignment is based on the categorization of the Incident.
Output: Resolver group identified

Act No: 4

Act Name: Review and Update Incident

Owner: L2 Team

Description: Upon receipt of an Incident, review and updating is done to the ticket.

Ensure the following has been captured correctly:

· Priority

· Assignment

· Categorization

If any additional information is required to understand the issue, contact the customer who raised the ticket on Service Desk directly.

Output: Reviewed and updated incident

Act No: 5	Act Name: Investigate and diagnose Incident	Owner: L2 Team
Description: Carry out investigation and diagnosis activities to identify a workaround or resolution for the issue. Update ITSM Incident with any investigation and diagnosis activities.
Output: Resolution identified

Act No: 6	Act Name: Resolution Provided	Owner: L2 Team
Description: Resolution provided to the Incident. Update the ticket with resolution activities.
Output: Resolved Incident

Decision Box

Act Name: L3 Support required?

Owner: L2 Team

Description:

If L2 is able to resolve the ticket resolution, it is updated in the ticket.

Else if L2 Team is unable to resolve the Incident, functionally escalate the Incident with respective L3 vendor.

Output: L3 Support required or not

Act No: 7	Act Name: Engage respective L3 Vendor	Owner: L2 Team
Description: If the L2 resolver group could not find the resolution and detemines that L3 support is required, the incident is assigned to respective L3 Vendor. Any communication with L3 vendor is recorded and updated.
Output: L3 Vendor Engaged

Act No: 8	Act Name: Update status of Ticket as ‘Pending’	Owner: L2 Team
Description: Update ITSM to reflect that issue is raised to Vendor and set status of ticket to “Pending” to stop SLA clock.
Output: Ticket status set to “Pending”

Act No: 9

Act Name: Investigate and Diagnose Incident

Owner: L3 Vendor

Description: Carry out investigation and diagnosis activities to identify a workaround or resolution for the issue. Update Incident with any investigation and diagnosis activities.

L2 continue to update based on updates provided by the L3 vendor.

Output: Resolution identified

Act No: 10	Act Name: Resolution provided	Owner: L3 Vendor
Description: Apply resolution to resolve the Incident identified during the investigation and diagnosis and inform L2 team about the resolution. If L3 Team doesn’t have access to the respective system, L2 will apply the resolution provided by L3 Team.
Output: Resolution implemented

Act No: 11	Act Name: Verify Resolution	Owner: Service Desk
Description: Verify resolution by contacting customer who raised the Incident, checking alarm or other tests. L2 support might be involved at this stage if required. Tickets will be transferred back to respective L2 team if user is not satisfied with the resolution.
Output: Resolution verified

Act No: 12	Act Name: Close Ticket	Owner: Service Desk
Description: Incident Requester will close the Incident once issue resolution is verified to be correct and customer is satisfied. The process also checks that the Incident record is fully updated and assigns a closure category.
Output: Ticket closed

Process Actors

Incident Reporter

The incident reporter may be an end user who has experienced an issue with a service or product, or it may be an automated alarm.

The role of the incident reporter is to provide as much information as possible about the issue, including details on the symptoms, any error messages or other relevant information, and the steps taken to reproduce the issue. This information is crucial for the incident management team to properly identify and resolve the issue.

In KeyCore Managed Services the incident reporter can be either

Customer team member - when a problem is identified outside the IT part of the workload
KeyCore Specialist - our team will log problems when we identify architectural issues or potential problems in your solutions.
AMS Specialist - AWS / AMS will log problems when they detect them through repeated alerts or through advanced pattern recognition.

Incident Manager

The incident manager is responsible for coordinating and managing the resolution of incidents that occur in the cloud environment. This includes identifying the root cause of the incident, coordinating with relevant teams to resolve the issue, and communicating with stakeholders about the status of the incident and any necessary actions.

The incident manager works closely with KeyCore and AMS to ensure that the cloud infrastructure and applications are operating efficiently and effectively. They may also work with the customer's development team to ensure that any issues with the application are addressed and resolved.

Overall, the incident manager plays a key role in ensuring the reliability and availability of the cloud environment, as well as ensuring that any issues are quickly and effectively addressed to minimize disruption to the customer's business.

KeyCore Application Specialist

A AWS Certified Application specialist from KeyCore

Application Team

One or more representatives from the team responsible for the application. Depending on application and setup this role may vary - it is defined during on-boarding.

AMS Cloud Experts

For infrastructure related problems the dedicated team from AWS Managed Services will participate in the problem management process.

AWS Support

Both KeyCore and AMS team may need to escalate incidents to the relevant service teams in AWS - this is done through AWS Support.

Service Level Agreement

Incidents are covered by SLAs that depend on which Service Tier have been selected.

Incident Response Time - SLA Tier Premium

Service commitment 24/7/365	Key Performance Indicator	SLA Tier Premium
Incident management	Incident P1 (Critical) Response	<=15 min
	Incident P2 (High) Response	<=4 hours
	Incident P3 (Moderate) Response	<=12 hours
Incident Management Restoration/Resolution Time	Incident P1 (Critical) Resolution	<=4 hours Restoration
	Incident P2 (High) Response	<=8 hours Restoration
	Incident P3 (Moderate) Response	<=24 hours Restoration

Incident Response Time - SLA Tier Standard

Service commitment 24/7/365	Key Performance Indicator	SLA Tier Business
Incident management	Incident P1 (Critical) Response	<=1 hour
	Incident P2 (High) Response	<=4 hours
	Incident P3 (Moderate) Response	<=12 hours
Incident Management Restoration/Resolution Time	Incident P1 (Critical) Resolution	<=4 hours Restoration
	Incident P2 (High) Response	<=8 hours Restoration
	Incident P3 (Moderate) Response	<=24 hours Restoration

Output and delivered value

With Incident Management as a core component of your cloud operations, you can minimize the impact of these incidents on your organization and your customers

Improved service availability and reliability

By quickly and effectively responding to and resolving incidents, you can minimize downtime and ensure that your services are available and reliable for your customers.

Reduced impact of incidents

Effective incident management can help reduce the impact of incidents on your organization and your customers, such as by minimizing the duration of disruptions or by identifying and addressing root causes to prevent future incidents.

Improved customer satisfaction

By quickly and effectively responding to and resolving incidents, you can improve the overall customer experience and satisfaction with your services.

Increased efficiency

By streamlining incident management processes and identifying and addressing root causes, you can increase efficiency and reduce the overall cost of managing and responding to incidents.

Pricing Information

Basic

DKK 1350 per hour

During on-boarding we will define how to handle identified incidents. We can participate in your existing Incident Management process, run the process for you or forward everything related to application to your own team.

DKK 1350 per hour

Standard

DKK 1350 per hour

Business

Included

We will run Incident Management process on every identified incident and ensure to include relevant parties from AWS, AMS and your application team to resolve the issue.

Included

Premium

Included

We will run Incident Management process on every identified incident and ensure to include relevant parties from AWS, AMS and your application team to resolve the issue.

Included

Frequently Asked Questions

When possibilities are almost endless, it is crucial to have a partner who has in-depth expert knowledge. Not just in opportunities and benefits, but in challenges.

What is incident management in the context of KeyCore Managed Services and AWS Managed Services?

Incident management is the process of identifying, responding to, and resolving incidents that occur in the IT environment. In this context, KeyCore Managed Services is providing cloud managed services using AWS Managed Services under the Partner-led AMS model. This means that KeyCore and AWS are responsible for managing and monitoring the IT environment, including responding to and resolving incidents that may occur.

How does KeyCore Managed Services handle incident management?

KeyCore Managed Services has a dedicated team of experienced technicians who are responsible for incident management - part of the team is provided by AWS under the AMS model. They are available 24/7 to respond to and resolve incidents in the IT environment. They use a variety of tools and techniques to identify and resolve incidents, including monitoring tools, problem management processes, and incident management processes.

How do I report an incident to KeyCore Managed Services?

To report an incident to KeyCore Managed Services, you can contact the service desk using any of the methods available under your specific tier. Our team will then triage the incident and take appropriate action to resolve it.

How long does it take for KeyCore Managed Services to resolve an incident?

The time it takes to resolve an incident will depend on the severity of the incident and the resources required to resolve it. KeyCore Managed Services has established service level agreements (SLAs) with our clients, which outline the expected response and resolution times for different types of incidents. Our team will work as quickly as possible to resolve the incident and minimize any impact on your business.

Can I request updates on the status of an incident?

Yes, you can request updates on the status of an incident by logging a ticket through the support portal or contacting our support hotline. Our team will provide regular updates on the status of the incident and the actions being taken to resolve it.

Cloud Specialists

Sectors

Services

Cloud migration

Details

Support

Managed Services

Details

Additional services

Life-sciences

Details

Data & Analytics

Details

Career in KeyCore

Open Positions

Process

About KeyCore

KeyCore

Latest