DevOps Business Improvement Guide | Ahmed Fawzy | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

DevOps Business Improvement Guide

teacher avatar Ahmed Fawzy, IT Transformation Advisor

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

16 Lessons (54m)
    • 1. Learning outcomes (Solve using Devops)

      0:55
    • 2. The Development Lifecycle

      5:07
    • 3. What is DevOps?

      2:57
    • 4. DevOps Methodologies & Continuous delivery

      6:44
    • 5. Infrastructure As Code

      1:54
    • 6. Dev-Sec-Ops and DevOps Best Practices

      5:00
    • 7. Microservices

      4:22
    • 8. Containers

      4:23
    • 9. Serverless

      2:44
    • 10. Site Reliability Engineering & Distributed Designs

      3:59
    • 11. Incident Command

      3:27
    • 12. Learning outcomes (Solve using Devops and Cloud)

      0:35
    • 13. Implementation Challenges

      2:13
    • 14. The Response Time ,is Job Rotation

      4:23
    • 15. Infrastructure Changes

      4:18
    • 16. Last thoughts.

      0:33
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

23

Students

--

Projects

About This Class

In this course, you will learn about 

  • Learn the core concepts of DevOps
  • The Development Lifecycle

  • DevOps Methodologies

  • DevSecOps

  • Get an overview of

    • Microservices

    • Containers

    • Serverless Computing

    • Site Reliability Engineering

Meet Your Teacher

Teacher Profile Image

Ahmed Fawzy

IT Transformation Advisor

Teacher

Ahmed Fawzy, is an Advisor, Author, and Online Trainer. He has 18 years of experience in the fields of IT transformation. Utilizing a unique approach to achieve a better alignment to the business through solutions and processes. Also, how to transform IT organizations successfully from "Traditional to Digital."

Ahmed holds ITIL Expert certification and ITIL4 MP. He is also a certified Project Management Professional (PMP), TOGAF 9 Certified, and has a Master in Business Administration (MBA).  He has implemented improvement programs for a wide variety of organizations. His approach is unique because it doesn't require new additional software or hardware,  "It's simple few adjustments that yield a high return." Ahmed's goal is to help leaders transform their IT internal o... See full profile

Class Ratings

Expectations Met?
  • Exceeded!
    0%
  • Yes
    0%
  • Somewhat
    0%
  • Not really
    0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Learning outcomes (Solve using Devops): Now we are reaching the final step in the technology part the divorce. This section is for you to plan. Your next move in develops correctly because the section is not technical and I believe develops is for everyone in the organization. This section is a structure toe address, both roles, developers and operations. The other is both. You have toe have a common ground. The starting point will be the development life cycle before moving on to the main develops methodologies. After that, I moved toe how security is addressed in develops. This is what known as def sec ops. Next, I will explain how develops monitoring is a bit different from normal monitoring scenarios . And lastly, I will give you some develops, Mr Practices. 2. The Development Lifecycle: we to clarify what is the typical development life cycle. If and what is the difference is the develops is proposing to that. This part is a quick recap for non developers like system admin is and system engineers. The life cycle starts from the source code until the final results off delivering production value in the cycle you have the following mean items you have source code repository. This is version control system for tracking the changes like get build systems. It watches the sorts of code repository and inform you if a new build is required. This tool is like Jenkins built tools, build orchestration and compilation like Docker, the for Build and Cooper. Need is for orchestration. Same apply for programming languages. Test should be fast, less then five minutes reliable and can isolate failures and, most importantly, provide feedback. In this stage, initial type of testing is unit testing to test individual components like close or subsystem by the developer to confirm the co dysfunction and integration testing. To this, the group's off subsystems collections off subsistence and eventually the entire system by the developer. At this point, you have a built with the test results associated with it. This is called artefacts. If a test failed, it never become an artifact. Artefacts follows the same version control process as a source code. You can use a tool like nexus or artifact Cherie. Next. In the cycle we have building the server buildings, the server and the deployment of the build into an environment, production or reproduction you preform first integration testing test groups, off subsystems, collection, off subsystems and eventually, the entire system by the developer system. Testing to this, the entire system by the developer. U i n toe in testing test by the developer and how the user will use the application security testing. The test is reformed by the developers. Look for flows in the coat to prevent that a leaking in the production. Two kinds of security testing You have static and dynamic. Dynamic testing is a specific type of attack that the tool execute a battery off attacks tests to the system. The downside. It is required a full system Toby running not only partial coat and takes a long time static testing, which is easier and simple. A tools that scan the code in non running estate and the highlight possible areas for vulnerabilities. After that, we have very four months distinct, huge range of this Soxfest spike test step test acceptance testing evaluates the system by the client, and end user testing involves typical transactions. Production deployment. Next in the cycle, we have production deployment. In this stage, use the same deployment tool and built deployed tobe reproduction. I always recommend separating the reproduction from production When testing a new built. It is always recommended either toe clone the production machines and use them in your testing or restore form. A recent backup to build your pre production. Now we move toe branching codes. The court branches are necessary to deliver value, though there is no golden bullet existed for branching. Almost every team have its own process for branching. Branching is how to create features and integrate them into the main application. You start by having branches like some features development, feature, development, development, fix, end the monster branch or the trunk. You start by the development product based on the customer requirement you received. Initially, this becomes your version one, so customer requirements translate to code the court translate toe version one. Then you discover a bug or discover some shortfall in the court. You create a fix for it and release it to the to become the master version 1.1. Meanwhile, you develop additional features in the South feature branch. You integrate both the new feature and the fix is released to the batch version one to create a new version in the monster branch that is called version to this is only as natural, but you continue doing so. Get a new requirement to develop it and mix it with the old cold and release a new version . Successful branching should not live long and should be integrated into the main branch. In this lecture, you learn the high level, how the development life cycle works on how the business applications are created, deployed and maintained. Thank you for watching and see you in the next lecture. 3. What is DevOps?: Now we understand the development lifecycle. Next, let's identify what is develops. Develops is the practice off operation and Development engineer participating together in the entire service lifecycle. Improving deployment and change cycle, which lead to higher performance, faster recovery and fewer failures develops is not a new position between developer and operation engineer. There is nothing called develops engineer. If you see someone say develops engineer required, then you know that he or she's needed developer that will handle system, administration and support as well, not on actual develops, because the develops movement is totally against creating a new classifications called develops. Engineer develops in the way of thinking and approach off, building and delivering support to the systems. And let me be clear, if an organization is using develops tool, it doesn't make it a develops organization. They are only using the tools. That's it because I see this a lot. If an organization used docker or Jenkins or arty factory does not make them a develops. A practice, though develops is a very young compared to other practices on I t like GSM as one example . The following is a measure, principles and methodologies that almost everyone in the developed community, agrees on Let's start by develops the principles first, we have C A. M s culture. It's changing culture and breaking the barrier between teams so that everyone in the organization can help anyone else. Automation. This is where the tools come in and this is one of the most visible parts of their crops. Measurement develops is all about measurement and improvement. So you measure to seek out improvement opportunities and repeat Onda lost. Lee sharing is sharing the knowledge and experience with everyone in the team and the community. All of these values, all of these principles represent the acronym comes C A. M s. If you remember in the previous section, this is exactly the same process we talked about, but it's implemented on a people and on actions. Next we have the three waves. The three wave system thinking focus on the overall results, the outcome from the entire system, not individual process. He's coming out of it. Abilify Feedback loops considering out between starting the process to discover any potential issue directly continuous learning and experimentation. Thank you for watching and see you in the next lecture. 4. DevOps Methodologies & Continuous delivery: Now we move to develop methodologies people over process over tools this mean defining. Who will do the job first, then to find the process around them and what tool they will use. That lost lean management work in small batches, feedback loops and visualization, change control, operation success and controlling changes, and the last two are continuous delivery and infrastructure as cold. Let's start by continuous delivery by creating small chunks. Run the entire system with every modification in a testing environment and, once accepted, ship it out to production. Always use the mini moment change possible. Continuous delivery does not mean continuous disruptions. User Need to know what's changing. Win and boy every build need to be tested to make sure it will not fail to reform. Blow the expectation. Some definition you need to know about continuous delivery methodology are first, the continuous delivery. It's the practice off. Deploying every application built toe production like environment and reforming automated acceptance testing continuous integration. The practice of automatically build an entire application. Continuous deployment. The practice of deploying every application built toe production after passing the automated interest. If you noticed that difference between continuous delivery and continuous deployment is the production like environment in case of continuous delivery, but in continuous deployment it is deployed directly to production. The challenge is to find the way to automate deployment and testing. The key is finding a trustworthy automata test. This is one of the reasons why the continuous delivery is more accepted than continuous deployment is that continuous delivery has a manual test at the end before deploying to production. Unlike continuous deployment, which has everything go to meet it next, we have cycle time from development until deployment. It should be as short of possible. Some say it should be shorter than the time it takes to make tea or coffee. Lead time, the time from the request to deployment. Now we understand the term continuous delivery. It's time now for the best practices. One. Try always to deploy smaller changes. Not so many changes at once, too. Try to know the features that our resource intensive So when you have a problem, you don't have to choke the entire system. Instead, you simply switch them off three separate deployment from releases. In other words, you can deploy it to production and it become up on running but always release it on another batch. Four. Always use the same artifact and the same deployment method when deploying in both pre production and production. Five. Even if your software bosses all the tests and deployed to production, test it once more to make sure it's functional. Six. If you have a blue grain deployment, it will be best to scenario to test it and swap it for production. Seven. Cannery Deployment Toe. Bring just one server to see how it will perform in real life. Next, What are the test in continuous delivery? As mentioned, the problem was continuous. Delivery is finding reliable tests. These are the most common and the most important test you should have in your continuous delivery deployment. First, you have unit testing. This is the lowest level of the code for testing. This is the basic connectivity and the basic core function. Good hygiene. The best practices for the court. This is the system best practices integration, testing house, a system or component function together. User acceptance distinct Based on the end user perspective. Infrastructure. Testing their an on off and see the system is it's still functioning or not. Reformist testing like loud test spike test stress test, soak test and the list go alone, especially in performance testing, security test penetration test or PAN test for short. These are the major tests need to be conducted in continuous delivery. Some of the ways to improve the testing results are just driven system. You write the test before the court or build the system. So the test come before the court. Behavior driven development described the business functionality and desired outcome tested Drive development TDD Start with the court that you know will fail, then fixed the area that made the test failed some sort of ah, reverse engineering off all the problems. Now what tools do you need to do all the mentioned? The continuous delivery you need first version control this the commit changes and review all changes ever made. A change is considered an independent layer on the court. Next, you will need continuous integration systems. This is to create a workflow that can be repeated with each builder from your system. Like backer. This is one of the most famous tools test using the scripts use the specific tests each in its respective areas. Artifact repository, centralized location software that holds the created items with all its revisions. Deployment automation for deployment like Chef, for example, and lost one. It is not at all but a way of building the application. It's called Feature Flag. It's a technique to turn some functionality off your application off via configuration without developing a new coat. We can boss this point without mentioning what is Jenkins? In a nutshell. Jenkins Continuous integration is the leading open source continuous integration server built with Java. It provides over 300 bludgeons to support building and testing virtually any project. Thank you for watching and see you in the next lecture. 5. Infrastructure As Code: the next topic is infrastructure as code this methodology. It's all about adapting the development approach to your infrastructure in an aim to control and limit changes seem like a coat. If you are asking Developer if he can work without any kind of version ing or source control, he will tell you it cannot be done in infrastructure. We cannot adopt everything, but we can get close. Let's start by trying out some concepts first, provisioning a process to make the server ready for operations like set up the hardware operating system and network configuration. Deployment the process off. Deploying the application on the server. Immutable deployment deployments not intended to change after deployment. Instead, the entire system is redeployed if needed. Like doctors, for example, you deploy the container. You don't redeploy everything from scratch. Having a muster image like a golden image. Result off configuration drift over time. The solution to that have a minimum system image delivered by configuration management software. If your images complete artifact act as a one unit, then no need to configure. Manage it. Redeploy a new one, and whenever changes required, you redeploy the entire set off the EMS or the container with all the applications, Send for service the ability for a user to kick off the process without the need for other people. Always remember systems treated like coat. Check into source control. Reviewed, built and tested. Use source of control for systems. For example. Development system integration, testing, production, deployment. The most famous tool for this is Terra Form by Haixi Corp. 6. Dev-Sec-Ops and DevOps Best Practices: one. Finish the system or coat to run security tools. Toe ordered the code or system. Three. Discover Vulnerabilities four. Generator report as a Pdf five. Send it to the system or the good owner. Six. System owner. Immediate vulnerabilities seven. Repeat and develops. It's all a matter of how fast we can run the test, even on the partial system and the who runs a test. So the idea off develops is to educate a developer on which is security tests they need to run the problem. Usually, this is a separation off duties. In this case, the security team need to be a separate team. How to make sure that all the tests are done correctly and will not impact the organization . The key to this is automation and real time communication that everyone knows what's going on. I t department. The middle ground for this is to empower the Edmonds and the diff team to run the skin themselves. And maybe the finalists can before the Goa life need to be done by the security team, security team simply will audit the final result. This will reduce the cycle significantly to enable developers and admits the security team need to one developer security process to provide an automated tools. Select what tests need to be reformed and find the tools that fit these tests. Three. Guidance in Remediation If containers air used to hold the tools, it's a good idea to put all security tools inside a container and handed over to the developer. This will allow the developer toe have a security tools with every build they make. One of the most common tools for vulnerability, scanning and management is called Open Vaz Open VF. Next, we need to discuss develops monitoring. The purpose of monitoring is to ensure that the service is delivering the expected result for the business. So monitoring objectives is one problem detection, reporting troubleshooting and for costing. We moved to the concept off. Observe ability. Monitoring is the action. Observe ability is the quality off monitoring important monitoring tour characteristic. A self service. It can be delegated to each team to monitor his services. It must haven't the ability to monitor the entire service, not only a specific function off the system, but it must also have an automated response function. Two concepts in monitoring you need to understand utilization is a unit that can be served up to 100% to saturation is the number after the 100% situation shows the bottleneck. Some of the develops Mr Practices are Incident Command Center. This will be discussed later in details. Transparent up time always communicates the state of during the outage. Don't leave users or customers in the dark Developer and call the developers are part of the operation. They don't simply throw something over the wall to the operation team. Blameless postmortems. A meeting after the outages solved maximum one week after. Have someone outside the outage and recovery team run the meeting. Create a timeline off instant and recovery. How we solved this House incident impacted our company and impacted our customers or users how to prevent this from happening again without blaming anyone. This is the entire reason for this is to gain and create lesson learned. Embedded teams Operation Team has won developer on development team have won from operations to make sure that each one of them understand each other. The cloud. This is one of the most important best practices that can be utilized in develops and on cords. The ability for anyone in the process to stop the process. If something wrong was detected, this will prevent the errors. Happened upstream toe become a downstream problem Dependency injection. You don't have to build the code for the dependency. You can get it ready made from the functions that is already providing it. Blue green deployment have two identical systems one life and the other is not. You bet that inactive system and shift the production traffic to it using glowed balancers , for example, If there is a problem, you shift back to the life environment and lastly, chaos monkey randomly pulling items out of the infrastructure to ensure reliability and availability. Thank you for watching and feeling the next lecture. 7. Microservices: Now we move to the supporting technology trends that support develops. The 1st 1 We will talk about his micro services, but before we discuss it, let's discover the history off application development. First, we had the monolithic Application single application package that does several functions and deployed all at once. This is considered a single tear application itself contained and independent from other applications. It provides a single function, not one single task. This will mean Ah, largest care releases, toe batch or add features. Toa software. Next, we got service oriented architectures. Each application component is independent and utilizing communication protocol to communicate with each other. Each part provide specific service toe the other parts with a specific outcome. Each part is a self contained. The downside of it. It's either ochi or error. Nothing between both states. A typical software, it's away is presentation layer a business process or logic and that access. The problem in ESO is the connection between application communications, not network communications. If it's too complicated, it will be easier to create it using monolithic application and finally, micro services create the court or the system. On a modular level, all communication is over a. B. I. Any unit can be called from any other unit in the system. Micro services entirely is an open source concept which mean it's cheap or even free. Some cloud provider also supports it. The issue with micro services is the number off artifacts you have to maintain. The more artifacts you will have, the more complex the service will be. Micro services is not the cloud. It could be on the cloud, but it's not only cloud Micro services is not the size of the service. It is the operation of the service. Each part is a small can be tested with few scenarios and can be deployed Foster. This allow you to build the service not for maximum possible load, but for the average lord. This will give you a list IQ availability. This model is best for cloud deployment. This is why it's often confused with cloud Onley deployment. Micro service is far from perfect. It's still very new concept and still face some challenges. One network latency increase the network latency due to increase the number of calls between one system to another. This is called distribution costs, the distribution cost and micro services will be higher. The more complex set of you got. If you have two milli Second, the late and Sieber single coal, imagine what will happen if you got 15 calls. The literacy will grow on might be a problem. Also, avoid circular called service one calls to and then two calls one and so on. This will significantly increase the late INSEE off the calls to a circuit breaker in all system. Use circuit breaker. After a number of connections attempt to the system, the system should be assumed off line. This will allow other components toe work around this part and not timing out. This is called Time Out. Look, if you build the system without dysfunction, a single component failure might cause a cascading failure that brings down the entire service services grouping and service break down. You have to build service data flow diagrams to understand how the system is talking to each other. If discover to system that Onley talk to each other, then I would suggest merging them into a single data domain. Use your best judgment when deciding on service boundaries. They be I layer. It is just an aggregation brooksie for the services. It should not have any logic in it. Its main purpose is to shield the services from being called directly by endpoint. Instead, the end the point called the FBI layer and the FBI layer distribute the call to the correct system. Think of it like router on the network traffic. Thank you. 8. Containers: containers is the way of deploying applications. Containers are not virtualization, nor it replaces virtualization, its mortal or approach off deploying and transporting applications. So why containers First agility, scalability, high availability, cost optimization and, most importantly, portability. The main advantages of containers are brokers makes Lennox containers easy and reliable. Better resource utilization, since you will have the same operating system running multiple containers on multiple processes. Lower application conflicts you can now boat to applications on the same operating system without fearing off publication conflicts between them. Better application compatibility. You can run the application on multiple operating systems without getting a mismatch version. The same artifact of used you can could be the container between the environments production MPRI production, and it's easier to release management. The common use cases are continuous integration and continuous deployment automation. Auto scaling Michael Service Architecture Containers as a service hybrid cloud architectures. Now we move to a bit that confuses people. Container versus virtualization. It is not a VM container is a collection of data and application. Back to together toe have mobility and agility. Containers are server based application deployment. Almost the entire technology is based on scripting files think of it seem like application virtualization or thin app. But for servers, the VM is better in one aspect because it's has increased security since each application have its own operating system and the virtualization technology is standardized Now the calm for the VM it is much additional overhead since you are running a complete operating system that Decatur toe one application on Lee. If you are worried about security in containers since the operating system view each earning container, you have to revise the design if you have a standard like PC I. Some options in container application, like doctors, can do isolation, but it's the best. Separate them on a different systems tow. Avoid issue with the regulations. Containers are not simplifying the development life cycle. Instead, it increases the complexity off the development because you are adding a layer creating the container to the mix. But it makes the operation maintenance and support much easier. So when design containers you don't store data inside the container, the containers are short lived. You use container for micro services or foreign to end and middleware off your application , but all we've you keep your data in the data bees. Is there one monolithic that obeys or smaller multiple that a basis. You can launch as many innocents as you can from a single container. This will allow you to process several requests in barrel. This will allow you to skill up and down as you wish. You will require some orchestration tool, so to meet the deployment and the container creation, this will simplify the final service delivery rolling back and rolling forward. You can move the same container between tests, and they've staging reproduction and production environment, and this is will lead us to opportunities. What is companies cooperate is an open source orchestration system for doctors containers. It handle scheduling nodes in tow, compute cluster and actively managed work clothes to ensure that their state match the user declare the intentions. It's responsible for creating the clusters and Skilling up and down the instance off the containers running on the system. Think booking and go when it launched. It has a huge surge before stabilising. What if your business is a seasonal in this case, Skilling, up and down will help you meet the required. The men was there with investing much in hardware and investing much up front 9. Serverless: In a typical architecture, you have climbed hardware operating system application server ended up with then virtualization was invented toe abstract hardware after that container was invented and abstracted the operating system. Now the application and serve Earless abstract The language on time and server list each component Reform of function are not permanently exist on a system. You run it from the function and deleted. Afterwards, you can change several functions together. Each function trigger event which create another function until you get the results you require. No central server is governing the process. It's all distributed functions. Serverless Computing is an event driven, a computer demand experience that implements a code triggered by events. Example for that is a read on Lee containers. Once an event, it regular it more will be created, respond to the event and terminated. Serverless Computing is a cloud based on Lee and will be offered by the three major vendors , Microsoft, AWS and Google. You can use it in your testing. Vagrant is an open source software product for you to build and maintain portable virtual development environment on your desktop. This will allow you toe build system on your laptop or this go top to serve as a test environment. This will make container delivery a much easier task and agile practice. What are the benefits off service scaling It's built. End the more conditions or events or requests you receive. The more functions will be created to service requests. Bay for the compute You only bay for what you need in 100 millisecond to compute time. No server. As a concept, everything is fully automated as a function on the cloud abuse cases are new applications. Small applications that serve a specific function have a large value off scaling need to scale fast, but not for very long. The challenge is off. Server with Operation Model could be difficult to grasp vendor looking, since you have to build it on the vendor Blood form does not replace the fervor oriented computing it have from limitation. It's not for everything. It's not fit for everything as mentioned running your code on a shared a platform. This will generate issue with the compliance for some standard like busy I, for one example, 10. Site Reliability Engineering & Distributed Designs: the primary focus off fight reliability Engineering is one production support. The operations released manager is there to control the change of the production in production management, you need the smallest payload as lean as possible with ways to roll back. Try to make your changes the rely on automation not on manual works. Don't look into server and modify something. Instead carried a package with the change inside. It could be a script for one example on Run the back edge on the system. This will be much easier to trace. Changes generate a log and it could be a rollback. You might think that clicking a bottom is easier than creating the script to do the same function. The idea is not how easy it is. The idea is traceable and repeatable. Since this change, you have to apply in the same way in the pre production and production. Always have released website or location to highlight the changes. A simple I s server or Apache server with a symbol h et MLB age uncover this point. The idea is announcing your change. Toe everyone else. The main pillars off the site Reliability engineering are one self service automation. As the title implies, it's do it yourself approach whatever might be required to get the job done. You can do it yourself without relying on any process or someone else to deliver this piece of information to you. This is a work in progress. This would reduce the work in progress. This will significantly increase accountability for tasks and streamlines a possible delivery. Service level Objectives are the general goals and objectives off the service error budget a budget dedicated to issue than errors. This can be a time allowed for a service to go down for a year. If the budget runs out, changes to that service might be blocked. The budget can be a time or actual money. This will lead us to another topic that is very close to site reliability engineering, which is distributed designs. You have to understand everything breaks no matter what. You have to design your system. Toby Resilient to be able to recover from the error. Forced reliability in domains are how 3 99% available components become lower availability rate. If you're multiplying 99 by 19 I am by 99 you will end up with 97%. By using this formula, the more component you have, the list reliable. The system will be now some application used look, balancer or store three copies of itself. So any given failure will not impact the system. So in this case, the formula will be a little bit different. It will be one minus mind. My nine multiplied by one minus 10.99 It will end up with six nines. The very important thing. Insisting failures. You should plan failure and plan how to recover from it. You have to see the system for what it is. You have to blend a detection system. It could be as easy as a script if a server failed, a script will generate a new server to pick up the load. But if the load decreases, the server will be deep, provisioned automatically. Always a challenge. Reliability ideas by testing them, don't depend on a cluster or function. If a node or more failed, you have to tested to make sure off the service continuity. This is where chaos monkey comes in as one example. So with containers and chaos monkey, you actually will increase and have more distributed design. Thank you for watching and see you in the next lecture 11. Incident Command: Have you ever seen firefighters how they deal with fire? If you did not notice, they treat larger, small fire with the same level of importance because they know that all you have to do to make a small fire. Ah, large fire. It's time. Same applied toe ICTY Incident Management You have to learn to adapt the firefighting techniques. Dealing with instance two main concepts and incident command is based on the incident. Commander. This is the first engineer that received the instant on I t admin or a developer. The 1st 5 minutes is very critical and will dictate how the incident will flow. The innocents civilized is based on whom you have to support you and the equipment you already have. But waiting too long toe ask for additional resources, can result in a longer stablization time or even turned the instant into a measure. Innocent. Let's take an example. You detect an incident. Whoever detector instant is automatically the incident commander in a five minutes, everyone that can help to solve the incident should be notified and aware with the situation and the stabilization of start to control whatever damage has been done and proceed with fixing the issue. Once fix it, the recovery process starts to form testing and documenting what happened. But what if we forget someone now? The damaging momentum will continue, but at slower pace until we reach a stabilisation and the recovery time is much stronger. What if we just waited and did not act? The incident will automatically become a major instant. But if we waited on did not act for a short time After that, it will be a full disaster. This will lead us to another great concept, which is Kaiser approach or change for the better. The five wise Why the system are down and continue doing so until we reached the root cause five times enough to reach the root cause of our problem. Human error is not a root cause. Human error is the failure in the process or failure in the process of guards during the incident service outage. Don't try to find the root calls and instead your primary target is to restore service to normal operation. In a very complex system, it's very difficult to point to a single root cause we find the top contributing causes. But an incident in a complex system will have multiple failures with different degrees. It is not a single fish bone. It is more of our snowflake. It looks like one, and every innocent is unique. So it is not a root cause. It is the most likely cause. Trying to find the root cause is simply trying to blame a single thing or a single someone which is incorrect in a complex system. It would help if you understood that the complex system you should have safeguards against failures. In the after instant meeting, the Scouser safeguards succeeded and what failed on what's missing avoid blaming someone or something for the failure. Thank you for watching and see you in the next section. 12. Learning outcomes (Solve using Devops and Cloud): you reached the final station. This shows your commitment to learn the subject and provide real value for your organization. In this section, you will learn some of the common implementation challenges that I had seen over the past few years and also discover the gap that you will have in your organization that typically , if you try to implement develops right away, you will fail into these gaps. Consider this a bridge that you should build before actually have a full, agile and develops organization. 13. Implementation Challenges: At this point, you understand the concept of develops, cloud and agile, so you have to proceed with implementation. At this point, please be careful. Though the concept is great, the implementation is different. So what are the implementation of challenges? First, you will have one off two chances. Your company's small and agile. This means you're have a small team, maybe eight or 10 people, and you can talk to most of them directly. In such cases, you can adopt cloud and develops easily, or at least a good part of it. The second part is that you are working in a medium or larger business with 200 blocks. Employees moving such business is a bit problematic, and this is where I see lots of failed adoption to the cloud and develops. Never try a big bang implementation. Try to introduce a small pieces first to the mix and this will improve over time. Though the implementation will take longer, you will achieve a better results. A gile develops and cloud all the nice buzzwords we are hearing are there to improve business. Business is all about the operation on how to enhance the operation as mentioned. Your first step is to adapt. Both cloud and develops is discovered. The gap. Any organization is a set of processes performed by people and tools. So first reform. High level analysis off your process using the guidelines in the Process Improvement section or better. If you already modelled your processes, go around, talk to colleagues and gather that an information to draw the process guides. You should be looking for the following first the response time Second job rotation third, request the frequency on finally infrastructure changes. These are on lee some of the first initial development cloud implementation. This will not help you into deploy full develops, but it will guide you to understand the core off the develop set up and the possibility off such implementation. 14. The Response Time ,is Job Rotation: Let's start first with the response time. The response of the operation to requests how long it takes for you requesting the test environment toe actually having it. In most cases I had seen, it's around 72 hours, plus or minus 24 hours. If you have such a duration, this means you have to focus a bit on automation, engines and self service. The idea is to start collecting the response time. If you did not do it before and drew this chart once you collect such information through the point of the process and drew the guidelines, define your barometers and started tackling them one by one until you reach a better grouping of results. All of this we address in our Brussels section next highlights this gap with other gaps to reach the Organization on Freeze Estate. How to do that is addressed in the organizational change section. Next, we have the request of frequency, the frequency off business requests. In many, many cases, the business has ideas that they, the business side don't know if it can be implemented or not. In this case, you need to understand how often the business requesting new things from your side. In many cases, I discovered that there is no mechanism off collecting such information. You only get requests from top management, but the operational enhancement you don't capture and the section from Operation Toe Benefit you now know how objectives are set in your organization. Plan to have a small portal to submit enhancement requests. This portal should be a user friendly because in many cases I had seen the user are frustrated with very simple items that can be solved by both operation and developers. Lack of requests indicate either very mature process. This is a highly unexpected. If you don't work in a Fortune 500 company or lack off mechanism toe capture ideas. This in turn will surely impact the business agility. In this case, you need tohave, the blood always going up with good angle. This mean you generate business idea and requests in a positive way to deal with such request, utilizing agile project management to prioritize the requests on continuous delivery to deliver the requests. Next item you need to address his job rotation. Do you have job rotation? Most people cannot perform two functions at once and cannot go from deep thinking in one area to deep thinking in another. If you think you can have one team to perform a both application development and reform the operation at the same time, you are sadly mistaking. You will lose lots of time. A typical person needs some time to get in a state of mine that will allow him to focus If you introduce operation interruption, this will reduce the speed and the progress of the team. Have mentioned organization, have two areas of focus operations and how to enhance operations. So how to create a develops team from the section Process Improvement. Imagine I t. As an activity list with all the activities as part of the chain with input is request and the output is the service delivered. This means that this basic diagram will look like this. By thinking off this method, you can determine your bottleneck in your environment. Do you have an issue in your ops or your depart or both? In all cases utilizing the pool system, you can easily improve your overall output. This means that almost every activity you will have a primary owner, either deaf or up, but each of them knows how to assess toe whenever there is a bottleneck. By using this at the end, you will have one team able to perform all actions in both Dev and ops, though with different level of efficiency. You'll still have the same activities you keep optimizing the bottleneck until you have one team that actually can perform all functions. Thank you for watching and fueling the next lecture. 15. Infrastructure Changes: How do you apply changes to your infrastructure and develops everything is applied A school . This is called infrastructure escort. This means you will need to adjust a bit your process in operation. As mentioned, you have two sets off systems dynamic and fix it systems for dynamic. Most of probably, it will be developer based, and the development team will have the source code. But in fix it systems, all your configuration needs to be done using scripts. This will allow tracking off the changes in addition to historical references. The issue. If you do it from the interface, it might create and change additional things that you don't know. What if your system doesn't support scripting a lot off those exists? And the Operation world, in this case, have the exact steps documented. Let me give you an example. Step one. Logan Toe the server Steptoe. Open the council. Step three. Open the option menu Step for change. Option X five. Close the console and look out with the server. You stick exactly to this a step list, but I would highly recommend finding a way to make the script based or change the system completely. So what is the goal of shifting everything. The scripts. The primary goal is to have a source Court in this case would be the script with the modification on it. This is the changes script all in one package. We call it on artifact. By doing so, you will be able to follow the development life cycle. In this case, the cloud is a huge part of this even public or private, we adopt the development life cycle to the Operation Team. This is the life cycle from the Operation Boy interview. The life cycle started from the bare metal off the server. Up until the final results of delivering production value. You first have the source code repository. In this part, it will be the script template repository. Next, you have the build system ambled tool. In this part, you can use cloud engines, Toto, meet the system deployment and the script running on top of it to produce the same results . This can be Dockers or open a stack engine or anything similar at the end. All you need is automation to configure the system. Next, we have test at each a script. You can easily create text file with the state of school. If it's a success or a failure, you can build the simple workflow test each step of the script. Did it succeed or not? This is considered unit testing, and once all scripts are done, you can run some script to use the system. Example. To send an email in case of an email system or create a user on a system or any similar function, this is considered an integration testing. Once your system workflow, pass all these checks and tests, you can consider this an artifact and save it in the artifact repository. Lots of software can do that, but let's assume the very basic you will save the entire flow on the Albert as a template with a specific version number end with a valid description. Now you can deploy this entire server under five minutes if you need it to build the test environment, for example, or building additional system toe, handle the additional workloads. This is considered one system. If you need to build our entire service, all build the multiple systems. In this case, you follow the same steps for each system, and when deploying them, you do it in a test environment to test the integration between the systems. Before you market all as a single artifact. So how to branch scripts? This would be the same. Like branching court. You perform everything using a script, any additional feature added or modified. You use script. But in the end, all the small modification scripts should be integrated into a monster script to simplify the implementation of the system in the long run. 16. Last thoughts.: Congratulations. You finish the course. Please leave a comment and feedback. I love to hear from you one final thought from brew via feedback about If you still don't know from where to begin, start by understanding organization card processes to commence a response time and introduce the concept off collecting business requests. Doing one of these will be a huge improvement for the organization on will drive the organization to be more agile and in business. Thank you for watching on. They hope to see you again.