Preparing for the System Design Interview | Programming Made Easy | Skillshare

Playback Speed


1.0x


  • 0.5x
  • 0.75x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 1.75x
  • 2x

Preparing for the System Design Interview

teacher avatar Programming Made Easy, Software Developer

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

    • 1.

      Welcome to this class!

      1:35

    • 2.

      What is System Design?

      5:33

    • 3.

      A template for any question

      7:12

    • 4.

      The URL shortening service

      24:22

    • 5.

      The photo sharing application

      19:02

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

83

Students

--

Projects

About This Class

Nowadays, almost all the companies ask for the design of various systems in their System design interviews. Mainly the system design round is for experienced people but top companies like the ones from FANG and so on, are keen on asking the designs to even freshers. There is a dedicated one to two hours round for system design. The system design round has multiple purposes, the interviewer want to know your breadth of knowledge, they want to understand how do you approach an open ended problem and how do you handle stressful situations.

System Design is also known as High Level Design. High level design is nothing but deciding on what components we will be needing in our system, how all the components will communicate with each other as well as external systems and what we be the capacity of our system. These are important things while designing any system to make it reliable, available, consistent and efficient.

This course is designed in an incremental fashion, for the purpose of understanding. Initially, all the concepts and components of system design are discussed. A full proof step by step procedure is explained to tackle any system design problem. All the case studies are given in comprehensive manner and are designed by following these steps.

Meet Your Teacher

Teacher Profile Image

Programming Made Easy

Software Developer

Teacher

Related Skills

Design More Design Set Design
Level: All Levels

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Welcome to this class!: Hello guys and welcome to this course on preparing for the system design interview. My name is Alex and I'm a software engineer, the test being taken and also giving System Design interviews during my time in software companies. When I heard of the possibility of creating a course to explain more about these complex interview type II was quite eager to develop one. This class will be structured in four lessons that contain practical steps for you to take in order to understand exactly what the interviewers will look forward to when scheduling these types of interview which you will first evolve, present to you what system design is. And then we will look at the template that works when it comes to solving any system design interview question. After this, we will also solve two of the most common interview questions from these field, designing a new URL, shortening service, and the photo-sharing system. If you are interested in bettering your software engineer interview skills, consider this course for you. There are no other requirements then an Internet connection S for the project of this class, it will be extremely practical and it will involve you to follow the steps presented in the course template. So you can start on your journey of solving system design interview questions. We thought the said, I think we are ready. See you in the first lesson. 2. What is System Design?: Hi guys and welcome back to the system. Did you tell him interviewed tutorial. In this lecture we are going to discuss what exactly is system design, what is supposed to be a system design interview. We are also going to take a look at some general good practices when it comes to your own system design interview. You may be running them into position where you are trying to land a job for a software engineering company. And you heard the two, I'm going to have a system design interview along with the more technical oriented interviews where you're actually supposed to coat. Well in general, in system design interviews, you are not really allowed to coat. Maybe just a little bit to prove your points in some parts. Now, getting started with the definition of what exactly system designed ease. It is just the process of defining different modules, interfaces, components, and also the data for a system to satisfy specified requirements that might be specified in your question that you get a newer on interview. Now, the system development process is actually the process that you are going to be doing at your job. If you land the developer position, where you are going to actually create or altered the system. Along with the processes, models, methodologies, and also practices that are used to develop them. But for now, system design is just the process of defining the components. So it's basically talking about what would be needed when trying to think about developing a project from scratch. Getting to the interview apart this system design interview that again, you might have conducted with the sole purpose of allowing the candidates that are taking it. Like maybe programmers, but it can be also engineers or designers. The opportunity to prove their expertise in the field that they're applying to. The tangible application of knowledge in solving a real problem that the company might be facing. So these are again, very big problems and general ones too. So you are going to be asked to implement something like a WhatsApp chat or something like that. We are going to have also concrete examples further along in this course. But again, they are big problems and you need to talk about them and think about how you would implement them optimally from both the time and space complexity point of view. Now, this system design interview is also conducted later in the interview process. First of all, you might be getting a technical interview where you are actually asked to code an actual problem. And then later in the interview process, just so DC, you have prior knowledge of some topics and you also are able to view things from a bigger perspective. They are going to also give you the system design interview. Now, this trial intended it is to see how well you work on a team. And also you are approach to a problem solving using questions that are, as I mentioned earlier, open-ended. And you just need to arrive at the best possible solution. This interview is supposed to analyze your processing, solving the problems, your thought process in creating and also designing the systems to help eventual clients that firm you want to get in needs. It's also an opportunity for you to show the hiring personnel, either the manager or other developers that you are taking this interview with. The two are valuable the member, by displaying your skills in this manner. When it comes to interview questions and answers in system design, they are typically very ambiguous. And this is to just allow the opportunity to demonstrate your qualifications. But you do not need to rush into giving an answer that might be wrong or not even wrong, but not sufficiently documented. You also meet to ask questions before you respond to their question just in order to narrow the scope. Give a direction to you to clarify also the explanations. This was about it for a general presentation of a system design interview. In the next lecture, we are going to discuss a little bit how we can approach every system design interview question. There is a four-step process that has been documented to be working, circling to take a look at that as well. But for now, thank you very much for sticking with me up to the end of this lecture. And I look forward to see you guys in the next one. 3. A template for any question: Hello guys and welcome back to the system design interview tutorial. In this lecture, we are going to take a look at a few steps that we might use in a template when we are given system design question at an interview. This temporary the time about to talk about is a one size fits all type of template when it comes to system design interview questions, there are just a few steps that you need to make sure that you get through when answering system design interview question. So getting started with this template, the first thing that you need to be shared when being asked the question is that you fully understand it. So we need to understand this problem. They need you to solve and establish the scope of the design you are going to talk about. This is the part where you need to ask questions just after t given you the full details about the questions. And you'll also need to establish any constraints that might appear. You do not need to rush into trying to solve the problem before having all the facts and having other things that might be unclear to you cleared out by the interviewer. The next step would be to propose a high-level design. I say said, you do not need to jump into implementing without confirming it. The approach that you are thinking about is actually satisfying all the constraints they want. So you might propose an approach that you think that is the best for solving the system design problem. See how they react to your approach before jumping into actually implementing where even going furthermore into details in each part of your proposal. Now the third part is actually done most time consuming of the mode, which is the design Deep Dive. This happens once you validated with your interviewer that you are on the right track, you know this, at this point, you just need to get into all the details of the application solving that you came up with. And this is the part where you need to have deep understanding and vocabulary of systems design. And at this design deep dive, there are many steps again, and beginning to deep-dive. The first question you need to ask is the requirements. And the requirements might be functional or nonfunctional. The functional requirements are just functionalities that problem you are solving is providing to the users. So for example, on your social media, there is a functionality that you can like a post from any other users. That is a requirement that is functional. You need to establish and run through all the functional requirements with your interviewers by course, listing them and validate their importance. Now, the requirements can also be non-functional, and here might be some things like the latency, which is the response time of user interaction. Reliability, meaning that there won't be any data loss. The consistency, which means that high availability. Now two things that you also need to keep in mind when trying to design your components. You need to make them. First of all, fault tolerant, meaning that your system is up and running at all times, even though there might be some errors or some database scratch. And they also need to be scalable disease. Another important fact, scalable just means that you need to handle the growing amount of traffic on your system. Going. Furthermore, to the second part of this design Deep Dive. Other than requirements, you also need to do a storage estimation based on the data modality, you need to just give a rough estimate of how much data must be stored. Also, you need to think about the number of requests, the service you provide and the readwrite ratio. After you talked about all of these storage estimation, you need to think about a database design. Here, the actions that you are user can perform to interact with your system is taking place. You can also think here about which type of database you will use and also why you can go for SQL, NoSQL, and so on. You just need to discuss about each option and think about which one fits the best your solution. Then a fourth step would be to start basic high-level design, where you can write about the clients, the servers, and also the database. And starting from that basic design, you can furthermore get into some more details and isolate the services, partition the data talked about caching and also the replication of the services and databases, even load balancing and message queues. If that fits your specific problem. There are of course optional which some other things that you can think about depending on exactly the problem that you're given, which can be the encryption of some messages or passwords. Some telemetry that makes sure that you keep track of things that get used in your application the most. Some API gateway service discovery and also d analytics service that analyzes the requests and the user beta. And after this design deep-dive, the last step, you need to take this to just wrap up, meaning that with the given design to, to build up. If it seems sensible, you can think about identifying some bottlenecks and improvement areas and talk a bit about those. This is the general structure that you need to keep in mind before starting your system design interview without even being given the problem. In some future lectures, we are going to take a look at specific system design interview questions like designing the tiny URL system, search engines, shared drives, and so on cell you can see how I apply all these steps from this exact template, my approach when solving DC problems. If that sounds interesting to you, I look forward to see you in the next lectures. Thank you very much for staying with me up to the end of this one. 4. The URL shortening service: Hello guys and welcome back to the system design interview tutorial. In this lecture, we are going to discuss famous problem that is often given in these types of interviews. And more specifically, the designing of a new URL shortening service. Just like for example, you may have seen tiny URL dot L and so on. In this lecture, I am going to run through all the aspects that you need to consider when S wearing this topic in a system design interview. And after watching this lecture, you are going to know how to ace this question. But not only that, you would also know how you can approach similar problems studies. Because by solving this problem, we are also going to respect the template that we should use on any system interview design question that is given to us. Starting off with this interview problem, we need to design, as I said, a URL shortening service, like tiny URL. And this surface will provide, of course, short lysis, red directing too long URLs. Now, the difficulty of the system design interview is pretty basic and it is often the most given one in system design interview. So it is very important for you to know how to solve this problem. The first thing we need to discuss is why do we need to shorten the URL? Here are multiple reasons and why we would need that. First of all, they save a lot of space when displayed, predict message it, and so on. And also if you want to type them from your own keyboard, you are less prone to make mistakes when typing them out. It is also used to optimize the links across devices and measure performance on ads. The second thing we need to address is the requirements and the goals of our system that we need to design. When given this question, I remind you here that when you are in a system design interview question, as well as other types of interviews. And you are being given question. You need to clarify all the requirements at the beginning. Before trying to answer your problem. You need to ask called the question to find the exact scope of the system that you are interviewer has in mind, going to the functionalities that this tiny URL surface should provide duty end users. We have them split into the main two categories of requirements, which are functional and non-functional. When thinking about the functional requirements, we can think of giving a URL to our service and being able to generate back a shorter and unique. And this being called the short link. This link that our service provides should also be short enough to be easily copied and paste it. And we are going to discuss later what short enough means and also agreed on that term. Now the second functional requirement can be that the user would be able to pick a custom shortly for their URLs. So if they do not want one randomly generated by our surface, they should also be able to pick one of their own. These links that our surface provide should also have a timespan after death. They should be deleted and also expire. Because otherwise our database would overflow at some point if you would keep having these links in it. Lastly, the users access to a short link should of course read directed to the original link that the short link represents. There are also some non-functional requirements for the URL shortening service, which would be that the red direction should happen in real-time with no latency whatsoever. Because imagine you would want to click a bit dot L-Y link and it will take forever to load. You were actual link that you want to access that would not be good at all. And also annoying. Also another non-functional requirements would be that the shortened link that is provided should not be guessable or predictable at all by a malicious eventual user. The last and most important non-functional requirements. Would be that our system, our service should be highly available. Because if it would be down, all the URL redirection is that it would conduct would be of course, down with it. Some additional requirements here can't be that our service should be accessible to APIs. So third-parties can make actual request to our endpoints and get on URL shortened back as a response if the request is alright. Also some analytics that we will be able to do with that lemon tree. And they should concern how many times Resurrection happened and also The more often where direction links, access and things like that. After we discussed all these requirements and costs of the system, we can move on to the third, which is the estimation of the capacity that the service should have and also the constraint to beat. So when we think about the service that we will implement, it will of course be read heavy. That's because a lot of red direction requests compared to new URL shortening will be available. And we can assume approximately 90 to one ratio between read and write. Because if you think about it, you only generate the link once, but of course it would be read much more than one time. Now the traffic estimates for the service, we can assume that we are going to have above 100 or 500 million new URL shortening per month that are created. So 500 million URLS being maybe tower service. With our percent of readwrite, we can expect about 50 billion red erections during this month. We can also think about queries per second for our system just in order to keep our database see from crashing. But apart from that, we also need to look at storage estimates. Here we can again do estimation. Assume that we store every URL shortening requests for five years. Since we have 500 million new URLs and B and D objects we expect to store are going to be about 30 billion. And assuming they took about 400 to 500 bytes each, we are going to need approximately 15 terabytes to store our whole data. We also need to think about the bandwidth estimates, not just the storage estimate for right, we can expect about 200 to 300 new URLs every second. And that is of course a top. We are going to look at the top of this range. So the total incoming data for our service would be about 500 bytes per request times two or 300 new URLs and B second, so about 100 to 200 kilobytes per second. A read request. Since every second we expect about 20 to 30 thousand URL already directions the total outgoing data is going to be about ten megabytes per second. We can also look at some memory estimate. What about if we want to catch some of our most used short URLs that are very SH said x. Well, we can look a bit at the 8020 rule. That is saying that the 20% of our URLs are going to generate 80% of the traffic. So 20% of our generated short URLs are going to be the hot ones. And at the short look, we can see that if we have 20 to 30 thousand requests per second, we are going to get about 1.7 to 8 billion requests per day. To catch 20% of these, we are going to need approximately 170 gigabytes of memory. As you have seen on my approach, the part of capacity estimation and constraints need to be very straightforward by thinking about the limit and estimate how much actual capacity and bandwidth he's going to be taken by implemented surface. I said we can also look at some APIs and discuss this part. We can make available some endpoints that our service is going to provide for third parties that wanted to access our service display, make requests to our endpoints and get back as a response the shortened URL. So for example, we can have an endpoint that east called create the URL. And it can take the original URL as a string parameter, which is of course the original URL that you want to have shortened. And also some kind of key that is associated with the API developer key of an account. So we know who is making these requests and have some sort of security in these parts. As I said before, this endpoint, we would of course return the successful insertion. Would of course return a string with the shortened URL if the request is made. Well, and if not, maybe an error code, we can also have a delete URL endpoint. What that is important, you can also mention it whatsoever. We also need to think about how to prevent abuse here. If we expose an API because a malicious user could put you out of business and destroy your services if they put some olive URL t is the current design and fill out your database. We can impose here some limit on the API developer key. It can be limited to a certain number of URL creations that say ten or 20 days or something like that. Now the next step would be to think about the database design and schema. Defining the schema in the early stages of the interview would really help you understand the flow amongst the components, would guide you towards later partitioning of the data. Now some observations here would be that we need to store for the services billions of records. We are going to have a lot of shortened URL. But these objects are going to be very small, as I've said, 500 kilobytes. There are no relationships in-between the records. So other than storing which users created that URL, you are pretty much free to do whatever we want here. And another observation regarding the database design and relevant here is that our service is going to be very heavy on the red part. We are going to need, as I said, two tables, one for storing the information about the URL mappings, and one for the user's data, who created the short link. Since we anticipate storing a lot of rows. And we also do not need to keep any relationship in-between objects. Now SQL kind of choice would be very good here since it is also very easy to scale. So we can look at something like Dynamo DB. I came here as you solve, we need to create the database schema in an interview and also think about what kind of database should be used and also suggest one from your own knowledge. The sixth step here is going to be the basic system design and algorithm. So here we are going to actually go into the implementation of how our service is going to work behind the scenes. The problem we are solving here is to generate the shortened unique key that is going to be mapped to a given bigger and larger URL SIC. This situation, we have two approaches. We can generate these keys of line with a key generation service. And it is going to generate random six letters strings beforehand and store them in new database. Whenever we want to shorten the URL here, we can take one that is already generated by the surface and years it. This approach is very simple and fast, but because we are not encoding the URL, so we don't need to worry about duplication or collusions. The service of the key generation is going to make sure that all the keys are going to be unique. We also need to keep in mind some concurrency problems here, because as soon as a key is used, it should be marked in our database to make sure that it sees not going to be used again. And if there are multiple servers that are reading these keys concurrently, you might get a scenario where two or more service tried to read from the database, even though from upwards to service endpoints, it would be very less likely. Now the servers can use these generation systems to read and marquees in a database. And they can use two tables to store the keys. One for the actual keys that are not used yet, and one for the keys that are used already. And as soon as this generation service will give the key to one of the servers, eat and move them to the table. We need also to think about here, if our generation system, being the single system of our entire service, wouldn't it be bad if it would die? And that's a very good point because we need to have a standby replica, this generation system in order to have a standby server that can take over to generate them provide easy in case something happens to a primary one. Now, another approach here, other dead degeneration of the keys with the service. Is also to encode the actual URL encoding. An actual URL can be done by computing a unique hash. Either if we are talking about the shot 250 seats or the empty five, and we can hash the given URL with these algorithms, these hashes can be encoded for display. And a reasonable question at this approach would be, what would be the size of the shorthand key? It can be six up to 20 even characters, but that would be quite a long key. We can use base 64 encoding. And if we are going to use six letter long keys, then it is going to take about 64 to the power of six possible strings. If we would generate eight letters long keys, we are going to have about 280 trillion possible strings. But for our system, the letter number would be quite high. And we can assume that six letters can suffice for our number of URLs. Now if we use the MD5 algorithm as our hash function, it will of course provide, as you may already know, 128 bit hash values. After base-64 encoding, we will get string having more than 21 characters. Now we only have space for six or eight characters per short key. How will we choose our key then? Well, we can take the first letters, but this can result in key duplication. And to solve that, we can choose to put some other characters out of the encoding or swap in-between them to avoid this problem altogether. We can also, after giving the solution in an interview, think about some issues with our solution, is it may turn out not to be bulletproof and these different issues come from you and not the interviewer. You are going to probably get extra points as you turn out to be a more aware eventual employee. The different tissues with the encoding of an actual URL using a hashtag. Unique solution can be that if multiple users enter the same URL, they can get the same shortened URL. The concurrency problems is it was the case B, the last approach as well. But here is another one. What if parts of the URL, URL encoded? The workarounds here again, the appending an increasing sequence number to each input URL to make it unique and generate its hash. And another solution could be to append the user ID, which should be unique to the input URL. If the user has not signed in here, it could be a problem because we would need that in order to confirm the uniqueness of the key. Even if after all these steps that we took to be sure nothing grow could happen, we still have a conflict. We can keep generating the key until we get the unique one. Wrapping up the basic system design an algorithm part. The second part here would be the data partitioning and replication of our service. To scale out our database, we need to partition it so it can store more information. We need about billions of entries in our database. And here we can take a look at the two types of partitioning that there are, which are the wrench based and the hash-based partitioning. Starting with the range based partitioning weekends portal URLs in separate partitions based on the hash keys, characters, or even one of them. So we can save all the URLs starting for example, wheat or letter a in one partition, and saving the ones that start with the letter B in another one, and so on. And looking in contrast at the hash-based partitioning in this scheme, we can take the hash of the object we are storing and then calculate based on that which partition to use. In our case, we can take the hash of the key or the shortening to determine the partition which we are going to start that specific URL. The next thing that we need to think about when approaching the system design problem is the cash part. So we can catch URLs that are frequently accessed as we already suggested earlier. And we can use any of the shelf solution that exist out there by memcached for example. But we can also store full URLs. And by doing that store the full URL sweet their respective hashes in their dedication surfer. In this approach, before you can think, the back-end storage can quickly check if the cache has already the design URL. And these way more optimal service. We also need to think about here at how much cache memory should we have. So we can start up. About 20% of the daily traffic and based on the client's usage patterns, adjust furthermore, how many cases servers we need. As estimated above, we need about 170 gigabytes of memory to cash 20% of the traffic. Since a modern-day service can have 256 gigabytes of memory, can easily fit all the cash into one machine. This was about it with the caching part. We can also think about the load balancer in our system, which is pretty straightforward. We can think about adding these load balancing layer at three places. And those would be between the clients and the application servers, and also between application servers and database servers and cache servers. Moving on to the dance thing that we need to think about when having these systems design interview question given to us is the purging or the cleanup of the database. The entry should of course not stick around forever in our database as it would fill it out and overflow causing a crash. Epoch. Is there a specified expiration time is reached? What should happen to the link? Well, if we chose to continuously search for experiments to remove them, it would put a lot of pressure on our database as well. So instead we can slowly remove the expired links by doing a lazy cleanup. And the way our service will ensure that only expired links will be deleted with B by whenever user trying to access inspired link, we deleted doing them entry code to the user. This way not deleting all the URLs expired at once and risking a database crash. We also can have a default expiration time for each link as far as this aspect goes. Lastly, we also need to keep in mind that the limit tree, which should answer some questions as how many times it's short URL has been used. What was the user location? What would be the date and time of access and so on. Security and permissions part, which is another important one. We can store the permission level with each URL in the database being public or private. For the private ones, we can also create a separate table to store user IDs that have permissions to see that specific URL. Of course, if it does not have that permission, our endpoint or x is in the short URL, can return an HTTP for a one code, meaning not enough axis. Of course. This was above all the angles that you need to keep in mind when approaching a solution to the shortening URL surface question in a system design review. I hope you guys really got something out of this lecture. And I thank you very much for sticking with me up to the end of it. I also look forward to see you guys in the next one. 5. The photo sharing application: Hello guys and welcome back to the system design interview tutorial where we understand how we can pass an interview that is made up out of C3b sign questions. In this lecture, we are going to discuss another very common problem that is given these types of interviews and take a look at exactly how can we solve them in order to pass, as I've said, the interview. The problem we aren't going to look at in this lecture is designing a photo-sharing service. Here the users can upload photos to share them with other users. And of course, some similar services can be Flickr or some social media platforms, even though they have more functionalities integrated within them. First of all, when approaching this problem, I am going to use the general template that is fitted for any system design question that can be given in an interview that I talked about in a prior lecture. And the first step after you thoroughly understand the question and ask any other query applications that you might have from your interviewer is to define the requirements and the goals of the system you are going to talk about. When designing our photo sharing application, we would like to have some functional requirements that include that a user can follow some other users that have an account on this platform. The user can also perform some searches that can be based on the photos that are shared. Did you search can also upload photos themselves from their account. And the system that we will implement should also generate and display some sort of news feed where an user, when it opens up the application, will see the top photos from the people he fought bows. These are, as I said, the functional requirements, but we also need to think about the functional ones. The non-functional requirement number one needs to be that the system should be very reliable, meaning that once the user uploads a photo on our surface, the photo will not be lost. Next, our surface needs to be highly available, so any questions not allowed. Also, we should keep our latency down for the news regeneration and stuff like that. Meaning the amount of time that the user will actually wait from starting the application up to the point where he will actually see some photos from other users that he follows. So now the first step is done. And we define the requirements and the goals of the system we are about to design. We can make a few other design considerations. Here. We can begin from the fact that all these service that we are going to implement, it's pretty obvious that will be very read heavy because net debt when users will actually upload photos, but a much higher number of users will actually see the photos that I other users saw. You can think this, like you would think when posting a photograph on your social media app. You post very few times, but you actually see a lot of other features in comparison to watch you actually upload. So that's why this system will be read heavy. From the fact that our service will be read heavy, we need to focus on retrieving. The photo's very fast. Now, the users can upload as many photos as daylight also. So that's a thing to keep in mind because we need a very efficient management of the storage for these photos. And also, I save said, if a user uploads a photo, this system that we designed, we guarantee that it will not be lost. Moving gone to some estimation and constraints for the capacity of the stuff we're going to store it. We can assume that we're going to have a 100 million users with about 10 thousand daily active users and 2 million new photos every day. That would be about 23 new photos every second. And these calculations that we are going to make here with take place at a further date where our service will be highly adopted and successful. So that's why you see me break up these very high amounts for these. An actual numbers. Now, the total space required for one day of photos, as I said, would be about 2 million because we said that we would have 2 million new photos every day. And we can think about the average photo file size, which can be a 100 kilobytes, that is going to be about 200 gigabytes in total for our capacity. Now that is only taking account of one day. But what if we point the space for, let's say, five years? Well, we can multiply 200 by 3655 years, and we are going to get about 365, so 365 terabytes. Now that we have this capacity estimation out of the way, we can think about the high level system design. Because at a high level we only need to support two scenarios. One to upload the photos, which is workflow in our service. And yeah, they wanted to view or search these photos. The service that we are going to implement, of course, needs to store the photos in some database. And we are going to need some object storage servers here. Furthermore, to database schema for storing these photos, he's going to also be discussed in the interview. We need to keep in mind that we do have about three entities. One which is going to be the user, next, the photos, and then the users that they follow. The schema can be containing three tables. When a photo, which can have a photo ID, which is going to be the primary key. And then a few other fields like the user ID, which is of course the user dead posted it, which is going to be a foreign key in this table. Then a very important other field here is going to be creation date, that is going to be of type datetime. And this is a very important field because it is going to be useful when user opens up its newsfeed, we wanted to get him the latest and most popular photos, but for the latest part we're going to eat that region beat field. Then we also have the user table, which of course will have a primary key of User ID. And then some other data that we wanted to store for him. If we want to do e-mail marketing can keep its email. But we also need the name, the date of birth, the date for each account, and maybe the last login. I'm thinking. We also would need a table for the user follow, which would take field that will both be part of the primary key. And that will be, of course, user IDs of the users that follow each other. So this of course takes into account on the scenario where the follow-up user each going both ways, like on Facebook when you add a friend. So if you'd accept you both each other as a friend, not like an Instagram where the follow can be unidirectional. So you can follow someone that would not follow you. We do not have that scenario in this discussion just so we can make it a little bit more simple. Now the straightforward approach for storming the schema detectors talk about would be a MySQL require obviously the joints, but we can store it in a distributed key value store to enjoy the benefits offered by NoSQL as well. After we talk about the database schema in our interview, we also need to take another estimation of the data size. So the D decides estimation on the contains information about the fields that are going to be stored in our database. And we need to think about the integers, strings and so on. Later going to be stored in our database. We can assume that each input query and the DateTime past four bytes. So each row in the users table will be about 68 bytes. If it would have, as I counted earlier, about six fields. If we would have, let's say, 250 million users, we are going to need about 16 gigabytes storage only for the user's part to be can be stored with all their information in our database. But what about with the photos? So each row in the photo table will be about 284 bytes Cs. We also have photo pack, which is going to be var chart up 265 characters and that takes 265, watch the photo entry which would be quite longer than the user one, but that would be pretty normal. If we take into account about 1 million new photos getting uploaded every day, we are going to need about 200 and A 50 megabytes of storage for one day. And that is going to be on ten years, about one terabyte, almost. The last table, each the user follow table. And if it consists of eight bytes, we only had the primary key composed of two integrals, which are each four bytes. We can think about 250 million users. They can follow another 250 million users and we are going to meet another additional one petabytes of storage for the user followed table. The total space required for all these tables for the ears, as I said, would be about two terabytes, which is actually pretty great for an application. And the surface of these types. Next weekend also took a quick look at the design of the components we are going to have in this service. The photo uploads or right skimpy slow. Read who are going to be faster, especially if they are being served from cash. Uploading, users can consume deck connections that are available because uploading is a slow process and that means that we cannot be served. Either system gets busy with all the right requests. We should keep in mind that these web services have a connection if we meet before designing the system. So if you assume that a web server can have a maximum of 500 connections at n time, then it can have more than 500 concurrent uploads or reads handled these bottleneck, we can split reads and writes into separate services. This will also allow us into the future to scale these operations independently and much easier since they are two separate entities. For redundancy part now, losing files is I've mentioned earlier, is not an option. We can start multiple copies of each file as solution to debt. Because if one storage dice, we can retrieve the fourth too strong, the other copy. But the same principle also applies to other components. So if we want to have the availability of the system that would be high sell our system would not be dying at any point in time. We also need to have replicates of the service running. Services died down. The system remains available in frightening as this redundancy removes the single point of failure in the system. That is a very important point. When taking any system design interview, you need to think if you are implementation is actually making a single point of failure S debt is a downside and it needs to be observed. And you also need to find a workaround for it. If only one instance of the services required to run it at any point, we can run a redundant secondary copy of the service that is not serving any traffic, but it can take control after the fatal for white deaf primary has problem. As I've said, it are two instances of the same service running in production and one fails. The system can fail over to the healthy copy. And this can happen automatically or require manual intervention, even though if you do not go into your system down at any point in time, you would better implement and automatically recovered. Because otherwise, you would need to do observe that your system is gum and then manually intervene with the developer debt can of course by himself reroute the 17th 2D available one. We can also talk about the data sharding. Because of course I think such high amounts of data, we need some way sharding them. We can chart on the user ID so we can keep all the photos of a user in the same sharp and defined DB short is one terabytes. You would need four charts to store about four terabytes of data. Now, for performance and scalability weekend keep about ten of them. Again, FOR system grows larger and larger, we are going to be able to handle all debt traffic. We are going to need to find the sharp number by user multiple of ten. Let's start there. Do uniquely identify any photo in our system. We can append this number to the photo ID, and that would be a pretty simple solution on that regard. How can we generate the photo ids while each database can have its own auto increment sequence for the photo IDs. And since we will append this idea of the sharp photo ID, it will make it unique throughout our system. Even though we came up with this partitioning system, it has some downsides. For example, if we cannot store all the pictures overusing my chart, we have to distribute that to multiple them and will cause higher later disease. And also the heart users need to be handled quite differently because they are followed by a lot of other users and a lot of people. They will see their upload the photos and debt can get cause problems in that regard. We can of course, partition around the photo ID and not the user ID, which can be another approach. Because if we generally these photo ID is first, we can find the sharp number through again, the photo ID module, which means they are less digit basically. And of course, the above problems that we just talked about that were the downsides after first approach would have been solved at this point. Again, to generate the photo IDs, you cannot have an auto-incrementing sequence because you need to know photo ID is first define the sharp solution here could be to dedicate a separate database instance to generate these auto-incrementing ID, we need to also think about the future a lot and to blend our growth of the system. If we have large number of logical partitions to accumulate the data growth in the beginning, multiple logical partitions inside on a single physical database, CCS database, or we can have multiple databases systems running on it. We can have separate databases for each logical partition on any server. Whenever we feel that a particular database server has lots of data, we can create some logical partitions to another server, and we can maintain a config file. And that would constitute the only thing we would have to update to announce an eventual change. The ranking and newsfeed generation is another problem we are faced with. The way we are going to deal with it. Or at least one way that we can deal with it is to pre-generate the US feed. We can have some servers that their only task would be to generate users to continuously and stored them in another table called user news feed. When the user needs belly this photo for new suite, we will simply query this table and return the results. Leslie, we also need to think about the cash and load balancing of the system as it will be very used one, we need to introduce a cache for me to Data Service to catch the database rows. That means the hot folders and users. We can use demand case to case this data and application servers There'll before hitting the database can quickly check if the cache has the desired routes. Here, an approach that would be reasonable would be the R U, which is the least recently used. For the cache eviction policy. We can build the integrant more intelligent Kiersten, these using the 8020 rule, which states that 280% of the volume of photos that are going to be read daily, we'll generate 80% of the traffic, so we can just cash that first 20%. So this was about all the steps and endless data. I would consider a powerless at the system design interview and I would receive a photo-sharing service design implementation. I thank you guys very much for sticking with me up to the end of this lecture. And I look forward to see you in the next one.