ACID Properties in Database Systems by Example | Hussein Nasser | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

ACID Properties in Database Systems by Example

teacher avatar Hussein Nasser, Author, Software Engineer

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

8 Lessons (1h 18m)
    • 1. Introduction to ACID

    • 2. What is a Transaction?

    • 3. Atomicity

    • 4. Isolation

    • 5. Consistency

    • 6. Durability

    • 7. Eventual Consistency

    • 8. Class Project - ACID by Example with PostgreSQL

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

ACID are four properties of relational database, Atomicity, consistency, isolation, and durability, and I think anyone working with a relational database like Postgres, mysql, sqlserver oracle, should understand these properties. In this video, we will go through the four properties and explain why each is critical to make a relational database we will also talk about why some people are moving to NOSQL database. 

Meet Your Teacher

Teacher Profile Image

Hussein Nasser

Author, Software Engineer


My name is Hussein and I’m a software engineer. Ever since my uncle gave me my first programming book in 1998 (Learn programming with Visual Basic 2) I discovered that software is my passion. I started my blog, and YouTube channel as an outlet to talk about software.

Using software to solve interesting problems is one of the fascinating things I really enjoy. Feel free to contact me on my social media channels to tell your software story, ask questions or share interesting problems. I would love to hear it!

I also specialize in the field of geographic information systems (or GIS). I helped many organizations in different countries implement the GIS technology and wrote custom apps to fit their use cases and streamline their workflows since 2005. I wrote fiv... See full profile

Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. Introduction to ACID: Acid R4. Properties of relational databases. They are atomicity, consistency, isolation, and durability. And I think any one working with relational databases, like ball screw is my SQL SQL Server. Oracle should really understand these properties. In this video, we will go through the four properties over relational databases, explain why each is critical to make a relational database. We'll also talk about why some people are moving away from relational databases to a no SQL database, kinda ditching one aspect, one vertical of these properties, which is essentially the consistency, right? Alright, agenda. Here's where we're going to discuss in this video, guys. I want to discuss the four properties, but I want to first talk about what is a transaction. I think this is important and I don't want to take this for granted. That people, you know, what is the So I'm going to go quickly through authorize transaction. We're going to talk about atomicity. I flipped the two properties, isolation and consistency. So acid, but I am, I changed it because there's a reason token Barb isolation will, domestically will lead to consistency in a very nice way. We've gotta talk about consistency that talk about a little bit of our NoSQL databases, why people are moving to Nozick or why they create an RC cold to begin with. And then fourth, talk about durability, just kinda very straight forward thing. 2. What is a Transaction?: Transaction. What is a transaction? We hear this a lot in databases. At transaction. It's nothing but a bunch of queries, okay? It's a collection of queries is always a one or more queries. And that property of the queries are, are, they are u1 unit of war. An example would be if you want to deposit an account, right, and some cash into an account. That is actually a select statement followed by an update statement, followed by another update seven. Like we know we're going to show it in a second RAM. But each one of those, well, the three of them can form a transaction and you define all the transaction begin and where transaction end. And we also, there is another semantic attach death, resurrection like a commit, hey, I'm happy with my changes. Go persist them, commit. That's what a commitment. Rollback. You know what? I think something went wrong. Rollback, abort, abort, abort. So that's another piece of concept that attach why transaction. We know that. Let's just jump into an example. In a transaction, I have an account here with a beautiful table, just do field because that's all we're you need obviously for an a bank right? Feels like count and you don't need a name, you don't need anything else. No date and just do field. So there's that God ID, there's a balance. This guy has a $1000 guys ADA to 5,500 and the transaction is send a $100 from account one to account to. So what do we need to do? First? Does account one even had a $100? I don't know. So I need to check. Right. So that's gonna began that transaction. Get I select. Let's check the balance. Do you have a $100 app? Yes, your balance is greater than a 100, Yasser? It is greater than a 100. Alright, let's debit that 100 minus 100. Okay? So now it's 900, okay? But it's not committed. He has just the right is just touched. It's a dirty change again. And then I'm going to add another 102. The second account going to commit my changes turned green. That means it's like it's persisted. All good, we're happy and that's it. Okay. So that's essentially what a transaction is. It's a unit of work, but it consist of multiple queries, one or more. That's a transaction. 3. Atomicity : Let's talk about the first property now that we understand transaction. Atomicity. Atomicity is Kim from the word atom, right? In science or as an atom is the smallest piece of a unit that cannot be divided. You cannot divide an atom unless you're doing a nuclear fission and all that stuff. But let's assume atomic cannot be divided, cows be split. And that's what a transaction is. A transaction should be atomic. When we say a transaction has atomic, that means all of those queries should seed. And if one of them fail, you better failed the whole thing. You better roll back the whole thing. And now we know what a rollback is. Does that make sense, guys? Very simple concept. Okay. Why does it matter though? Why does it matter? Why should I care? Okay, why are you forcing that? Well, you got to know us. Are they the same example without atomicity? Let's say I don't have atomicity. I don't enforce this in my database or whether the vendor of this database, Oracle Postgres didn't implement that atomicity which is bad or all relational should implement it them as, bless it. I'm going to deposit the same transaction. I'm gonna do it, but we're going to add some will be spices and double of that. Again, there's that isn't going to debit a 100 Yukon. Ok, I need to check this this guy had a 100. He does, sir. Okay. Cool. Go ahead and debit that. Go right with a 100. And why that happened. That's an homage in just a slide guys bariatric. You database crashed. Something habit. Un back, restarted the database. Guess what? That's the last thing you did. You debit in one account by you, but you didn't get to the other part of the transaction to actually commit and change that to 600. Now you have a $100 and the thin air, it just went away, right? That's just bad ish, right? That's very bad ish. Okay. After we restarted the machines accountant get credit obviously. So that's what I thought was transaction is a transaction that will roll back all queries, all changes if one of the queries fail. So that's what it means essentially, right? So if this is what we just did is not atomic transaction, what happened is as bad, right? So the correct way is to just roll back and roll back list of 1000. Does that make sense, guys? That's 100 Thomas, it is. 4. Isolation: Very simple stuff. Jump to the second one, isolation. I skipped consistency because we're gonna go back to consistency. And later after isolation, this is one of the longest, and in my opinion, one of the most critical properties a software engineer really should understand, really should understand isolation. What is isolation? And isolation is? To be isolated as I want to stay alone and I don't want to interact with the world, right? Okay. Let's take this in the transaction. A transaction is to have a transaction that isolated, that means an in-flight rises while I'm executing the transaction. Can this transaction see other changes made by other transactions? There are in flight. Ok. So that's what it means. I am now executing my stuff. Can I see other stuff people are making OK. And is that ok? Right. Am I isolate from the other ward? Ok. And it's really up to the implementer, the software engineer here, to adjust this knob of isolation that we're going to talk about. Okay, do you want noise, religion? I don't care. You can change anything you want and I'm going to read whatever you change and that's maybe OK. But some cases it is no. No, sir. Okay. We're going to talk about that isolation. We are talking about as a lack of isolation. You get these read phenomena like has a fancy name for some reason. There's this, you're gonna get some read. Ugliness, read weirdness, right? Some weird stuff while you read in transactions. And as a lack of isolation, we're gonna talk about that relies way redone phenomenon. This is dirty read and all that stuff. And we're going to talk about isolation levels that a database can implement to get read of the read phenomenon. Okay? So without isolation, you get a bunch of free phenomena as we implement isolation levels to fix those problems. Okay, let's talk about, let's talk about the read phenomenon. We have three, I added the fourth one. I don't think it's a read phenomena. It's one of the bad things that can happen as a side effect, but I added it anyway. Okay, so the first one, dirty read their two reads essentially as I am and and we saw that, right? I don't think we saw that. Yeah. But essentially, dirty read what it is as I am in flight transaction and some other transaction made a change, that transaction did not commit that change, and I read that change. Ok. So any minute that transaction can't roll back that change and I just read a bad value, a dirty value that hasn't been committed. That's what it means. Essentially, non-repeatable reads, we're going to talk about that. So I read a value that is being committed, right? Not necessarily committed or not, right. But then I turned around and my Sim transaction and I read the same value, whether the same query or other query that give me the same path to that value, like a range query. And I did not get the same result. So my results, my reads were not repeatable in my same transaction. Some, some users are fine with that. Some users are not fine with that. We're going to talk about that. Okay. Phantom reads, I inserted a new another transaction inserting a new record, and I pick that up where I shouldn't have picked that up. It's usually happens with range queries. Worst updates, like when when you're started to change something, right? You change in transaction, you change a value and some other transaction overwrought your value before you even committed. So you get a commit, a wrong value, so you just lost your update essentially. And these are the four phenomena. Let's talk about each one of them. Read dirty reads. Let's talk about dirty reads. So we're talking about a reader or dirty reads. I am going to read a value that hasn't been committed by another transaction. Let's say I have a sales table. There's a two products. Product one. Product two. There's a quantity of sales is Barack Obama has sold ten units at a price of $5 product to 2004. And I am asked to write a query to produce a report of myself. So what I'm gonna do is select the entire thing and some of those guys. So product one has sold what, $50 for act two has give us like $80. And then at the end I'm going to just view the sum of all the sales, okay? Very simple stuff. So to do that, again in a transaction, okay? Even if we don't really need to commit or change, you can also begin a transaction, Ryan, and that's, and that can have its value. I'm going to begin a transaction selecting the product ID and then multiplying the quantity down surprise, obviously. And then from sales, that will give me y we just saw right? Product 150 to that sounds good. But here's the trick. While this is executing another transaction just began, right? And talk. Another quantity is just added some, some someone made us another sale, right? And they updated the sales report where the PID is equal one. So now we just increase the quantity by five. That's pretty good. But I don't know if it's good because we just read this. Now. I just have an old value and might be that, okay, well, let's see what will happen, right? So i didn't commit this transaction to yet, right? While I'm doing that, I did the second query which is select some quantity times price, which is literally this times this plus some 20 plus 48 plus guess where this is giving me a wrong result. This is now reading that dared t value that never been committed yet. This guy's running it. And that's a dirty read, right? We're reading the bad value. Why? Because it's leading to bad results in 50 plus eight is 130. But I am showing the report 155. So you can see dirty reads are bad. I'm not saying they are always bad. It's up to you if you can tolerate that or not. If you are building and stick around, for example, you don't really care for the number of likes doesn't match the number of users who actually like the picture, right? So you might, there's a price for everything, guys, we're going to talk about that. Or you cannot just say, yeah, I'm going to implement the best isolation ever again. So there's the first read phenomenon. And we commit that transaction obviously, but it's, it's useless. We got batteries though. So that's their theory. Let's talk about non-repeatable reads. And non-repeatable reads. Is he said, read that. I'm going to read something and my transaction, and I'm gonna read it again, I am getting two different values. You might say, why are you reading the same values twice? Or why are you doing that? You just read it while you're reading twice. There are believe me, when you're writing code, when you're writing transaction, you will need to do that sometimes. And we're going to show you an example. I had to think about an example of this, but there's an example, same thing right? Beginning of the transaction, I'm gonna select the account report, right? That sounds good. 5080. And then this is the total of this is 130. But guess what? A transaction began at the same time and it updated the account. Sounds like is the same transaction, but here's a little bit different I did here. I actually change that. Okay, now it's 15. And I also committed that the difference between the first one and so what I'm doing next is actually not a dirty read, right? I'm reading a committed value. It's a legit value. But the question is, should I committed value or not? Ok. So to me this has been committed. It's a legit value. It's not like the previous one where someone actually rolled back, right? Or, or could, could be rolled back. It's not a dirty value. It's an actual committed value. So I'm going to turn around, do the sum, but now I'm going to get a bad result. Despite me committing, I get a bad result. So having it committed or not doesn't really matter. Now, I got this phenomenon of a non-repeatable reads in this example because their first read gave me that some secondary give me a different sum. Okay? So that's the non-repeatable reads in itself. Okay? So non-repeatable reads, or we read this first together the results, then we read it again, we get a different result. So that's another repeatable read. Again, discuss ways to fix this. How can you we can talk about things, how to fix them, okay? Non-depletable rate, right? And obviously gonna commit it then phantom, phantom rate. So what phantom reads is actually interesting because here's, here's an example of a phantom read, write essay. I begin a transaction, do the same thing, selecting count and then sum. The whole thing, will get one 30-50 sales and 18 red. And the same time here's the other transaction. It didn't actually update. Just inserted record. It's just insert a new cell for a totally different product. And now you committed that. So it's changed, it's in the database. And the same time the sick, the first transaction is still executing. I am selecting all the count prices wrong, sales. That is bad. Because now you're just accounted for product three on the first sheet of your paper report, you did not account for whom? Right there now, the sum of those overdoes, they won 30, but you shine on 40. That's a phantom read. And there isn't this phantom R3 because this is one of the coldest range queries where you're selecting, But you're getting extra rows. So if we're gonna talk about ways to fix that, a little bit harder to fix. And to avoid the previous ones, you can implement locks and version controls to get away with it. But this, this is a new value. How do you lock it? It's a new value. We cannot say, hey, don't read that, right, as just really interesting. And that one of the ways to fix that is actually serialization isolation level. The thing about isolation levels. This is very interesting guys. I really like this topic. So isolation levels are levels of isolation dogs and what, what are you talking about? Isolation levels are level that implemented by the database to fix those phenomena that we talked about. And there's a nice stable by Wikipedia here then I'm gonna reference here is very nice. It just explains the whole thing here I'll get. So read committed is the first type of isolation, which is, I don't know, I call it isolation level, is that literally offers no isolation whatsoever. That transaction, when your transaction starts in a read and committed, whey is not Ghana is going, it is not going to outperform any isolation. And he changes, the database is happening. You're gonna see it again today. Get dirty reads here obviously because you're just seeing everything. You're gonna get. Non-repeatable reads. Obviously, you're gonna get phantom reads, right? And you get it lost updates as well, get the read committed as each. There is little bit of isolation here and most databases implement that. And most of the time this is, this is the one of the best transaction isolation level because you're happy with that. You're happy to read committed stuff. But here's the thing. Each query in that transaction only sees committed staff at that time of their query. So each query you execute Ghana C, The New committed values at that time. And that could be fine, right? Sometimes it is not. Okay? So that's what it means to read committed. You are seeing and viewing the committed staff at the time of the query, okay? So anything that was committed, you're going to see it. Okay? Obviously this does no work on our example that we show because we shot even that is committed. The report came out wrong. Right? But there are certain cases where recombinant is enough for you. Yeah. Repeatable. Read each querying the transaction only sees committed are birthdates at the beginning of the transaction. So it's the repeatable read isolation level gives you the ability to see committed values, but not at every single query at the beginning of the transaction. So once you begin a transaction, you will only see changes that have been committed before you started that transaction. Okay? So that gives you a nice isolation tie to view, and that's really powerful stuff. So now I'm not gonna go through the implementation aspects of this database is some database and implements versioning to implement this like a, when you start a transaction like Cassandra, you start a transaction, I am inversion 0, okay? And then any thing I read is belonged to her version 0 of the database again. So if other transaction are committing stuff, changing stuff, they are ablating to version one and version 23. I am only reading version 0, and that gives me a nice consistent view which we get that right. So repeatable reads gap. So this way I guarantee that I'm gonna read the same repeatable read, right? Gotta get a saying get the same read because I'm reading the same version, right? Some databases implemented as a form of a lot, which is a little bit more expensive. Okay? So you have you establish an exclusive lock and say, you know, I am only reading this, please. Guys, don't change this, forget, don't change it. So if once we established an exclusive lock, we cannot, nobody actually can, or nobody can change that self-doubt. Sorry, it is not an exclusive lock. So so the LOC approach, another approach to another approach to implement repeatable raises. You're gonna start, you place a shared lock so you guarantee that nobody can actually change your data. Because everybody that needs to edit, the need to establish an extra clues exclusive lock to edit these rows, right? And you'll get about what to read. And then once you do that, you released the shared lock again. So he implements some sort of a shared lock, but you just blocked people from editing. So versioning is kinda a little bit better. But there's pros and cons for each one. Obviously serializable, the easiest way to implement and the kind of a slowest. And the performance goes down with every level of isolation. So it's like I think of as a chart, right? This is no isolation. This is the highest isolation level, but this is the fastest thing, this is the slowest thing. Got. The serializable is essentially each transaction has to be serialized, right? So if you, as if they are city was almost like serials, right? So transaction just followed by another transaction, they cannot be executed in parallel. Essentially, the try as much as possible. The database does a little bit tricky things to, to execute. Some of them are parallel. But most of them I, you can think of as a serialized, right? So if your serialized, nothing running in parallel, you're happy, right? This is the best thing ever. And if you're not nothing, nothing and running parallel, then you're not gonna get in contrast issue. You don't have isolation problems anyway. So this is the best way. Most of the time this is implemented at the application level. The application places a lock and says, guys, stop touching my stuff, I'm updating it or I'm reading it. Alright, and then start reading everything and then it goes away. That's the table I was talking about with. From the Wikipedia. Yeah, read uncommitted level, you were all of that stuff. Phenomenon can occur obviously goes we don't care. Read, uncommitted, read anything, right? Any change that happens, dirty reads, lost updates, you get non-repeatable, you get phantom reads. Rayleigh committed. What are we going to do is essentially read only the committee staff. So you're not gonna get dirty reads. You know that because you're reading only committed stuff, but you're going to get her under non-repeatable reads because each query is getting the new committed stuff, right? So if you queried the point, the first query of the transaction, and then you turn around and query again. Between those two query, someone might change the value and committed it. Uranium the committed value, but a change. So it's non-repeatable and that's my not be desirable. Obviously my get Phantoms as well. Repeatable read. You're not gonna get dirty reads because we're eating only committed staff by an edition of that, we're only reading committed stuff at the beginning of transactions so you don't get the repeat, you're not gonna get a non-repeatable reads. All of yours are repeatable because you're reading the same version of the database at the start of the transaction, or that the database will force you to places a lock and then will only read the stuff at the beginning. Okay? So that's essentially a non-repeatable reads. And, but Phantom still can occur as a result because you cannot control renewables. You can only controls what? You can only control rows that you can lock, right? And this is debatable with versioning. You actually can get away with it, you write this. So this table is not entirely accurate. With if you implemented versioning with repeatable reads, you can actually, you might get away with repeatable reads, right? If you implement a burgeoning, I think rocks DB, the Facebook data storage engine, and thermostat versioning, serializable, best thing ever. Gonna save us there. But if you serialize your transaction, you're not gonna get any of that stuff, right? You are guaranteed that nobody touches anything unless they have a lock, right? So you start with transaction, you place the shared lock, right? And if someone want to diminish exclusive lock, they will get, they will get an error because there are shared logged there or read blogs. So and we talk about shared lock and read locks. I'm going to reference the video there. 5. Consistency: Alright, consistency guys. Alright, so I really needed to talk about atomicity and isolation, forced to talk about consistency, right? Although consistencies, The second property or an acid, right? And there isn't as atomicity and isolation leads to consistency in my opinion. And the way I see it, there are two types of consistency that way I see it here. And the first one is consistency in your data. And the second one is consistency in reading the data. And let's talk about each one of them and, and, and let's talk about what, which one of those, the NoSQL guys came and tried to improve upon and try to read candy relaxed here. Consistency in data. Because this is data is essentially is defined by the user as something that the user defines. And their table schema says, hey, this view and this view should be consistent. Like say the some of the money in the stable should equal the sum of all the balances or the number of likes. And this, the picture should number, it should equal the number of the users were actually like that picture. So this is a consistent view that the user-defined. Usually it's enforced by referential integrity like foreign keys and primary keys. So you ensure consistency in your data. And it's also ensured by atomicity and isolation. We saw that we were number when we got the blue screen of death and the metal right, we just lost my data, right, just went away. And I got an inconsistent balance sheet. As a result, we got nine hundred and five hundred, where a 100 just went away and the thin air. Okay, so a total lack of atomicity leads to inconsistency in your data. Okay? Isolation. Do I need to say anything? We saw that stuff. We're either isolation rank, giving all this false reports as a result of lack of isolation, being the moment people are changing my product sales. While I'm reading this data, I'm going to get a bad result. I'm gonna get an inconsistent result. Although my product says the sum is 130, I can see it. I can sum it, but the actual total is saying 155. That's bad. Okay. So a lack of isolation leads to inconsistency. It's up to you as a user if you can handle if you're okay with that inconsistency or not. Sometimes it leads to corruption inconsistency. If you're not happy with that, like bank transaction, you cannot probably be happy with that. Product sales. Probably not, that's bad, right? Textures and likes. We're going to show that example. So like this is an example, right? Where consistency in my data is really not a big deal. Like we have two pictures here. There's a blob of that picture. Say this is an Instagram implementation. We have a blob of the picture and we have the number of likes. Wholesale might say, oh, say Why are you adding a field called likes? What, what is that? Ok, why don't you just query this table and just get the number of lights. Performance is the best. Short answer for this, right? So you would add a likes field here which will contain the total number of lives in this picture that this picture got. And that is the another table called picture like, which includes like, Hey, John liked the picture number one, Edman lac region number one. John likes region number two. So if you sum this right, picture number, I've got two likes from John and Edom and that's correct. And then John picture to God one like this one from John. That's a consistent view. But if you saw like for example, this is four and this view doesn't represented, that's an inconsistent view. That's up to you if you are happy with that inconsistency or not. And that's a very critical question because based on that, you can adjust the performance, you can adjust the consistency, you can adjust scalability. You can just isolation, right? Based on what can you give up as an engineer, as a software engineer, you really guys need to think about every single aspect of that. What can you give up? That's why you have to understand the requirements. And then ask yourself, what can you give up? You're telling me that Instagram, Kylie Jenner gets like 5 million likes on a picture, five million and thirty-two likes. If I clicked on that and view the 5million, Are you telling me that you're going to match ie betting everything you want, there are no matching. This, this is impossible to get Incas consistent view on this. And they don't care because nobody is going to have Federal through 5 million users anyway. So they can give you an approximate number and they can be off. Those two views can be off by even a 100 thousand. Youtube is the same subscribers, right? Do you, why do you think YouTube is showing you like my subscribers 6K or six. Okay. Because they cannot guarantee you that I have exactly 6,032 subscribers, ok, or 23, or even purifier. And he has like one no, almost no, 95 millions of drivers. That number is not exactly accurate to the actual number of subscribers that he actually got. It's an approximation because they cannot, they decided it's not important to maintain the consistency or this thing because, why? First, because nobody's going through 99 million subscribers. So they are referring performance over this consistency, sorry. So by going to a tangent, but as critical to this, let's talk about the other costs concept of this consistent in data. Sometimes it matters, sometimes it doesn't. Consistently consistency and reads. Same thing. Sometimes it doesn't, sometimes the matter. So we'll think about that. No consistency and reads is very interesting. And what it means is like, if I update something for transaction update something to the database, like say update a value x and that x gets persistent and then another transaction's done reading that a better get the value x. Hussein, What are you saying? That's obviously TAF rekindle, right? Obviously. Das to get the same value. But not necessarily. We're gonna talk about the concept of eventual consciousness xy here. So that's an consistently read, if I committed something, can new transaction see it immediately. Okay. And here's the thing. This type of consistency is an available on both relational and NOSQL database is a big statement here. Okay, both databases suffers from this consistency. Including Oracle, including Postgres, including my SQL. Those areas are not consistent in reads. And let me explain here. When you have a one server and you're committed something to it and you read from that one server, life has gone, right. Life is perfect. Because you have one frequency server. But the moment you start adding other serverless, Greg, this is paused or MySQL ramp. What you're gonna do essentially is you're going to have one server unused, creating a replica of that server. And that is even when we are applique R2 where applicants is gonna start pumping data to the replica. Right? Now, obviously you need to add multiple replicas for horizontal scalability, right? You cannot serve 7 million people of one database. Database. Youtube started doing that with one database and they scaled it up to multiple. And they implemented now fit tasks, which is this new fancy stuff on top of my sequel that gives you like sharding on the fly. Sharding and our staff we're gonna talk about in another video. But essentially remote, you break things up into these replicas, into this essentially the other databases, right? The secondary databases and the follower, leader or follower nodes right now. But when you start doing that, you are inconsistent, sir. You will become inconsistent because you write to the primary node and someone else read from the secondary node, the secondary node takes time to get the value propagated. There is networking going on, there is delays, latency until the secondary naught gets the new value. So you're gonna get an old value, my friend, you're gonna get an old value, right? And that is inconsistent. So this problem, and a lot of people get this wrong inconsistency. So the relational database is inconsistent in reads, right? But the moment you start breaking them into horizontal scalability and multiple servers, you're gonna get inconsistency, right? So they are consistent when there is one big, nice server, right? The moment you break now they are inconsistent reads, you're going to read it, but someone who get to read an old value, and it's up to you now, as an engineer, are you happy with this old value? If you slightly older, subscribe or count. Okay. That's completely fine. If your video, the latest view and and that's what YouTube does right there, money servers and all these rights going to multiple databases. And then they eventually sink back into one big server, one big database. And then if you read, if I'm reading that view versus someone from Germany versus some are from Japan reading that same B, you're going to get different results because we're seeing different replicas were reading from different follower nodes. And that's okay, right? Sometimes that's OK, sometimes it's not. So now there are a bunch of people who say, You know what, we don't want to enforce that thing is all the stuff you guys are doing silly. This acid thing is silly. And the fact that scaling relational databases has been always hard, right? Because was designed in the seventies to be a one big beefy machine, right? One big beefy server. The database is there and that's it. Alright, and this era, those guys know what, I'm going to give up consistency because you guys are already don't have consistency, right? I'm gonna relax a little bit of this four properties. And I'm going to give you a better scalability and performance. So I'm going to scale horizontally, going to add a bunch of other servers on the size, and then just start sharding doing all these things and distributed manner. Ok, so that's essentially what they did. And those Thus the no SQL databases came into picture, okay, so they give up consistency or, or on favor of scalability, which the relational databases are really, really hard to do, right? Raise your databases whether you have to implement the follower node and leader node in this start, replicating, pumping changes. And the SQL one gives you a scalability. You can add just nodes and there will scale nicely. But if you are looking for isolation and all that fancy stuff, you're not gonna get it. Obviously. Gonna get all these properties of this u, you will get this new concept called eventual consistency, which is to me as just a marketing term, okay, eventually everything is eventually consistent Tao. So it's like if you, if you read the value, the old value, if you wait a little bit, you're going to get the latest value essentially, right? So to me, a visual consistency both relational databases and non-relational databases suffer from this, gave enjoy consistency. 6. Durability: Durability committed transactions must be persisted in a durable, non-volatile storage. So as y means, if I were right and committed transaction, it better stay there when it's when I when I lost power, I recommended a transaction I lost power and that it was shut down. If I restarted the value better be that. That's what they're built. Durable myth, right? So read as for example, is not a durable database. It's an in-memory database, Memcached D as an in memory database, it's not durable and it's up to you. If you're caching stuff, you're doing doing stuff for caching, you might really don't care about being durable, right? And so a lot of databases talk these for property and just starting hacking one of them out and then creating a brand new database or of air. So red is toke the durability out. And the greater this in-memory database thing they have, I think persistence. I didn't work very closely with the artist, but the in-memory, they have persistence model. I think Cassandra and MongoDB decided to say, you know, I don't need consistency. I gotta give you consistency when you write your query, right? But we don't use consistent and you don't have a schema to begin with. There is a lot of things that God relaxed, not only just the consistency part of it, the snow schema nor referential integrity. You don't get indexes, secondary indexes, all that stuff, you don't get any of that. So really, that's probably another topic now, sequel versus ego by guys. Long video. You made it. Thank you so much for watching until the end. Alright guys, I'm gonna see you into the next one. You guys are awesome. 7. Eventual Consistency: Consistency is the property of having Expected results in the view of the data or during reads while working with a database system. It is one of the acid properties in relational database systems. Eventual consistency is another term that was born recently, specifically as a no SQL databases started to emerge. In this video, we will discuss and talk about the different kind of consistencies. And we will explain what eventual consistency mean, how both actually relational databases and NoSQL databases suffer from this eventual consistency. Something that is not very clear of. There are a lot of people think that eventual consistency or something for just no SQL, right? But that's not entirely correct. If you wanna find out how that is the case, stay tuned. There's agenda, hers or we're gonna talk about today. In order to talk about eventual consistency, I really need to talk about consistency. And I have two types of consistency here to discuss. And it's very critical to disambiguate this, these two types of consistencies. Okay? And now we're gonna talk about eventual consistency but less. Let's learn about consistency first. So you can have consistency in your data or consistency during the reads. And these are two separate concert that we really need to understand. I'm going to talk about each of them now, consistency in the data specifically, let's take an example here. And it emerges when you have multiple views and representation of your data, especially with multiple tables, foreign keys joins, that kind of consistency emerges when you have these multiple collections of data, okay, Deborah represents kind of the same unit when you have essentially a normalized view of your data. That's saying an example. Let's say this is Instagram. I have a table called Pictures. I have an idea, the blob and the total number of likes that this picture received. And there is another picture, another table that actually list all the users who liked what pictures, mixins. So theoretically these two views present the same thing which is a picture and users ramped. But in reality, those two better be consistent, right? That means if I have picture one, got two likes them and if I go to the other table, there must be two records. Okay, so that's a consistent view of the data itself. And there are a lot of things that ensure this kind of consistency, especially when you're updating multiple things and database, relational databases are great, or this, especially with atomicity and isolation. And it makes sure that you're touching as much as stable as possible. And it's going to ensure that consistency for you, okay? And the data, let's talk more about that load bit consistency in the data as defined by the user. I split those two views into to multiple tables, the pictures and the user like balance sheets and the sales, all the stuff you split it up and new normalize your database the way you want. But it's essentially defined by you as a user and the user years, essentially whoever designed the database, right. You get referential integrity is you'd get foreign keys that represent those tables. You delete one, you delete the picture, right? And the bank example, if I delete this picture, all those records will go away, right? Especially if you have cascades that are new foreign keys and all these things that ensure your consistency in your data and that view, the simple view, narrow view. Atomicity and isolation are two properties of the database. Relational database that ensures if you think about it, is consistent. And when we talked about this, but essentially I chose atomicity is like all or one. If you have a transaction with multiple queries touching multiple tables, all of those should fail, right? If one of the veil, okay, and if all of them succeeds, that gets committed. No SQL databases, most of them do not have that atomicity on different views. They have atomicity on a different set of collection. If you're uploading a collection, that's all you get, fewer touching other stuff you cannot really ensure atomicity and isolation is the abilities like fire, minimum flight transaction can changes to other stuff being seen by me like other transactions are executing transactions or changing. Can I see these changes at the same time? And that obviously leads to inconsistent reads, inconsistent results. We talked about that in the other video as well, right? Different INR isolation, repeatable read and an uncommitted and all that stuff serialization. Let's talk about the second type, the 3D consistency. If I have a database here and I say, hey, database update this field started to feel to value x, OK. And then if I do that, that value will be updated. And consistency and read is nothing but if a transaction now comes and try to read the value, the new value, That's what it is. Consistency in reads me. Now we would say Hussein, What are you talking about? Of course, that makes ends and the fire updated that value. Of course, a new transaction will pick it Arab 20 token about this style of consistency. Relational databases and NoSQL databases suffer from this, right? We cannot guarantee this. If we have one database, life is dandy, right? If you have one server, everything is nice and easy. Relational databases the best if you have one single server, but that's reality. You never have one server. If you have like a million user, you need to scale lots of reads, lots of updates are going to add caches. You then add nodes, follower nodes you're going to become when you have a one leader node and one multiple follower nodes. And, and those follower nodes will take on your reads and we'll have one reader or writer node, right? Let's take an example and talk about this eventual consistency concept, right? And the reads essentially. Ok, so now let's say I have this master node horror, the leader node, and then you have to Follower nodes here. Okay? That is essentially just getting whatever the master node or the leader not get's, it will use. They follow the values and will get propagated. Those values will get propagated to those follower nodes. And we point the readers to this database so we can offload this huge number of these million users to these databases like non-permanent behind load balancer or something, and then just offload these huge number of reads. Ok, sounds good. Let's say I want to update the values E. Now, this is a consistent view. All of these databases are right, have a value of z. I am going to update it x, right? Okay, so sweet. My leader node now have a value of x. But guess what? At the same time some read, someone's started to read that value, but they got what? They got z. So now this system is inconsistent to me, right? It's not consistent. Despite it being relational database. This is a Postgres database, right? Three postgres instances, it's not consistent. It is eventually consistent. And we're going to talk about that now. So we read that we got an old value. We just broke our rule, right? We said, when I write something and I another transaction, read it a better get the latest value, it did not. And that is why we are inconsistent here. In this case. This poor slob got our old value. That maybe that's okay. But I'm just saying now, with time x or, or the leader node will start pushing the new value to the follower node. This guy will push the new value and now an consist in new reads will start reading the new value. So that's what eventually consistent chin system system is. And this is, this is a problem that is in both nosy and relational. So eventual consistency is not something for NoSQL databases guys. Okay, so we need to really clarify that. And first, what do you mean when you say consistency, right? What do you mean by a consistent system? Are you talking about the reads are already taken above the actual view of the data is that consistency, that data itself or it may different tables because that's great. Yeah, that update my touch seven tables in an atomic manner, in an isolated manner. And that view is consistent in this database. But the whole system to the user. It is not consistent because you introduce those new hubs. The same thing with few introduced it a cache, like a writers or Memcached D, right? The moment you introduce a cache, you're inconsistent. Whenever you have a cache, you're inconsistent because you will remain inconsistent unless you update the cash, okay? Once your data is at two places, you're inconsistent, you will become eventually consistent. Which is, this, is this just a marketing term to me? Alright? Yeah, of course this is eventual consistency, or in this case, yeah. And eventually these values, you will get it, but it's up to you now as a user, as a software engineer, can you tolerate that? Is that okay? Is that okay if the user read a value and they got seven thousand, seven thousand likes, and instead of 17,011 likes. Right on Kylie Jenner's latest picture. Okay. Maybe not. I don't care right now. I don't care. If that egg picture God, 3,002,000 thousand views that even a number right on a picture versus the second person. Actually God like, I don't know another, an extra 50 thousand likes, extra. It doesn't matter, right? Sometimes it doesn't matter. Sometimes it does matter if I updated my balance, right? If I deposited $1000, I better see that $1000. If I drew some. If I say, if I withdrew some money, right? And let's say at the same time I did double withdrawal, double-spend, and I end up with $2 thousand. Why I have only $1000 in my account. That's an ecosystem. It's good for me because I was able to trick the bank and drove $2 thousand while I have only 11000, right? Because of this inconsistency at, that's something bad, right? So that's eventual consistency in Ach. Alright, let's talk more about the consistency and reads. If a transaction committed that change will endure transaction immediately see that change. That's what that's what a consistency and reads me to different than inconsistency in data. I don't know if there are times I made those two terms up. Ok. This is the royal thank, I don't know if they have an actual scientific term. Right. But what, what do you think about it? That's the only thing I can think about it. This is a data consistency and there is read consistency. And you have to clarify that to differences, okay? It's a problem in both SQL, NoSQL, and relational, right? As long as you want to scale horizontally introduced casually write with, with no SQL databases and there's and if you have sharpening, that's a different story when Hogan talk about sharding and this video, right? But that's in a nutshell. The moment is to introduce caching, or you try to scale horizontally by adding some follower nodes, you are inconsistent. That's eventual consistency and an actual dies. So summary, what did we talk about? We talked about consistency. We talked about the different two types of consistency, right? We're talking about consistency in the data. We talked about consistency in reads, right? We talked about thou, those two differ, right? Consistency in the data is guaranteed by these property that is called the atomicity and isolation and even durability if you think about it, right? So it's like if you, if I commit and I again had better be there, right? If it's not there, then it's not durable than it's not consistent. Eventual consistency is the ability to further reads only, right? This is a point I forgot to mention. If you screw up your atomic transaction and then you have seven tables, they have to stay consistent together, right? And you updated three and you got blue screen of death. There is no eventual consistency coming out of that dude, you just screwed, you corrupted your database. There is no eventual consistency in that. If you do not have atomicity, you do not have eventual consistency. You don't have consistency at all. Consistency in read is different than consistency in the data isn't radio. Sure, I'm going to commit that eventually that the follower node, if it's offline gonna come online again to pick up the new data. That's okay. We know that it's going to be because we know that this side, we guaranteed that consistency at that leader node. But the moment you screw up, you don't have isolation. You don't have proper atomicity, you don't have durability. No eventual consistency can actually help you hear my friend arrives, you just your data is done. You have corrupted inconsistent data. If you have inconsistent data, there is no point of doing an inconsistent read because your view will always be inconsistent no matter how long you waited. And I'm gonna see you on the next one. You guys stay awesome. 8. Class Project - ACID by Example with PostgreSQL: Hey guys, what's going on? So we have talked about acid, the four properties in a theoretical manner, but it's time to put them in practice and explain those four properties in an actual database setting. So I'm going to pick both Chris, but you can do the same exercise with any database you want. For simplicity, I would love to use Docker because this way I can try out many, many databases with just one command and I have spin up another database just like that. So let's go ahead and spin up a database on here. You know what, I've gotta keep this up because I'm going to need the two terminals, but I'm going to spin up a new one right here. And let's do a spin. Postgres container. Gonna do docker run. And let's call it darker road. Let's give it a name, dash, dash names. Gotta call it acid. Scott PG acid. And we'll get to do, we don't really need to expose the port or anything because we're going to really go into the container itself, bash into the container. The next command is, let's go ahead and detach that. And then let's pull up postgres 13. That's the latest thing. Before we run that actually we forgot something very important, which is dash E, postgres, password equal pulse, Chris, pulse, Chris user a password as going to go ahead and run that just like that. And it's gonna take a little while on your end to load the postgres image. But once you have it, you're ready to rock and roll. Let's go ahead and, and so so let's make sure that the container is running just like that. Yes, here's my PG acid. It's running the old. Make sure that's running because he might have spilled something in the command or did something bizarrely and something is wrong, right? So we'll go ahead and do docker exec, dash IT and get into the PTA acid and come into we want to execute P SQL because that's the postgres SQL command. And I want to login as dash Q that Postgres user. That just like that, we are in a Postgres command shell. Now we can execute beautiful stuff. We can create a database, we can play with that, create tables. So since I'm in the Postgres database, I have a database called posco, so I'm going to play up with that database right there, right? So let's go ahead and do create a table called products to test atomicity. And I'll go ahead and create table product as do a PID serial. So that's an auto-increment and MySQL I think it's called. And sure, let's make this the primary key. And I think we can just continue doing that stuff, right? And then you align and name as a text. Uhm, what else? Prices double, double or float and postcards I keep for, for getting in different databases. When you work alone many databases you get confused. And the finally, in venturi, how much inventory do we have this product? And then let's just use integer here. And that should be it. Yeah, that's a lot. I thought it's not called double. It's gotta be flawed. I think. I think it's double in MySQL or Oracle acute forgetting. Or now we have a products table. Let's go ahead and create a sale stable so we can produce sales, period table sales of let's go ahead and do a sale ID. That's a serial, also primary key. The primary key, that's the product ID. I can create it as a foreign key but doesn't have to. So that's an integer. Alright, that's a product ID. And to Drew, boom price flow to Illinois and lesson. So that's entering the sales. Could be, a product, could be sold in a different price. But the price of the product is in the product table. This is what was sold that and then the quantity as also an integer. Alright, now we have this two beautiful tables. This test out atomicity. So filled first thing, listen this populate some, some rows on those tables, right? So it was gonna do insert into product. And let's give it a name. I don't know. What are we inserting name at a price and inventory. For simplicity, I'm just gonna do it 11 row here. Let's call this table at tables very conveniently followed, though, you're selling phones. And the price of the phone is 999999. As an expensive phone. And venturi, let's say you have a 100 units and then that's it. That's the only product we have. Okay, so now let's just stout our atomos city. So I mean, did select star for all products. You can see that we have this row, right? But we're gonna do is we're going to make a sale. We are going to sell, let's say ten phones. Like we sold ten folds. What does that mean? Right? The fear walking with Domain Driven Design. That is one past scribe. But it's in the database, it's broken into multiple queries in a single transaction. So SAL is off ten units. That means you have to deduct ten units from the inventory, and then you have to insert a row in the sales table. Save that you haven't sort of that. You cannot do that and to transaction, you have to do it. Atomic transaction. That's where we explained that the video right in the lecture. So let's go ahead and begin a transaction. And then we're going to be the first time is actually what we're gonna do is update products. Alright, what was the phone idea against? Let's just do from products you might want it to query. So process ID number on process product like number one. So we're gonna do is update products. Shut name. I'll break products, sit in Venturi, in venturi equal inventory minus ten, right? Oh, that's an in this transaction. So now if I query that product again, you can see it's 90, right? Very, very critical layer. So now if I go ahead and insert, let's assume I crashed right here. If I crashed right here, what will happen? You just lost ten units from your phones that you didn't really sold. Okay? And that's the idea of atomicity goes, you have to follow it up, right? So like, let's say I crashed right here lies ILO. Just this killed a whole container, right? Just like that, right? If you do that without atomicity, it will be a disaster if you come back to the database. And then you said select star from products and u so that it's a 90. But no, it says it's a 100. Y equals of atomicity. We started that transaction, we did an update, but we clashed. We exhibited pulses in this case simulating a crash, right? That is the idea of atomos of all the transaction have to be done in as, as an atom, right? You cannot split those. Let's do the transaction again and the right way by doing a begin transaction, doing an update to the inventory. And then now I'm going to insert a sale and put into sales of process ID Prize. And the conto. Let's just do cou values. But one, I sold it. I thought, oh, I sold it at 99999 Sam prize. And I sold ten units, right? And you can break it up into five units at this price. On five units at this price will note right? We did that. Now, if I do a query that sales tables, you can see that we have ten beautiful units right here and my products 90. So now if you sum those puppies up, you get a 100. And that's called consistencies. Do we just explained atomicity and consistency and the same concept, right? And this is only viewable in my transaction. So now if I went to a completely different terminal and tried to query, again, I'm not I'm not committing yet, right. And I do like, I don't know, let's do the same thing. Hopefully we got, so it's good there, BG acid. Then if I do select star from products, you can say that's still a 100. And if you query the sales table is nothing. And that's part of isolation. Because now we're, we're in an atomic manner. So these two things, what inserted in that transaction, but we didn't enrolled Mike old commit. The mood I commit. Now we have a nice view and the data and As now flushed and available visible to all transaction. Now if I query today again, you can see that we have a sales. Now. It's all zeros and a one. And if you query the products table, you can see that it's 90. So 100 was to talk a lot here, talking about atomicity. We talked about consistency because if you don't have to miss it, you don't have consistency. And you can't argue with that obviously eyes because you can say, hey, say not, I can, I can be smart and make a single query, right? And I think for the longest time MongoDB, for example, didn't have at Thomas city across documented has only atomicity across a single document. People were fine with that because most of the time they, they will do their data model so that it is a single document and instead of no, say normalized like we do here. All right, let's jump into isolation or a guy. So I went ahead and insert a few rows in the table that you don't have to see the process, I'm going to cut it out. It's a straightforward thing. We sold a couple of units right here, couple of units right here. So now let's test out isolation. And to this toggle isolation guys, I want to do the following, right? So stop isolation guys. I want to do the following. Just with normal transactions. I'm going to start a transaction and my first session right here. And I'm gonna do on all more queries saying select Product and the count of the product. All right. I just want to see how much, how much sales Did I made on for each product from sales and grew by PID? Boom. If I do that, all of a sudden now and this query, I have three sales for or D2 and three cells for protocol, which is great, a good, alright. Now I want to actually print the report that that shows that out. What are this list? Just the whole thing. So in order to do that and you do it, select product, price, maybe you do a join. But we don't, we're not gonna do that quantity from sales, right? But before you can execute that, we've already gonna execute it. Another session starter right here. Okay. And inserted a new sale. Okay? You just inserted a new cell. And you're just doing your own thing right here in generating this large report. Try. But do you reach this stage and, and meanwhile things. So think of these two US AS concurrent application transaction going at the same time. Someone is making a sale, someone who's generating a report. So in the same time, I'm going to generate a new cell. Hey, we just sold a new phone, right? Pid and let's say price and quantity, right? Values. We sold a new phone and price, 999999. Yeah, let's just say we sold ten more phones. And we also go ahead and update the products where cert Quantity Q1 two do not sit inventory, inventory minus ten, where PID equal one because we just, we just made a sale, right? So we need to decrement the way we did is we actually committed. We're done. We've made us all. And we wouldn't back here. We're continuing to genetic data. But again, this is happening on pad on the right, I'm posing and just to show you what this is, now, we've made a sale. And then you printed the record, right? So you printed that you made three root three sales, three cells here, and then you made, and then you actually printed that stuff and guess what guys? This don't match. You said that you've and this happens a lot, right? An actual accompanies what were the actual table doesn't match the counts. And then this happens. If you do this kind of things. We're gonna talk about. This is called isolation. Because of the isolation would not fully isolated here. So when it will happen here is we printed this puppy, right? And we said, okay, we made three phones actually for phones and 23 orbits. So the airbags are correct but the forms are not correct. You just produced an inconsistent view. So inconsistency or consistency in general can happen because of atomicity which we explained. And can also happen because of isolation, which is what I'm going to have a cabal to explain here. So how do I make sure that nobody touches my view when I am beginning my transaction? If I'm starting this transaction, I want you to give me a snapshot, right? All the queries. And I'm gonna do as if at the moment of the where the transaction started. And that's called repeatable read. So let's go ahead and roll back this length. And then we'll do begin a transaction. But we're gonna do isolation level, repeatable read. I think that's how you say it. The default was read committed. That means anything that has been committed on other transaction. I I can see it. And this isolation level marks most of the time, but sometime in this case, you don't want it, you don't want repeatable read or serializable. So let's go ahead and installed in a repeatable read. And now, if I'm gonna make my beautiful query that count, Hey, there are 34, which is just good. So when we made for sales, but like that, bought the skewed the second query to query the sale. But in the meantime, someone started another transaction. We don't care which isolation level at that point when they they inserted another sale. Same Sal Exactly. And they committed. So we just made a sale. Well, but if you query this, notice that it didn't show up on your case here. Why? Because your your isolation level prevents you from seeing stuff that other people's changing. That's a beautiful repeatable. It's expensive. You gotta understand how it worked, right? Because it all looks at them and VCC level and bolts because does it differently, MySQL doesn't differently. Sql Server does it differently. Oracle does it exactly very similar to MySQL? Mysql doesn't very similar to Oracle or you want to be very precise. But that said, now, if you have a consistent view, you made five sales technically on the iPhone. But if you query again, that group, why you only see four go into this others other query, right? If other session, if you do select PID, count, PID for what does that. So we have five units here. But here, no matter how many times you execute that, you're gonna get better repeatable read, you'll read as a repeatable is not going to change. And that's a very attractive feature for a lot of applications right? Now once you commit or roll back, if you execute that query again, USE five because you're out of your transaction. All right guys, that was isolation. And finally, for the fun of it, let's do this as exit right here. And then Docker. I'm going to do this. I'm going to begin a transaction and then I'm going to insert a new product. And sort of let me put to explain durability. And suddenly a product. And I'm going to prepare here to kill the container or the whole, the whole container BG stop, right? But in the exact same second, I am going to insert a new sales and see if this actually going to happen. If I say, if it if it says commit, what will happen here? Name, price, inventory. Yeah, that's just insert a new product. That's called TV. $30 thousand. There is only ten units with this. And I gotta do this and I'm going to commit. But at the same time we're going to kill the container. So immediately we were killed. So now I want to know, did my TV get recorded? That Postgres did told us that it's committed, right? If I do this and a little database doesn't support this by the way, like riotous or does this in memory database, but it does support the persistence, right? But essentially I do what ability isn't means if if you tell me that you committed something, that means if I login again, it better be there. So let's see if it isn't there. Yep. It's right there. Because it told me committed. If he if it didn't return that comment, that means it hasn't been written, right? And durability is a very, you might say, you might think of it as like, okay, of course Hussein, it's given that it has to be fired, right? Something has to be there by noon. Some databases and write to memory first for a performance reason and of the container or the host dies, they forget to flush the disk. You just lost your stuff. But guys, what do you think about this, guys? Thank you so much for watching. How fun, enjoy the rest of the course that I'm going to see you in the next one. You buy.