Database Partitioning | Hussein Nasser | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

13 Lessons (40m)
    • 1. Introduction to Database Partitioning

      1:26
    • 2. What is Partitioning

      3:28
    • 3. Vertical vs Horizontal Partitioning

      2:07
    • 4. Partitioning Types

      1:32
    • 5. The Difference Between Partitioning and Sharding

      2:43
    • 6. Class Project - Preparing: Postgres, Database, Table, Indexes

      5:22
    • 7. Class Project - Execute Multiple Queries on the Table

      1:18
    • 8. Class Project - Create and Attach Partitioned Tables

      3:46
    • 9. Class Project - Populate the Partitions and Create Indexes

      3:59
    • 10. Class Project - Querying and Checking the Size of Partitions

      5:32
    • 11. The Advantages of Partitioning

      4:41
    • 12. The Disadvantages of Partitioning

      2:48
    • 13. Class Summary

      1:10
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

36

Students

--

Projects

About This Class

Database partitioning is a technique that breaks down large tables into smaller tables called partitions. This allows clients to hit fewer partitions and therefore, minimizes the search space and optimizes the queries as a result.

In this class, we will explain Database Partitioning in detail, the different types of partitioning, its pros and cons and finally we will do a demo using PostgreSQL which will be your project for this class. 

Meet Your Teacher

Teacher Profile Image

Hussein Nasser

Author, Software Engineer

Teacher

My name is Hussein and I’m a software engineer. Ever since my uncle gave me my first programming book in 1998 (Learn programming with Visual Basic 2) I discovered that software is my passion. I started my blog, and YouTube channel as an outlet to talk about software.

Using software to solve interesting problems is one of the fascinating things I really enjoy. Feel free to contact me on my social media channels to tell your software story, ask questions or share interesting problems. I would love to hear it!

I also specialize in the field of geographic information systems (or GIS). I helped many organizations in different countries implement the GIS technology and wrote custom apps to fit their use cases and streamline their workflows since 2005. I wrote fiv... See full profile

Class Ratings

Expectations Met?
  • Exceeded!
    0%
  • Yes
    0%
  • Somewhat
    0%
  • Not really
    0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction to Database Partitioning: Partitioning or database partitioning is a technique where you split up a huge table into multiple tables and let the database decide which table or which partition to head based on the whereClause and is a very, very genius design. And I'm going to discuss it in this video about we jump into a garden. So here's we're going to discuss in this particular video, guys. So what is partitioning? Why does it doesn't exist? Because that's the question we always ask. Why does any technology exist? Alright, so we're going to ask what is, what is partnership that we're gonna talk about? Horizontal partitioning and vertical partitioning guys. So that's very, very critical concept, right? And what was the difference between them? And from now on, whenever I say partition, I'm always referring to horizontal partitioning and that people rarely use vertical poaching these days. There's just, it's like normalization, almost vertical Berkshire, bought stream targets. What kinda types of partitioning we have? What can, what can we do to do partitioning, different kind of partitioning. And we're going to describe partitioning versus sharding, which is another video I discussed that specific to shorting, right? A little bit more complex than partitioning. I'm going to explain the difference between what we're gonna do a demo because as a beautiful thing, gonna use Postgres to do a demo, to show you a table with 10 million rows and how to partition that stuff. Let me discuss Finally the pros and cons and then summarize the whole thing. 2. What is Partitioning: All right, so what is partitioning guys? So let's say we have a beautiful customer stable. Yeah, this is like, I don't know, 1 million role of day 1 million is not that bad guys by the way. But if I want to pick the name of a customer with this particularly 701, right? So we're going to select name from customer tables where ID equals 700,001. So the, what the database will do is it, well, if you have an index on the, on that table, we're gonna use the index and then we're going to land on that particular row on that disk, right? Or if we don't have it, I'm gonna do a sequential scan, which is the worst thing we can do, but we have to scan all these rows until we find that always we will look in front regardless whether using an index or sequential scan. Scanning a 100 million row is a lot, right? And multiply that by ten AM, or 1000 billion rows, that's going to slow things down. Index large tables are always slow to query regardless, right? So what do we do? Partitioning is the idea here is break the table down to smaller pieces. So we know we only work with so many rows in instead of larger set of rows. So that's one idea. Let's break up this 1 million rows into five partitions, five tables partitioned. Don't often want a table and we're going to name them customers to a 100 K customers, 400 K customer, 600 K customers, 800 can customers 11 million. That means basically, these are the customers between 1200, these are the customer is willing to 101400 customers between 401600 and so on. You get the idea, right? So that's just how we partitioned it. And then the database magically, you're going to attach these partition to the, to the, to the main table, which is Customers table. So Customers table is just almost like a parent or inherited doesn't have any data that actual data exist on these stable. So now what I do is I'm going to do the same exact way. Select name from Customers table where ID equals 701. And the first question is, I'm going to send that query. What do we do? Where does this idea exists? And that's a very quick check by the database was like 701. Which table should I query? This, this, this, this, or does. It's very easy because what bothers you is customer has 700100 on one end. We have the definitions in these partition. There is metadata. That's really where 07 has between 801 million, which is which is six hundred and eight hundred. So that's exactly the table we need to query. So partitioning customers 800 K, we hit that table immediately. So now i'm working only with 200 thousand rows, which is beautiful. I all say I tweet the other day. The quickest way to query a table with a billion rows is to avoid querying a table with a billion rows, then that's what the deer is. Do they just throw it to as much as possible to avoid querying this much data and work with smaller set of data. That's the trick here. Alright, so now once I have this, if I have an index even better, right? I'm going to use the index and then land on customer seven hundred, ten hundred and one, which is k m in this case. 3. Vertical vs Horizontal Partitioning: Beautiful. So that's partitioning. So as discussed, vertical versus horizontal partitioning. Horizontal partitioning is splitting the partitions by rows, as we discussed in this case, right? Customers from this rose to this, this ID to this ID is, and this watershed and other customers. And from this ID to this ID is this word show. We were splitting by rows like almost like horizontally is, imagine this is like a knife, right? Slicing in the middle is biasing that table in the middle and then breaking their partitions into multiple ones. That's, that's the, the partition and there's types of partition like y ranges like we did, or buy less zip code or 90 to five is on this stable partition. Zip code 91111 is on this table, right? So buy less or by range, vertical partition on the other hand, is you splitting literally vertically from top to bottom. And what do you do is use it essentially you split columns within entire junk of roles into an complete different partition. This is useful for like, let's say you have a large column that is a blob and you rarely query that Bob document, right? You don't want that blob token to take from your precious SSD space or the fast axis, right, that you have. So we're going to vertically partition it, put it in another table space, put in its own table with it's all documents, and then attach it to your master table. But if you query for that, the database knows, oh, this particle accommodated in another table. So I have to jump into that particular table to pull it. However, my beautiful fast axis columns are fast axis in a proper table space, in a proper SSD drive, and they are coming back, they are not as large, so I I just saved a lot of space in my desk. I just saved also a lot of access speed too. So vertical partitioning. 4. Partitioning Types: Part of the Sharing tab. So that's where we did partitioning range. As example, I gave IDs. You can partition by log day. Let's say you have an IoT device spitting out measurements or something like that. And there's a log date with every single measurement and you go and partition like O 2012 data 20132020 data on each year, bipartisan or each quarter or each month, put it in a given partition. So that's range right, from this date to this date, throw it in this partition. And then the beauty of this like, oh, this, this table, this partition is old and nobody uses anymore to who, who's gonna query 1995 data anymore. So let's move it into a slower archive. There's like, I don't know, running on tape somewhere. So if it's slow, I don't care. So that's Range. Wireless discrete values states, for example, California or rows, or any customer in California goes on this table. This is not range, this is a value. So hey, that partition is for California, people that partnering for Alabama or zip codes or countries in the same vein, right? So that's violence. That's another way of partitioning by hash. Another popular thing, especially Cassandra uses that, right, where we are, uses a hash function to spread to the value. So hash function that gives you a hash, if you've got this hash, go to this table, if you guys had to go to this table. So that's very, very popular thing. I know in the proxies we use IP hash to kinda decide which back into head based on the IP. So that's another idea. 5. The Difference Between Partitioning and Sharding: Horizontal partitioning or just partitioning versus sharding. So we made a video about sharding. Partitioning versus shorter is very interesting. And let's discuss the difference between the two. Horizontal borders as you were just partitioning, splits the Bigtable into multiple tables in the same database. And the database takes care of the management of these partition. You as a client, as a query builder, you have no idea what partition you are hitting the database is designing. So that's why I said the client is agnostic. It doesn't care which partition is going to hit the database. I'm sending my own beautiful query and the day was decides which partition to head based on the logic on the whereClause, right? So that's the trek here. Sharding, on the other hand, splits the Bigtable and also multiple tables. However, they are completely different servers. They are a completely different database servers. And you do this for, for distributed, essentially processing, right? So, okay, so let's throw this into different tables right altogether, and different servers altogether. So that like basically for latency issues, like I say, California customers, we're going to put a database Shard just for California customers. And you're going ahead. If you're from California, we're gonna hit you based on a proxy SQL. We're going to detect where IP address from and we're going to serve you to the database, or if it's a read, for example, to the database instance that has the California table, right? Versus if you're going from Asia, we're gonna hit you then Asian server, right? So that's the IP address is different, completely different IP address. So that's a beautiful, beautiful design. Obviously it's a little bit complicated, but, but yeah, so if you think about it, the client is aware of the shot, right. Which is a big no-no and charting, that's that's one of the big thing that I don't like about Charlene. And I know the tass and other technologies are trying to solve this, but still it's a, it's a, it's a big it's a big investment for anyone as back engineers. So the table names, this is like a little bit of just detail. The horizontal partitioning table name changes. That backend. You have like multiple table names and they are changed, right? However, in the sharp incase, this type of table manners stays the same almost right? Because the query is exactly identical, doesn't change, right? But the server to which you are executing the query against actually changes. 6. Class Project - Preparing: Postgres, Database, Table, Indexes: Nemo, let's jump into this guy. So what we're gonna do here guys, we're going to use Docker. I'm going to spin up a beautiful postgres instance, and then we're going to insert that until million rows in that table. And then we'll get our own create table partitions. And then we'll get at joined the politicians to the master table and then do some fancy queries, explained all that jazz, right? How about we jump into a guy's? All right guys, let's just jump into the demo DIE. So requirement here, just have Docker. And that's it because I'm going to spin up a postgres instance because I play with this is way, way easier than installing pulse because on my own machine, but if you have it, nevermind. So once you have that, I'm going to actually run at Docker command here, right? And I don't need to expose the port for positive because I'm going to bash into the container inside Postgres and I'm gonna run the SQL commands inside postgres container. So I'm gonna do a docker one. And then I'll have to do is that basically do dash, dash name, give it a name, let's call it a PG, mean, whatever, right? That's the name of the container. And then once I have that, I want to detach this container, all right, because I want to use the same terminal. And then finally, do an environment variable, postgres password, that's an acquirement recent one. I just give it a postgres password, any, any password. And then finally just do Postgres. This is the image we're gonna pull from. And then just like that, we spun up postgres instance. So if I do docker ps, we can have that. We have a Postgres and still running, and this is my name. So now we're gonna do is go Docker, execute interactive terminal. I want to jump into PG main inside it, right? I want to run the command bash, right? And then once we bash into the container, I wanna do PC coal, dash Q, which is the username and Postgres. And just like that, since I am the owner of this, is going to lead me in immediately. So now I'm in, I'm in the beautiful Postgres command, right? I can do stuff, right? Alright. So I'm just gonna do, I'm gonna create a table guys. I'm gonna call this create table grades. This is just a bunch of student grades. And I'm going to I'm not going to maintain names of the students or anything. This is just a garbage table where we have literally an ID I sequence ID that just 1-2-3, 4-5-6, and a grade. That's it. Nothing else. So I'm going to great. I'm going to call it grades original, right? And then I am going to get given an ID cereal, that means every row I insert will automatically get a sequence and increment. And I'm gonna make it no, no, this is a requirement for partitioning. Everything should be not null, right? Everything that your partition Against should be not known. And then I'm gonna do a g, which is the grade itself g. And that's also an integer, not know. And just like that, ready? So now I'm going to insert around 10 million rows in this table and thus extremely easy to do. So we have the loser insert into grades. I'm going to insert that before I add an index to the G. Just because it's faster this way, right? Because I don't have to maintain the index while I'm in certainly this stuff. So I'm going to insert this select star, OnStar, select. Here's what I'm gonna do. I'm gonna floor a random function, which are the random function gives me fractions like 0.09.02.09. That's the maximum number. So I'm going to multiply the random by 100. That gives me values from 0 to 99. And that's exactly the grades right? Out of a 100. Essentially. Note quite 100. No, I'm not gonna get a 100, but who cares, right? So and then I'm gonna floor the whole thing because I'm gonna get the fraction. The Florida will give me the function. So now selecting this from generate series. This is a function that you basically will every time you call you, you specify how many time you want to call it. And I want to call it 10 million times. And I just added this comma so I can't see for myself so that I actually like that. And then let's just insert and excess weight or bad, bad, bad, bad. Okay. So what I'm gonna do, I forgot to specify the column, so insert into, so I have to do, gee, I guess I didn't specify the field itself. That's my bad. So what will happen here is the ID you don't have to specify because it's, it will be auto-generated by the G. You have to specify it and the value will go to the G, These values. And just like that, we have a million rows. Obviously we can, we can generate more, but just, this is just for example, say, nice. So now let's create an index on that. Create index, let's call it grades original index on grades. Grades original 4j on G. Those are all create an index so that quite a beautiful index. And we talked about indexing guys. And I said we have an index. So let's describe the table we have here. Boom, just like that, we have an id, we have a G as. 7. Class Project - Execute Multiple Queries on the Table: So now if I do select count star from grades original, where where, where g is equal 30, right? How many rows we have around 10 thousand rows that are called 30. And then let's do this fancy thing and as well as go scald explain and allies, which will explain how they came up with this result. And it will tell you how long it took. So it took two seconds to execute this query. So it was 1000 milliseconds. It took like a 0.97 milliseconds to planet. And we use bitmap index scan. That means we picked, we hand-pick the values from the index and we went to the table, right? Sometimes this works, sometimes this actually becomes a slot, right? And this is the number of rows that came back. So now if I do where g between 30 and, let's say 35, that's kinda different query, right? It's a range query. So there's gonna be a slow bus, slower. And as we see as like it took three seconds to execute, instill the database, decide to do with parallel index scan this time and then jumped back to the index and God calculated their values. So that's what we see here, guys. 8. Class Project - Create and Attach Partitioned Tables : So although we do partitioning, to do partitioning guys, we're going to create the main table, right? Which is called table is called a grades parts. Right? And here's the trick guys. It's the same thing. Serial, not know. And then the grade integer not null. So, but you have to say partition by range, and which is the range in this case, the G will be the range here, right? We didn't define the range, but you have to create this terrible with partition in mind. And just like that we created it. Now you're responsible to create all the ranges, all the partition yourself, that they're not gonna do this for you. So I'm gonna create the partition one by one now. And I'm going to divide the partition as g, which is the, just like the name of the table and then 035. So grades from 0 to 35 goes into this table. How do I do them? Create table g. And I'm gonna make this exactly like the table above that, right? So like the grades, parts table, and I'm going to include indexes. I don't have to do that because there are no indexes in this table. Remember, what I'm gonna do is I'm gonna add indexes after the fact. And so, okay, so now we have g 035 as just a normal table to, for now. I just want to copy the same exact structure. So if I describe this table, we have ID and g, and there is no indexes or anything. So let's go ahead and create the rest of the tables. Did it deter from 35 to 60, let's say. And then we're going to create from 60 to 80, and then finally, from 80 to a 100. These are the 44 partitions, in this case, only four partitions. And now what I'm gonna do, I am going to attach the partitions on my main, main one-by-one to my major table. And to do that I do alter table. Which table? Of course, the great spots that the partition table, right? Attach partition G 0035, which is stable for values, which values are this allowed for this table? Four values from 0 to 35. That's what I'm supposed to do, right? And I have to do the same thing for the rest of the partitions, which is from 35 to 30 to 60, right? And we're going to change this guy to 35 to 60. Boom. And then we're gonna do the same thing from 60 to 8060 to 80, boom. And then finally a 160, eighty, two hundred, eighty two hundred. Nice. Now these are partition or touch, but remember guys, this table is empty. It doesn't have anything, it's empty. So now if I, if I describe any of the tables, obviously GAT 100, they still, they say a partition of grades parts four values from 1880 will describe details of this, but there are no indexes in this table nor on the the grade. Let's clear this up grade parts. There are no index here. 9. Class Project - Populate the Partitions and Create Indexes: So now here's what I'm gonna do. I'm going to insert into grades parts, select star from grades orange. So what will this do? It will essentially copy the entire table into this stable, right? But it will insert one row by row. And here's the beautiful part, guys. The beautiful part of on this is every time you insert a row in this table, the database will decide which partition that rogue goes to based on what the value of g. If g is twenty o twenty is between 035, so it goes to a G 0035 and so on. So it's going to distribute our beautiful table just like that. So let's see what, oh, what did I do wrong? Just different syntax. So let's do it again. Boom. I keep doing this clear where I say Yes it, and let's just wait for it. So it's going to insert 10 million rows into this great stable. Nice done. That's pretty fast, right? So now guys, if I do select count star from grades parts, that's the same numbers like ten million and one, right? That's extra, extra one. But if I do select max id, node ID G, the grade from grades, parts of this is going to be whatever 99 is the max RID. However, if I go to the tables now themselves, they are populated by looking at this, the maximum value 0035 is actually 34. Alright, so let's do select count. So you can see from g 0035, this is the first partition. There only this much rows around 10. How much is this? Three million, three million, three million rows, right? And if I go to 3560, we have around 2 million and so on. It's just a random distribution. So if I do select MAX G from G, 3560, what do you think the number will be? Ei by 59? Exactly, because that's the largest gray we can get in this table and so on. They are just distributed. So now guys, still, my tables don't have an index, right? And always bar by purpose. So let's go ahead and create an index. And should I create an index on all of the partitions? Well, prior to pause because I believe 11, that was the only way to do it. In Port Postgres 12, I believe they added the ability or 11 on and Bev. You only have to create it on the partition table. Great parts create index. Let's call it the red parts index, right? Index on grades, parts on G. The act of creating it on the master or the leader partition and tabled. We'll create the same index on all the partitions, which is very, very powerful. Let's, let's check this out. So it's going to create that obviously there is no index on the master david That it, because it's empty, there is nothing. It's just a virtual table almost right? So if I do grades ports, there's an index. They say this is your index. But if I go to g 0035, hey, where does isn't hexagon come from? I didn't create that database that for me, which is beautiful. But I go to 35 G 3560. We have indexes on all of them. Beautiful. 10. Class Project - Querying and Checking the Size of Partitions: So now if I do explain, analyze, select count star from grades parts where g equal 30. So now which table is going to? Where are we going ahead, guys? We should hit the first partition, right? Let's, let's check this out. So we says, hey, index can only ON which for the ship G 00 3G index on this partition. So only hit one partition, right? And took us 1 second to query still. Yeah, this is no much compared to what? To the original table, right? The original table. You're gonna see guys, it's the same thing. Husseini, You say that partition are great and we're going to be faster. Not really. You think about it. My Mac here is that 16 RAM, 16 GB of RAM. And the docker container doesn't have a limit. I believe I didn't have a limit. So it's going to use all the memory needs and that is the same identical queries. There is no bounding limit, there is no memory, there is no IO bond. If I limited the container to like F 500 megs, right? And the index was so large such that it doesn't fit in memory, then you're going to start seeing the difference can be hitting the huge index versus hitting the smaller index card. But now you don't see it guys. Supplies. How about we show you the sizes of these indexes now compared to what we have here. So however, I remember the query select relation. Alright, so there's a function called PG relation size. That's a function that gives you the size of the relation and that's an index or a table or anything. And go, oh, ID, that's the object ID. And then I'm going to also select the relation name and from p g class as the table order by exactly the same thing, relation size, OID, descending because I want the biggest ones first. Alright? So you can see this the, our, our original table, Grey's original, and it's around what? That's around, that's bytes, right? So you divide it by ten thousand, a hundred thousand. So you get this and megabyte. So our 362 megabyte, which is not big, right? But at multiple either for instead of ten million, a hundred million, twenty five hundred million, then you can start Z. This is increased. And these are the other partition, that's the, the omega is the rest of my partitions, partition one partition to all the partitions. And these are the sizes, not something that large. This is also 126 megabytes. I've apparently this is 0235 is very popular grade. That's bad. Everybody's failing. So the original index is around 69 megabyte. That's the big large index, 69 megabytes, guys, six. Now I was nothing for my, for my Mac. That's why we didn't see a difference. But increase the number, the size of the Rose will increase the number of the index and DAO will slow things down even more. So. But look at this guys. The individual indexes are way smaller than the larger indexes. So the individual index, which is the 035, right? That's the first partition is only 24 megabytes. So like what, three times smaller. And obviously querying this index is way faster than querying this index, right? The size the and the index matter, where's it DES and the scatter indexes and all that stuff, guys. So a really, really, really matter guys, this, this little bit of information. And then the final thing I'm going to show here is there's a feature and Postgres. I want you to make sure that it's on. And it's called. Let me get it right here. It's called enable partition pruning, right? Let's try, let's see what's the value of that enable part to shun pruning. The value of point whenever watershed and pruning is on and you want this ONE guys by default, if it's off. Well, how about we are showing you what happens if this is off. Alright, so let's set this to off. And this is off. And I do explain, analyze. We don't really need to analyze, just explain select count, star from grades parts, the partition table where g equal 30. So wish to partition this is supposed to hit partition one. But nope, look at this stuff. It head partition one and hip partition to partition three and head B24. And I have no idea why do we have this option to begin with? Maybe soccer safeguard somewhere and Postgres where this is dangerous. So if this is off, you just wasted, this is the whole partition is useless because you had, you hit all the four indexes, all the fourth partitions. Compare this to set it to on, execute the same query again. And look at this. We only had the one partition. So make sure that this, this thing, that partition pruning is always enabled. Otherwise you will not take advantage of this thing, guys. 11. The Advantages of Partitioning: So what is the pros and cons of partitioning? Is this thing perfect? Obviously, nothing is perfect. There's always, everything is a double edge sword when it comes to back and engineering. And we need to understand this thing as, as back end engineers, right? So the pros of partnering is beautiful because it improves the query performance. Yeah, we didn't really notice much of my demo because I'm using a laptop and I don't want to Docker container but improve, increase the size of the tables to the billions. And make so that your memory, your memory bound or IO bound. Because I wasn't either memory bone or IAB on my table and my case right. Because my ADA I had lots of memory to work with. But if you are limited with the memory, you gotta start seeing it because you're going to pay to this and you're going to hit IOs. And Gustav feeling that query versus working with a huge table, right? So performance when working with a single partition or should a little bit fewer partitions is low bid way better actually than walking with a huge single table. So working with a smaller partition is always better. So as long as I know how to hit that partition, that's the key here. So sometimes the database decide on a sequential scan versus a scattered index scan. So guys, that data is always makes decisions. And this is called a planner, right? The player that ever says, Okay, you are anemic wary and there is an index on the table, but I don't have to use the index if the, if you're going to pull every single rows in the table, it's a waste to hit the index actually, because going to the index, you're going to start jumping zone index and going back to the table and going back to the index and then go back to the table, that's actually should be slower sometimes to head the index, right? And it's called scatter index scan. That's slower. So those are the database decide to hit a sequential scan on the table directly says, you know what, it's not worth it to you as the index for that particular query because I'm going to select lots of road and that's a complicated logic. Physiologic by the database includes a lot of AI if you think about it as well, to, to actually decide what to do. So some, based on that, a partition BY giving the partition is actually smaller right? Now it really can make this decision easier because hey, sequential scans actually better in this case then as scatter index scan on a huge table. So deciding which one is, you just made the life database easier using partitions. Beautiful, beautiful, easy bulk loading. So if you have, you can essentially just create a table and attach it to the partition immediately, right to the partition table. Without, without worrying about anything. So you can bulk data load into a given table and then attach that table to an existing partition table, given that it actually the constraint and the Czechs or are all valid and all that stuff, right? You cannot just attach invalid partition, right? So this is, this is a very popular thing. It's really a MySQL. You can actually have a CSV file and create the storage engine because that's only available in MySQL, MariaDB, I believe, have the CSV B, the storage engine for a given table. And just like that, you have a table magically, if you have like a 300 million CSVs, you point your table to that CSV and you have a table that you can query immediately, right? And then you can use that to attach it. It's little bit complicated, but that's one benefit of using MySQL over passcodes, for example, obviously, that doesn't mean my scores better than pulsars. That's just one use case. Archive all data that are barely accessed into cheap storage, right? 2001, data. That partition is no longer used, barely query and you can check this fingering quick. That's actually pretty neat, right? This partition table is barely query, so let's throw it in a table space so that nobody is using on a dy dr. That is pointing to an old hard desk that is slow and cheap, right? And let's use this precious. Our precious is D in beautiful, beautiful a water fast, frequently accessed data. 12. The Disadvantages of Partitioning: Alright, cones is this thing perfect? Knows are nothing as Bob act. So, so was the battling of our partition updates them move rose from one partition to another. Are SHE slow? So now updating roles are no longer straight forward thing guys, think about it in a huge table. If you update a row, you just, you go to that Ron just updated. End of story. But what happened if you update a row in a, in a partition that the act of updating that row moves the row from this partition to another partition. Let's say Customer ID is a bad idea, but let's say grace as example, we gave the grades. I moved a grade from partition was a partition one, and I update the grid to 60. Now you just move the entire row from one partition to another and that, that is a slow, the act of update is now you're deleting and inserting into another table. And that could be very, very slow. So you do this so many times if you're moving on in parts of the data, you can actually harm you or your SSD. You can have caused, especially you know, how jumping from one partition to another so that the pages, if you're doing this so many times, Man, that's could hurt, right? So avoid updates. Their move rose from one partition to another partition. That actually changes the definition of the word partitioners. Inefficient queries could accidentally scan all partition. So yeah, if your queries actually include a where clause that then knows, that lets the database know which partition to head. Everything is rosy. However, if your, if your query actually is, is inefficient, you will end up scanning all the partitions and despite actually result in slower performance than just having a single table, right? So let's say you say select star from this table where i is greater than one. You just basically selected everything. But instead of selecting one table, now you're selecting seven or five tables based on how many partitions. So the database is jumping between one table another. So that could be slow. So think about all that stuff guys. Schema changes are challenging. Dbms usually manage that usually. Alright, so like a few outdated and index on the master table or the, or the partition table, all the partitioned child tables, we'll get that index. We saw that. Alright, and then the demo. However, sometimes this doesn't always work that day, but it actually has to support this feature. So that's the interesting part, guys. 13. Class Summary: Summary, let's jump into finally, that was, that was fun, that was found us. So we discussed what is partitioning. It's a very, very interesting feature writers. It's not always perfect. You have to know what, what, what to use it, I guess, I guess horizontal partitioning versus vertical partitioning, literally splitting in, in horizontally versus blending vertically and move the whole thing, right? Partitioning type we talked about ranges are IDs, hashes, partitioning versus sharding. Yeah, this is, this is, you want to use this before that. This is an overkill to think about it, right? But yeah, shorting sometimes is, is beautiful. If you can find a software that hides the complexity of shortening, definitely use sharding. But not all. Not all databases support that. Not all. There are no software available that does sharding by default I again, approximately equal the test is another one I'm not sure about either. And we decide the pores and causing us. Guys, what do you think about this technology? Let me know and I'm going to see you in the next one. You guys stay awesome. Goodbye.