Cloud Computing for Beginners - Database Technologies | Idan Gabrieli | Skillshare

Cloud Computing for Beginners - Database Technologies

Idan Gabrieli, Pre-sales Manager | Cloud and AI Expert

Cloud Computing for Beginners - Database Technologies

Idan Gabrieli, Pre-sales Manager | Cloud and AI Expert

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
28 Lessons (2h 11m)
    • 1. Promo - Databases Technologies

    • 2. Section 02 - Basic Database Terminology

    • 3. DBMS, Database Instance and Schema

    • 4. Operational Data and OLTP

    • 5. Analytics Data and OLAP

    • 6. Data Warehouse and Data Lake

    • 7. ETL and ELT

    • 8. Batch and Stream Processing

    • 9. Scaling Up and Out

    • 10. Section 03 - Database Technologies - What, Where, and Why?

    • 11. Traditional Relational Database (SQL)

    • 12. Non-relational Database (NoSQL)

    • 13. Distributed SQL Database (NewSQL)

    • 14. In-memory Database (IMDB)

    • 15. Time Series Database (TSDB)

    • 16. Section 04 - Types of NoSQL Databases

    • 17. #1 - Key-value Database

    • 18. #2 - Document Database

    • 19. #3 - Wide Column Database

    • 20. #4 - Graph Database

    • 21. Databases Ranking Review

    • 22. Section 05 - Database as a Service (DBaaS)

    • 23. The Challenges of Traditional Databases

    • 24. The Concept of DBaaS

    • 25. The Advantages of DBaaS

    • 26. Azure - DBaaS Portfolio Review

    • 27. AWS - DBaaS Portfolio Review

    • 28. GCP - DBaaS Portfolio Review

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class


Data is like Energy!

As you probably know, data is everywhere. Data is the energy that drives many business companies and organizations worldwide. If data is like energy, we need a storing container and an engine to store and access this energy, the data. A storing container that will be flexible, fast, reliable, easy to manage, and cost-effective.

Over the last couple of years, the application requirements to store and handle data changed dramatically. There are more data types to manage, higher volumes to operate, and a need to keep the data for a longer retention time.

Database Technologies and DBaaS

As you may guess, the container and the engine for running this energy, the data for the upper layer applications are all about databases. Selecting the right database technology for a specific use case can dramatically affect any aspect of an application. Modern applications are using a variety of database technologies optimized for different use cases.

This training is about databases technologies and, even more specifically, how such technologies are offered as cloud database services (DBaaS) by leading public cloud providers – Azure, AWS, and GCP. We will learn the technical concept, market use cases, advantages, and disadvantages of each database technology while following its evolution path.

Are you ready to explore the world of Database Technologies?

Meet Your Teacher

Teacher Profile Image

Idan Gabrieli

Pre-sales Manager | Cloud and AI Expert


Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Your creative journey starts here.

  • Unlimited access to every class
  • Supportive online creative community
  • Learn offline with Skillshare’s app

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.



1. Promo - Databases Technologies: Hi and welcome. My name is Dan. I'm prison manager, walking in the software industry while promoting cutting edge Big Data Analytics technologies for large enterprise companies. I'm also teaching online courses about my favorite topics, our cloud computing and artificial intelligence. Let's talk about this training. Data is everywhere. It is the energy that drives many business companies and organizations worldwide. When looking at a typical modern application, one of the components is the database. It is the container to store and manage these energy that data. Today there are multiple database technologies, hundreds of database vendors, and multiple cloud-based database services. Each one will be more tuned to specific use cases. This training is designed for software developers, ID experts, data science engineers, or anyone who wants to understand the key industry terms around databases, the Market Evolution, pet, and then the concept benefits and use cases of each database technology. We will also talk about how those technologies I'll offered as cloud database services by leading public cloud providers, meaning a so AWS in Google Cloud, part of the project. In this running, please download the mindmapping software tools and templates and use it to organize the key terms and concepts we will cover in this training. And later on you will be able to use it as a reference. Good luck, I wish you exciting learning. Let's get started. 2. Section 02 - Basic Database Terminology: Hi and welcome back. In this section, I would like to review the typical most common terminology terms we'll use during the training. Maybe you already familiar with some of those terms, which is excellent. Or perhaps you don't have too much background. And that's also OK. In any of those cases, let's make sure we're on the same page with the same baseline as a starting point. Keep you up to date and in some cases refresh some knowledge together. In this section, I will talk about the meaning of a database, the database management system database instance database schema compare between operational data and analytics data. The meaning of transactions which is critical FOR operational data, understand the concept of a data warehouse in a data lake. The differences between ETL and ELT while building database pipeline and the concept of database scalability that considered to be one of the biggest challenges in database. Ok, let's start. 3. DBMS, Database Instance and Schema: So what is a database? A database is an organized collection of information that is stored and accessed by application. That's pretty straightforward. The main idea will be to separate an application codes from the data that is used by data applications. Okay, developer will get some simplified abstraction layer from the database system to access and manipulate the stored data. Think about a stock trading website that end users access to review a stock's prices and perform actions. Website must store the user's profiles inside a database to validate each user doing the login process. Secondly, the website must store the real-time in historical stock prices in a database. So any user would be able to select a specific stock and present this information on charts. Again, this information must be stored somewhere and it's all about using the right database technology. Let's see a more detailed diagram of the dataflow. We have end uses on the right side. Those end users will access some software application with the business logic to do something. In our example, this software application is the stock trading website. This software application would like to store and retrieve data from a database located here on the left side. The database system has a dedicated software layer to access and manage the database. This software is called a database management system, or in short, DBMS. Dbms is used as the interface between the content inside the database and the uses of application accessing the content. It is used to retrieve, update, and manage how the information is organized and optimized. It is provides end to end control of the database, enabling a variety of administrative operations, say like changing the database structure, adding or removing access to the database and more. Each database technology will have a different database management system. Today there are hundreds of database management systems from a variety of windows and open source say projects. We need to distinguish between the DBMS or committing the database management systems and a database instance. A database instance is a logical entity of a database, like a container created by us that is ready to be filled with data. It is the container holding the actual data. Using a DBMS, we can create and manage multiple database instances from this same time. Now, how should the data be stored in the database instance? Or how application and reading something from the database should access the data inside. The data should be organized in some structure. Well, that's the job of a database schema. Database schema of a database is structured set of fools and mapping constraints on the data to be imposed by the database management system. It's like a blueprint or a scale tone that describe the structure of the database instance. In almost all database management system, we will need to create and define a database schema while creating a database instance. We will talk later on about relational database that are using tables and columns and rows to store data. There is a relevant table for each type of data irrelevant column. And we must follow the structure to store and access the data inside the database. The structure of those tables, I'll pop out of the database schema. The Database Management System job is to ensure that every piece of data to be stored and later ONE accessed follows this predefined structure, meaning the database schema. If I want to store a customer phone number, then using the database schema, I will be able to know where it should be stored. If the phone number was defined as a number as part of the day, the schema definition then the database management system will prevent me from storing a text instead of a number. Some technologies require enforcing the schema, doing the writing process. While other system, other technologies are more flexible, they provide more flexible schema. As we'll see later on, it's a careful balance of advantages and disadvantages. We'll talk more about that later on. 4. Operational Data and OLTP: In most cases, this selected database type is driven by the type of data to be stored in the use case of that data. In that context. There are two main types of use cases in a typical organization. Operational data and analytics data. Operational data is the data that is produced by the organizations day-by-day operation. It's pretty straightforward. In most cases, will look almost the same FO many organization. It can be data about the list of incoming orders, the up-to-date inventory status, list of customers available products, open trouble ticket, information about employee working in that company and more. Operational data is used to understand what's going on with the organization in real time. It should be up to date real-time information, operational activities are recorded in a database using something that are called transactions. Okay? A database transaction is a unit of work performed within the database. Think about the situation that the customer or the sum product. A new order should be created in the database and the inventory status of that product should be updated. Those transactions I'll typically stored in the database as a simple two-dimensional structure, row by column format, like a worksheet in Excel. Transactions are typically handled by system called OLTP, online transactional processing. As an example, an ERP system, okay, being used in many organization that handles inventory data is an OLTP system. This ERP system uses database optimized for oil tp operation. Therefore, the database being used by this OLTP application is considered to be also an OLTP database, which means a database that can support transactions. In a typical organization will find many OLTP applications, and therefore many OLTP database is being used by those application. Now, what is so special about a database transaction? Well, a database handling a transaction must follow something that is called the ICID modal. Scid stands for atomic, consistent, isolation and doable. And a CID enabled database guarantees the full completion of a writing process to the database, which is extremely important for transactions. And transaction defines a set of actions that must be completed. There is no middle ground. The data in the database must move from state x to state. Why. Let's use the same example of a customer that all the SUMPRODUCT this action is creating transaction thus must update the inventory status stored in the database if the system is getting a new bill. And for some reason the inventory status is not updated by the previous transaction, then we have an inconsistency problem. Okay, the next customer will be able to order the same product even though the inventory is not up to date. Let's use another example, maybe more tangible example. If I go to an ATM and redraw some money, the ATM system must guarantee the end-to-end process. If I got the money, then my bank account must be updated. If I didn't get the money, then it will not be updated. The consistency of transactions in an OLTP database is critical. When a database complies with such an a CAD model, then such problems are prevented. Another thing that characterize an OLTP system is the number of transactions. Those transactions can create a high volume on the database when a the, the updated or deleted. Just think about how many people are using the credit cards in one hour. It can be millions of transactions and everything should be recorded without mistakes in a real-time. Secondly, the expectation form applications using this operational data is low latency, meaning fast access to the stored data. If I'm looking for available tickets on a flight while using some website, I will not wait ten minutes to understand which seats are available. On the web application presenting this kind of information should get the data from the database system in seconds. If I'm standing in front of an ATM in checking my account balance should be performed in seconds. Performance is a key consideration in OLTP system. And in most cases, the databases are the main bottleneck for those systems. Third, such operational data can be changed frequently with updated transactions. It is not just about adding more data to the database. It is also about updating the data already stored inside. Think about the inventory information of a retail store. It is updated in a high frequently intervals when getting new orders. This is the first type of data, operational data. This data is usually processed by OLTP systems installed in OLTP oriented databases. Another very important type of data is called Analytics data, and let's talk about it in the next lecture. 5. Analytics Data and OLAP: The second type of category is about analytics data is part of the digitization wave. More and more companies rely on analytics data to make ONE going business decisions by querying historical data and looking for trends over time. It is considered to be the input for business intelligence. Bi, for example, grouping all customers, doing the last deal based on market segments or presenting the top 20 products sold in the previous month by customer between the age of 40 to 50. In many cases, analytics data is based on the same operational data, but in a much larger scale, meaning much more data. This kind of high level aggregated analysis helps to tune the basis focused on emerging new opportunities, identify trends in the market, detect anomalies and much more. On one side, we have operational data handled by OLTP databases. A group of OLTP applications is constantly updating the databases with online transactions. On the other end, we have analytics application that would like to access this data. But the differently, while running more complex a raiding queries on the database, a faux aggregated analysis. One option will be to store in handle operational data and analytics data on the same OLTP database. Unfortunately, an OLTP database is not designed for fetching the massive volumes of data for such analytics queries. At some point alone, granting database query for some analytics use case can, may impact the performance of the database while handling real-time operational data. Not the best approach. Maybe it's feasible to simple use cases. The second option is to use a 10 OLTP databases FOR operational data and another, a dedicated OLTP database FOR analytics data. In such architecture, the workload on one database system will not impact the workload on the second database system. This walk around is working reasonably well for simple analytics requirements. However, during the years, the requirements for handling more complex and faster analytics queries pushed the market to develop new technologies that can better satisfy analytics application. And now we're moving to the third option. Those new systems are cold. Online analytical processing or in short, all up. All up emerged a bit after OLTP. As enterprise companies realized that they need a fast and flexible access to the data stored in the OLTP systems. So the idea will be to take the data from each OLTP system, imagine ten different OLTP system in some organization and copy that data into a different database. That the small, all up oriented the data will be stored in a different structure optimized FOR analytics queries. This new structure is based on multidimensional database structures, known as a cubes in the OLAP terminology. Unlike OLTP systems that are that stores the codes in a simple two-dimensional structure which is more relevant for Way transactions. Now, well, this kind of analytics data is stored. Well, such data is stored in a dedicated centralized database called a data warehouse. And let's talk about the concept of a data warehouse in the next lecture. 6. Data Warehouse and Data Lake: In the previous lecture, we talked about analytics data that should be handled differently compared to operational data. And I mentioned the concept of a data warehouse. In all up database is commonly known as a data warehouse. But what is the meaning? Well, what we say data warehouse. Data warehouse is a system used to consolidate in one single place data from multiple data sources like big data repository. It will have the ability to ingest large amount of data. The data stored in the database warehouse is uploaded for multiple operational systems. This data warehouse will be optimized a for all up use cases committing, granting complex head OK. Read queries. For analytics use cases. One or more ROLAP tools will run analytics queries on the centralized data warehouse. Datasets for a data warehouse can get quite big. The requirement would be to store historical data for a longer retention time. Like for example, all customers or there for the last five years, meaning billions of transactions in larger companies. To understand the scale of the amount of data in a data warehouse. We'll talk about a data set seen arrange form if few gigabytes to terabytes in a typical operational database. But in a data warehouse that can be hundreds of terabytes and even petabytes. One important thing to remember about a data warehouse is that the WHO data being collected from multiple data sources will be transformed to a new structure optimized to analytics use cases that are going to use this data. However, there are some analytics cases that we don't want to change the collected the load data. We want to keep the original structure because we may lose some data, some information about the data doing the data transformation process. Now we are coming to another essential term called a data lake. Data lake is a centralized repository for stalling all structure and unstructured data. A, the idea will be to keep the collected data has his without changing the default data, we can collect installed many datatypes in one places like images, look files, session data, social media information, IoT information, video files, and mole. It's a can be a very flexible a Datastore. One typical use cases of data lakes a is for data science teams when they are trying to utilize the data and create new types of AI and machine learning use cases that can be used to uncover useful business insights. Many typical organization will have a data warehouse and also a data lake because they are used for different analytics use cases. 7. ETL and ELT: We talked about data warehouse and a data lake as terms being used to describe data stores that are used to handle analytics theta. Now, I would like to talk about the methods being used to move data between systems. For example, the the task of moving data form an operational database into a data warehouse, so a data lake. The two well-known method are called ETL and ELT. Okay, starting with ETL. Etl stands for extract, transform, and load. It is the procedure or pauses of extracting data from one system called a data source. Transform the data into a new structure, and then load the data into a destination system like a database. It is important to remember that some level of transformation is applied to the data before it is loaded into the target system. For example, an ETL process can establish an FTP connection to a remote data source and copy one Omo, a row CSV files, okay, over this protocol. So this is the first step extracting the data from the source system, the OData will be copied to some temporary staging area as part of the overall process, the next step would be to perform some data transformation like maybe data cleaning, changing the data structure so it will be more optimized to the new use case. For example, here it will be changing the structure form CSV format to JSON format. Finally, the last process will be to load an insert the data into the final a target database using the new require this fracture. A data warehouse is a classic example of using the ETL method. The data sources are a group of transactional system. Then multiple ETL processes will be used to extract the row data, those transactions transformed them into the required a new database schema. And finally, it will be loaded into the data warehouse. The next option is called ELT. Elt stands for Extract load N Transform. It is an alternative for the ETL. We are using the same elements but in a different order, meaning extract the OData four-day datasource and loaded into the database. Without transformation. The data is not transformed on the entry point to the target system. It will be stored in its original arrow format. The systems querying the data from the target database will handle the data transformation. Each system may perform a different data transformation while loading the debt load the information from the database. The classic use case of ELT is when building a data lake, just extract the data from the data sources and load it into the data lake without performing the data transformation. Now what are the benefits of using ELT instead of ETL? Okay, first of all, it is easier to create and implement such ELT process between two system because we don't care about data transformation. Secondly, by not changing the row data to a specific new structure that is optimized to existing use cases. We keep the flexibility to handle a future use case that may not be feasible if we change the row data. Okay, think about an application that requires to identify the list of objects in an image, okay? They're all data is incoming images. As part of the ETL process, we can load the image from the data source, runs some process to identify the objects in that image. This is like the transformation phase of our ETL process. And then only store as simple text, the list of objects identified in that image. That's it. Meaning we are not storing the row image files. As you may imagine, it is very efficient methods for reducing the amount of data we're only storing what is relevant for that use case. However, maybe in the future, any application will want to identify the list of a car overdosing those images. But we are not storing the images. We decided to transform the whole data, install it in a different format. In that case, we can consider using Data Lake and then load those row images using ELT process instead of ETL. It will help us to be flexible in the future. And other benefits of fealty is about there faster processing time because data transformation is usually a time-consuming processes that disadvantages when using an ELT method. And instead of phi ETL is first of all about the storage space. We need more storage space to handle the data. Secondly, the data is not organized in optimized for specific use cases. Okay, so it's a, it's not a 0 or one. We need to carefully decide which option is irrelevant in a specific application. 8. Batch and Stream Processing: Another important element to take into account is the frequency of moving data between systems. In that context, there are two methods of moving data between system. Batch processing in stream processing. Let's start with batch processing. Batch processing is used when we want to move large volumes of data as a chunk to a target system. Typically doing of peaks at times in his scheduled intervals, like every 24 hours at two o'clock am bid night. A batch process is useful when the target system don't require real-time data and are designed to wait an hour, a day, or a week, or even more than that. For the next loading interval. Think about a cell's reporting system used to generate the cells report for multiple stores at the end of the day. In that case, the sales transactions can be copied from all stores into a centralized database as a large batch process, ok, once a day over the, over the night. And then the application will be able to generate reports on this data. This is option number one, batch processing, very popular option. The next option is called stream processing. This option is becoming more and more popular as applications are moving it to real-time use cases. And there are more streaming data sources. For example, sensible generating an event every second. And those events must be presented in real-time using some dashboard in some application. Stream processing is designed to handle a constant stream of data. Today, stream processing is becoming more and more important because companies would like to move into real-time analytics use cases. Okay, waiting to the next batch process is not good enough for real-time use cases. In some cases, developing well, we combine the two options in parallel, ok, in some system, meaning the batch and stream processing into one architecture, it is called the Lambda architecture. It is designed to handle massive quantities of data by taking advantages of both batch and stream processing method. By using this approach, data collected for a data sources would be duplicated into two separate data processing system. And then the application will be able to consume real-time data using this three processing, but at the same time, access more consolidated data using the batch process, say option. 9. Scaling Up and Out: The next important term I would like to quickly review is scaling. Well, looking on a typical software application, one of the main challenges to take into account is system scalability. Schelling is the process of managing their underlying IT resources, being used by the application to meet a set of performance requirements. Ideally, sources like computing power, memory, storage, and networking. When there are not enough available IT resources to handle demand, okay, traffic demand, there is a chance to impact the application's requirement. Or when there are too many resources that are not used. In some point of time, we will waste money on underutilized resources. It is a careful balance. The goal is to meet our defined performance requirements while optimizing the utilization of the IT resources. And in most cases, this is an ongoing process. Application are going to write data to the database and then read the data from the database. Going amount of cases, we are talking about a massive amount of data. Any delay in processing requests than providing a response will dramatically impact the application using a specific database. So databases are considered to be one of the main bottlenecks of applications because data is typically stored in disk that are much slower than a computer memory, it is extremely important to scale database components to prevent such performance issues that are impacting a application. So scaling IT resources is essential, that's clear. But what kind of options we have? Well, there are two main options when performing Scaling. Vertical scaling, an horizontal scaling. Let's start with vertical scanning. Vertical scaling is also called scaling up or down. Scaling up is the act of adding more IT resources to a single specific computing node, ok, the most relevant resource type is of course, a virtual machine, or maybe a physical server. When allocating a virtual machine, we are explicitly selecting computing profile, meaning the number of CPUs and memory capacity of that virtual machine. Scaling Up is about adding more resources such as CPU omega will lead to a single instance. And also the other way around scaling down is reducing the amount of CPU and memory. In many cases, such a vertical scaling is a useful option to solve all kinds of performance issues. Just increase the memory capacity or add more CPU power in a specific virtual machine. And you can. Quickly solve some performance bottleneck. So if we have a database running on a, on a virtual machine, we can scale that up and down. On the other end. Vertical scaling has some disadvantages. We can't always increase the resources of a virtual machine at some point of time will encounter some limitation in the underlying IT environment, okay? Or maybe moving to a bigger machine will be so costly that it doesn't make sense. Another disadvantage is that when we are, when we choose to scale up or down a Virtual Machine by selecting a different instance size, we need to reboot that virtual machines, okay, meaning able day the database that is running on that virtual machines. And this is not always an acceptable situation, especially if we need to change capacity in high frequently intervals, okay, let's say that we need to change the scale up and down every few hours. Okay, it doesn't make sense. Ok, can you think about an application that will be that except that the database will not be available because the the server running below must perform a reboot. It's a problematic situation. If we're doing the scaling once a week, maybe that's okay. But if we plan to scale the system every day or even every few hours, it, it's not acceptable. The next scaling option is horizontal scaling, also called scaling out, ok, or in. Scaling out is about adding additional nodes to support the load on some cluster. Instead of adding more capacity by making a specific node more powerful. Okay, we had the Powell by increasing the number of nodes, okay, it is called distributed computing. Instead of allocating a much more powerful virtual machine is we are doing in a vertical scaling. We can add smaller virtual machines running as a cluster in parallel. The application can better distribute the traffic load on multiple instances that are performing the same function. Okay, it can be more linear, goat, which would be translated into more cost-effective option. Now, why this is necessary in our training, okay, talking about scalability. Well, one of the biggest challenges in databases about scaling, okay, application or using much more data today, the load on many application is dynamic and traditional databases are not doing a great job when we talk about the scalability. We'll see later on by the development of new database technologies was driven mainly by the requirement of providing a also horizontal scaling and not just vertical scaling. 10. Section 03 - Database Technologies - What, Where, and Why?: Hi and welcome back. I hope that the previous section provided you some background about the basic terminology on databases. Anyway, it was a soft introduction. Now, I would like to move on and start to talk about the main types of database technologies. What options are available, where they are typically used in why a specific database technology is suitable for a specific use cases. Okay, starting with a well-known relational database or so-called SQL evolution to non-relational database called the no SQL. And then the latest revolution to distributed SQL database will also talk about an inmate movie database and time series database. 11. Traditional Relational Database (SQL): We will start with the well-known relation database, or so-called SQL database. The additional SQL database have been around for decades and serves as the core foundation of nearly every application we use today. They are suitable for many types of applications. A relational database stores the timetables, a group of tables. Each tables is looking at like an Excel sheet, meaning a Taboola view with rows and columns. A specific table contains some fixed number of columns, each of a specific type, a relational database. We live a pre-defined schema that describe the structure of each table inside the database. We need to define and create that schema. When creating a new relational database. The actual application data will be then be stored as rows inside each table. When data is coming in, the data must be stored in the relevant table. Again, based on the pre-defined schema. One or more columns in each table are designed as the primary key, meaning ischial in a table has its unique primary key combined with those columns, it can be one columns or multiple column. The key is used to access the data in a specific table. Secondly, rows in a table can be linked to groves in other tables by installing the rows unique key to which it should be linked. Okay, the unique key is known as foreign key. For example, let's think about the needed database, relational database for an online shop. One table will be used to store a list of customers. Each row is a profile of a single customer. The columns of that table will be used to store a variety of information about each customer, okay, like a customer ID, ID number, name, phone address, etc. Those columns are called attributes. The customer ID number can be used as the primary key, okay, because it's going to be unique number. A second table will be used to store information about the list of products available in that online shop. And the third table will hold the list of on-going orders okay, created by those customer. When some customer makes a new order, it will be stored in a new role in the oldest table. And it makes sense to keep a single foreign key of the customer that made that order. Instead of replicating the customer information inside each order. In the same concept, we can have additional keys, additional foreign keys for the product items inside an order. Those keys between tables aggregating the relational structure. For example, a single order must refer to a single customer. A single customer can refer to many orders, et cetera. The most popular language to access and store information in a relational database is SQL language. Called Structured Query Language. Sql is used to query, manipulate, and provide access, say control. It is straightforward to use by a developer, very popular. It when they are accessing data from the application code. An application can issue an SQL query against one or more tables to wait or maybe updated the data. Relational database are very popular options specifically when handling operational data, meaning transactions. The predefined a database schema model is helping to avoid arrows and keep things a very organized. The keys inside the tables are providing a high level of flexibility for developers to access a data efficiently. Some of the most popular SQL Database solutions are, of course, Oracle Database, MySQL database, MariaDB database, Microsoft SQL servers, a postgres SQL, and many moles who is using relational databases. It is important to distinguish between the two main market sectors that are using database technologies. One market sector is enterprise companies. A typical enterprise company will have many types of application installed on-premise in Dell private datacenters. Those enterprise-level application are using various database technologies. But the dominating database technology is a relational database. The second large market sector of database technologies is related to internet players running websites, those Internet web application, which I'll serving millions of customers worldwide. They will have much more challenging scalability issues. They need to scale databases to support a large number of concurrent users. Think about a website that suddenly handles to a 100% increase in traffic in just few minutes. Also, those websites are expected to be online and available all the time. And we'll see in a few minutes that a traditional relational database, our mapped so great feat for those Internet tape players. One of the biggest challenges with traditional relational database is scalability. The relational model and the requirement to handle in guaranteed transactions. Doing the writing process is creating substantial override on the database system. As a reminder, scalability is the ability to change database capacity to better handle the application's performance requirements. We can scale a traditional relational database only own one vertical dimension, meaning scale up or scale down on a single server. It is useful in some cases, but there is a limit on how much we can scale up a single server. For those web, internet plays. The traditional relational database was found to be a constant bottleneck. They tried the options of scaling the database vertically by moving the database to a bigger machine. However, this just delay the little bit, the next performance bottleneck. In addition, moving the database from one server to another is a complex process that requires significant downtime, which is unacceptable for such web applications. For web applications, availability and performance. Critical requirements. They need to be online all the time and have to support a large number of concurrent operations. It was clear that some, looking on historical perspective, that a relational database is an overkill solution, heavy-duty database solution that is not optimize to some of the use cases of web application into those challenges turn out to be strong forces pushing the web in the street to embrace a new type of database. And as you may guess, it is about non-relational database, which is the topic of our next lecture. 12. Non-relational Database (NoSQL): As I presented in the previous lecture, the challenges associated with relational databases pushed the industry to develop a new type of databases called non-relational databases. Big internet players like Amazon, Facebook, Google, and others, faced the challenges of handling a huge amount of data. And the industry traditional approach to scaling up relational databases system was inefficient and expensive option for them. Therefore, they developed a group of database technologies. It is designed to scale out and achieve the high availability needed to support a web-based application. It was a significant change in the database design concept. Instead of focusing on strong Database Consistency and easy access to the data using relational model. Let's focus on two main things, scale out and availability. Those a non-relational databases where published as open source projects and the rest is history. Today there are a variety of commercial and open source, say, non-relational databases. Some of the popular databases, Google, Bigtable MongoDB, Cassandra, Amazon, DynamoDB, a Zuo, Cosmos DB, Apache HBase, and many others. What is the meaning of a non-relational database? A non-relational database is a database that doesn't follow the predefined Taboola schema with multiple tables that we saw when using a relational database. It does not impose a strong schema model using a complex relational model. Instead, it is using most simple data models when storing data in the database. Those simple data models will be more optimized for specific use cases. It is also known as no SQL database a, which stands for no support for SQL. It's a strange definition because it's saying that the database can't do something specific, meaning it can't end the SQL later own the meaning, evolved it to not only SQL when some of those databases started to support SQL, but in a limited way. Anyway, this generic labeling, NoSQL, is used to emphasize that such database are not using a relational model and predefined complex schema to store the data. No SQL database can be classified into the following four category. Key-value document, white column in graph. Each one of them is suitable and more optimize to handle specific use cases. We will have a dedicated Lecture on each database type in the next section. At this point, let's talk about the high level concept. Flow SQL database. And the advantages and disadvantages in a SQL database, the incoming or collected data stored in the database must be structured using a pre-defined schema. The schema structure is applied to the incoming data, meaning the information is stored within specific tables, columns and rows. If I collect customer profile data from a new customer, it must be saved in the relevant table. If that customer makes an old model, then the new order will be stored. Or another fable. Everything is fully organized during the writing process to the database on the other end. And no SQL database allows storing data in a simple data model. Without schema. The schema is used by the application only when it will access the data. This process is referred to as schema on read, schema on read instead of schema on write. The second is about speed. No SQL database can write and read data much faster than a relational database because they are using very simple data models. The third is about scalability. As I mentioned, one of the most significant limitation of SQL databases is about scaling. It isn't straightforward to perform scale out when using a relational model. In a relational model, data is typically concentrated on a single node, on a single physical or virtual server. And databases can only scale up by increasing existing hardware, getting much bigger boxes. No SQL databases were designed around this concept of scale out, scaling data in a distributed environment, distributed database in multiple servers working together as a cluster, adding and removing nodes from the cluster based on traffic load and storage requirements. As you may understand it, distributed database is much more powerful than a database sitting in a single machine. And no SQL database has the visual benefits of providing better availability when using a distributed cluster of nodes. There is no single point of failure. When a server goes down, it can be quickly replaced by other servers with no application disruptions. Also, we can use a data replication across the cluster, meaning stalling multiple copies of the same data in multiple nodes. To ensure high availability is always you win something and lose something. Let's talk about the disadvantages. First of all, no SQL database, fundamentally not transactional. They don't comply with them a CID model to achieve better scalability and whether performance they give up the strong consistency offered by a relational database. No SQL databases. I'll based on a weekend consistency model that is called the eventual consistency model. Let's open that side topic for a minute. No SQL database is designed to one on a cluster of nodes, meaning a distributed database, where we store a piece of data in a NoSQL database. The database management system must replicate this piece of data to several nodes to achieve high availability. If one node is going down, then the data is still available form other nodes. The eventual consistency model is about this data replication process, which is complicated in a distributed database. It means that when we save something to the database class till it may take some time until all nodes in that database cluster will be in sync with the latest replica update. The same data replica will eventually propagate to all nodes, but the database will acknowledge the writing action before this process is finalized. A caused the cluster. Finally, after some convergence time, all access to that new data will return the last updated value, meaning the same value. But doing that convergence time, applications leading data from the database cluster may receive different results while running the same query because the data replica is not synchronized yet a course, the whole cluster, that's the meaning of eventual consistency. It may take some time until things will be fully aligned. As you understand this type of a weak consistency when saving in reading data is not walking for transactional data. Therefore, no SQL databases are not such a great solution when handling transaction. If we check the main use case of no SQL databases for the enterprise market, it is mainly a solution for storing Analytics data. All operation data is still running in a sequel databases. As you may guess, it's not the last evolution step in databases. What if we could have the best of both worlds? Okay, let's talk about distributed SQL databases. In the next lecture. 13. Distributed SQL Database (NewSQL): In the last two lectures, we talked about the limitation of traditional SQL database and also the limitation a no SQL databases. In a SQL database it's about limited scalability and lower is zillions to fail is we can only scale up as everything is running on the same server, single node. On the other end, traditional SQL databases can handle transactions, which is a common requirement. In enterprise level application working with operational data, NoSQL databases are designed to scale out using a distributed architecture. They are a great choice for applications where availability, meaning getting a response from the database is more important than getting a consistent, correct response all the times. Therefore, no SQL databases will have limited support for handling transactions. Secondly, no SQL database. I'll not optimized for sequel queries because it is using a very simple data models. It's a trade off between options. You win something, you lose something, steal. The lack of supporting transactions is a huge limitation in no SQL databases. Developers are investing more time in lines of code in their application while trying to handle the lack of fit transactions. And as a simple example, I will use a nice quotation from a Google Paypal saying something like that. We believe it is better to have application programmers deal with performance problems due to other use of transaction as bottlenecks arise, rather than always coding around the lack of transactions. Anyway, the market demand for Sequel databases that can better scale while still supporting transaction pushed the industry to serve this technology gap. And the latest interesting step is part of the evolution of databases is called distributed SQL or new SQL. Nosql databases emerged to make SQL scalable, adding the ability to scale more than online transaction processing workloads in a way that is not possible with traditional SQL systems. I like the Wikipedia definition of new SQL. Nosql is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing workloads, maintaining they SCID guarantees of a traditional database system. Their motivation would be to combine the benefits for merit relational database management systems which are using a relational data model. Strong consistency and fully features SQL with the benefits form no SQL, meaning horizontal scalability and high availability while using distributed computing. Using this new technology, we can have a single logical relational database instance that is distributed on a global scale over multiple geographic regions. Data will be replicated into multiple nodes inside the cluster so that the single node failure will not impact the database availability. If more capacities they did, we can scale out the class there and add more notes. Today, the multiple commercial and open source distributed SQL Database solutions, such as Maria DB, Amazon now oh, Google's span L, single-story be VLDB and many others. Each one of them may use a different combination of database technologies to achieve the goal of a SQL distributed database. Still, many of the core functionalities will be the same. Meaning database that is trusted for supporting SQL storing relational data. It can be a geographically distributed running a database on multiple nodes, scale on-demand. And of course we are talking about scale up in, also scale out acid compliant in shoe transactional consistency. And last one is all kind of deployment options. Say we can run it on premise self hosted cloud or as a managed service, something that we'll talk later on. Anyway, the concept of distributed SQL databases is still evolving in the industry. And every database window will have some unique value proposition with different features and options. In general, those distributed SQL databases are an overkill solution for many types of most simple applications. For the enterprise sector. It is definitely something to consider when building a web scale application with real-time analytics application and high frequency OLTP workloads. 14. In-memory Database (IMDB): We talked about SQL database, no SQL database and also distributed SQL database. It's all about supporting or not supporting relational models. There is another dimension to describe a database. And it is related to how the database stored the data. And the most common option to store the data in a database is using the simple disk. It can be based on a technology is like SSD or HD. In that case, the data is persistent, meaning it will not be erased in case the database server will perform a reboot. However, as I mentioned before, when looking at the, any database that is storing data on a disk technology, the disk access time is usually the main bottleneck point. It is slow compared to operation that can be performed in a computer memory. Therefore, some modern database management systems can run the database and store the data completely in memory. This approach is called an in-memory database. And in-memory database may be a relational database or non-relational database. An in-memory database or in short, the IMDB, is a type of database that relies primarily on memory for data storage. In contrast to a database that stores data on a disk, it is designed to take advantage of large memory capacities that are available today on more than an up computing systems. What is the benefit cert, okay. It is first of all, eliminating the time needed to query data from a disk. Each time we query a database or update a data in the database, we only access the main memory, which provides much faster and more predictable performance. Risk. The downside is that memory in a computer is much more expensive than a disk storage. For the amount of data will be much smaller in an in-memory database compared to a disk based database, we need to consider that an in-memory database system may not always be able to fit the entire dataset in the available where movie in-memory database is a great solutions for applications that require microseconds response times, and can have large spikes in traffic coming at any given time. One of the top use cases of an in-memory database is the caching and also when the linger web session data. And so one, some well-known in memory databases are there, read this Memcached, a apache ignite, volt DB and others. 15. Time Series Database (TSDB): Our last type of database is called a time series database, or in short, TSDB. It is considered to be one of the fastest growing database categories for the past couple of years. Today, the world is made of data. Almost any physical objects produces a relentless stream of data all the time. Everything already uses or will use sensors to measure something. Just think about a wave of Internet of Things with many use cases of objects that is continuously generated data. It can be sensors in, on machines, wearable devices for personal health management, environmental sensors, performance metrics collected form IT devices, servers, logs and more. There are endless use cases of devices generating data. In all those cases, the idea will be to monitor, analyze, and explore interesting real-time, any historical patterns. These type of collected data has a typical dimension that is very important. And as you may guess, it's about time. For example, every measurements a collected from a temperature sensor must be stored in the database with a timestamp, the date and time when this temperature value was measures. Otherwise the measured value will be meaningless. Time is a critical pivot point when analyzing the data. Now, usually 1 of measurements is not useful it meaningless. We would like to analyze a series of values where the x axis is time. Then we'll be able to see the state change of information of the measured objects and look for interesting patterns when modelling installing time series data, there are three main pieces of information. The timestamp, the subject, for example, a specific sensible and actual measurements. The data can have regular intervals like every five minutes, and also fluctuating or random events. For example, collecting temperature recordings for MS from sensors would be typically recorded at a repeated time interval, like every five minutes or every hour. On the Otherland, a picture taken by a camera located in a forest. Web code. Beds will be triggered by a detected movement. It's a random event, but the time is recoded. Doing the event, in many cases, May 1 draw things that describe this kind of data is about data volumes or better to say, Hive data volumes. In that context, we need a database technology that can handle I frequency data, right? Of millions of small datapoints. When data is coming in. From a developer perspective, we need the functions and API from the database to filter aggregated data as easy as possible. One option is to use a generic, traditional relational database. And for many use cases, it will be fine. It's not something new to store a timestamp as an additional column in a table. It's a common practice. However, you still have many limitation to consider like handling high data volumes and providing fast searching queries on the stored data. The specific challenges that are raising today with the explosion amount of data may require specialized database systems. And now we're getting to the topic of this lecture, which is about time series database, say in short, TSDB. A time series database is a specialized database technology evolved to better accommodate the specific requirements of handling time-series data. As the name suggests, a time series database is optimized for handling time series data where each entry in the database is associated with the timestamp. So what are the advantages here with this kind of database? When dealing with time series data? A TSDB will be more optimized from a performance perspective. It will automatically store data in my movie for fast access and movie historical data to a disk storage. It can handle I volume data ingestion, meaning writing to the database. It is optimized to scale very well and provides millisecond level query times when working with a large number of data points. A time series database will be easier to use by developers. It is typically includes built in specialized time-related functions while using SQL. Ok, built in functions for data aggregation and running analytics queries to identify trends and patterns. Bottom-line time series database are optimized as a better solution for handling time series data. Some well-known time-series database, ours in flux db, okay, very popular options, Amazon time stream that is provided this database as a service will talk about it later. Promised you say k db plus an open TSDB and others. 16. Section 04 - Types of NoSQL Databases: Hi and welcome back. In the previous section we talked about the leading database technologies like a traditional SQL noSQL, distributed SQL in-memory database, and time series database. As I mentioned, the term NoSQL database is too generic because there are different database models being used to store data in the database under the NOSQL category. And it is essential to understand each one of them. They'll fall. In this section, we'll talk about the four types of no SQL databases. Meaning key-value database, you mains database, wide column database and graph database. Each one of them is tuned to a specific set of use cases. 17. #1 - Key-value Database: The first category of no SQL databases is called a key value database. Key value databases are the simplest type of no SQL database. Key value databases store data in a collection of key value pairs. As a simple data model, we have a key and a value associated with that key. Together, it is one pail of key value. Unlike a relational database is, there are no tables, there are no attributes associated with the tables, such as columns, and there are no constraints on columns. A value associated with a specific key can be a number, text, string, JSON, format, or even binary data like an image or a video file. Unlike a table in a relational database, the values don't have to be uniform between the rows between different pairs. Each value in a key value pair can have completely different types of fields. It can be the name of a person. In the next key value pair, it will be the phone number of an image. Now there are three primary operation performed on a key value database. Say the first one is called PUT, used to add a new key value pair to the, to the database or update a value if this key is already present, we just need to make sure the key is unique. The next one is get used to retrieve data stored in a particular pill. We need to use a specific key. This makes querying a key value database fast because it is using a simple model. There is no query language needed like SQL being used in a relation that the base give me the key eta will provide you with the value, can be very fast access time. At the same time, it is also a disadvantage of a key value database because there are no way to perform complex queries, as we can do with SQL, the only way to look up a value is by using a key. And the last one is delete. We can easily remove a key value pair from a database using a key. What are the main use cases for this kind of technology? Key-value databases are used whenever quick access time is needed for large quantities of simple data. These highly optimized for applications performing simple lookups using the key or by a range of keys. On the other end, key value databases are not optimized. For situation where querying of filtering by values is essential. For example, with a relational database, we can find records by using the option in a select statement and filter the non-key columns. It is not supported in a key value database. Key value databases are frequently implemented as a caching solution in front of other databases. To be more accurate, The d would be a combination of a key-value model and then in-memory data store. Another nice example is a shopping cart on an e-commerce website, each user login and starts to add products to a shopping cart. The data is relatively simple, is just information about the items a customer is interested in purchasing right now, doing some peak traffic time, the website may receive millions of customers playing with the shopping carts. Therefore, a key value database can be an excellent match. The key will be the customer identification and the list of items added to the shopping cart will be stored as values associated with that key. In case of high traffic day, these key-value database can be scaled out to handle millions of users. There are a more use cases like managing productive recommendation's a comments on a website, caching data before storing that in other database. And more. Some well-known key-value databases are rare. This Amazon, DynamoDB, x2, Cosmos DB, memcached, and others. As you will see later on, some databases or using a more than one data model type. For example, Amazon DynamoDB is a key value database as well as a document database. So it's a combination of multiple technologies. 18. #2 - Document Database: Our next type of no SQL database is called a document database. Document databases are one step up in complexity. A form, a key value database. The concept will be to group key value pairs into objects called documents. We should not be confused with the word document. It has nothing to do with, for example, Microsoft Word documents or something like that. Each document is basically a file that is encoded using some file format, the most commonly used hours of JSON, XML or YAML. Each document is given a unique ID like a key, and we can use it to pull out a complete document from the database. As an example, a JSON document is just a group of key value pairs as simple text. Each a value could be a number, a string, or maybe a compound element, such as a list of values. Unlike a relational database where every record contains the same fields, living unused fields, empty. There are no empty fields in a document. A document, a database as no fixed schema. Each document is a collection of unique fields. We can add or remove data from specific documents without affecting the other documents. Therefore, it is a very flexible option to store a specific type of data. A classical example of a use case for a document database is for managing a product catalog. Some of the attributes between different products will be the same, like their lines in blue that we see product ID, product name, product description, et cetera. But we will also have many unique attributes per product type. For example, a car toy fully kid, and a drilling tool will have different unique attributes under the CARTO, we will have, for example, the age range or distance and other the drilling tool, we have other attributes like the voltage number or the speed in rpm. And we will still want to manage them in the same product catalogue. Trying to manage this product catalog in a relational database, it means we will need to use many attributes as placeholders. Many columns will be empty in many rows. On the other end, document databases are a more optimized option for storing such catalog information. Each product's attributes can be described in a single document using the relevant structure. One structure FOR a toy and a different structure for a two, like a drilling tool. And we can easily change the attributes of any product without affecting other products either documents. This free-form approach provides a great deal of flexibility when there is a change in requirements all the time. Some well-known a document databases are MongoDB, Amazon DynamoDB as x2, Cosmo DB, coach base, CouchDB, a Google Firebase and files though and many modes. As you can see, some names of databases are repeated like Amazon, DynamoDB, which I mentioned in the previous lecture. It is because some of those databases supporting multimodal, they can be used as a key value database, document database and other data models supported by the no SQL database. 19. #3 - Wide Column Database: Our next type of NoSQL database is a wide column database called a column oriented database. Columnar database or ColumnFamily database. As you may guess, it's something related to columns. When saying white column a database, it means it is designed to support the options to handle many columns, many attributes. Traditional relational databases, I'll low oriented, meaning each row will ever go Key and each field within the rho will be stored together in a table. Row is a record stored in the table and we can easily add any additional rows in a single table. A single column represents an attribute or a field inside the row, in a row oriented database, like firstName, lastName. And when looking on a specific column, we will see the actual values like care, Aidan Bohmer, maria, et cetera. On the storage level, the data appears one dimensional. It's looking like one big line with one row after the other, so each row has the same size. So the database system can easily jump between rows. Assuming I would like the database to provide me the first name and last name of each person in that table. The database, we scan the table, foam top to bottom and from left to right, reading the full, whole structure, regardless of which columns we required to perform the query in each line, then it will filter out the columns that are not relevant. Okay? Imagine we have millions of hours, then this process will be an intensive computing task. As a reminder, I just wanted to query specific columns like their first name and last name. But the data is stored in a row oriented structure. Let say one row has ten K byte size, we have 1 million rows. Then the application will read from the database ten gigabytes of data just for these simple query. Now let's assume the two columns I wanted, meaning the first name and last name. Our only two k bytes of the row size. If I can read just those two columns, then the amount of data will be reduced to only two gigabytes. This is a big percentage reduction. So let's think about an alternative to store the data in the database. And as you may guess, it's about columns. A wide column database flips the storage system. Now, a row is a list of values related to the same column. Take a look on this table below. A row represents a column or an attribute related to the first staple. The firstname here is a column. And over the second table, the firstName is now a row. The storage level, the data appears again as one-dimensional. It's looking like one big line, but this time the data structure is different. The original columns before the flip, I'll now organized as rows, okay? The content of each column is now stored as a sequence of values in a low. It's a column oriented database when running the same query to get two columns of, for example, first and last names. The database, I need to read only two loaves because the columns I'll store this rose to make the database query even more effective. The columns sometimes organized into groups, a cold families, okay, it's stored in the disk. So on the disk or the columns within a column family will be stored together in the same file. This approach can reduce the amount of data to read and load from the disk. And therefore, dramatically improved performance and query time, which is a critical things for big data analysis in practice, such databases are well suited for OLAP oriented workload, which typically involves high complex queries. Another advantage of such a column oriented database is the possibility of a data compressing because the data in one column is always, will be of the same type. For example, a string or an integer. In that case, it is easier to compress the data. And other essential thing is scalability. Like other NoSQL databases, a column-oriented database is designed to scale out using distributed clusters of low-cost hardware to increase the throughput, making them an excellent match 4-way data warehousing in big data processing. However, writing data into a column-oriented database is a slow bosses, okay, transactions must be separated into a columns and compressed is they are stored. It makes the row in the database a much better solution over the column-oriented database for online transaction processing, okay, for OLTP. Some examples of a column-oriented database is our Apache Cassandra, HBase, ASU, Cosmos DB, Google, Bigtable, et cetera. 20. #4 - Graph Database: The last category of no SQL databases is a graph database. As the name suggests, this approach represents data as a graph. And it is specifically useful when the connection between data items are as important as the data item themself, a graph database, a managed to types of information nodes and edges. Both components are represented visually as points and lines. And node represents an entity like a uses a product, a category, or any piece of data. For example, in a social network, these could be people. And edge represents how to nodes on the graph associated specified the relationships between them. Each relationship is represented by a line, and the line has its own properties. And edge connection will have a start node and also an end node. And edge can also have a direction indicating the nature of the relationship. For example, like Sadat is the manager of a look, a look at MLK, friends in a social network or graph the relationships between people. A fundamental part of the information. A relationship can be many things like a Friends of manager of like or dislike of something, a customer of a brother off, et cetera. Okay, many types of connections between entities. The purpose of a graph database is to allow an application to efficiently perform queries that are running inside the network of nodes and edges and quickly analyze the relationships between entities. The result is a high performance speed even for very complicated queries. For example, running a query like find all friends of MLK and present their habits, okay, the database can use mark as a starting point search for Mark's friends using their friend of edge type, which are there in our example, alloc and sour, and then check the habits of each one of them using the leakage. Let's talk about the typical use cases. The first one is recommendation engines. Some websites are using graph databases for recommendation engines. The graph database stores the complex relationships between information categories such as customer details, customer interest, purchases to refer ence, et cetera. The same fall will be then be used to make product recommendation to other users following a similar purchasing flow. For example, if you buy a few accessories for a new bike, the system can lend a pattern that the person who owns a new bike is looking for such products and recommend them to other people. I'm providing that example because I started to search for accessories in Amazon for my new bike. And it was a little bit surprising that Amazon recommendation service successfully anticipated which accessories I'm planning to buy even before searching them inside the website. Another typical use case is for detection or cyber-security detection. Such systems a US relations between small pieces of information to detect some useful patterns, some anomaly. For example, multiple people are sharing the same IP address but residing in different physical addresses. Ok. This is a little bit strange. Or a person logs into his own a bank account from a foreign country, which can be a legitimate activity unless ten unrelated people are trying to login into the same account over, for example, the last day. Rough databases are designed for data that is highly interconnected with many relations. We mentioned the social relation, but it can be other domains like a roadmaps, network, topologies, et cetera. Some example of a graph databases are them are x2, Cosmos DB, orient DB, Amazon, Neptune, graph DB, et cetera. Keep in mind that some of those databases are multi-purpose databases. Therefore, they will appear in more than one time when looking on no SQL databases, we will see later on, for example, that the ASU Cosmos DB is actually completely multipurpose databases, which is supporting all data model under the no SQL database category. 21. Databases Ranking Review: A useful website that I wanted to show is called DBA engines. And specifically here a webpage called db engine ranking. This page nicely represents an updated ranking analysis of database management systems. The first ten lines are the top ten databases. The first three columns here will show us the current and historical ranking. For example, with a little bit surprise. Or a Cal relational database is still one of the most popular database technology. Even though a, to the increasing number of open source, say, SQL databases, I assume it is because many small to large enterprise companies and many software vendors selling software tools. I'll using Oracle database for many years. It's not simple to replace it. Still, we can see a constant drop in the scoring right after it. We have one of the most popular SQL databases for websites called MySQL. Mysql is free and open source software solution own today by Oracle. It has a free version and also a commercial version. By the way, I can click on it and I will get a useful summary about this database with a list of properties and features. For example, I can see the list of supported operating system going back to the main list. Next we have Microsoft SQL and postgres SQL after MySQL. And ranking number five. A is a NoSQL database called MongoDB, which is a document oriented database. While scrolling below, we will be able to see more types of databases. For example, here we have graph database and this one called HBase is a white column, a NoSQL database. Influx the bee is a popular time series database and so on. If you see a specific database down below in this list, it does not necessarily point out that this database is less or better than others. Some databases are more optimize two specific use cases, okay, with the small niche market. And therefore they will, they are getting a lower ranking. So it makes sense to get the ranking Pair Database category. You can achieve that by selecting the category here on the left side, I will select the time-series databases. And now the ranking shows information just about this category. And I can see which time series databases, the most popular, okay, influence w0 is getting the highest points by fall compared even to the second third places. We can also click here on ranking by database model to get a nice pie chart of the database modal distribution, it makes sense to use the pie chart below with a percentage, a number. As we can see, the most significant portion is still a relational database. The next slice is document databases scrolling below another nice chart is the trend of the last 24 months. We can see that the market segment for time-series databases is going with the stone trends. And the last one is Open Source versus commercial databases. When looking below at the epipolar trend, it is interesting to see how they are correlated. Of course, the market is moving forward to Morgan more open source databases compared to commercial license based databases. As calling below, we can see the top five commercial databases in the top five open source databases. That's it about this nice website to understand what's going on in the industry. You know, on the macro level. We also finalize the types of no SQL databases. Say, see you again in the next section. 22. Section 05 - Database as a Service (DBaaS): Hi and welcome back. Thanks for watching. So file. Until this point we cover the main database technologies like SQL, NoSQL, and distributed SQL, et cetera. Each category is more suitable for specific use cases. And it has a variety of open source and commercial license based software solution that we can select. Now the next question is about how to use a specific database type. The traditional, most popular option is do it yourself. On-premise installation, meaning I'm buying a commercial license or download an open source package, install it on a dedicated physical or virtual machine. Then I need to set up, configure, manage, and monitor database instance using the database management systems. The next option will be to allocate virtual machines in a public cloud environment like AWS, a zoo, all Google Cloud and perform the same steps. It is part of the infrastructure as a service model offered by all cloud providers. This time, it is a database instance running in a public cloud environment. Instead of running in a private on premise data center. Two options require that will be responsible for managing the database instance, will talk about it in the next lecture. But database management is a huge overhead in a complex task for many organizations. Therefore, a spout of the platform as a service model. Cloud providers say, basically, let us handle the overhead of database management for you. Just, you just let us know what you need and who will do the rest. We means the cloud providers. We will allocate the needed resources. We will perform a software updates, take care of security, data replication, monitor the load on the database and perform system scaling automatically. You will pay for the time use the database instance. It is the core concept of databases service ON showed a DBS. In this section, we'll discuss the challenges of a traditional database management and the concept of database as a service and the main benefits. Then we will review the database as a service portfolio of the leading cloud they provide us. 23. The Challenges of Traditional Databases: As a starting point, let's talk about the business challenges associated with traditional database management. Today, more than ever, organizations are handling a massive amount of data being generated from a variety of data sources to be used in different use cases. It is the era of data explosions. Data is everywhere. It is creating new business opportunities for organizations that can handle this data in a smart way. The amount of data to be collected, stored, and analyzed is creating all kinds of operational challenges. These data must be stored in databases and databases costs money. The money invested in software applications say IT resources in manpower. It's a question of storing data in a cost-effective manner, using databases and still provides the business benefits. Let's talk about the traditional approach for database management in most organizations. As part of the database lifecycle, It is starting by someone that these asking to create a new database instance. It can be a developer that is distinct, some application of DevOps engineer that would like to implement a new application in production system. Those stakeholders require databases for a variety of use cases. It is typically a long internal process in most enterprise organizations. If some internal team wants to have a new database, they need to open a change request to get the approval and budgets. Once approved, they will open a trouble ticket requests to the IT group to get the needed IT resources for that new database. The database must run on some physical virtualized hardware. And once the IT resources are approved an allocated, they can open another trouble ticket to the DBA team to configure the needed database instance on the allocated IT resources at the end of the cycle, which can take days or even weeks, the end-users will hopefully get the new database instance. It's a labor intensive, time-consuming process, meaning multiples manual steps with multiple people in the process. Just think about the developer waiting a few days for a database to test the new feature. This process can slow down the agility of different teams inside organizations. Once the database instance is created, it is also requires manual management and monitoring. Throughout its life cycle, it is the job of a database administrator to tune and optimize the database according to the application workload. Also, the database admin ensured the handle database security, Database replication database software updates, and think about a large organization. We will find hundreds of database instances used for many types of obligation and multiple database administration. People should manage all those databases, which is another override to take into account. Another issue is utilization. In many cases, the allocated IT resources for a specific database instance are static and designed to handle a peak load. So whenever aged utilization of IT resources for many database instances will not be optimized. Think about the situation of a developer that asked the IT team for specific database size. And that's it. It will be static for a long time, even though some resources for that database are underutilized, okay? And users are complaining about performance issue, meaning over utilized situation, and not own underutilized resources. Either utilize resources in one database is not the problem, but it is becoming a problem when handling hundreds of databases. Now if we zoom out for a second, then we can see that over the last ten or 15 years, many enterprise applications that used to run in on-premise data centers moved to public clouds while using a variety of cloud based services. Let's say we are talking about a large company, like a bank, okay? This bank can lift and shift existing applications to the cloud using the well-known infrastructure as a service model, instead of allocating IT resources in their banks private datacenter, the bank will allocate them inside some public cloud provider. Now the application and the database component running in a public cloud environment with the benefits of public Clouds. However, in the context of databases management, the bank or customer is still handling that day by the database management task. Keep them up and running, handle network issues, monitor performance and security, and scale the system is needed. It's still the same problem. Unfortunately, this, say lift and shift migration strategy to the cloud is not enough to solve the database management challenges I just did described. Therefore, public cloud providers are providing a different approach to consider. The next cloud-based service alternative to address those challenges is to use fully managed cloud database services. And from this point, I would like to zoom on this topic. 24. The Concept of DBaaS: Let's talk about the concept of databases is service. Database as a Service is a cloud computing service model that offers users, okay, like you and me, a flexible, scalable and on-demand platform to create a news database systems in the cloud. It is sitting between the platform as a service model and software as a service model, we can basically outsource some of our responsibilities of handling and managing databases to a third party, like a public cloud provider. The concept is to use a cloud service for venting databases, em, on a macro level. This is one of the fastest growing Cloud services in cloud computing because it has a real proven business value for many organizations. All the big public cloud players provide a variety of cloud services related to database as a service model. Later on doing the section, we will see the type. So in options available today by deleting cloud providers, they idea is straightforward but still very powerful. Let's provide the end users or customers the ability to quickly and easily create databases using an on-demand self-service platform. It is simplifying the database creation process by providing some abstraction layer and also automated process. A software developer will, will be able to login into some web UI and in a few clicks, select the Neither database properties and get the new database instance as a rented salaries, all the backend activities to create the new database will be fully automated from the end-user perspective. It is provided as a service. In addition to a self-service user interface, a public cloud provider will also support the options to allocate databases using APIs, okay, using rest course. Once a database has been deployed, the cloud providers takes care of almost everything. It will be responsible for the all back-end operation to maintain the database healthy, including backups, security patches, software upgrades, system monitoring, making sure that the database system remains available insecure 24 hours, seven days a week. By the way, the concept of a database as a service is not just related to a public cloud providers. There are a large companies that implement this concept in their private cloud by the IT department's, Okay, let's call it a private databases service. It helps them to centralized the database allocation and management task. It caused the enterprise level and be much faster and flexible. When end users would like to get new databases, the IT team can configure the databases service to support specific database software releases and restrict the configuration that specific users can provision. I'm going to focus on public database as a service provided there by cloud providers. Let's talk about the main advantages while using the database as a service model. In the next lecture. 25. The Advantages of DBaaS: Let's talk about the main advantages while using databases, surveys. The first one is the rapid on-demand provisioning of new databases. I mentioned that the manual process of deploying a new database is a slow, time-consuming task that involves several people with different skill sets. Someone needs to allocate the needed virtual resources like a virtual machine. And then someone else will install and configure the database software. Databases service improve this task by automating the end-to-end process and the outcome that they require time to set up a database by end users can be reduced to a form weeks, two minutes. The second one is about reduced operational over it. As soon as a new database instance is created, there are many administration task to take care of, like a configuration, performance tuning, monitoring, upgrading, creating backups, et cetera. All those tasks are time consuming and creating more overhead on the IT team and end users while using databases. Elvis, a big portion of that overhead is moved to someone else. So end users can focus on other important activities. Developers are going to be one of the main end users for databases service. They are using a variety of database technologies while developing and testing new features and capabilities. That is a constant pressure on developers to release software code much faster and reduce the development cycles. Database as a Service can directly impact those developers productivity because now they can create databases by themselves On-demand without waiting for anyone else, spin up a new database instance, check something for a couple of days and then deleted. This can be done with a simple API calls or using some few clicks on a web-based user interface. In a typical enterprise company, we will find many database instances based on different database technologies. Database as a Service provides a more consolidated and unified framework to manage and operate all those databases through automation. As the urbanization, the IT team can configure the relevant database options that end user can use later on while they provision database instances by themselves. It helps the IT team ensure that these databases follow best practices and operate safely and securely. The next one is about cost saving. And here we need to be careful because it's a case by case. Database as a service can be rented as a service from a public cloud provider or the organization can build such capabilities in-house. With a private cloud. Still, most of the market use cases are about using databases service in a public cloud. So let's talk about it in the, in that context. Assuming we decided to use databases service from a public cloud provider, we don't need to purchase software and hardware for our databases, which can be substantial cost. Secondly, we can manage those databases with less manpower because many of the database maintenance task are outsourced to the cloud provider. Cloud computing, everything is measured down to hours, even, in even seconds, because we pay to cloud providers based on consumptions. It is also true when we use databases service, the cloud system will provide us granola metering of our database service consumptions. Then we can use this information to check and optimize the cost of running a specific application in a cloud environment. Scalability is probably one of the biggest advantages of databases service, making sure that the database resources I'll optimized to the performance load is a complex task. Experts are spending a lot of efforts to tune and optimize the database architecture to keep up with the traffic load. Today, there is a strong focus and the mend for horizontal scaling for large workload, okay, meaning scale out, which is a very complex and time-consuming task if you are operating your database by yourself in that context, a Database as a Service allows for on-demand scalability. We can scale up or down, scale out, or even with simple interface or completely automate this process. Assuming we are talking about the public database as a service, we don't need to endl any issue related to the underlying IT resources. And the public cloud provider has a massive amount of available capacity to scale the database instances is needed with the ability to distribute the database in different geographic locations. By the way, It can also be translated to a saving costs because we can reduce the database size at any given time during non peak usage times to save a cost. And the last main advantage is high availability, which is an important factor for production systems. A typical Cloud provider will have many datacenters located in different location worldwide. And by using a database or service, we can use this infrastructure to increase our database availability. The database can be distributed in different geographic locations. The data will be replicated to different data centers and the database computing resources will be deployed in different places to increase redundancy. In the event of a failure, the cloud system automatically reroute traffic to a replica, okay, in showing system availability, old times, by the way, the distributed database architecture in different geographic location will helps to reduce network latency for end users because the database a node are sitting closer to the end users. As you can see, there are many interesting advantages to consider while using databases, surveys from a public cloud provider. From this point, let's review the portfolio of the leading public cloud provider in the context of a database and service. A meaning we'll talk about Microsoft or x2, Amazon, AWS, and Google Cloud. 26. Azure - DBaaS Portfolio Review: Let's start with the Microsoft ASU website and review the options related to database as a service. Just keep in mind that they are continuously changing the look and feel of this website. So when you will open it, it may look slightly different, but I assume that 95% will be the same. And the products, we have a dedicated category called a databases. It is our starting point and I will get a web page dedicated to all types of database services available in a zoo. The first day bill is a use case oriented list with the show description. If you want to do something, then what is the recommended cloud service? If I will scroll a little bit below to this table, I will get a different way to filter the services. The top line is the list of database products. And the rows represent features related to each database products, but the auditor based service. Let's start with the first row, okay, which is a quick filter for relational database. We have the Azuela SQL database, postgres SQL, MySQL, or Maria DB. Each one of them is based on a different database management system. But all of them are examples of databases, surveys. Let's open the AZO postgres SQL database. It's a fully managed relational database service based on the postgres SQL engine. We say high availability and 99.99% SLA. On this page will be able to explore more information about this specific cloud service, like features, pricing, the commutation case studies in mole. Let's go back and move to the next line. Non-relational database NoSQL, ASU provide one type of cloud database service code, Cosmos DB. It is a fully managed NoSQL database service, meaning it is another type of databases, Elvis. As a reminder, no SQL is a generic term for several categories. Looking at this line over here, data models, we can see they support a data models by Cosmos DB, it's a multimodal, no SQL database. And now let's open this link. It's a fully managed NoSQL database with 99.999% availability. The database is used for many types of use cases. And I would like to review one of them. For example, this one, real-time IoT device telemetry. Ok, looking at this diagram, we can see many building blocks that are part of the Azuela ecosystem. In the middle, the ASU Cosmos DB is used here to store a vehicle telemetry data, for example, data from sensors in cars. It is coming from here. It is the vehicle telemetry data sources that are collected by a zoo IoT Hub. Okay, another component in a zoo, any event coming in is triggering a by AZO function used to store the data in the Cosmos DB. All the rest of the lines here represent integration which systems that consumed the collected data like a real-time in operational reporting or machine-learning pipelines, et cetera. By the way, this kind of data ingestion, it's an example of stream processing, okay, because it is triggered by events. Okay, let's go back to the main table. The third line is an in-memory database called a zoo cashflow. Red is. Okay, let's click on that. It is also fully managed in memory and key-value datastore and in-memory database for several typical use cases. The first one is to speed up applications with a distributed cache. It's very popular for this use case. And by looking on that diagram, we can see the zoo cash for red is building block will sit between the databases and the applications. And other use case is to store session data for web application, okay, session data has a simple structure, so it is suitable for key value model. And secondly, it requires fast axis and therefore the in-memory database would be a great match. It's a combination of an in-memory database in a key value database. And the last interesting use case is to use it as a message broker. It will be used to route messages between different building blocks inside a complex application. Okay, that's about database services in Microsoft or x2. Let's move to AWS. 27. AWS - DBaaS Portfolio Review: Our next review is about Amazon AWS, which has the broadest range of database services. Again, I will go to the product's main menu and select databases to get a dedicated webpage about all options available in AWS. This table is a nice summary of available database services based on database technology. The first column is the database type. We have a relational key-value in-memory document wide column, etcetera. Okay, then the second column is a show description of day use cases. For example, four, a document database they use cases are content management catalogs, user profiles. At the last column is the service name in AWS. Sometimes it is straightforward like Amazon Document DB. And sometimes the names is not helping to understand the nature of the database. Like in the relational category, we have Amazon or oa, okay, which is a sexy marketing name, but we cannot understand what is the meaning behind that. Before drilling into those database services, I would like to show you the Lebanon section over here. Aws is doing a great job with education. You can find here some great tutorials and training courses. Under the Getting Started tutorials, you will find step-by-step guides to get you started if you'd like to use specific service a, just select the relevant database service you would like to use. Am if you want to go deeper, okay, consider reviewing their courses under the training courses by use cases. It is a couple of hours of training per each topic, or select a training course by a service name, okay, like Amazon Document DB. Ok, let's go back to the list of services and start with the first category, meaning relational database. The first one is Amazon RDS, meaning Amazon relational database service. It is an umbrella name for managed a database services, meaning AWS is handling the routine database task such as provisioning, pitching, backup, recovery, failure detection, and repel. It can be used with a variety of SQL engines, okay, like Amazon's, ROA, postgres, SQL, MySql, MariaDB, Oracle database, Microsoft SQL Server. Let's go back to the list of services. The next one is Amazon oa. Okay? Amazon OA is the new SQL engine developed by Amazon, which is compatible with applications that are already using popular SQL engine like MySQL and Postgres. But it's submitted distributed SQL database service. This is a huge advantage because developer will not need to change the application code while replacing, for example, a MySQL database, wheat, Amazon, Noah. The main thing that is new with this database service is a distributed SQL engine designed for the cloud, okay, according to Amazon, it is much faster compared to running on MySQL or postgres SQL engines. Amazon, all I is offered as a fully managed service using the Amazon RDS service that we just reviewed. It is an attractive options for enterprise customers would like to replace existing databases like Oracle database, use a cost and get more performance with minimal migration of efforts. One thing to consider is that Amazon nova can scale out full leading queries, but it cannot scale out for writing queries. It must use a single write a master instance. I mean, without getting into too much details. But the bottom line is that it is not an entirely perfect distributed database. But on the other hand, it has compatibility with either SQL engines. I'm going back to the main list. Amazon Redshift is a data warehouse solution, okay, it's a no SQL technology. One of the most popular cloud-based data warehouse service. It is used to store and manage structured and semi-structured data for a variety of use cases like Business Intelligence, say predictive analysis, et cetera. We have a, a key value database service called Amazon DynamoDB. We have in-memory database service called Amazon ElastiCache. We two options, a document database called Amazon Document DB, et cetera. One unique database type that we didn't. So in a zoo is a time series database service called Amazon times stream. It is used for IoT application when collecting sensor data events, collecting metrics to monitor servers, et cetera. 28. GCP - DBaaS Portfolio Review: Moving next to Google Cloud, I will select products. And then they the bases. Here we have a nice table very similar to AWS with the database type, Google Cloud service name, and use case examples. The first relevant option is CloudSQL and umbrella name phone managed relational database service using the popular engines like MySQL, PostgreSQL, and SQL Server, similar to Amazon RDS, but there are specific features that are supported or not supported by Reach option, something that we need to check a case-by-case. The next one is Cloud Spanner, which is a Google solution for distributed SQL. It is a fully managed relational database with automatic software updates, an instance optimization. It is almost unlimited scale with, while supporting horizontal scalability with a distributed architecture, it has strong consistency and up to 99.999% availability. Cloud Spanner is sitting in the same category as Amazon OR meaning distributed SQL database service, but they are different in multiple aspects. For example, Cloud Spanner has its own SQL syntax, meaning any application that would like to leverage spinel will have to be Everly rewritten. On the other end, Amazon of OA is supporting out-of-the-box popular SQL edges. So if you, if the application is using such SQL H n, it will be easier to integrate that into Amazon OA. Cloud Spanner is a true distributed SQL meaning for reading as well as for writing. Unlike Amazon novella, which can only support distributed SQL for reading data. Okay, so again, it's a balance between options. Cloud Bigtable is a fully managed key-value NoSQL database service. It's a very popular service. It is used for large analytical in operational workload. It has up to 99.99% high availability. It is used for storing very large amount of data in a key value store and supports a high rate and write throughput at low latency for fast access for large amount of data, it is similar to Amazon DynamoDB. The next one is file store. It's a fully managed document, no SQL database service similar to Amazon Document DB. Okay. Perfect for web, mobile web and IoT application for specific use case memory stall is an in-memory database service similar to Amazon Elastic cache. It is being used for caching data for gaming applications, so shall chats, etc. Okay, that's about the main database as a service options in Google Cloud.