Data Science 2021 - The ONLY 'Data Science using Python' Starter Course You Need in 2021. | Python Profits | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Data Science 2021 - The ONLY 'Data Science using Python' Starter Course You Need in 2021.

teacher avatar Python Profits, Master Python & Accelerate Your Profits

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

60 Lessons (6h 46m)
    • 1. Course intro

    • 2. Software requirements

    • 3. How to make the most

    • 4. Big Picture - Why Python

    • 5. Python installation with Anaconda

    • 6. Basic scripts using IDLE

    • 7. Jupyter Notebooks

    • 8. Python modules

    • 9. Overview of Google colab and github

    • 10. Python Setup Quiz

    • 11. Control structures

    • 12. Functions

    • 13. Lists and tuples

    • 14. Variables and literals

    • 15. Data types - Integers, Floats etc

    • 16. Operators and booleans

    • 17. Classes

    • 18. Errors and exception handling

    • 19. Basic algorithms

    • 20. Generators

    • 21. Advanced lists

    • 22. Memory management

    • 23. Advanced classes inheritance and polymorphism

    • 24. Images, PDFs, Spreadsheets

    • 25. Exercise 1 + Solution

    • 26. Exercise 2 + Solution

    • 27. Python basics Quiz

    • 28. Numpy Motivation

    • 29. Arrays

    • 30. Matrices

    • 31. Random number generation

    • 32. Statistical analysis and computation

    • 33. Linear algebra

    • 34. Interpolation

    • 35. Linear regression from scratch using numpy

    • 36. Neural network from scratch in numpy

    • 37. Vectorization

    • 38. Boolean indexing

    • 39. Exercise 1

    • 40. Exercise 2

    • 41. Quiz

    • 42. Pandas Motivation

    • 43. Pandas intro Series and DataFrame

    • 44. Statistics on DataFrames

    • 45. Slicing DataFrames

    • 46. GroupBy and pivot

    • 47. Functions on DataFrames

    • 48. Advanced visualizations

    • 49. Exercise 1 + solution

    • 50. Missing data

    • 51. Merging or joining and concatenating

    • 52. Data export - HTML and SQL

    • 53. Exercise 2 + solution

    • 54. Pandas Quiz

    • 55. Big Secret#1 - Python secrets

    • 56. Big Secret#2 - Numpy secrets

    • 57. Big Secret#3 - Pandas secrets

    • 58. Capstone project with Solution

    • 59. Tips and tricks

    • 60. FAQs

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

Why Data Science?

According to statistics, the demand for data scientists is growing at an astronomical rate.

The Bureau of Labor Statistics estimated that there’ll be around 11.5 million new job opportunities for data scientists by 2026.

But, the number of experts in this industry is insufficient. Thus, a lot of companies are offering high salary compensations and packages to hire the best prospects out there.

This means that by simply learning this skill, you’ve already opened your door for tremendous opportunities.

Your True Journey Of Becoming a Data Scientist Starts NOW! - The ONLY Starter course you will ever need to take your data science knowledge to the next level!

You also get these exciting FREE bonuses !

Bonus #1: 3 Big Insider Secrets
These are industry secrets that most experts don’t share without getting paid for thousands of dollars. These include how they successfully debug and fix projects that are usually dead-end, or how they successfully launch a machine-learning program.

Bonus #2: 5 Advanced Lessons
We will teach you the advanced lessons that are not included in most machine learning courses out there. It contains shortcuts and programming “hacks” that will make your life as a machine learning developer easier.

Bonus #3: Solved Capstone Project
You will be given access to apply your new-found knowledge through the capstone project. This ensures that both your mind and body will remember all the things that you’ve learned. After all, experience is the best teacher.

Bonus #4: 20+ Jupyter Code Notebooks 
You’ll be able to download files that contain live codes, narrative text, numerical simulations, visualizations, and equations that you most experts use to create their own projects. This can help you come up with better codes that you can use to innovate within this industry.

Meet Your Teacher

Teacher Profile Image

Python Profits

Master Python & Accelerate Your Profits


We are Python Profits, who have a goal to help people like you become more prepared for future opportunities in Data Science using Python.

The amount of data collected by businesses exploded in the past 20 years. But, the human skills to study and decode them have not caught up with that speed.

It is our goal to make sure that we are not left behind in terms of analyzing these pieces of information for our future.

This is why throughout the years, we’ve studied methods and hired experts in Data Science to create training courses that will help those who seek the power to become better in this field.

See full profile

Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. Course intro: Welcome to the Python basics for machine learning, data science and analysis course. We will start this introductory module with the plan and a roadmap for this course. First of all, let me tell you a bit about myself. Most importantly, I really hope that I am able to teach you and inspire confidence in you as that is what a teacher and an instructor should do the best. I have over seven years of teaching experience at the high school and college levels. And I have also done various online courses similar to this one. I have a PhD in machine learning that has made me use Python mostly for developing machine learning algorithms that can be applied to various other fields. In order to solve important problems in those fields. I also love to explore all things tech related, and I'm very passionate about computers and programming languages in particular. And I hope that you share some of these passions in c or here. The target audience for this course is anyone who is interested in studying Python basics, whether or not they've had experience in other programming languages before or not. Also, people who are interested in studying Python libraries that will be useful later on in case you want to pick up machine learning and data science. The libraries that we will cover. Very, very important for these fields. These fields can basically not exist without them, and you cannot work in these fields without knowing the libraries that we will be covering quite extensively in this course. If you like coding, even if you haven't done it before, please feel free to stick around. And I'm sure there is a lot that you will learn and that you will find very interesting. I will do my best to make it that way. The emphasis in this course is on practicality. This means that we will only cover the theory that is absolutely necessary for understanding something. Otherwise, we will focus on getting things done and understanding how to use the right tools for that. In order to do this, we will generally favor libraries that focused on ease of use, accomplishing things fast, and that are used for developing real-world applications. It's very important to us that the content we provide has real-world implications and that you are able to use it for developing real-world applications of your own or for working in a job that requires you to do that. Here is the methodology that we will be using. So first of all, we will focus on how, when and why. You might have noticed that we've already covered how, when and why in relation to this course overview, which we've talked about, what we'll be teaching, why, a bit about how, and now we're talking more about the how. This will be done throughout the course for every topic and concepts so that you have a clear picture about how to do it, to do it. And perhaps most importantly, why you might want to do it in the first place. We will also be using a lot of quizzes. It's important to test your understanding of something. So we will have regular quizzes for you to make sure that you're following along. They will all be explained if you ever get stuck. There will also be plenty of exercises. It's also important to put what you learn to good use outside the classroom. The exercises we have prepared for you will help you do just that baby as similar as possible to what you might encounter in the industry. Not be discouraged though, as you won't have to spend hours on each one. Tutorial hell is when you start just following tutorials about a technology or concept, but still don't have a clear idea about what to actually do with what you're learning will definitely want to avoid that by always pointing out things that you can build on your own. And here is the roadmap that we will be taking in this course. So first of all, we will start with the basics of Python. Starting from the installation, the various development environments available to you, and some basic coding in Python. Then we will move on to NumPy and how it can be used to accomplish the necessary mat for machine learning and other similar fields. After which we will move to Pandas, which is used for working with data. Think of Pandas like the axle of the programming world. We will also have plenty of bonuses and, and interesting exercises for you. So without any further ado, let's get started. 2. Software requirements: In this video, we are going to talk about the software requirements for this course. The supported operating systems while we will be using Windows, but you don't have to use Windows. You can follow along on any operating systems such as Linux or Mac OS. But we will be using Windows. So all of the examples we'll be in windows. However, there's not much difference other than the specific steps necessary to install the software. After that, the code remains the same across all the operating systems. And once you install the software, it works the same across all of them as well. And another good news is that the necessary software is just Python. So we will only have to install a single thing. After that, most things will look the same on any operating system. However, don't go install Python just yet because we will go into detail about how exactly to do that in future videos. So just know that you can use any operating system to follow along. 3. How to make the most: In this video, we are going to talk about how to make the most of this course. First of all, treat this as an online course, which it is first tried to watch it and understand things at the video level. So treat it like you would treat a TV show where you would watch each episode and just enjoy it without thinking too much about what's going on. You should do that in my opinion here as well. Watch each video as it would be a TV show episode. Then after each video, we watched that video and only then try to follow on. Ok. OK. So first of all, you watch it relaxed and maybe you understand some things and maybe some things aren't that clear. And after that, you should re-watch the video, pause it, and try to follow along with the code that we wrote, that we're writing. Try to write some code in advance, see if you get it right. Then we write it like we've written it, and make sure that you understand what everything does in detail. And you might need to do this multiple times. Take your time, do it as long as you need to, and make sure that you don't move on from a video without fully understanding that video. Next, I would strongly suggest that you delay watching the exercise solutions as, for as long as possible. I know it can get frustrating. Sometimes I've went for weeks or months without knowing how to solve a problem. But the only way to really understand things is by figuring them out on your own, even if it takes him a long time. So do your best to figure the exercises on your own. They are not meant to be very difficult, but they are meant to ensure that you have a proper grasp of the things we are discussing. And last but not least, you should tinker with the code files that we provide for you. Change stuff around and see what happens. And before doing whatever change you're thinking of trying to guess what that is before running the code. So basically tried to set some exercises for yourself. Another interesting exercise that you can set for yourself is to break stuff and then try to fix it. Maybe delete a random line of code, or just rename some stuff, or just make some change that you are pretty sure that will weighting things, but you're not sure exactly how. Then go debug the code, see exactly how you've broken it and put it back together. And another thing you can do is try to make our code better, use fewer lines of code or code or maybe make it cleaner. That's another sign of a proper and deep understanding of a piece of code. When something jumps out at you in a way that you don't exactly like. And because it's necessarily wrong or inefficient, but maybe you just don't like how something was named or how some problem was approached. And you think you can do better, or at least you think you can do it in a way that you are more comfortable with. And you should definitely do that and ensure that what you've done still works. And of course, this is a great exercise that will improve your understanding of things. 4. Big Picture - Why Python: In this video, we are going to talk about the big picture. Why do we choose Python? We choose Python basically because it has a huge ecosystem, which means that there are many libraries available which will make life easier because a lot of the hard stuff has already been implemented in these libraries. So we just have to use them to build nice things. Think of libraries like a Lego blocks, right? So you put the Lego blocks together and you get something nice. Putting the blocks together has its own difficulties of course. But it's still much easier than if you had to build each piece yourself. Let's say if you had to 3D print it yourself. There is also a very big community support. So if something goes wrong with the library or we're not sure how to get it working. We can ask or we can look what other people have asked. And it's very likely that whatever question We have already has an answer. And if it doesn't, if we're very unlucky, asking it ourselves will likely give us an answer. Another important thing is that it's not looking like Python is going anywhere, anytime soon. It's here to stay. It's under a constant development. New versions are released every year or so. That's also a good thing and it indicates that Python is a healthy programming language that is here to stay. Also, there is a great machine learning and data science focus in Python. A lot of scientists and universities use it for their research, and a lot of companies use it for implementing the machine-learning pipelines as well. Because of this, there is also a lot of support from big players. So a lot of library developers are from Google, Facebook and other such companies like maybe Microsoft, and also from important universities like MIT and Caltech and Stanford and so on. And last but not least, Python is very easy to use. It hides the things that we don't care about, like some memory management and a few other things that other programming languages need us to specify in more detail. In Python, we don't need to do that. And because of this, it's easy to pick up even if you don't have programming experience, which is why you are very welcome to follow along if you have never programmed before. 5. Python installation with Anaconda: In this video, we are going to talk about installing Python. And we're not going to be installing Python directly. We are going to use Anaconda, which is a manager for Python. It will allow us to run multiple Python versions at once. And it will come with some important libraries pre-installed. So basically it will make things easier. So in order to install Python with Anaconda, go to, go to products individually edition, download, and pick the latest installer. Probably you want to 64-bit one, most likely the one corresponding to your operating system. Right? Save it wherever you want. I've already done this and it takes a while to download. Once it downloads, run it, and just to keep everything as it is, except all the default settings and install everything it tells you to. It also prompts you to install PyCharm. So make sure you do that as well. And once all of this is done, you will have something that looks like this. A 100 recently added will have anaconda prompt Jupiter notebook. And I think you will have yes Spider, you might not have kite, but that's fine. So if you have these on your start menu or the equivalent for your operating system, you are good to go. And just to make sure that everything is working right. One, this one and a conduct prompt. And you should see something like this. And type here. Idle. So that is I d e enter and something like this should pop up. This is a console of sorts for Python where we can type commands and they will be executed, for example, x equals two. X plus two. We get the results, right? So we set x equal to two, then the result of the expression x plus two will be four. So if this is all working with Python was installed successfully and we'll go into how to use it and what other programs were installed in future videos. 6. Basic scripts using IDLE: In this video, we are going to continue where we left off in the latest video. And if you recall, we type this x equals two and x plus two in Idaho, and we got this result. Well, there are a bunch of Python instructions that you can type in here and get their results. For example, let's say 45 plus 12, okay? And you get the result. And if you want to repeat the expression, you just go up. Arrow keys, hit enter and hit enter again. And you can also edit the expression. So for example, we can have a minus here. Okay? Let's do a multiplication. And let's do a division. For division, you want use this column like you would on paper, or how you've been taught to do it in math class. You will use a slash. Okay? And you see you get the result. It's 3.75 in this case. And another important operator is the double star, and this means power off. So this will compute 45 to the power of 12, which is this huge number. But of course, maybe you don't believe me that this is result. So you can test it out with something simpler that you know the answer to like three to the power of two. Or maybe a 100 to the power of 0.5, which is the square root of 100. And of course we get ten. And there are many things you can write here. We're not going to go into details just yet. We will cover most of Python's instructions in later videos. I just want to show you now how you can use idle as basically a calculator and as something where you can do quick tests of instructions. Another thing you can do is you can combine these operator so you can do something like two plus three in parentheses, raised to the power of three. Okay, and this is a 125. You can use lists as well. By typing in numbers between square brackets. This would be a list, right? And I don't want to go into too many details about what we can do with lists just yet. Now, this console type Program is very good if all you want to do is test out some quick expression. But what it, if you have multiple expressions? Well in that case, you would go to File, New File. And it would open this editor here. And notice that this looks kind of like Notepad, right? So this one here looks kind of like a terminal. And this one looks kind of like Notepad. So let's save this. You can save it by going to File Save. Or you can just hit the shortcut on your keyboard control plus s. So I'm going to use the shortcut now. Okay, and you can save it wherever I am. Just going to save it here and call it temp. Okay? And here we can write instructions just like we did previously. So let's say we are going to write x equals 20-20. And we wanted to put into the square root of x. Well, if we run the code as it is here, you see nothing happens. In order to print results, you need to use a special instruction, and that is the print instructions. So write print open parenthesis and say the square root of x equals comma x. And if we run the code now, we get the result here. The square root of x is equal to 20-20. But of course this is wrong, right? Because the square root of 20-20 cannot be 20-20. So if you recall previously, for the square root, we raised the number to the power 0.5. So let's do that here as well. Okay, run the code again. You run it with F5. I think I forgot to mention. Or you can go here and run module. The square root of x is this number here, which you can double-check with your own calculator case. So let's talk a bit about point, just very generally. So print takes a string. So a string is just a sequence of characters. And strings in Python are denoted by these apostrophes. So think of this pair of apostrophe is like a pair of parentheses. When you open one, you also have to close it. And if you want, you can also use these quotation marks. They do the same thing, but make sure not to mix them because that can be confusing. So why do we use them like this or use strings by using apostrophes? The only good place where you might want to mix them is if you wanted to print one of the others. So if I wanted to do here x, and in this case, if I run the code C, they get displayed here, then I am forced to use quotation marks here. And if I want to point quotation marks, you can see that this method of the string, it's not going anymore. So in that case, I am forced to use apostrophes here. All right, so that's it with idol. I just wanted to give you a very, very quick tour. Like I said, we will go into more details about Python in later videos. So don't worry if things aren't fully clear yet. 7. Jupyter Notebooks: In this video, we are going to talk about another development environment that Anaconda has installed. If you are paying close attention in the previous videos, you have probably noticed this Jupiter notebook, Anaconda tree item in your start menu. Let's click on that and see what that does. Is it opens this terminal window and it shows some text. And then it should open a browser and bring it to something that looks like this. Of course, your folders might definitely be a bit differently depending on what you have on your PC. So go to new here in the upper right corner. And click Python tree. And it should open something that looks a bit like this. And what we can do here is write Python code again. So let's say x equals two plus two and print x and we can run this cell. This is called a code cell by clicking around here. And you see we get the results here. And this is called a notebook. And like a notebook, you see, you write stuff and you get the results. So it resembles a physical notebook quite well. And we can also do stuff like go to insert cell above and then go to cell, cell type and click Markdown here. And now we can write text here and format it a bit. So if we do hashtag, title and run this cell or we can do control enter to run it with your keyboard. You'll see it writes a title in big letters. The more hashtags you put. So let's say we put two seats a bit smaller. We put tree, it's even smaller, and so on. And this is kind of like in the middle between idle and something like PyCharm. And remember that we said that idol is only good for one to May 2 be files at most and for running quick instructions to test them out. And PyCharm is better for bigger projects with a lot of files, even hundreds of files. Now, this is in the middle. It's good when you have quite a bit of code, but you don't quite have tens or hundreds of files. And in cases where splitting code in multiple cells as opposed to many files is good enough. And this is used very much in the machine learning community and the AI community in general. Because experiments in those fields do not require writing a lot of code compared to something like web application or a desktop publication. So the code you write for a machine learning experiment or to test out some algorithms and theories is not that much. It's quite literal, in fact, may be a few hundreds line, maybe a few 100 lines of code. So for that, Jupiter is, is more than enough. And of course, the code that we have previously can also be used here. So we can do something here like y is equal to x to the power of three. If we run this, okay, nothing happens because we haven't done anything with y, but we can prevent it in the next cell that appeared around this. And we get 64 because four to the power of three is 64. So we can run everything sequentially. We can organize things quite well. And it presents like a notebook. It can make things very easy to understand. So this is also a very good development environments to keep in mind. Another advantage is that it runs in your browser. So if you have it installed on a server, you can connect to the server remotely and run code on the server just as you would on your own PC. Because the, because the interface runs in your browser. And in order to close it, you can simply close this terminal window. And you can see here we get an error. It even says connection failed. So now all you have to do is close this two tabs and you are done. Everything is cleaned up. So that's it for Jupiter Notebooks. 8. Python modules: In Python, all of the libraries that we are going to work with are made up of multiple things called modules. We've already used modules actually, these two files that we created in pie chart in one of the previous videos, also called the modules. So basically, when you hear the word module, it means a Python file. Okay? And these are basically the Lego blocks that make up a library. And we're going to make TNC views of various libraries in the future videos. So I just want you to know that if you hear this terminology, a module, it just means a Python file that contains various functions that we can use. So functions that other people have written and that we can simply make use of. We can basically use their code to accomplish our objectives so that we write fewer lines of code ourselves. 9. Overview of Google colab and github: So far we've seen idle, PyCharm and Jupiter notebooks. But the disadvantage of all these three is that you have to install something in order to get them, right. You have to installing our case Anaconda, which got us all to eat. But what if I don't know, you have a new laptop and you have to get something done really fast. You have to test something out. Or at some someone else's PC and they don't have anything installed and you can't install anything. Or maybe on a restricted us cool or walk laptop and which you can't install much. So in that case, Google comes to the rescue. If you go on your Google Drive, right? If you right-click here, you have a bunch of things that you can create that may be already familiar with, like Google Docs, Google sheets. These are basically equivalent to Microsoft offices. What documents, excel documents and PowerPoint presentations and so on. But if you go here to more, they also have an equivalent for Jupiter notebooks. And it's called Google cola, bought by Google Codelab for short, even shorter you might hear it referred to as colab. So if you click this, it takes awhile until things loaded up. But it shouldn't take more than a few seconds. Okay, it's done loading. And if you take a look at this, it should resemble the Jupiter notebooks that we talked about, right? We have the cells, we can write code in them, let's say x equals two and we can run them. I run this with control enter. You can also click the play button. And the first time you run it, it might take a bit longer because it needs to do some initializations. Okay, I think it's done. Let's try again. Control enter. Okay, it did run the cell, but there is nothing to outputs, so nothing happens. If you want something to happen, let's do for int x times two and run it again. And there we go, we get the result. And we can add cells with plus code here and texts cells with plus text here and here we can. We even get to this nice editor which we didn't in Jupiter Notebooks. All right, so let's say I've ended the text cell, which shift enter, okay, and so you can see it prints hello. So you can basically use this as an online version, a strictly online version of Jupiter Notebooks. And you can use this wherever you have access to a web browser and to the internet. And another thing I want to mention in this video is GitHub and Git in particular, you might hear about this and you definitely will hear about this a lot. And GitHub is the place where source code for various Python libraries and various software in general is stored by the development teams. Okay? It, it uses Git, which is a versioning software. And I'm not going to go into details on version control software. But just know that this is where you can find various discussions about bugs. For example, by going two issues for the project you're interested in. About two. This is also where you can find the latest versions, right? If you want. Absolute cutting edge who builds, they might not be released on some projects website right away. That you might be able to get them by getting the code from GitHub and compiling it. This is also, you can find various data repositories, right? So if we want to make use of some data, we might only find that data on GitHub in some cases. So it's good to be aware of it and get in particular if you want to learn more about Git and GitHub and how they work. I suggest you look them up, you look up these keywords. Maybe Google phone, good tutorial, and read a bit about it. But don't go into too much depth it because it won't be very useful to you. Just yet. We will make only superficial use of this if at all. And the example I have here is the GitHub page for Scikit Learn, which is a very popular machine learning library for Python. And we can see that it's quite active. Some changes were done only a few hours ago while I'm recording this video. It also links to the websites to documentation. So it's, it's definitely very helpful to VDD a project's GitHub page. As even if you are not directly interested in the source code, you might find some very helpful and interesting links on it. 10. Python Setup Quiz: In this video, you are going to take a first Python quiz in this course. Don't be scared because it only has two questions. It's just a warm-up quiz. We didn't even cover that much material, so there's not much to do a quiz firm. So here is how quizzes are going to work in this course. You're going to get a question, this question, for example, and then some options. Then I will pause the video and you should also pause it when I say pause the video here. And I tried to answer the question, then resume the video and see if your answer is the correct one because I will then reveal the correct answer. So let's do that now. So which of these has an equivalent app from Google? Idle? Jupiter Notebooks? Well, a pie chart. Also. The correct answers can be one of these, two of these or all of these. Ok, so pause the video here. I will also make a two or 3 second pause, and then I will go to the next slide revealing the correct answer. So the correct answer is Jupiter notebooks. We've talked about this, and Jupiter notebooks have an equivalent in Google Drive called Google colab collab, which works the same and it looks very similar. Next question is, what is a Python module? A Python file that we can use in our apps, a development environment, or a machine learning library? The correct answer is a Python file that we can use in our apps. Python files in a project are also called modules. You will hear this name a lot, and they mean the same thing in the context of Python development. So that's it for the quiz. Congratulations if you got it right, and I hope you're ready for more challenging material up ahead. 11. Control structures: In this video, we are going to start off the next section or module of our course called Python basics. And while we've already discussed some of the basics of Python, like the basic data types, how to declare integers, strings, and how to do basic mathematical operations such as additions and multiplications. We're going to continue with some other language features that are going to be indispensable in future videos. So if you don't remember how to around Jupiter Notebooks, which is what we'll be using for the following videos. You have to go to your start menu and open this Jupiter notebook Anaconda tree. And if you don't have it here anymore, simply type it like this. And it should show up amongst the first results here. So click it and it will open a browser and take you to something that looks like this. But you most likely won't have this folder, so you will have to create it. And I do recommend that you created so that you have the same folder structure that I do. So it's easier for you to follow along. So in order to create a folder, go to new click folder. And it creates it with the name untitled folder. So click on it and click Rename and give it whatever name you want. I suggest you give it the same name that I did. And that is Python basics. So I already have the folder, so I'm going to delete this new folder now. And instead I'm going to create a notebook. So go to New. And click Python tree here. Okay, let's give this notebook a title as well. And I'm going to call it one control structures. Because in this video we are going to talk about control structures. And I'll tell you what those are in a few seconds. But until then, I want to talk about why we are going to use Jupyter Notebooks for this section. And actually for most of this course, it's because the of code we'll be writing is not going to be very complex code. And the organization structure of Jupiter Notebooks will allow me to explain it better and to understand it better. So that's why we're going to use Jupyter notebooks. We're also going to write the titles here that will tell you what each collection of cells in the code does so that you are able to find things faster and that you are able to compartmentalize the concepts I'm talking about. But don't worry if you don't understand the things. I'll be talking about fully in this section because they will become a lot more obvious once we use them in more realistic applications later on in the course. Right now, in this section, I just want you to get a scratching the surface level of understanding. So if I'm able to do that, I'm happy and you should be too. So first of all, let's set this first sells type to mark down. And we're going to title it the same. So one control structures. And if we run this shift enter, we get this nice title. You can do this too. I recommend you do it so that the things you are learning better structured. You don't have to. So from now on, I'm not going to do it on video either. I'm going to I'm going to most likely start the next videos with the titles already written. And on video, I'm just going to write the code and explain that, but I just want to familiarize yourself with Jupiter notebooks better in this video. That's why I'm doing everything on video this time. Okay? So before I give you some definition of control structures, I'm going to start with an example. So let's say we have x equals to five. And we want to do different things depending on the value of x. Ok, so let's say we don't know what x is. Pretend we didn't just try it. X equals five here. So if we wanted to check if, let's say x is lower than five. We would write this if x lower than five. And this colon here is very important. Also noticed that if we hit enter here, there is a space here. And in fact it's not just the space, it's a dab. So if we hit our backspace key and then hit Tab C, it puts us in the same position. So in this case, let's bring x lower five. Now let's say we want to do something else. If x is equal to five, in that case, we will draw it EL IF, which is read as else, if, even if it's just one word here. In this case, let's go in x equal, equal five. So for equality, we always use two equal signs because one equals signs means assignment or takes the value of, ok, so this is read as x is assigned five. And this is read as else if x is equal to five and we have else going x larger than five. And if we run this cell, of course, it twins this because this condition is the true one. If we set x, x equal to three and run this, this gets printed. And of course, if we set it to something like eight or anything larger than five, this part is printed. So these are also called conditional structures or conditionals. Okay? Another type of control structures is what is called iterative structures. And this involves using a for loop. So let's say for i in range of 5 i. And if we run this, it basically prints every number from 0 to four. So range of n will give you a collection made up of the numbers 012, and so on, up until N minus one. Okay? Now, I know for some of you that might already know Python, you might be thinking but arranged doesn't give you a connection actually. And yes, that's true, but let's just consider a collection right now because that's easier to explain for beginners. And semantically there's not that much of a difference anyway. Okay? By the way, this hashtag means a comment, okay, So this will not be executed as code. And we also have a while loop in Python. So let's say we have x equals five and i equals to 0. And we can write while i is lower than x point i and i plus equals one. And if we run this, we get the same output as we did with the for loop. And what this does is it repeats these instructions while this condition is true. Okay, so we also have a conditional here, just like we did here. Plus equals one means that I will take the value of i plus one. So it is a shortcut for this. There are a few other control structures, but you don't want to complicate things too much just yet. So these are the most important ones and the ones that you will find used most often in this course and in Python in general. So I hope you have some level of understanding about them. We will be using them a lot and explaining things multiple times as we go along. So if you don't fully understand them yet, don't worry, that's perfectly fine. And feel free to continue watching the next videos. 12. Functions: In this video, we are going to talk about Python functions. So what's a function in Python? We've actually already seen at least gone function. I've shown you print. So print. Hi, and if we run this, you'll see it does something, it has an effect, right? So point is called a function. And hi is called an argument. Or you might also help you to refer to as a parameter, but strictly speaking, the corrector and an argument here. But, okay, we don't have to go into these academic discussions. If you want to call it a parameter, that's perfectly fine. We want to focus on the practical aspects anyway. Okay, so how can you, can we write our own functions and why might we want to do that? So first of all, let's talk about the how. How do we write our own function? Well, we do that using the def keyword def and give it a name. Usually you should give it a suggestive name. So someone reading the function should know what it does simply by reading the name, right? So when I read Clint, I think that it prints something to the screen and that's what printer does actually. So it's name is very good. It's very suggestive, it's very indicative of what it task. So my function. And here we can write the parameters that this function takes. And let's say it takes an x and the y. Okay? And let's say it returns two times x minus y. Ok, let's run this cell and you can see that nothing happens. Why does nothing happen? Why doesn't this gets printed? Well, this doesn't get printed because this is just a function definition. You have to call the function in order for it to actually be executed. And the way we call the function is by typing its name and giving it the necessary parameters. In this case, we need to pass in two values. And these must be integers because we have mathematical operations. So let us say 34. And if we run this, we get to, why do we get to? Because x and y here will be replaced by 34. So the output of this function will be two times three minus four, which is six minus four, which is two. And we can also store the result to a variable. Let's say 35 this time. And if we print x, we get one. Okay? So it's very important not to confuse the return statement with a print function call. Clint shows the number on the screen. Return causes that value to be output for someone else to save it somewhere, like we did here. We saved this return value to x, and only then did we print it. So that's the basics of functions I call. Things are a bit clearer now regarding what a function is and how to use them. Why must we use them? Well, because it makes organizing the code It's easier and better. And because we can reuse code c, I call the function one here and got an output that called it here. Again, if I need to use two times X minus Y many times in my program, it's a lot better to have it put in a function rather than having three times 32 times three minus four here, and two times three minus five here and so on. If I do it that way, things can get very confusing once I have a few 100 lines of code. So that's it for the y. 13. Lists and tuples: In this video, we are going to talk about lists and tuples. A list is a collection of multiple elements, and it is written between square brackets like these. So let's say we can have here 456. And if we put into this list, we get this. And we can access each element individually, like so, LSD of 0. And let's copy paste this one and t2 here. And if we run this cell, we get four or 56. So these are called indexes into the list. So the index start, the indexing of list elements start from 04 is the 0th element, five is the first element, six is the second element. Okay? So counting elements starts from 0. And this is true for most programming languages, not just Python. And it's very important that you keep this in mind. We can also change individual elements in lists like this. Lsd one equals, let's say 100. And now if we go into this 1 LSD one, we get 100. And if we put into the entire list, we get 4106. Because we changed the element at index one. You might hear people refer to this as the second element. When you hear that, consider that they might be referring to element index one because element index one is the second element from the left. Okay? Although like I just did, other people refer to it as the first element and they consider it to be index one in that case. So you have to rely on the contexts when you hear this terminology first, second, et cetera. Because it's not always clear if they mean the counting from 0 or counting from one. To avoid confusion, I suggest referring to them like the element at index 123 or 0 and so on. Because if you say the element at index, and it becomes much clearer that you are counting from 0 because everyone knows indexing starts at 0. We can print the number of elements in the list using the len function LAN. We have three elements here. And we can also iterate the elements in a list with a fall like this for Alam in lst, print lm. So in this case, a lamp will take in turn the value of each element in the list. Alright? Another important thing to know here is that lists do not have to contain elements of the same type. So we can also do LSD equals one. Let's say hello, some string. And let's say another list, 56. And if we printed this LSD, we don't get any error and it prints the list we've already done. So we can do this in Python. And if we print, let's say LSD of 21, what do you think this will print? Pause the video here, and I'll also pass for a couple of seconds to let you think about it. So this points six. Why? Because LSD of two, we'll get this element. And this element is a list. So this element of one will get the six. Now, tuples are the same except we write the elements in round brackets. Okay? And the only difference is that we cannot change the elements once we've created the topo. So we cannot do TTL of one equals 200 for example, right? We get this type error. It says tuple object does not support item assignment. So we cannot change the items or the elements once we've declared the tuple. Other than that, they behave exactly the same as lists. So that's it for our lists and tuples so far. There are few other things we can talk about, but I'll leave those for later. 14. Variables and literals: In this video, we are going to talk about variables and literals. If you already have some experience in programming, you might find it a bit odd that we are only talking about this in the fourth video. Well, that is because I feel like after being able to run a few Python instructions and seeing the results, It's a lot more likely to get a better understanding when diving into this terminology. So a variable is pretty much everything that we've given a name too so far. So when we said LSD equals 456 in the previous video, for example, about lists. When we said x equals two in the first few videos, when we declare some string, right? So all of these LSD, ex and my string are variables. And all of these 4562 and the destroying, hey, our literals. Okay, so why are they called variables? While they are called variables, because we can change their value. Okay, so we can do ls t equals 12. And let's add some print statements to convince ourselves that we've actually changed their values. Print lst here, and print lst here. And usually you would do this in multiple cells. But I just want to show you that there is no trick going on with the cells. Ok? So if we run this, right, so the first print statement prints out four or 56, and then the second one points 12. Why? Because we've changed the value assigned to the LSD variable. Okay. And why are these called the literals? It's because they are constant values that will not change. So they kind of literally stayed the same. But you can think about them as constants as well. Okay? You cannot change this once you've written, hey, here, it stays that way, right? Of course you can come and change it to something else. But you've deleted those characters, okay, so it's not the same as reassigning things like we've done with lists here. Point for example, here is not a variable, ok, that's a function name that something you call. So variables are just things that you can assign values to. If we have a function that takes a parameter, this parameter is also a variable, ok? Because we are, we are, we are assigning a value to it when calling the function, right? And also in the function we can change that value. Per gram equals six and see it printed six plus one. There is a bit more to talk about parameters and how they work in functions, and what happens to them when you change the value in a function. But that's for another video. In this video, I just want to make sure that you understand what we refer to when we say, we're going to use this and that variable or this and that literal. 15. Data types - Integers, Floats etc: In this video, we are going to talk about data types or rather continue talking about them because we've already talked about most of them. So these are the basic data types that are pre-existing in Python. So you have these available by default without having to import anything and without having to do any other work. Okay, they are therefore you all the time. First of all, we have integers, we've seen them. They are stuff like 71320, so basically whole numbers, okay. You declare them like this with no quotation marks now apostrophes, no, nothing. And what I haven't told you about them is that you can also obtain them from strings. So if we have the string 20-20, note the apostrophes, we can get an int out of it as long as it's possible to do so. So as long as that string only contains what an integer would contain. So as long as you don't have any letters here, by using this int function. So just write int open parentheses and put the string in. And if you run this code, you can see you get the output like this. Floats work mostly the same way. You can convert them from strings using the float function. And what a float is, is it's a number that's not an integer number, that's not a whole number. So something, point something, right? So 4.63.14 and so on. So let's make this not an integer. So you convince yourself that this works. Let's do a 0.5 here. And this again, and see we get 20.520 here. We've also seen strings. A string is anything between apostrophes or quotation marks. And we can also convert integers and floats or other things as well to strings by calling SDR like this, just like we did with calling int and float. And if we print them, you see, we get the result like this. But how do we know that this is a string? Right? We know that high is a string, that's obvious, it cannot be an integer. But how do we know that this is a string and not afloat, right? Because it looks just like the float here. So why is it a string? Well, in order to get the type, we can use this type function and the variable, and that will tell us the underlying type of that variable, right? So each variable in Python has a type. If you assign something with Some number with a dot in it, that type will be a float for that particular variable. If you assign an integer, the type of that variable will be int, and so on. For strings, if you assign something with quotation marks to a variable, that variable will be a string. And we can get that type by calling the type function and the variable. We've also talked about lists. Converting lists to inform strings is not something you want to do in general, it can be done, but it can be problematic. So I don't want to go into it right now. And we have this append method that I didn't show you in the video about lists. And what this does is it adds the parameter at the end of the list. So if we run this cell and then this cell, we get this output. Make sure you run each cell in order, in order to get the results I'm showing you here. Don't be intimidated because I have all the code pre-written in some videos. I'll have to do that to make the presentation a bit faster. But don't be scared if you think I'm moving too fast, just pause the video, rewind. If you have to write, there is no reason you shouldn't watch the video at your own speed. Even if I'm going fast, right? You can easily stop me by pausing the video. And I encourage you to do that if you feel like you're getting lost. Okay? And now we get to the new things that we haven't talked about. So first of all, we have sets. A set is just like a set in Matt if you're familiar with that. So it can only contain distinct values. It's like a list, but values cannot repeat. And it's written between these curly brackets. Okay, so if we try to create a set out of 1232, and we print that set, we will get 123. And let me just run this again because I have a five thereby mistake. Ok, c. So we get 123. We do not get to two times because Bina set, it only keeps the distinct values. We can add to it by using the Add function and passing it. A parameter is just like append. But it's called Add here because sets also do not keep an order. Okay? So if we do here for example, I don't know, let's add minus five and drawn both cells. Okay, it gets displayed like this, but the order doesn't really make sense in a set. Okay, this is just an artifact of displaying. The numbers are just their, okay? We cannot do, for example, print my set of one. If we run this, we get this error because we cannot use indexing and sets. Ok, so they'll just a collection. There are not a list that has indexes. So I'm going to live the same, but how do I add a comment here? So if you download this file, you know that it's bad. Okay? And finally we have dictionaries. Now, dictionary is you can think of like a generalization of lists. Lists only have integer indexing from 0 to the number of elements minus1. Well, in dictionaries, the indexes can be almost anything. Usually they are integers or strings. Ok? So we can have here a dictionary defined with curly brackets as well. With the keys on the left, then a colon, and then the value for the corresponding key. And this is repeated as many times as you want. So dictionaries are also thought of as a collection of key value pairs. Okay? This is a key value pair with the key being Mike and the value being 12. So maybe this is a dictionary, we can imagine. Its, it stores the age of some people, right? So Mike is the name and 12 is Mike's age. We can interpret it that way. And if we print my dictionary, note the square brackets just like for lists. And the index is now the string Mike, which is also a key here. And this will print the value associated with that key. And C pivot is 12. We've written 12 as the value for Mike, so it prints 12. Here. We can set new keys like this. My dictionary of E n equals 30 tree. And if we print that, we get delta tree again. And the key is do not have to be of the same type. So we can also do my dictionary of 100 equals 300, or you can also put a string here. It doesn't have to be an integer. The values also don't have to be of the same type. So maybe let's put a float here, 65. And if we print this C, we get the value that we've set. Dictionaries are very powerful and will likely make use of them quite a lot in future videos. So make sure you understand them as well as possible. 16. Operators and booleans: In this video, we are going to talk about operators and Booleans. And this is also something that we've seen already, but I didn't mention it explicitly. So what is an operator? Well, if we write something like x equals four plus two, then we have two operators in this instruction or expression. We have plus and we have equal. Ok? So what does, is it adds up the thing to its left with the thing two, it's right in this case, the two integers 42. And what does, is it sets the thing to its right into the thing to its left, or it assigns the right-hand side to the left-hand side. This is also terminology that you will hear a lot. Right hand side and left hand side. So this is what an operator does. It acts on what is called an one or more operands. So the operands here are 424 plus this expression or the result of this expression. For equal. And also X4 equals and other operators that we have. Minus, minus equal plus equal, time sql and so on. Power off. So this would be nine. Parentheses are also considered operator. So if we do something like x equals 2x plus three in parentheses times nine, then these parenthesis are also considered operators. Ok. And this isn't really something new, like I said, you've seen all this before in previous videos. We are just introducing some new terminology here. Now, what is a Boolean? A boolean is a true or false value. Okay? So we've already done stuff like if x lower than 5% this and let's say else points higher than or equal to five. And let's run both of this house. And it prints here larger than or equal to five because x is this. So probably 45. Alright? Okay? Now, this expression here has a value, right? It's either true or false. X is all, is either lower than five or not lower than five. Okay? So we say that the value of this expression here is a Boolean. And we can see, we can actually see this more clearly by doing this point. The expression itself, right? So this is an operator that gives you a truth value, a, something that is either true or false, or in other words, a Boolean value. It's just like the plus operator. What it doesn't give you a number, it gives you a Boolean. This is also a data type. Ok? So if we run this, see it prints false. And if we run this, it should print true. Okay? So that's it for operators and Booleans. I hope this little bit of theoretical ends, terminology talk has been useful. 17. Classes: In this video, we are going to talk about Python classes. Classes will allow us to make our own data types. We've already talked about integers, floats, lists, dictionaries, sets, and so on. We've touched on the most important data types already available in Python. But sometimes these are not enough and we need to make our own. And here is how we can do that. We define a class with the class keyword. Then we need to give it a name just like we have integers, floats, lists. So if we wanted to make our own list class, we would call it list maybe. But let's do something simpler. Let's say we want the class name the dog OK? Colon, just like when we define a function, need a colon here. And there is one function in classes that you should almost always have, and that is the init function, which is written this way with two underscores before and after. And this is also called the constructor of the class. I am just giving you the terminology now and I'll explain exactly what it does in a minute. We must also take a parameter called self and any other parameters that we want. But self should always be there. And let's say we want a parameter called name here. And we want to store the name in a field that will also be called name. In that case, we will do equals name. So to recap a bit, this is the class name. This is the class constructor, which you should almost always, right? This is a parameter that should come first in every function or method, you might hear. Functions that are part of a class are referred to as methods. And self should always be the first parameter in these methods. And this is a parameter that we chose to put in. will create a field in the class dog. And we assign, and we assign the name parameter to it. Now let's make a function. Let's call it bark. And this will just return. And this will just return a string which will contain the dogs name. So this part here. Using dot format will cause this curly bracket pair to be replaced with the value of this parameter given to format. If you're using the latest versions of Python, I believe this was introduced in 3.7. We can also do the following. We can add an f here. And then we can write what we want to be put here in the string, like this between curly brackets. This is basically a formatted string or string interpolation. You will hear it referred to in this manner. Ok, let's run this cell. Well, nothing happens because we haven't done anything with the class. Just like when you define a function. If you don't call that function, nothing will happen. So let's make something happened. Let's create a dog. Let's call that dog. Maybe max width two axes. And the way we instantiate a class, so the way we make an object out of that class is by calling dark, just like you would a function and passing in the parameters defined in the constructor after the self parameters. So all we need to pass in here is the name and Max is the name. So think about this as the equivalent of doing my int equals five. Okay? Five is an instance of the Integer class. Or in case of lists. My list. This particular list containing the integers 123 is an instance of an object off the list class. And in our case, this dog with the name max is an instance of an object of our Dog class. And let's do one more. Let's say Fido is a dog called Fido. And if we run this, nothing happens again, just as nothing would happen if we had written this. Okay? In order to make something happen, we must use these variables maximum Fido somehow. And the way we do that is by calling methods on them. For example, point max, dot bark, and Fido that bark. And if we run this, we get this error. So why do we get this error? If you are not sure, we wind the video a bit and try to see what I mentioned, but I forgot to do in the bark method. Okay. If you're back, I hope you figured out what that was. And that was that every function or method written in the class should have as the first parameter, this self parameter. And I'll explain why in a minute. Now we need to undersell again, this again two. And now if we run this cell, see it walks. So this function gets called and the string gets returned here with the name added here. So max barks and, and Fido barks. Ok. Now what is the self parameter? Because see we added it to the bark function to get rid of that era. But we didn't pass it in when calling that function or that method. And the same is true for the constructor. Okay? So when we call this, when we instantiate the class, the constructor gets called. But we didn't pass in this self parameter. Well, that's self parameter is a special parameter that must always have this name self. And it refers to the instance that we are working with. Okay? So in case of the constructor, this self here, when doing this, we'll refer to the instance of dog just created here. And when calling bark, this self parameter is automatically passed in as the object or the variable that we call the function on. So when we call max dot bark self, we first do this max here. And when we call Fido dot Bach, Fido is automatically passed in as the self parameter here. Okay? And we can also do PO INT max dot name for example. And this points the value of this field in the class which was stored in the constructor. So that's the basics of glasses. I hope it makes sense. And if it doesn't make a lot more sense as you start using them more often. 18. Errors and exception handling: In this video, we are going to talk about errors and exception handling. Errors or exceptions are cases that make the program crash. And we've already seen one example just in the previous video when we discussed classes. And if you recall, we forgot to add the self parameter to a class method and we got the error. So let's see how we can handle something like that. But I'm not going to write the class again. So let's just write something like given int equals input into an integer. And let's print the square of this given integer. We're going to use formatted strings for this or string interpolation. We've covered them in a previous video. So let's say the square of the given nt is that given n squared. And I'll explain what this input part does in a few seconds. First, let's run this cell and see it prompts us to enter an integer. So basically our code is now stopped somewhere around here. Okay? See it's not going to the point. And if we look here to the left, we have this asterisks here, which means that this cell is still running right now. So what input does is it prompts the user to enter something from the keyboard. Alright, so here we can type, let's say six, and hit Enter on the keyboard. And we get this error here. It says TypeError unsupported operand types for the power of operator or Pao because they're the same thing. Sdr and int. And how we should do it. This is that this power of operator, this two asterisk operator, does not work with a string on the left hand side, because it says string here. And then int and the right-hand side. And of course it doesn't, because it doesn't make any sense to raise a string to a certain power, okay? It only makes sense to raise integers or floats to a certain power. And the problem here is that this input function returns what did you eat from the user. So six in our case as a string. So if we pass in six, it will return the string six, not the integers x or the float six. In order to fix that, we must, we must cast it to a float or an int like this. We have already seen this type of casting in a previous video, but we have to do it here as well. And now if we run this. And we enter six. See, it computes the square root of six correctly without giving an Arab. But now what if we run this again? And we enter something that doesn't make any sense like this, okay? Any other string, anything that's not a number. Again, we get this error. Now. We don't want this because this makes the program crash. Basically, we don't want it to crash. We like to say something like, well, you entered something that's not valid. Please try again. Ok. So in that case, what we would do is we would use a try except block. So we will put our code in a try block and write here except and after except. We have to write the type of the exception in this case its value error. You can always get the type from here. So except value r. Let's store this value in a variable so that we can also pull into the underlying problem. And we do this like so except value error as v, colon and print input. Please try again. And if you run this code now, and we enter something bad, See it says invalid input, please try again, but it doesn't permit us to try again. So let's fix that as well. In order to fix that, we can put this whole thing in a while loop. In a while true loop actually. And this will cause this while loop to execute until we forcibly stop it. And we'll see how we can do that. Okay? So let's see how this is going to work. Like first of all, if everything is correct, we're going to stop the while loop by using break. When the execution encounters a break statement, it will terminate the innermost while loop. The while loop that contains that break statement. Okay? So if everything is all right, we terminate the while loop. And if we get an error, the error will be at this line. So once we encounter an error in a try block, nothing after the line that contains the error gets executed and the execution jumps to the except block. And then the except, we don't have any break statement. So if inevitable appears, this will get printed. And then the code will execute again, pumping gas to enter an integer again. So let's see if that works. And to have an integer, let's enter something bad. Invalid input, please try again, and it prompts us to try again. And it keeps prompting us until we enter something that's correct, and then it stops. Now, what if you get an era of a type that you don't expect, right? Like, let's say maybe you are not sure that value arrow is the only thing you can get here. Well, in that case, you would put the most general type of exception, which is exception as E. And point to a more generic our message may be. And if we run this, of course now it prints the first one. Because all this trolls is a value error. But we can make it point to the type error that was printed initially. By removing this. And now if we enter a string or anything really, see it goes to, please try again. It's very important that you catch the most general type of exception at the end. Okay? So that's it for exception handling so far. 19. Basic algorithms: In this video, we are going to talk about basic algorithms. Most programs will be writing will be made out of algorithms. Algorithms are a way of expressing the solutions to problems in programming languages. So it's very important to know some of the algorithms that are used in most any program. Like for example. And we'll look at a few here. Given a list. What is the minimum or maximum element? Now, okay? Or given an integer, find the integer obtained by reversing its digits, okay, and we'll assume it doesn't end with a 0. Ok, so this will give you an idea about how to put everything we've seen so far, most of it together into an actual program that does something. Ok. So let's write the first one, and let's write it as a function. Okay, so deaf, get min, and it's very similar for a Mac. So I'll leave that as an exercise of LST, which is our list. And the way we do this is we assume that the minimum is the first element in the list. And we'll also assume the list has at least one element, okay? And after that, we can just iterate over the list. So ylab enlist and we compare it with the minimum. So if Elon is lower than the minimum, then we set the minimum to be lm because we found a smaller value than what we assumed was the minimum. What we know the minimum was until this step. And then the end. Will it done this minimum? And let us see some examples. Per int get min of what do you think the answer should be for this? I think it should be minus five because this is the smallest element. Let's see if that's true. Let's run this cell. And indeed we get minus five. And this is the algorithm for determining the minimum element in a list. For the maximum, you would just use larger than here. Fortunately, Python makes things easier and it provides a built-in function that does this for us. And that function is min. So if we run this, see it gets the minimum for us. Now, let's do the second problem. And for this one, there is no built-in Python function. Although we will be able to come up with a very short solution for it, a one line solution of a one line of, as it's called in programming. But will have to study a bit more until we get to that level. But you should be able to do that by the time we finish this section. Ok, so let's make a function for this as well. And let's call it reverse of number. Let's declare a variable. And we've lost number, and lets initiate it with 0. Ok, and here is where we will build our reversed integer. And the way we do this is we say, well number isn't 0. We get its last digit by finding the remainder of division of number by ten or number modulo ten. So we call that a modulo b is equal to the remainder of dividing a by B. For example, ten divided by three will be equal to three. Remainder one, right? This is just great school division. So this means that ten modulo three will be equal to one as well. And if you take the remainder of any number modulo ten, this will always get you the last digit. Again, this is just great school matt. So in order to see that, I suggest you do a few examples on paper to convince yourself that this is true. And then we have to do we burst number equal to reverse number timestamp. And what this does is it will add a 0 to the number, right? Because if you multiply any number by ten, it will be that number with a 0 at the end. For example, 13 times ten will be a 130, okay? 0 is an exception here, 0 will remain 0, it will not be two zeros. So for this, we can add the last digit that we computed previously. Okay? And now we need to get to it of numbers, last digit. And we do that with double slash Dan and double slash performance integer division. So if times ten adds a 0 division by ten, integer division by ten will remove the last digit. Okay? So basically, you can see that this keeps chopping off digits from number, from the end and adding good too, if adding them to reverse number. And this will end up reversing the number. In the end we return a vast number. And let's print an example or 2245. And let's do once X3 one for example. And if we run this, see we get 5421361. So it successfully with us is these numbers. Ok, so this is how most algorithms will look like. They will have some logic, some while loops and for loops, some Ifs, et cetera, some mathematical operations. It depends on what we're trying to accomplish. And this is how we reason about them generally line by line, making sure to be very precise about what we tell the computer to do. Because the computer will do exactly what we tell it to, not what we wish it did, it cannot read our minds. So be very careful about how you express yourself in code. But also don't worry if this seems complicated. I mean, if you think you are not able to come up with your own algorithm for something just yet, because that comes with time and experience. And it will get easier as you work more and more in the programming. And in Python in particular. 20. Generators: In this video, we are going to talk about generators. We've already seen generators before in the range function. So if we have something like for i in range of ten, let's say point i. And if we run this, we get the numbers from 0 to nine. Okay? This is how arrange works. Now, the cool thing about the range, it's that if we have some really big number here, like a million, it will not allocate memory for these 1 million numbers that we are printing. It will only allocate memory for one of them at a time. So it's going to be a very efficient program if we run this to create a list of these 1 million numbers, it's smart enough to do that. So let's see how we can write our own functions that are smart enough to work with very large numbers without storing them all in memory. So let's say we want a function that gives us even a numbers. We will write def yet events. And we're going to do I equals 0. And while two. If you recall from our errors and exceptions video, we used a while to there and it basically worked as an infinite loop that would only stop when we stopped it with a break statement, right? What? We're not going to have any break statement here. So we're just going to have yield high. And we're going to increase i by two because we want to generate the even numbers. So in order to do that, we go like 0246, etc. So we need to keep adding to. And I'll explain what yield does in a few. Right now let's do events equals get evens. Okay? And let's do four. Let's say I in range of 50 because we wanted to print the first 50 even numbers. And we're going to print next of evens. And if we run this, you can see that we get the first 5050 even numbers from 0 up until 98 inclusively. And you can check that these are all there. So how does this work, right? We printed 50 even numbers without storing them. Okay, and using this weird yield syntax. Now what yield does is kinda the same as what happens in traffic when you encounter a yield sign, right? You'll stop your car or you pause your car because you don't really turn off the engine and wait until it's clear to go, right? It's the same here. When yield is encountered, the value given to it is returned from the function. But unlike the normal return statement, the function does not fully stop. Okay? Its engine still runs, kind of, it's just bright or paused. Okay. Is stationary. And it's stationary until the next call, the call to this next function, in which case the function is resumed and it runs until the next yield statement is encountered and so on. And this allows functions to be very efficient, okay? It allows for what's called lazy computation. Okay, so not everything is computed at once. It's computed when you request it. More concretely, it's computed when you call next and the generator on evens here. So that's going to generate us and they can also be quite useful in a variety of situations. For example, imagine that this would yield data taken from the internet and the connection can be slow. You might not need all of that data at once, and you don't want to wait until this connects to maybe a 100 websites and gets all the data. Maybe you're happy with just the data from two websites for most of your program. And in that case, it would make sense to do it like this. That's it for generators so far. They can also be made to work like range, like you could have a parameter here, a condition regarding that parameter here. The same kind of yield statement and updating step code here. And then you could do for i in get evens of 50 for example. It would work the same way and it would be closer to how I change works. But using this next call gives you a bit more control. It allows for a more fine tuning. 21. Advanced lists: In this video, we are going to talk about some advanced lists concepts and in particular about slicing methods. So let's say that we have a list. And I'm going to add a few elements so that it's easier to exemplify. All I wanted to show you. Ok, that should be enough. Now, let's say that I want to get a sublist. So another list made up of some elements that show up on consecutive position in LSD. And that's a mouthful, I know. So as an example, let's say I want to get something like 65 or maybe 14635 and so on. So in order to do that, I would do, let's say, Slice one equals LSD. And I draw it here, colon four. And what this will do is it will get me a list made up of the elements at index 0, index one, index two, and index tree. Okay? So it was kind of like range. If you think about it, write, it starts from 0 and it goes up until the parameter minus one. This is the same as if I write 0 here, okay, but we can omit this 0 if it's 0. So let's just write that as a common. Okay, and this is called a slice. And if we print it, see, we get what I showed you. This. Now let's do something that doesn't start at 0. In order to do that, we would do LSD off, let's say two and go up until seven. Let's bring this to ok, so it starts at six, right? Which is index two, if you counted from 0 and goes up until 21, which should be index six. Ok, so the seventh element, okay? Okay, good. So what happens if we do slicing tree LSD to colon and we don't put anything after the colon. Well, this is the same as LSD to colon. Len of LSD. Len of LSD will get you the number of elements in LSD k. So if we omit the right-hand side here for the colon, it goes up until the end of the list. So this should get us this. Okay, let's see if it does. Slice tree. It does. Okay? Now, what if we add two columns, right? So let's say we do something like slice four is equal to LSD. Let's start from one, go up until eight, and go in steps of two k. So let's slice four and see what happens. So we start at the index one here, which is four, and we go to three, we go to 74, and we go to 30. Okay, so what this does is it will start at index one, then go to index1 plus two, then that plus two, then plus two, et cetera. And keep going. Wow, that value is lower than this, then eight. Okay? So basically this will go in increments of two and the two have the Fact basically skipping one elements. So it will go this, this, this, this. And if we didn't have eight here, we'd have something else. It would go to four. Now what happens if we omit the eight? So slice for LSD, one, colon, colon two. Okay? It will basically have the same effect as this, but in increments of two. And this should be slice five. Brain slice five. And you can see it goes until the end of the list. Okay? What happens if we omit the first value as well? The one in this case, let's copy paste this renamed to slice six. Believed one. And you can see it starts at index 0, then goes in increments of two for the index. What happens if the step is negative? Well, before we get to that, let's say we start at six and at one and have minus1 here. And this will start from six and keep going by decreasing the index by one. So by adding minus1 to six then to the result to five, then to the result 24, and so on. While that value is not one. Okay? So basically this gets us a sub list in reverse. Ok, good. And so that we keep seeing the list. What we can do here is go to cell and we have here all Output Clear, clears the output and it allows us to see the initial list. So I'm going to run this again. And you can see that it's this sub list in reverse. Ok? So we can also use slicing to get sub-lists in reverse. And we can get the entire list in reverse by doing this, by omitting both the start and the end. You can see this is the original list in reverse order. So this is a very easy way to reverse a list. Everything I've just shown you also works with strings because strings are basically lists of characters. Okay? So all of this here also works on strings. If you have a string, you can use slicing to manipulate it in various ways. And one more thing I want to show you about lists is this. Let's say you want a list made up of a 100 elements equal to 0. You might need it for some kind of algorithm. You can easily create that list in Python by using multiplication with lists. I print my list. And here it is, these are 100 zeros. So these are some advanced concepts regarding lists. Although I call them at vast dare not very complicated or not very complex, I think they are quite easy to understand if you spend some time on it. And they will definitely come in handy. Because Numpy, the library that we will be covering in a future section, we'll make heavy use of indexing for almost everything we're going to do with it. So we will definitely need to know how to index, how to index into lists and how to use a slicing to be able to take full advantage of Numpy. 22. Memory management: In this video, we are going to talk about memory management. In order to get a proper understanding about how Python works, you need to understand how it handles memory. In Python, every variable and every object that we write, such as for example, LSD equals a list made up of 123, is a reference to the memory location in which the necessary memo, if that object is actually allocated. What this means is that this LST is not the actual list itself. Although in normal speak, we are always going to say that LSD is a list. But strictly speaking, technically it's a reference to a list. Think about it like being a label put on an item in a supermarket. Okay? So something like a label that says tee shot $10. Okay. That's what LSD is. The label, not the actual T-shirt. And this has a few implications that we're going to talk about in this video. Let's say we have this LSD equals 123. And actually let's go ahead and make it LSD1. And we are going to have LSD t2 equal to LSD1. Then we're going to say LSD, one of 0 equals 100. And we're going to print LSD to pause the video here and try to guess what this will point. Okay, so if you guessed 123 because we changed LSD1, but $20 d2 and we didn't touch Alice t2, you are wrong. Unfortunately. It will print one hundred, two hundred three c. And this is precisely because of what I told you. This line here only copies the label. It's like using the copy machine to make a copy of the label that says T-Shirts, $10. It was still refer to the same T-shirt, to the same item. So no matter which you change it from this reference, this reference, the same list will change, okay? And this is how it looks in memory. So LSD1 is a reference to this lists. So it basically stores a memory address such as, let's say 50, at which the list actually resides. And this is another variable, but it still has the same memory address. It's still stores 50. And when we do LSD1 of 0, because 100, what we tell the program to do is. Go to the memory address stored in LSD1 and change the first element to 100, or the element at index 02100. And because both of these contain the same memory address, of course, no matter which one we use, the same change will be made. Okay, so it's very important to keep this in mind when working in Python because you see at first sight it can be confusing. You might expect it to remain 1-2-3, But it doesn't. Now what if we wanted to remain 123? Well, let's copy this code. Okay? And we're going to make a shallow copy here. And the way we do that is we slice the list like this. So slice is make a copy, a shallow copy. And now if we run this code c, we get 123. So this, but now what if we have the following case? So this is case one and case two. Ok? Now let's say we have a list of lists. Okay? And here we're going to have LSD1 of 0, which is this first list of 0, which is this one equal to 100. And note that we still have a shallow copy here. Ok, so again, pause the video here and tried to figure out what this will point. Okay, so if you said 123 here with each one in a list, again, unfortunately, you'll be wrong. It prints a 100 to three. And this one with the shallow copy, you see h remains at 123. Now why does this happen? Well, it looks like this. So for the first situation, we've made a copy, everything is fine, right? So LSD T2 points to a whole new listed LSD to LSD to basically store AS another memory address. It doesn't store the 50. It's something like maybe I don't know, 500 or 100 or whatever. It doesn't matter exactly what. It just matters that it starts another memory address compared to LSD1. For the second situation, however, we have this representation in memory, so LSD1 stores the address of a list. And that list also stores addresses for each of its elements. And they will point, there will be references to a list for each one. Now LSD T2 will store another address for the initial list, but the inner lists will reference the same values here. That is why it's called a shallow copy because it doesn't go as deep as possible with the copying. So if we have lists of lists, a shallow copy won't help if we modify the inner most elements. So the elements of the inner lists. Again, it's very important that you understand this. So feel free to pause the video, rewind, or even make some changes in the code to convince yourself of what I'm saying or to gain a better understanding. And in order to fix this too, if you want to make copies of these as well, we need to use a deep copy. I'm going to copy paste case two here. And I'm going to say from copy, import, Deep Copy. And instead of the shallow copy here, I'm going to say deep copy of LSD1. And if I run this now, it prints 123. And let's go back this comment. It's a 100 here. And this corresponds to the following presentation in memory. Okay? You can see that each individual reference in the big list now has also been copied or its memory addresses are no longer the same across the two lists. Of course, while a deep copy will solve any problems you might have with two references that points to the same thing. It's more expensive. So if you can get away with prefer, if you can get away with it, prefers a shallow copy because it will be computationally faster than a deep copy. This also has implications when passing parameters to functions. For example, if we have a function f of x, which just sets x to six, let's say. Then if we come here and say a equals Stan, call f of a and print a. This will print then. Why is that? Because when passing a parameter to a function, it makes a copy. Okay? So this is not the same a here. It's another one. And when we set it to six, it sets the copy to six. Okay? Now what if x is a list? Well, if x is a list and we say x of 0 is 100, then if we run this, it changes. So why does it change here, but not in the other example? Well, it does make a copy as well, but it only copies the reference. Okay? So we call that a reference stores a memory address. And a will start, let's say 50. And x will be a copy of a. So another valuable, another object that it will still store 50. And this will say set the 0 indexed element at memory address 50 to 100. So it will remain changed outside the function call as well. But if we do something like this, so x equals another list, and we run this. No changes persist outside the function y because this creates another list and that list can not be created at the same address as the one stored in a, okay, it will be created, let's say at memory address 100. And since x is a copy, it will not have any impact on the a outside the function. It's very important to keep this in mind. So please do not move on until you are confident that you fully understand the contents of this video. 23. Advanced classes inheritance and polymorphism: In this video, we are going to cover some advanced class concepts, and those are Inheritance and Polymorphism. This is also something you'll have to use a lot in your programming career should you choose to follow such a Coahuila? So let's define a class. And let's call it animal. Okay? And what we're going to have a constructor that takes as parameters self. And the name of the animal, stores the name in a class field. And then we are going to have a function or a method called make sound That's going to return a string, a formatted string. And it's going to have the following implementation. So and generic sound because we don't know what kind of animal it is. Ok, and if we run this, we can create an object of this class. And let's call it something really generic. Some animal. And if we call make sound on animal. And let's also point it. Again. We made the same as in the first video about classes. And that is we forgot to add the self parameter here. Okay, so this is a very easy mistake to make in Python. I didn't make it on purpose this time. But when you get the server, you should recognize that this is the problem. You probably forgot to put self somewhere. All right? And now we get some animal generic sound. Now this is a very general class, right? So it's just an animal. Well, we can derive classes from it. And that will allow us to keep most of the functionality of the Animal class, which is the same across all animals, but make some particular changes for our specific animals, such as the sound they make. Ok, so let's do that now. We can say class Dog. And then in parentheses we put animal, and this means that dog inherits animal. Okay, so this is an, is a relationship because a dog is an animal. We're going to have the init method here that also takes as parameters self and the name. And we're going to call here x2 bar dot init of name. And I'll explain what that does in a minute. And now we're going to redefine the make sound method. So we can copy, paste it. Put it here. And we can say box here. Let's run this. Now, what we can do is animal equals dog of max, let's say. And animal makes out. And let's go into this. Now. Here you can see it says Max box, okay, so even though we call this an animal, we instantiated it with a dog. And it knows that this function is what should be called, OK. What if we didn't try this function? Well, if we comment this and call this a dog as well, let's run both cells. See it says Max genetic sound. So dog has all of the methods defined in the animal class available to it. Okay, if we override them, we can change their functionality. But if we don't, if we don't, we define them. They remain as they are defined in what's called the base class. And this is very useful because it allows us to treat the derived classes as though they were a generic class. And the interpreter or the compiler will figure out which methods to call on its own, which can be very, very useful. 24. Images, PDFs, Spreadsheets: In this video, we are going to see how we can work with images, PDF files, and spreadsheets. Spreadsheets are basically Excel files. So these tasks, so all involve working with some third party libraries. So what I've done is I've Googled for code on how to do the things I'm going to show you and basically build my presentation on that code. I don't know everything here by heart because I don't use it that often. And it's very important to be able to Google for staff and adapt the results you find to your own needs, right? Nobody expects you to memorize how to do everything. That's impossible. And even if it were possible, even if you could do it, you shouldn't do it because it just takes space, brain needlessly. It's all there. You can easily find it. So this is what we have here for working with images or displaying them. First, we need to install a library that can do this for us. And I've chosen the OpenCV library. And if you uncomment these two lines. So if you have just this, you can put it in a separate cell or this cell with just this in it. It will install the OpenCV library in a few seconds or a minute or two. This is the command to install a a module in Python. You can also run this without the exclamation mark in your terminal. And as long as Conda is in your operating systems pat, it will install correctly. And this says that prefix trick is to make sure that the installed module is available in this file. Otherwise you can get into some problems with it. You might not be able to use it directly and so on. So I suggest you do it this way. And once it's installed, we can come into this because we don't want it to run again, it will just take time with no purpose. We're also going to use the matplotlib library, which is already pre-installed in Anaconda. So we don't need to install it. And see v2 is the OpenCV library that we just installed. We use this Jupiter magic command. Matplotlib is in line with a percentage sign in France to display the image like this. And this is the code that you can find online for doing it. It's quite self-explanatory. We read an image, we show it. And we also call this PLT show. And what we have to be careful about here is that Cv T2, that IMO IID leads the image in a certain format regarding the red, green, and blue colors, right? It so it them as blue, green, and red. And for matplotlib, we need to pass them in as RGB. And this Cv T Color Call makes sure that we convert it to the format that matplotlib expects. And this is the image I've drawn in paint. Python executes the bad handwriting. And you can see it gets displayed here in the Jupiter notebook. For emerging PDFs, I found code. Here. I've searched for how to handle PDF files in Python and how to merge BDNF in Python. And I found this nice function that takes us parameter a list of PDF files and outputs them in one big merged file. That's also a PDF. This has installed using this command. We have to specify here the channel that we want to install from its Candace dash Forge in this case. And again, you don't have to remember this. If you'll search, for example, for pi, PDF, Anaconda, and Google, it will give you the command you have to use. And if you're going to run it from Jupiter, all you have to do is add this syst dot perfect staying here. Or you can take it as is and run it in your terminal. So what this does is, for each file we give it, it leads that PDF and it adds it to the output. It has every page in it to the output. It's again, quite self-explanatory. What this does is it opens a file for writing and we specify that it's a binary file. And we just write what we built up in the PDF writer previously. And then we call the function. I've already run it and I checked and it creates the marched PDF successfully. So this is a very easy way. You can make a utility script that merges PDFs for you so that you no longer have to use an online service or download any existing software that does it. Do it in Python in very, very few lines of code. We're going to have a whole section dedicated to the pandas library. But I'm just going to show you here how we can use it in two lines of code to print the outputs of an Excel file. So on dot XML SX file, and it's just import pandas as pd. Pd is the standard abbreviation for this library. You're going to see it a lot. Then we just call the Vedic cell function from the Pandas module. And we print the resulting DataFrame, df dataframe. And it's basically Python or presentation of a spreadsheet from an Excel file. We'll go into more details in a future section, like I said, for now, if we run this for the spreadsheet I've created here, it will print this. So I made a Excel file with three lines and two columns, a column called item that contains these items, for example, in a supermarket and their prices. They also have actually this hashtag column. It's the first one and I just used it for numbering the lines. So that's it for processing images, PDFs, and spreadsheets. You can read more about them by searching for the respective libraries or for how to accomplish a specific task that you're interested in. And I recommend that you try out a few of these Google searches, maybe look up how you can rotate pages in a PDFs, or how to extract information from a PDF or how to ride to an Excel file. Or maybe what else you can do with Open CV regarding images. Just exercise googling for information for a bit. Because that's also a very important skill. And it's one that's not mentioned as often as it should be, in my opinion, in various tutorials. 25. Exercise 1 + Solution: Here is your first exercise for the end of this section. So you have to implement two functions here. I have written in comments what they have to do. So I'm not going to eat that. Pause the video here and try to do it. You can Google for the algorithm of, for terminology that you might not be familiar with. For example, if you don't know what a prime number is, feel free to Google that. But don't just copy paste code with the codes alt tab back to your notebook and tried to do it by ourselves. And keep doing that until you can write it from start to finish bios half. So you can start now pause the video here and I'll keep talking about the solution. All right, welcome back. Hopefully, you are able to do this by yourself. And in case you need any tips or you want to see what I had in mind for the solution? Keep watching. Okay, so for this prime function, what we can do is check if the number is divisible by the number starting from 234, and so on. Up until the square root of the number or the integer part of the square root of the number. Okay? And in order to do that, we can say d equals two. And while D times D, So D squared is lower than equal to the number. And this is the same as writing d lower than or equal square root number. But I prefer to do it this way because this way I don't have to deal with square roots and floating point numbers because you can introduce various bugs in your programs if you do that. So if possible, always try to keep it as simple as possible. Alright? And we're going to check if number modulo d 0. Don't forget the colon. And if it is, then this is not a prime number. So we return false here, regardless of whether this is true or not. Anyway, if it's true, it will exit the function. We have to increment d, so d plus equal one. And in the end, if false wasn't returned, which I told the numbers, it's not divisible by any of them. So we return true. And we need another condition here because by definition the smallest prime number is two. So if number is lower than two, it is definitely not prime. And we can simply return false. Now let's bring some examples. Make sure we check the function. So let's do it like this. One is prime ofs one. And let's say three. Maybe 15, maybe a 101. And let's do a big 1666013 is a large prime number, quite large. And it's a good one for testing such functions. And if we run this cells, we get one falls. This is correct. Tree true, correct. 15 False, correct. A 101 to correct, 6660132. This is again, correct. So this is one way of doing it. If you did it in another way, don't worry, as long as you get the same results for these examples of other similar examples, it's perfectly fine. Don't think about any issues regarding efficiency or a memo yields a draw as such, they are not too relevant right now. For this function, the idea is to use slicing. So we call that in the slicing video. We've shown that this slice expression basically gives you the list in reverse order. And a list is a palindrome if it reads the same forwards as it does backwards. So here we can simply return LST colon, colon, colon, colon minus one equal LSD. So I told return true if this expression is true, if this equality is true and false, otherwise, we don't even need an if else. So again, let's print some examples. Let's say LSD is 123, LSD is palindrome, l-s-t. And let's do one more, one to 33 to one. And this should give false and true. And it does false for the first one, true for the second one. So this is the solution to the first exercise. Again, congratulations if you made it this far. And also congratulations if you came up with a different solution for the first problem, just make sure you understand this one as well. 26. Exercise 2 + Solution: Here is your second exercise for this section. I've already written the problem statements and some examples in comments. I'm not going to eat them on video. So pause the video here unless you want to see my solution. And of course, feel free to Google for ideas or elderly DOM. So how to do something in particular in Python that will help you solve this exercise. Good luck and resume the video once you are done, so that you can take a look at my proposed solution. Okay, so congratulations, if you managed to get something working, I'm now going to implement these myself and explain what I'm doing. So here we get a list and we must return a dictionary of the counts of the distinct values. So what we're going to do is first to declare the dictionary. So result will be equal to an empty dictionary. Then for every value v in the list, if v is already in the dictionary, so if it's already a key in the dictionary, we will simply increment the value for that key. So we're going to do without of v plus equal one. And if it's not already in the result, we will add it and set it to one. So else result of v is equal to one. And at the end we simply return the result dictionary. Let's see if this works and the examples 0 get counts and copy paste the example list. Let's run this cell. And we get one has valid tree T2 has value to phi, five, has value 16, has value one. And the order doesn't matter. It's correct. We get the same thing as the example says we should. Now, for the next problem, this has a very simple one line solution. So let's declare our list. Just as in the example. Let's run this to make sure that it runs right in the format the example has given us. And it does. Sometimes the example will not be so easy. For example, it might say just in watts shop item named a with price 150. And then you will be responsible for getting it in this form. You wash yourself. Now to sort this list, we can do this. Sorted list equals sorted. Sorted is a function in Python that can sort lists for you. So we're going to give it the list. So LSD. And we're going to give it a key because it says we need to sort the objects by count. So we need to sort them by the count field. And by default, slotted uses the comparison operator are lower than, higher than and equals. But we don't want that. We need to give it a key by which to sort. And that key is the count field. What do we do here is say C0 equals, this is called a keyword argument. When you pass in an argument to a function by specifying the name of the parameter that you pass in the value four. So key equals means that what we right after the equals will be passed in as the key parameter of sorted. And we're going to use a lambda here, lambda up shop item. And I'll explain what a lambda is in a minute. A colon. And here we'll just write shop item that count. And if we put int the sorted list, well, we get this weird output in order to fix that as well. What we can do is come here and write a STR function that takes as parameter itself because it's a class function or a class method. And this will return the string we presentation of a shop item, how we want a sharp item to be displayed as a string. And here, we can simply return in order to keep the format of the example. A formatted string, shop item. And here, self.age, and here self.contents. Again, don't worry if you haven't done it this way. If maybe you found something else online, some other sorting algorithms that you implemented from scratch, that's perfectly fine. All right, this looks good. Like this. And if we run it now, nothing has changed. But if we do point, so at that list of 0, we can see that it prints it in the format we've written in the SDR method. Though isn't it doesn't work for the entire list, is because for lists, it's not SDR, that's called, it's this VBR or the presentation. So what you can do is you can also write this one. And we can simply do here with an STR of self, which we'll call this one. So we don't have to duplicate any code. Or you can put in a for loop and print them one by one. It's up to you. But now if we run this, you can see it prints them like we wanted it to do. And if we compare with the example, it's correct, so it has sorted them by count. Now, I owe you an explanation for this lambda. What this lambda does is it creates a function that takes this parameter and returns this value. So it's basically a shorthand notation for introducing very small functions that you will likely only use once. They are also called anonymous functions because you can see it doesn't have a name, but this would be the same as me writing a function here. Def myfunction that takes us parameter a shop item and returns shop item dot count should be written here. Going to comment, this can come here and say C0 equals my fun. And if I run this, I get the same output. So it's the same. But I don't have to make my workspace mess here by writing a function that I will only be using here. Okay, so this is it for the official solution. Congratulations if you made it this far and if you are following along nicely, and good luck with the quiz in the next video as well. 27. Python basics Quiz: In this video, you are going to take your quiz for the end of this section. We are not going to be using a PowerPoint presentation with multiple choice questions. Instead, we are going to be using filling the blanks type questions here. So what you should do now is pause the video here and try to replace all of these underscores so that each code snippet does what it says in the comments that it should do. And I'll pause the video here. I'll let you work on it. Then. After you're done, come here so that you hear my explanation for each of the problems. Alright, so hopefully you've got them. All right. They are not that complicated. Here is what I think you should have really done. There might be other solutions as well, but I'm quite confident that these all have unique solutions. But if you think you came up with something that works as well and it's different from what I am going to show you. That's also great. Congratulations. Okay, so here we can do two times X plus one, because two times x will be an even number and adding one will be an odd number. And by doing it in order from 0 to nine, we will get the first n odd numbers. Here. All we have to do is add a try statement. Works for integers, works for non-integers. So that's it. It's a basic example of a try except block. Here, we need to use slicing. And the trick here is to notice that five is at index one. So we start from 12, is the, is the last element. So we can leave the right-hand side, the empty for the first column. And we need to go in increments of two because six is right in the middle, right? And it goes index one, index tree and the index five. So one colon, colon two should do the trick. If we run this, we get 562. Here. It's a basic application of deep copy. It's very similar to the example we have seen in the memory management video about Deep Copy and shallow copies. So let's import Deep Copy. And here we can simply do deep copy of LSD. And if we run this, it prints what we needed to. And for example, just as an example, if we had done here a shallow copy, it would have printed 100 to three. Ok, so this is bad. We need deep copy here. Congratulations if you made it this far, and I hope you are going to enjoy the future sections even more. 28. Numpy Motivation: In this section, we are going to introduce NumPy for numerical computations. Here is why we are going to use NumPy instead of something else. First of all, it's because Numpy is very, very fast. It's actually a Python interface to some libraries implemented in C and in general, C and C plus plus are much faster languages then Python is the same code in C is going to run faster than similar code in Python. And NumPy fixes this issue. It's also at the same time very Python like which will let us use it just like we use out of Python objects and functionalities. For example, the ways that we're going to talk about the num pi will be very similar to the lists in Python. This will make it very easy to learn and to become productive in it. Another important aspect is that it's also very close to mathematical notation. If we read some paper that presents some formulas or some algorithms, and that paper is quite heavy on mathematical notation. We will be able to translate that paper to code using NumPy quite easily. Of course, there is still an element of difficulty because we are talking about research papers and cutting-edge research here. But it does make things as easy as possible. And last but not least, it's under a constant development. There are many people actively working on the numpy library and improving it, optimizing it and adding new features all the time. Numpy has a lot of applications to data science. For example, it makes it very easy to compute various statistics that are useful in this field. It makes it very easy to preprocess numerical data in various ways. This is also quite important. A lot of the times there are data that we have to work with, needs to be preprocessed before it can be fed to various other algorithms that process it further. And Num Py will help with this a lot. And there is a lot of documentation for a lot of things you might think of doing. Data science is a very big field. And the methods in it, the algorithms used often are very well documented. And also if you have to do something new, chances are that there will be aspects of your original idea that have been implemented by ROP pole. So that will help you not have to do everything from scratch. And also quite a few applications to machine learning. For example, this is used heavily by many machine learning libraries, such as Scikit Learn, which is a very well known Library. It's very popular and it implements a lot of machine learning library, machine learning algorithms. And it does it in an efficient manner and also in a user friendly manner. It's very easy to pick up scikit-learn if you know a bit of Python and become productive in it. Novel algorithms can also be prototyped relatively easy. Like I've said before, if we have a research paper that introduces a new algorithm, chances are that we will be able to use NumPy to implement an initial version of that algorithm to see how it behaves on certain data. Or in our particular problem. Num pi will allow this quite easily. Also, GPU libraries like TensorFlow work in a similar fashion to num pi. So if you want to pick up TensorFlow for example, and you already know a bit of num pi. It will be noticeably easier for you to understand how TensorFlow works. And this is true for other libraries as well, for example, pi torch and some others as well. So in conclusion, Numpy is indispensible if you are serious about getting into data science and machine learning. So I recommend that you learned to out and that you'll make the most out of this section. Make sure that you understand all the code examples. Make sure that we are able to apply various changes to them. Even if there are any exercises for a particular topic or aspect. Tried to set some exercises for yourself so that you get as comfortable as possible with the code presented. Good luck, and have fun. 29. Arrays: In this video we are going to talk about Numpy arrays. Numpy arrays are very similar to what a list is, but they are closer to the mathematical concept of a vector or a tensor. And don't worry if you don't know what that is, I'll explain things as we go along. So I have some code pre-written here and we'll go over each cell and I'll explain what's going on. First of all, this is how we import NumPy. Import NumPy. And this is a convention. We usually alias it to NP, alright? And P stands for a numpy and we'll be using this a lot. So it makes sense to alias it to something shorter. You should already have this installed in your Anaconda off. So it should work out of the box like this. So if we run this cell transcode correctly, and this is how we declare an array. So my array equals np dot. So the module and P, which is the alias dot away, we'll tell it to create an array. And here we must pass in a Python list. And I've done that. I've passed in the list 1234, and this will be transformed to a pi, to a Python NumPy array. And in order to see that, we can insert a new cell here and print it. And it will show like this array of 1234. And we can access the elements of this array just like we would access the elements of a Python list like so. See we get two here. So this is element at index one. We can put into the shape of the array. This is something new. We will see we can have multidimensional arrays, and in that case, we will have more numbers here after the comma. We can also print the length of the array, just like for Python lists. And we also have slicing. For example, my array from one to three is array of 23. So it also returns a numpy array, a slice. And the cool thing here is that we can do something like my slice equals this. My slice of 0 equals, let's say 100. And now let's run this again. If we print. Slice. As expected, we get 103, but if we put it in my array, we also get 11034. So the slice does not make a copy like it does for Python lists, right? You can think of them and they are actually called that a window into the original array. So just like in that memory management example in the previous section, this is basically a reference to the slice. It's not the actual slice, it's not a copy of that part of the array. And it's very important to keep this in mind because it can make a lot of things much, much faster. It can also cause confusion at first. So it's very important to understand this. We can also iterate away is just like we, you, we can iterate Python lists and any other collection with a for loop. And we can also apply mathematical operations on them. For example, Miao eight times two will return another way in which all the elements have been multiplied by two. And this will add five. This will always everything to the power of two. And chances are that these are going to be much faster in practice than if you had a Python list. Andrew went over it with a for loop and did these operations on each element. Like I said, in the background, everything is implemented in C by a numpy just calls some C libraries that do the job. And Python is just an interface. So it's very likely that all of these operations that I'm showing you here are going to be much, much faster on Numpy arrays that on Python lists. Of course, for the four elements that we have here, it's impossible to notice any difference. But if we have thousands or tens of thousands or hundreds of thousands of elements, it would likely become quite noticeable. If we do my array times my array. It's the same as squaring it, so it's applied element-wise. And if we do np dot and pass it to arrays, in this case it's the same array. We get this value here. And this is basically the sum of this. Or if we run this, see we get the same value. This is called the dot-product. Ok. And this will be important later on when we will discuss multidimensional arrays. But for now, just know that it exists, it's there. And actually we can also do it like my array dot, dot and pass in another array here like so. I'm just going to pass in the same array now and we get the same value. So this is it for the introduction to NumPy arrays. As you can see, they're very similar to Python lists, and we'll move on to a bit more advanced stuff in the next videos. 30. Matrices: In this video, we are going to talk about mattresses or 2D arrays, or two-dimensional arrays, or even more generally speaking, multi-dimensional arrays. And what those are is an extension of lists. Basically a, if you're familiar with other languages, they are vectors of vectors or arrays of arrays. For example, in pure Python, we could do something like matrix equals list, list 123. And we can write these on a new line, 789. And this is a matrix, it's a two dimensional matrix with three lines and three columns. And if we had something like, let's say 00, this would be three lines and for columns. And we've written it similarly to how you might have written this in algebra class in school on your paper. So if we go into this, we get it like this. And the disadvantage of the Python one other than efficiency is that we can't really manipulate it very well. So we can do stuff like mad 0-2. And we'll get to, and why this works is mat of 0 will get us this list and that list of two will get us 012. So three. But we can't use slicing in both dimensions. For example, if we wanted to get a sub-matrix, For example, this sub-matrix here. So 2356789. We can't really do that easily. And there are other things we can do very easily. So numpy to drug rescue, NumPy makes matrices first-class citizens, which means that they are actual objects. They are not this hack of making them a list of lists. They are, as I said, proper objects with their own method, with their own slicing across each dimension and with other functions and methods that can be applied on them efficiently and in a user friendly way. So let's see that import numpy as np. Let's say. And be mat equals np dot array, just like before, and we will pass in math to it. Okay? Let's show it. And it also looks nicer here. It's formatted in a nicer fashion. It's easier to read. It looks just as very similar to how you will drive this with pen and paper on a notebook. Alright, so let's show the indexing I was telling you about. So first of all, in order to access an element, we can do 0 comma and two here. And oh my bad, it's not ten p here. And p underscore math. And we get three just like before. So we no longer have to use this clunky syntax like we had to in pure Python. Because this is a first-class citizen. Num pi knows that it has two dimensions and allows us to use this common notation. Again, just like you would on a piece of paper you would write. And p underscore Matt. And as a subscript you would write 0 comma two. Okay? We also have the shape, attribute or property or field, however you wanna call it and make sure I don't make the same mistake again. And P Matt, two-by-four, three lines by four columns. This is a Python tuple. So you can access it just like you would any Python tuple. Now let me show you the indexing I was telling you about. So let's see how we can get the sub-matrix here. Ok, so in order to do that, we will write and b underscore mat of get all lines. And for all lines only get columns starting from column one and ending just before column three. So this will get columns index 12. And we get it here. It works just like in Python across each dimension. So if I do here to, this will only get me the first two lines. So probably just 23 and 5-6. So this allows us to use the indexing across all dimensions. If this were a three-dimensional array, we would be able to use it across all three dimensions by putting into commas here. And again, this does not make a copy. It's a window into the array. If we modify this, for example, if we come here and say this is 0 and we, we show NP mat here. It changed these elements to 0. So this is very powerful, but you also have to be careful not to modify something you don't want to modify. That's it for matrices so far, we will introduce another concept says we require them. 31. Random number generation: In this video, we are going to talk about random number generation. Generating random numbers is an important ability when doing data science and machine learning and statistics. A lot of the data structures and algorithms that are used often in these fields require to be initialized with, with the proper random numbers. And the NumPy lets us do this quite easily. So in order to do that, we are first going to import NumPy. And here is how we can generate a floating point number between 01. So floating point number. And this is actually a closed interval at 0. So this can also return 0 and an open interval at one. So this cannot return one. Ok, and if we keep running this cell, you'll see we get a different number, number every time. And this is often used when dealing with probabilities. For example, if you want something to execute with a 50% probability, you will check if the random number is smaller than 0.05. and only execute the necessary code in that case. We can also generate integers by using the np.zeros random module, which is where you will find all the random number functionalities of Numpy. So if we run this code here, it will generate 3410. See you all again so they can repeat 0 again, three again, and so on. And if we bring up the documentation for a randint, which we can do by clicking shift tab. It says here that it returns a random integers from low inclusive to high exclusive. So this will generate random integers up until five, but not including five. Okay? And if we only pass one parameter, this is going to generate random numbers starting from 0. As we've seen, we can also get 0. And if we pass in two parameters like I did here, it's going to generate them between five inclusively and up until nine inclusively. So then cannot be generated here. Okay? And we can also pass in sizes to these functions. So for example, if we do np dot random dot trend of 3.35, of 35, we are going to get an array. This is a numpy array. We've talked about those in the previous videos with three rows and five columns containing random floating point numbers between 01. Okay? And if we bring up the docstring here, you can see this accepts as parameter a shape. So D0, D1 is the given shape. And if we pass in, let's say four here, of course this is much harder to understand. And usually when working with more than two dimensional arrays, we don't really need to be able to visualize them. So this is basically a three by five by four array. You can imagine it as a list of matrixes. Ok? And if you look here, this is a matrix. This is a matrix. And this is another matrix. So yes, you can imagine it like this, but don't worry if you're not able to fully visualize it. We can also do the same for random integers. But here we need to pass in this size parameter like so, because the first two parameters correspond to the interval in which to generate the integers. And another often used the random number function is the choice function which accepts a Python list a size. And it will return this many. So in this case, three random elements from the given list. And they can also repeat with these parameters. Let me see if I can get it to repeat. See it got two twice. And if we bring up the documentation, we have this replaced parameter here, which we can change in order to make it not return duplicates. We are going to make use of these functions later on when we implement actual algorithms. 32. Statistical analysis and computation: In this video, we are going to talk about some statistical analysis and computation using NumPy. I'm only going to be presenting the very basics. You can read more about statistical functions at the NumPy documentation site, which I've linked here. But don't go and do it too much right now because we will make use of various things later on. And they will become clearer then any way to get started. And first we're going to import NumPy, of course. And then I created here a random matrix and assign it to the X variables. So let's just run this again. And most statistical functions are available directly to NumPy. For example, finding the minimum of X. This might seem like something very basic. So why even call it a statistic? But it actually is a statistical function. And if we run this, we get the minimum outside out of the whole array. So the minimum value is 0.01 here. And if we look carefully, we can find it here. Now the other cool things you can do, for example, what if we wanted the minimum in each column? Well, in that case we would pass in an axis which would be equal to 04 axis equals 0. What this gives us is an array of length 12345. And if you notice, we have five columns. So each of these values represents the minimum in a corresponding column. And if we look at it, we get here 0.55. And here it is. And you can notice it's the minimum in this column here. And it's the same for 0.130.0110.5370.40 here. Okay? And if we want the minimum across each row, we are going to use axis equals one. And you can check it for yourself. These are the minimums across the rows. And this is going to be used very often when we pass in axis parameter 01 or even 23 for certain more complex applications. So make sure that you understand it really well. As a short exercise for yourself, maybe declare an array of three dimensions and see how it works for that one. We have other functions as well. For example, max to get the maximum. For something a bit more complex, maybe like the standard deviation. We have STD, and so on. We can also do a sum across an axis. And this will get us the sums of each row. Basically. Something other that you might find useful is the arc set of functions, for example, arg max. This will give you the index of the maximum value. So for example, the index of the maximum value on the first row is 0. So this one, and you can check that it is indeed the maximum value and the first row, four for the second row. So the last element in the second row is indeed the maximum value out of that row. This can also come in very handy later on. So that's it for statistical functions. You can read more about them like I said in the documentation. And they actually recommend this exercise. Try to eat the documentation and why the bit of the code presented there to make sure that you understand it and to familiarize yourself with reading and making proper use of the given documentation. It's very well written and Python is a very well-documented library. And it's very, very useful to be able to make good use of it and to take code from there and apply it to your own needs and requirements. 33. Linear algebra: In this video, we are going to introduce some linear algebra features of Numpy. First of all, we're going to talk about the linspace function or the linear space function. If we run this code, you probably are able to already figured out what linear linspace dots. It generates this many values spread evenly between these two values. So if we want a 100, let's say a 100 values. You see they are much more close to each other than before because the interval remains the same. And this will be useful in various algorithms and also for visualization purposes. When we want to display something like, let's say, like we want to plot a function in a Cartesian system. You know, the one with X and Y axis and something drawn between them. It will be very useful for making that plot and other similar plots. And also as an intermediary step in various algorithms that require values to be computed for a function for a certain interval. So keep this in mind and make sure you understand how it works. Now moving on to something a bit more interesting. Let's see how we can use numpy to solve this system of equations. So let's write them like that. And like this. And this is easier to see. And this is probably something you've done in high school or college while you are given such a system of linear equations and you have, you had to provide a solution for X, X0 and X1. So let's see how we can use numpy to do that. Well, first of all, let's declare our coefficients matrix. And this will be a numpy array containing a two-dimensional array. So a matrix basically on which the first line will be 32. So this and this, and the second line will be five and minus one. So this and this. Alright, now let's declare our results array, which will be 52. So basically the results of the two equations. So this is also going to be a numpy array containing 52. All right? And the solution will be obtained by calling np dot Lynn egg or linear algebra dot solve, to which we simply pass our co Fs and results of A's. Let's also show the solution. And we get this 0.691.406. This means that x z equals 0.69 and exon equals 1.406. And if we plug this into the two equations, we should get 52 respectively. And that's what I've done here. So let's double-check. You'll see we get indeed 52. But now what happens if we have many more equations and coefficients? Let's say we have a 100 equations with 30 coefficients. So not 30 coefficient, sorry, but 30 variables. That means that we will have the coefficient of a being an array of 100 by 30. And it will be very tedious to double-check that too in this manner. So what we can do is to realize that this solution basically means that if we do co Fs times solution as a matrix multiplication, we will get the results. So we can simply compute this as a matrix multiplication using Numpy and see if we indeed get the same thing as results. So in order to do that, we can point to dot solution and see what this gives us. And you can see it does give, give us 52. computes the matrix multiplication between the left-hand side and the parameter given to it. And this is how you can double check the return solution to any system of linear equations. And hopefully this shows you how powerful Numpy is linear algebra module release and the kind of things you can do with it. 34. Interpolation: In this video, we are going to talk about interpolation. You can read more about then what we are going to show here at this link. And in fact, the code in this video is adapted from this link. This is the official numpy documentation and it presents and explains the numpy interpolation feature. So first of all, we're going to import NumPy just as before. And we are also going to import this matplotlib dot py plot as PLT module, which we are going to use to display these graphs here. So these function plots. I'll explain how it works. Ok, so what does interpolation? Interpolation will let us estimate the values of a function by giving it the other values. So let's say that we know the values of a function at ten points, and we need to find the values at a 100 points. Interpolation lets us do this, and here is how we're going to simulate this scenario. So first of all, we create a linear space using linspace of ten values between 02 times pi in this case. And then we compute the cosine using np dot ceos of those generated values using the linspace function in numpy. So this computes the cosine of the ten values between 02 times pi generated above. All right? And then we have this interpolation part. But let's not worry about it yet. I'm just going to comment it out. And I'm going to just plot what we've generated here. So if I run this cell, I get this. So this is the plot of the cosine function applied on these ten values. If you count these circles here, there are ten circles, okay? And we can see if we change it to sign, for example. The plot also changes. Now, let's say we want to find what the values of the sine function, function are at these points here between the circles, so here and here and here and so on. So basically we want to find the cosine of auto values that are different from those. Generated here, and that's where interpolation comes in. So in x vals, I'm generating another set of 50 values this time between the same interval. So 02 times pi. And I am using the np dot interrupt, which is the interpolation function to evaluate, to estimate the function at the generated points x vals, and x and y is what we know about the function. So x is where the function has been computed on. And y is the result of those computations. So it's these ones here. And if we plot that as well, we get this nice graph. And you can see that it really correctly resembles the sine function or the cosine function. If you change sine to cosine here. And this is used in order to be able to generalize functions when you have a little data about them and see if we pass in, for example, 200 here. This becomes an even better approximation of the cosine function. Or if we had 20 here. Again, now it's much, much smoother. It's an even better approximation. But even with just ten values, this is definitely still a very decent approximation if we have nothing else available. So this is where the linspace function is useful and this is how the interpolation functions in numpy works. And I have another example down here where I've written my own function. It's a very simple function with some square root sum powers for x and just some random computations. And again, I generate ten values between 0 in this case. Apply my phone condemn. Then interpellate for 50 more values, or let's make it 200 again. And if I run this, you can see it nicely approximates my function. And these 200 values. This is going to be very important for later on when we'll be doing regression. Interpolation is kind of similar to regression there very close to each other, conceptually speaking. And we're going to be seeing that next. So make sure you understand this video really well. 35. Linear regression from scratch using numpy: In this video, we are going to talk about linear regression from scratch using Numpy. You might have heard this term tossed around in various articles about artificial intelligence or machine learning, linear regression, regression in general. Hopefully the examples we'll go through, we'll explain what this is in an intuitive way. Alright, so first of all, we're just importing numpy and matplotlib. And we generate some random data of size 100 by two. So we will have 100 rows of data. And those 100 dwells will have two columns. And the data will be between 520. So each of these 100 by two numbers will be in the range between 520 and the day will be floating point numbers. Next, we generate y as the sum of those two columns generated in x. So this means take all of those, take column, column index 0. And this means take all the rows and the column index one. So this will sum the columns resulting in an array of size 100 by one. And we, we shape that to really be 100 by one because if we just did this, I did say it's 100 by one, but actually, let's run the code. So I'm this. And if we print and print the shape of this, you see it's 100 by nothing. So that means it's a vector or a linear array. We wanted A1 to show up here. So we have to reshape it like this in order to get that. And minus1 means it's kind of like a wild card. So it puts, so whatever it needs to put their such that this equality is valid, if it's possible, if that's not possible, it will throw an error. So by putting one here, it will result in 100 by one here, kind of like how this is really like. Okay. And then we add some random data again of shape 100 by one, again between 15 and this is kind of like a noise to make the data noisier. And now just imagine that x, all of those columns 0. So basically column 0 means something like how many looms a house has. And X1 means something like how much was invested in the equations. Okay? And now you can imagine why. So let's say this is i here. Okay, so x of i and 0 will mean how many rooms a house I has. And x sub i1 will be how much was invested in declarations for house i and y of i will be the price of house i, the market price. And this can be in thousands of dollars. It doesn't really matter what it is, not. A very realistic example. It's just something to make you understand how regression works and what it can be used for. Okay? And now we want to find a formula that approximates Y or predicts y based on x. Okay, that's why we did this. So obviously that formula will have to deal with how many rooms they house has, how much was invested in it and maybe other things as well in we'll IN ALL scenario. But for now just those. So in order to come up with something that kind of makes sense, why, why I summed these two and added this to make it a bit noisy. So noise in data means the staff that deviates from the obvious basically. So it's obvious that the price is somehow proportional to x 0 and x of one. Okay? But it's not a clear proportionality, right? It might differ for reasons such as the owner wants to sell it for a bit more, and they're willing to wait. They want to sell it for a bit less and they want to get it over with as quickly as possible. So that's what this noise has to deal with here. Okay? And it's also to make it a bit harder to come up with an exact formula. The formula will never be exact. It will not be able to approximate the price exactly. That's why it's called an approximation. Ok, and this print statement here shows that we get the proper shapes. 100 by two in x. So we're dealing with a 100 houses here and we have a 100 prices in y here. Also plotted this sum. So basically this is what we are trying to predict, okay? And the way this plot walks is on the x axis. So here we have this sum and then the y axis. So here we have y, so the price. So basically if the sums of the two columns, which are also called features in machine learning and data science. So if the sum is say 20, we would move up here. And this point here, you see it's about maybe 21. Okay? So if the sum is 24, those two features, the house price will be around 21. Ok, and so on for the others. And now we want to be able to draw a line here that minimizes the sum of the distances of each of these points to that line. And that's basically linear regression. And the way we do that is we write to y. So y will be equal to W times x, OK. And w will be a coefficients matrix that we have to find x. We already have y, we already have. Now how do we find y? How do we find the w? Well, we do it by manipulating this equation. Ok? So the first thing we must do is isolate W in terms of x and y because we know those. So in order to do that first, it helps to write the shapes of all of these. So y is n by one, and n is 100. In our case, w. We don't know w, So let's leave it for the end. And x is n by two. Okay, so we need this multiplication to give us n by one. And if we look at it, this is never going to happen, right? Because we will write here x and n. It will be x by two, and this needs to be N for w. So we can write this equation like this. So what we can do when this doesn't work it straight to switch them around, okay? And now this is n by two. And this can be two by one. So W can be two by one. Ok, so now it's doable. So don't try to remember the order here. Many times they don't remember it either, just to remember how to check it. All right, and now I said we need to isolate W. In order to isolate W, we need to multiply this equation by the inverse of x. So x to the minus one. And we call that X is a matrix, so w will be a matrix as well. What kind of matrix matrix is x? Well, if we scroll up, it's a matrix of size 100 by two. And that's not good because only square matrices can be inverted. Only square matrices have an inverse k. So first of all, we would need to multiply this by x transpose because x transpose times x gives us a square matrix. Alright? So x transpose will have shape two by 100. So we can multiply X transpose X, resulting in a two-by-two matrix, which will be invertible. So first we multiply by x transpose here, and we get X transpose times Y equals X transpose times X times w. We can write this like so. And now this has an inverse. So we multiply by the inverse again, and we get x transpose times X inverse times X transpose times y is equal to w. Because if we multiply this with this, we get the identity matrix, which is the neutral element in matrix multiplication. So we are left with W. And this is how we get the regression formula for w. And that's what I've written here. We can get the transpose of a matrix with x dot d in NumPy. We multiply that with x. And we use NumPy that lean out that aims to find its inverse. And if we run this, we get these shapes here I've printed so that you can check them. And this is w. I've plotted that regression line here. It's the red one. And you can see that it goes through the middle of the data quite nicely. And this means that it's a good regression line. And this is how I've done it by multiplying x with w, and just as in the formula here. So now we no longer plot y directly. We estimate it using this red regression line. Now, when is this useful? While this is useful if we have a bunch of data about the sum that houses have solved for and there are features, in this case the number of firms and how much was invested in the equations. And we have a new house that we must deal with. And we're interested in how much we could potentially sell that new house for. And let's say that new house has 7.6 rooms, whatever that means. Again, it doesn't have to make perfect sense because it would make perfect sense in a real world problem. And nine was invested into declarations may be this is 909 thousand, whatever your problem your specific problem deals with. So if we sum these two, we get 16.6, right? And 16.6 is somewhere around here. So if we move up until we intersect the regression line, will here, then we move to the left. Around here, this is probably around 19. So this should get us a result of around 19. Of course, moving with the mouse is not an exact science, so it could be a bit less or a bit more. So let's see. We do that by multiplying Nu x with w. And because these are matrixes, we use and we see we get 18.5. Like I said, this is very close to 19 that we came up with by looking at this plot. So this is how you use regression. And this method that I've shown you here has its drawbacks. For example, Linac dot printf might not be doable because not all matrices are invertible. And another shortcoming of this is that if you have a lot of data, like hundreds of thousands of instances, so not just 100 and a lot of features, like not just two features, but maybe a 100 features. It's very slow, very memory. Happy to do it this way. And you might not even get the best results because of this inverse problem. So in that case, there is an iterative process to do this called gradient descent, which we will talk about later. But I encourage you to read up about it on your own and maybe even tried to implement it here. This isn't over about linear regression. We will get back to it anyway. So don't worry if you don't fully understand things yet. 36. Neural network from scratch in numpy: In this video, we are going to talk about implementing a neural network from scratch in numpy. First of all, I want to say that this is an advanced topic and we don't have time to explain the basics of neural networks in this video. So I do suggest that you go here and read the theory. It's very well explained there. And it even lets you skip a lot of the math and go to the intuitive of understanding part of it. So I do suggest you read up on it before watching this video. Or if you don't want to do it beforehand, Make sure that you do it after you watch it and after you see how the code looks like, it's important to at least get the gist of it and understand it, at least at an intuitive level so that you have some idea about what's going on. And well, the formulas that we will be talking about come from, okay, so we will call them a and M here. That's an artificial neural network that we'll learn the XOR function. And this is the truth table for the XOR function. The XOR function works at the bit level. And for each bit in an object, it will XOR the corresponding pairs of bits. If we XOR an object with another. So 0 XOR 0 will be 00 XOR one will be 11, XOR 0 will be 11, XOR one will be 0. This is also called the exclusive OR function. So just like before, we import numpy and matplotlib, since we will be plotting a few things. And we define here the activation function, which is the sigmoid In our case, which by definition is equal to this. And we will also need its derivative, which is written nicely as sigmoid of x times one minus the sigmoidal facts. And our training data will be basically the truth table above. So these are the pairs that get XOR. And these are the results. And we reshape this just like in the regression video with minus 11 here to give it two dimensions. So it will be a four by one array, not a vector of size four. We declare here the number of neurons or the hidden layer size. In the hidden layer of S z variable, we will only have one hidden layer. And I've set it to 20 nuanced, but this is something you can play around with and accept the learning rate to 0.9. We instead, we instantiate the weights of our neural network. We will basically have two layers and I'll explain what those are in a minute. Because I did say that we only have one hidden layer. And we initiate these with random uniform numbers. The size of the hidden layer must be equal to what comes into it. And what comes into it is always the input. So this, and we can generalize this by taking extreme dot shape of one. So its starts with the number of features, which in our case is two, the two bits. And it's Output is the hidden layer size. So we put that in here because this is what it outputs. So w 0 basically corresponds to the hidden layer. And w1 will correspond to the output layer, which takes as input what the hidden layer outputs. So we have hidden layer size here, and it outputs one element which will be the answer to our problem. The result as the neural networks is it of the XOR operation between the two bits that we give it? So the goal here is for the neural network to learn the XOR function without memorizing the training data we give it. And also without knowing that there is simply an XOR operator that we can use in Python. Okay? And also the learning rate is again, something you can play around with. Train app ox is how many times we are going to present the training data to the network in hopes of it learning. And we have this at 2 thousand here. Again, this is something that you can play around with it for best results. Arrows is a list where we will put in the errors given by the neural network during training. And I'll explain what those are. This is for plotting purposes. You don't really need it for the functionality, for the actual functionality of the neural network. The training algorithm of neural networks involves two steps, the feedforward step and the backpropagation using gradient descent step. In the feedforward step, we simply do some matrix multiplications between the layers and apply an activation to them. In our case, the activation is the sigmoid function. So layer 0 is the input x train. Layer one is the hidden layer, which is obtained by applying sigmoid to the input data multiplied with W 0. Again, if you look at this, it's very, very similar to what we had in the regression linear regression video. You can think of neural networks as a generalization of regression. They are non-linear and they kind of worked by putting. The regression algorithm into a neuron and multiplying those neurons into the layers I'm talking about. But more on that in the link I've given you. We do the same for the output layer, which is layer two. In our case, it's a matrix multiplication between layer one and w1 to which we apply in activation function, also called a non-linearity, which is the sigmoid function. And in our case. After that, we move on to the backpropagation stage. First, we must compute the error of the neural network. That is a measure of how wrong the neural network was relative to our correct labels or correct outputs. And those are stored in y train. So the layer two error, or the output error, is simply the difference between y train and what the output layer has outputted, which is layer two in our case. And the delta is layer two error times the derivative of the sigmoid function applied to the output layer or layer two in our case. More on where this comes from, you will have to read in the theory, in the link I've given you or in some similar material if you prefer to read it from somewhere else. I'm not going to go into details about this. It involves some math and some calculus regarding the chain rule and how to apply and how to compute derivatives. And we do the same for the layer one error, which is layer two delta multiplied by w1, of which we take the transpose. Because otherwise the multiplication wouldn't walk. Again. More on the mathematical reasons behind this, you can read in a theory, in a theory sauce. And the delta is computed. Very similarly. Now, we must update W1 and W2, which are our weights with the information we've computed above. And that is layer one transposed multiplied with layer two is delta times the learning rate. The learning rate basically controls the influence of this part. And we do the same for w 0. Next we compute a metric for plotting, and I've used the mean absolute error. And that is computed by taking of np dot abs of layer two error, the one computed here. Okay? And you can imagine it as how off each sample is compared to the known correct answers which. Are given in y train, okay? And we can take the absolute error here. But it's not a good idea to take the absolute error here because it's harder to compute the derivative that way of this whole thing, of what's called the cost function. This is called a cost function and it must also be derived. Ok. We also need its derivative in computing all of this. Okay? The predict function, I'll explain in a bit. We'll skip it for now. And here we plot the epoch. So this and the X axis and the mean absolute error and the y axis. We predict x train PR into the predicted values. And finally, print the final accuracy, which is given as the sum of all positions where y train is equal to predicted divided by how many samples we have or y train shape of 0 and multiply it by 100 so that we get it as a percentage. So before I explain any further, let's run this and see what happens. Okay, so we get this nice graph and we see that as the APOC advances. So as this for loop moves forward, the error or the mean absolute error in our case, decreases. That stabilizes at around 0.1 or just below 0.1. Ok? And if we look at what is being predicted, we see that it's the same as our y trade. So our neural network has successfully learned to apply the XOR function correctly if given two bits, so a 10. And if we keep running this, we see that it does change a bit because we initialized the neural network with random numbers, but not significantly. And you should always or almost always get a 100% accuracy. Now, I owe you an explanation for this predict function. So a prediction is what happens when the neural networks is some new data and it must provide an answer to it based on what it has learned. And we do that by basically duplicating this feedforward stage. But a sigmoid, we'll return a floating point number. So whatever this output layer returns, we round it, right? So if it's above 0.5, it will become a one and anything below will become a 0. Okay, so that's how we get the bits out by rounding what the sigmoid of the output layer gives us. So if it gives us 0.9, we interpret that as a one. Now, this might not be the best way of doing it. It's definitely not production quality code. You might also notice if you've heard a bit of the theory that we are missing the bias terms and a few other things as well. Maybe, maybe this should also not be a regression problem because we've treated it like b1 by using this here. So this is not a cost function that is used in such a classification problem where we should classify the two bits into 0 or one. This is usually used for regression problems, but it still works in this case, and it works quite well, as you've seen, is definitely fast because all we've done is matrix multiplications. And it's good enough for an introductory input implementation. It can definitely be improved upon. And in other courses that focus on machine learning, you will get to do that. For now. I hope you understand this implementation. I suggest that you play, play around a bit with the trainer box, the hidden layer size, and the learning Great to see how they affect the results. And maybe even to see if you can get it to give you the same kind of results in fewer epochs. It will definitely help your understanding if you do that. And another thing that will definitely help your understanding is if you put in some print statements here to check on the shapes of all of these intermediate results and fall of these variables that will help you to visualize what comes into a layer and what goes out. I hope this is somewhat clear right now. And I hope that you are comfortable with the code above all, even if you don't fully understand the mathematics behind it, I really hope that you can understand at least the code at a syntactical, if not at a fully semantical level. Because we will work with such code in the upcoming videos as well. 37. Vectorization: In this video, we are going to discuss vectorization. This is a term that you will hear often when working in machine learning and with NumPy in general. So let's consider this simple problem. We have a, which is a list of matrix is a three dimensional array if you want to think about it like that. Because in numpy, that's actually what it's going to be. And we want to find the result of multiplying each element in a. So each matrix in a with another matrix x. Ok? And the shapes of these will become obvious as I explained the code. So first of all, here we are going to generate some data of random integers. So a is going to be a 3D array of shape. And by p, by q and X is what we are going to multiply. Each of these p by q matrix is in a bi. And those shapes will have to be, since this is P Q, what we multiply it with has to be of shape QP. And they set some values for NPN Q here, you can mess around with these to figure out exactly how the performance is affected. All right? And now the problem is very simple, right? We just have to iterate up until n, which is a dot shape of 0. We could also replace this with Anna. And this here is an algorithm for matrix multiplication. This here multiplies in C, which is going to be resolved in c of i, will contain the result of multiplying a of phi, which is a matrix by x. Now I'm not going to explain this algorithm because I don't encourage you to use it at all. You can Google search for matrix multiplication algorithm if you are interested in the algorithmic part of it. But this is not the focus here. Actually want to discourage you from using this. This is just to show you that it's very slow. And in order to show you that, you might have noticed this centered sign percentage sign time it command here, which must be the first one in the cell. This is called a magic command in Jupiter. And it's times the and the times the execution of the code in the cell when you run the cell. And the result will look something like this. So what it will actually do is it will run the cell multiple times and the sort of average dose results and give you the mean. And the standard deviation of those farads. Okay, and we can see here that the mean is 7.5 seconds plus, minus 70 milliseconds per loop. Okay? So it did one loop and 700 tons, so seven rounds of one loop each. Now, that is quite slow as we will see, the optimal way of doing it is much, much faster. And if you increase here and p and q, it will get even worse. And the way to do it properly is this vectorized solution or this one liner solution. Vectorization means that we want to avoid writing for loops ourselves, and we want to rely on the built-in functions and the algorithms in a Numpy. Because those are heavily optimized, they rely on implementation sweetening C, which are very, very fast, much faster than anything we can come up with in Python 99% of the time. And they are also usually referred to as one line us because many times it's just that one line of code that gets rid of 123, or maybe even more for loops. And in this case it's simply a dot x. Ok? We don't even need the dot shape here. If you comment to the time it line, it will show the shape. Okay, so why does this work? We know that dot does indeed multiplied two matrices. But in this case, a is a 3D array and x is a 2D array. So how come this is the same? Well, this is the same because of something called broadcasting. So we have an n here. But these two last dimensions are compatible, in this case with the dot function or method. And what it will do is it will broadcast disapprobation to all of the elements of a. And it will have the same effect as what we have written here. And what we have written here in the initial problem statement. And that is it will multiply a of 0 by x, one by x, and so on and return the result. So if we run this, you will get something like this. And it says here 4.98 milliseconds. Again. It says seconds here. So this is way under a second. And also seven runs, but a 100 loops each. This was 7.5 seconds and one loop in each run. And this had many more loops, a 100 times more loops in advance. And it's still way under of a second. So that's the kind of impact vectorization can do. That's the kind of performance improvement you can get by looking up or by figuring out how you can use numpy built-in functions to replace code that you've written using for loops or other repetitive structures yourself. So always keep an eye out for these kinds of optimizations. Always try to do some research and see what kind of useful NumPy features can probably work for whatever it is you're doing and try to make use of them as heavily as possible. Because you can improve the runtime of your programs by a lot if you do so. In the previous videos where we implemented neural networks and linear regression, or a code was already vectorized as much as possible. So we don't need to revisit those. But if you do find those implemented with more for loops, then we have used, then be aware that those might not be fully vectorized implementations. And they may be very slow, or at least noticeably slower than what we've done, or at least noticeably slower than what we've done. 38. Boolean indexing: In this video, we are going to talk about Boolean indexing. We've already talked a bit about NumPy indexing and the extra features it has compared to the classical Python indexing. I'm going to show you a more advanced form still of Numpy indexing here, and that's called Boolean indexing. So first of all, let's declare an array like this. And notice here that I've created this survey by passing in a range to it. And that works as well. Numpy can create the veins of a's from Python's range function. Okay, so we have this array here. And here's something that we can do if we want this instruction, we get this. So basically we do x of wherever this expression is true and print the resulting away. And of course, if we just point here, this expression, we get something like this, which tells us on which position x modulo two gives a remainder of 0. So the positions with even elements, and you can look here to see that there is a match. And Boolean indexing basically means passing in a Boolean expression like this as an index to an array. And the fact is that it will return only the elements for which that expression is true. And we can do some fun stuff with this. For example. For example, we can increment those values here by a 100. And we can also combine conditions, but you must be careful here to combine them using this np dot logical AND function. Like this. This will get us the elements larger than 100 that are still even. So basically the ones that we've added a hundreds to hear. If you don't use this function. If you do something like this, you will get an error. Or if you do something that is actually a bit more intuitive like this, you will get this error. So you have to use this np.log10 and function if you want to combine conditions. Next, I want to show you how powerful indexing is in general, not necessarily Boolean indexing, but NumPy array indexing in general. So I have here some code that implements a sieve of Eratosthenes. And that is an algorithm that gets you all the time slower than N. And I'm not going to go into the details of the algorithm. You can find a lot of resources on it online. But basically, the naive implementation that you see in textbooks is something like the one here. And this is already quite fast because I've still use NumPy for it. But if we run this for n equals 10 million, so let's say we want all the primes lower than 10 million. You can see it quite fast in three seconds on my machine. But can we do better? And the answer is yes, we can do better. Let me just get rid of this line here first. If I run this same exact implementation, but using NumPy indexing, it's much faster at well under a second, about 0.6 seconds for the same 10 million elements. And the only thing I'm doing here is I'm replacing this four loops. This for example, with this. And what this will do, if you'll recall, is it will set x or for x, six dot, dot, dot, all the way up until the last element, lower than n. It will set them to 0. And it will do the same here with the multiples of P starting at p squared, which is basically getting rid of this for loop here. So you can see that it's much faster in general to use indexing where you can do it. We've also used this np.zeros function here. And I'm going to leave it as an exercise for you to take this separately and see exactly what it does and why we might have used it here. I'm sure you'll be able to figure it out. Just print its result, maybe on a smaller array so that you can tell what's going on. And I hope this convinces you of how powerful indexing gaze and the kind of things you can do with Boolean indexing. In particular. 39. Exercise 1: In this video, we are going to discuss your first exercise titled matrix diagonals, because we are going to be dealing with diagonals in a two-dimensional array. So consider this matrix here, which shows up like this. And you have to write code that returns an array of the sums of all diagonals that are above the main diagonal. So these elements make up the main diagonal. And you have to make an array of sums of the diagonals above this one. So in this case, for the matrix defined here, it would be this sum, then this sum, then this one, and so on. And in order to do this, please use slicing and indexing to avoid as many loops as possible. So for this matrix here, this is the array your, your code should generate. Pause the video here, make sure you understand what is required and give it your best shot. In a few seconds, I will be continuing with the solution. All right, so hopefully you have come up with something similar to the following. So we create an array or a list here, but we can easily make it into a NumPy array. Of this summing over x, which is our generated array, index to it, a range between 0 and I, and a range of ten minus I. And then np.log10 range is just like Python's way inch function, except it returns a numpy array a. So how do we come up with this? Well, we can do it by figuring out what the code is for using similar indexing for each diagonal above the main one. So in this case, the first one would be x of n p dot a range 010. And the second index would be np.log10 Ange of 0-10 as well. And if we run this, we get the main diagonal. Okay? Next, it would be 09 here because we have to start at and the column would be all the way up to the ninth column. Okay? So because of that, we will have to write 110 here. And we get from here to here. So to repeat, 09 means first nine rows, and 110 means last nine columns, or all columns except the first one, the one with index 0, and so on. And if you write one or two more of these, we will figure out this formula. And we have only used one for loop to accomplish this. Even if you haven't used list comprehensions, as long as you've only used one for loop, this is fine. Also. It's again fine if you didn't take the main diagonal into consideration itself. Because it does say here all diagonals above the main one. But in the example, the main one itself is also included. So these small details aren't very important. If you got something similar to this, congratulations. If not, make sure that you do understand this solution before moving on. 40. Exercise 2: In this video, we are going to talk about your second exercise for this module. And that is how to perform one-hot encoding. Before I explain exactly what you need to do, please do not Google this exact terminology because you will easily find how to do it. And that's not really the point here. Tried to come up with your own solution. Okay, so one-hot encoding means basically getting from this to this. So let's say that we have this array of numbers between 09 inclusively of integer numbers, 100 in our case. Because we have an eight here. We have an array of size ten here, whiten because our range of values between 0 to nine inclusively has ten possible values. So the one at index eight is a1. For the first number, four, the second one, the one at index seven will be a one to one at index six for the third one, the one at index 0 for the fourth one, and so on. And that's what you need to do. Given this array, you must write something that outputs this array. And again, try to do this as efficiently as possible, use indexes, slicing, and so on. Try not to use for loops or to use as few of them as possible in order to accomplish this. And I've already set up some code here for you. This np.zeros footprint options in order to display the full away without those ellipses for not displaying the numbers in the middle. This should help you experiment more easily. Again, I'm going to pause the video here for a few seconds. You posit yourself, tried to come up with something, then resume when you're ready and you want to see the official solution. All right, so hopefully you came up with something that at least works. And here's a very cool and efficient solution. So we will start off by generating an identity matrix using and p dot i. And this will have as its size x dot max plus one. Or in our case, this would be done. But we are interested in a more general solution here. So I'm just going to use this. So if we go into this, we get this. We get a matrix of size x dot max plus one by x dot max plus one, which is made up of all zeros except the main diagonal, which contains a one. And now, in order to get what we are interested in, all we have to do is write identity of x. And if we run this, we get the one-hot encoding that we are interested in. Another solution that works. And if you came up with that, that's also fine, is generating away of all zeros using np.zeros and indexing that with x and setting the corresponding values to one. If you came up with something like that. If this sounds familiar, that's also great. Now, why does this work? Well, this works due to the way that indexing works for two-dimensional arrays and how blood casting works. Feel free to search for more information online and the exact details and watch the video about indexing to gain a better understanding. I hope this was useful exercise for you. And I hope you are able to understand and to look up more information yourself about this solution and how to arrive at it. 41. Quiz: In this video, we are going to do your quiz for this section. We're going to mix things up a bit by changing the quiz format. It's no longer going to be a multiple choice questions type of quiz. It's going to be a fill in the blanks type of quiz. So let's do it. For the first exercise, you have to fill in the blanks so that this returns a numpy array containing the elements 12345. So not this, but this width, a five at the end. I'm going to make a few seconds pause here to let you answer the question, pause the video yourself and try to make this work. Alright? So we notice that we must generate an array of consecutive integers. So we're going to use the NumPy arange function. And here, because we want to go from one to five, including five, we are going to write five plus one. And if we run this, we get the correct answer. Next, we must fill in the blanks so that this returns a numpy array equals to 110. So this one here. Again pause the video and try to figure it out. All right, here is the answer. So because we index this with lists of just three elements, we can write tray here. And because this is the identity matrix, we have, if we have 0 here, then in order to have a one on the first position, we must also have a 0 here. Because we have a one here on the second position, and we must get a one here in the output. We also need the A1 here. And finally, because we have a 0 here, we can write in here anything other than two. Because if we would like to, we would also get the one at the final position of the output. So let's go with 0, let's say. And if we run this, we do get array, we are expected to. And for the final question, fill in the blanks so that this modifies the num pi of x such that the assertion at the final line passes. Alright? So let's see. First, we use rand int to generate random integers. And recall that trend int takes a keyword parameter called size. So we have to pass in size here. Next we have to index x with something and set that to one. Now, what is that something? Well, if we look at the assertion, it says that the sum of the slice in x and between four and something. So it's likely, it's going to be four. And colon here, and we'll see what comes after it. And this must be equal to seven minus something. Now what is that something? Well, since we have seven here, it stands to reason that seven should walk here, 77. And the minus what? Well, if we set this to one, then its sum will be equal to how many elements are in it and how many elements are in it. Well, seven minus four. And if we run this, we can see that it passes and it keeps passing. So it's not affected by the random numbers generated on the first line. So this was the quiz for this section. Hopefully you enjoyed it and I'll see you in the next module with more interesting Python and data science stuff. Congratulations if you made it this far, and I wish you the best of luck in the next module as well. And also don't forget to enjoy things and look up as much stuff as possible by yourself. That's the best way to learn and to truly understand the things we are talking about here. 42. Pandas Motivation: In this module, we are going to discuss pandas for data wrangling. Let's see why pandas, which is another library just like Numpy, is important. You can think of pandas as the Excel of the Python ecosystem. If you have some experience with Excel, you know that it's a very powerful tool. You can do a lot with it. You can compute various statistics. You can show graphs based on your data and so on. It's very powerful and very useful. Pandas allows us to do much of the same things in Python with datasets in common in a comma separated values format, that is the CSV format. You will find many datasets in this format. And Pandas is the best library for dealing with such data. But it also allows for easy manipulation of such datasets. We can compute the new rows, compute new columns, remove columns based on certain criteria, and so on. So it helps us deal with these datasets in any way that we need to. And perhaps most importantly, it works well with NumPy and other machine learning and deep learning libraries. This is a very important aspect. Like I said, it can do much of the same things that Excel can, but it's not very productive to do our data processing part in Excel itself and then save another file and process that new file with Python. That will take a long time and it's very awkward to do and very time consuming. So no one doubts that people use Pandas because it allows us to easily integrate with other libraries such as NumPy, for example, or scikit-learn. Pandas applications to data science are the same ones that NumPy has. And these ideas are exactly as I've written them regarding Num Py in the previous module, if you remember. So it makes it very easy to compute various statistics, like we've said. It makes it very easy to pre-process numerical data in various ways and with panned us, it's not just numerical data. We can also preprocess text data as well as we will see. And there is a lot of documentation, again, for a lot of things you might think of doing, Pandas is very popular just like Numpy is. So what's the difference? Why do we have to know Pandas as well as num pi? Well, these two compliment each other. Some things are easier to do with pandas. For example, if we have one of those CSV files that I mentioned, those will be easier to work with using pandas numpy. However, if you have to do some algebraic task like matrix multiplication, for example, that's where num pi shines. So these two compliment each other. One is often not enough and many times you will find both of them used invariants, tutorials, and examples. So in conclusion, you will see pandas used for a lot of operations on datasets. So you have to know it as a data scientist or a machine learning practitioner. It's something very useful to have under your belt. Because like I said, it's very popular and widely used. And it makes many tasks much easier than if you are to only do them with through our Python or with just Python and NumPy. So I hope you'll find this module useful. And I will try to keep it as well-explained, easy to understand, and entertaining as possible. Have fun while watching the next videos. 43. Pandas intro Series and DataFrame: In this video, we are going to introduce pandas and it's two fundamental data structures, the series and the DataFrame. So first of all, we have to install pandas, just like we would install numpy. And we can do that directly in our Jupiter notebook by running this command pip install. And I use the upgrade flag to upgrade it in case it's already installed. And Pandas. And you can see here in my output, I already had it installed, but I didn't have the latest version when I ran this, so it updated it. I recommend that you run this command even if you have it installed to make sure that you have the latest version. What we are going to do in this module should not be affected much by any future version. So you should be good to go anyway. All right, after we run this, we import pandas as pd. This pd is an abbreviation just like NP is used for numpy. It's a convention that you will see used a lot. Now, let, lets discuss the series data structure. The series data structure is basically the equivalent of a column in Excel. So okay. And if we run this code here, we see we get this nice outputs, just like we would in the numpy. So it's formatted for us a bit. And what this does is it creates a series out of this python list. And you can also pass in a Numpy array here, and it would work just the same. Alright, another cool thing that we can do, since I mentioned that a series is basically a column in Excel, is we can label each element of the series. So if we run this cell, you'll see here we have a, b, c, d on the left-hand side. And that means that we can index the series with that key. So we can do PD series of tree, which will get us four here. And with the label series, we can do p d labeled series of C, which should get us three, because three corresponds to c here. And we do get that. So that's a cool way of working with series. But by themselves they are not very useful. What's really useful and what you will find the most often used with Pandas is used is a data frame. So this is how we can declare a data frame. And you can think of a data frame as an Excel table or even a sheet. Although it's closer to just the table. Alright, so what do we have here? Have PD, that data frame. And this is saved in the df variable. This is another naming convention that you will see. Often. Dataframes will usually be called d f or d F12 tree or DFE underlie, underscore something, et cetera. You will often see this dF naming convention. And when you see the F, you should think of a dataframe. All right, so how do we declare a DataFrame? Well, we pass in a dictionary where the keys are the columns and the values are the values in the rows. So here we have in column a, three rows with values 1-2-3, and in column B, three rows with values hi, hello and hay. And if we run this, you see it's nicely formatted again as an actual table. And we can change the column names to something more suggestive if we want to, for example, numbers and greetings. And you see they show up here nicely just as they would in an Excel table or in a Google Sheets table. So that's very cool and it all ready lets us visualize data in a very meaningful way and in a very easy to interpret way. It's easy to make sense of tables. We can access specific columns either using this notation. So d f dot column name. And because I changed the column names, if I run this, I will get this attribute error. And in order to fix it, I just have to pass it into proper column name, which is now for example, numbers. And you see I get a data series. If we go here, you can see it looks exactly like a series and that's because each column is a series. Okay? We can also use what's called a dictionary notation. Like this. We get the same thing out. If you want a specific value, we would do something like this. And of course, we can also do this a combination of notations if you will. Okay? And in order to get a list of all the columns, we simply do df.columns, which will get us this. And this can, can be iterated. For example, if we want, if we want something like a matrix traversal to go over all the values in all the columns in some particular order to process them somehow, we would most likely make use of df.columns. Okay, so that's the basic workings of pandas and it's two and its two main data structures, the series and the DataFrame. I hope it makes sense and we will get into more details in the future videos. 44. Statistics on DataFrames: In this video, we are going to talk about some functions on DataFrames that compute useful statistics and perform other useful operations that you will see quite often in Pandas code. So first of all, we import pandas just like before. And here I created a data frame with some data that is somewhat realistic. So I've taught about some sort of table that keeps track of how our investments in stocks. So let's run this. And you can see here we have the date, the stock that we made an action on, the number of units of that stock, the unit price, and the action which can be a buy or a sell. You can think of this as something that someone is using for tracking their investments. Of course, is not very realistic. In a realistic tracking of this kind, you would have some other columns as well. And you will definitely have a lot more data. But anyway, this is good enough to get started. So what you will see very often is this described method. And if we run it, we get some values here, such as the count, which is the number of rows basically. And you can see here that this only compute values for the numerical columns. So in our case for the units and unit price columns. And it gives us the mean. So that is the mathematical average of these, the standard deviation, the minimum, and the cutoff or the twenty-five percent smallest value's 50% smallest values and seventy-five percent smallest values. And then the maximum. So these last statistics kind of go hand in hand. And this is very useful to know for data science problems or machine learning problems because in depth kind of problem classes, you are very much interested in what the standard deviation is of a dataset. What the difference between the minimum and the maximum is because this might tell you how you should normalize the data. If these differences are too small or too large, that can affect your normalization strategies. You are pre processing strategies. So it's useful to be able to compute these kind of things. Another useful method is the info method, which gives you results like this. And this is useful because it tells you how many are non null. If you have null data in real world setting, you might want to fill in those values with something. Most machine learning algorithms do not work on their own. With missing data or with null data. So this is also something that can come in handy for big datasets, resulting in big data frames. You might also want to inspect the first few, last few rows of data and you do that using the head and tail methods. So head here, you can see it gives us the first five data rows and we have seven in total. And tail will give us the last five data rows. And this again can be useful to get an idea of how the data set looks like without spending a lot of time looking at each row. This methods also work on a particular column. For example, if we do hear df units that describe, we get this DataFrame with the same statistics computed by describe applied on Df, but here only for the units column. Now you might ask yourself, why did I use two pairs of square brackets here? Well, you don't have to. We can also do this, but in that case, we will get the results like this. So as a series as opposed to a DataFrame, this is a series and this is a dataframe. So it just looks nicer this way in my opinion. But you don't have to do it this way. We can also apply other statistical methods. For example, the median method, which will get us the median for each numerical column. And again, we can apply this just on a subset of columns, like so. Next, we can sort our DataFrame using the values method to which we pass the column. We want to solve it by two the buy named parameter or keyword parameter. And ascending equals true if we want an ascending sought or false if we want a descending sort. If we run this, we get a DataFrame. Note that this is a new DataFrame. It's not DF That was changed. The f remains the same. And I can prove that. If I add another cell here and show the f here, you can see that it remained unchanged. So sought values returned a new sorted DataFrame. We can also filter our DataFrame, in this case by two conditions. Let's say we want just the Microsoft stock and just the by actions. So in that case it should give us this first row, this fifth. Oh, and this last row, because the action here is self. Let's see if it does that. And it does do that indeed. Again, this returns a new DataFrame. Note that we use this and operator here. It does not work if you use and you will get an error like this. So use this in case you want to chain multiple conditions. If you just want one condition. For example, let's say we only want the actions or the entries for the Microsoft stock. We would do this. And of course you can remove the brackets here. And we get all three corresponding to the Microsoft stock. We can also do other operations. For example, the units higher than. Then. In that case, we get those entries for which the units column has a value strictly, strictly larger than ten. And you can check that this is indeed correct. So I hope this gives you an idea of the kind of operations we can do on DataFrames and that you understand how they work. They are quite simple and also quite similar to what we've seen in numpy. So this shouldn't be entirely new. And I hope this video helps you do some of these actions yourself. And also maybe gives you an idea of what you should search for if you have some other particular operation in mind that you are interested in doing. 45. Slicing DataFrames: In this video, we are going to discuss the slicing of DataFrames. So unlike before, we are going to use an actual dataset now that is used in some machine learning and data science experiments. We're not actually going to be doing machine learning or data science, but it's good to introduce real world datasets or something resembling them as much as possible. And in order to make this transition as smooth as possible, I've picked a rather easy dataset, and that is the Titanic dataset, which you can find on Google by searching for Titanic dataset. And I've also given a link here to the pandas GitHub where this dataset also exists. But just in case this link will not work when you watch this video, do now that you can find it on Google by searching for it. So let's run this and see what we get. Okay, so it was very easy to, to read this CSV file into a DataFrame, a Pandas DataFrame, as you can see here. And if we display that DataFrame, we get something like this. So it's basically a list of passengers and the Titanic with various attributes regarding them, such as whether or not they survived their name, their sex, their age, their ticket, ID, fair, the cabin and some other columns that aren't immediately obvious. But we don't really need to understand what each column means. We're only going to be used seeing this to experiment with the various slicing features of pandas. Okay, so first of all, we can slice with separate indexing for rows and calls that there's columns. When you see calls, that means columns. And if we run this, we get this. So it basically gives us a slice of the DataFrame or a sub table if you wish, containing the columns we specify here. And their wells we specify here. And you can see that we can use the same kind of slicing as in Python here. And of course, if we eliminate one of these, let's say we only live this, then that will get us all devolves specified here with all the columns. So the wells from one to five without five itself and all of the columns. And similarly, as we've seen before, if we only let the, if we only specify the columns will get those columns and Every row. So that's one way of doing indexing. But a prettier way of doing it is using block and I lock. You can think of these as meaning location and the eye will see stands for index. So let's see what this does. Alright, so this is a bit different, but it's quite similar actually. So we get the values from one to five, including five this time. And the columns we specify here by their labels. Okay? And I lock requires that we specify the index of the columns. So if we run this, you can see we get the rows from one-to-five without five this time. So this can be a little confusing because Locke is enclosed, is including five. I lock is not including five. And this variant here is also not including five. So that's just something you have to keep in mind. Okay? Next, we've specified here the indexes of the columns. So column one and column three. And if we go here to the full table, this is column 0, this is column one, column two, and column three. So, so five. And the name corresponds to column indexes 13. And that's also why I, iloc and loc behave a bit differently regarding this notation here. Okay? And of course these also support filtering. So we can do something like this in lock in order to get only those rows for which the Survived column is one. So this we would use, for example, if we wanted to get all the surviving passengers, all the names of the surviving passengers. And if we run this, you see we get 342 rows. Of course they survived column here is a bit redundant because we know it will always be one due to this filtering condition here. So you might want to replace this with something else. There is a another indexing method and slicing method that Pandas supports, but it's actually deprecated in the list latest versions. So I'm not going to get into details on it. However you can read about it here. It's called Ixx and it basically combines lock. And I lock. That's it with the indexing and slicing features of Pandas. I hope they make sense to you. Please make sure that you understand them in detail before proceeding. As you will need them to properly understand the pandas code you will see for various data science and machine learning projects. Since they are used heavily in those. 46. GroupBy and pivot: In this video, we are going to be introducing two very important Pandas functions, the groupBy and pivot methods. So we are going to be showing examples of these two functions or methods on the Titanic dataset as well. So let's start by loading it and displaying it, of course. Alright, so this is the same dataset as in the previous video. And goodbye, like its name says, groups by a certain column specified by its label. And it allows us to perform various aggregations on that column or, and those columns. Since as you might guess from this example here, we can also group by multiple columns. And by aggregations on those columns, I mean something like computing the mean or the sum of a certain other column. So this might be sounding a bit confusing to you, which is all right, so let's just run this and talk on the example. Ok, so what does this mean? Well, we have grouped by the Survived and sex columns. You can see them here. And for all of the rows that fall into these categories, so survived females and males and none survived females and males. We have computed the mean, that is the mathematical average of the age of those passengers. So for example, the mean age of non surviving females would be here. It will be around 25 years. And the males denounce surviving males a little older at 31.6 years on average. And the surviving males, females, sorry, 28.8 years old, and the surviving males are a little younger this time at 27.2. Now I don't know if this means anything from a sociological or historical perspective, but this is how it is for this dataset. And of course, you will also see stuff like this whole grouping by just one column. This is often more common. And it looks like this. So the males have an average age that is higher than the average age of females for this passengers dataset. Even if this is just one column, you will often see it like this in parentheses, in squared brackets because it makes it easier to add in more columns later if necessary. And this part here is just to get the column that we want to compute our aggregation function on. In this case, the mean function. All right, so when would you use this? Well, you would use this for computing averages or SMS or median values, something like that on a certain category of entries in your dataset. We can also think of it like maybe you have some sales dataset and you have the month, and you want to find out the average sales of each item in all the months, then you would buy the item and the month, for example. So does the group by function. And then we have an even more interesting function, that is the pivot function or the pivot table function. And again, I call them functions sometimes, Strictly speaking, from a Python perspective, they are methods because we call them on an object. We don't call them directly. But these two are interchangeable in common speak and you will often hear people referring to them as functions when they're actually methods. It's just a very small distinction that does not find itself in common speech, so to say. So. It's a, it's an accepted small mistake, if you will. When we, when we refer to something that is actually a method as a function. And I do that too sometimes just be aware that I mean the same thing. All right, so let's run this example and see what we get because we have a lot of parameters here and I'll explain them all. And the resulting example after we run this cell. Alright, so this is quite a lot. So let's see what is meant here. So first of all, we have this index parameter, and this index parameter shows up here. Okay, so survived, which can be 0 or one. You can see this already looks a bit like groupBy because it grouped values by what I passed in here for the index. Good. Next we have these columns parameter. And you can see I passed in sex and sex shows up here. So this is, you can think of it as a group by in the other direction. So for, for adding in a column here you can see we get kind of what we got for groupBy when we grouped by sex and survived, right? But it's displayed differently. Now we have survived here and its values and different wells. And female and male here, and its values. And different columns. Okay, so that's what columns here does. It introduces columns here. Values is the values that we are interested in. It's kind of like the equivalent of this part here. And you can see they show up here. And this is what gets aggregated. And they get aggregated with the functions that we pass into this egg funk diameter. They passed in three here. And p dot some np.median and And these are all from numpy. If you don't pass in anything, the default is the average or the mean. And you can see that we have a, we have different column sets for each of these aggregation functions. So for the sum, the median, and the mean. So you can see that these values here are all repeated or these columns rather than at the values of all repeated for each aggregation function. So this for example, and let's just see if we get the same results. Let's go back here and pass a and survived again, run it again. So for female non survivors we have a mean of 2504. Let's see if we can find 2504 here. So we must look at the mean aggregation female and non survivors. And here it is. So that part is the same as for groupBy. But again, this is kind of like doing many more groupBy. So PivotTable is much more powerful in that regard. And again, as you might be guessing, for the, the square brackets here, we can add in multiple labels or column labels here. So let's see what happens if we add in, for example, let's add P class to the index. And that case, you can see it's smart enough to figure out that we first want to group by survived and then by p class. Okay? Now what happens if we remove p class from the index? And we add it as a column. Well, it shows up here. You can see p class. So it first groups by the sex here, and then by the P class. Now it's debatable which one is more useful? This results in a very wide table. So personally, I don't like it as much. But it is one thing that you can do. And again, there's plenty more that you can try. Plenty other combinations, for example, can pass in the sex here and live columns empty. Then you get something like this, which resembles, resembles the Goodbye example above. There just switched around. So there is definitely a lot that you can do with it, so it's very powerful. And I do recommend that you rewatch this portion of the video where I discuss PivotTable and pause after each example and make sure that you fully understand it. And one thing I forgot to mention is this margins parameter, but you've probably already figured out what it does. It basically gives you a total, right? So the sum of all ages, regardless of this grouping, the median of all ages, and the mean of all ages. And that's what margin Swiffer refers to. So this is PivotTable. It allows you to create very powerful presentations of your data. It allows you to specify multiple aggregation functions, multiple values that you want to aggregate, multiple columns and multiple indexes. Don't worry if you're not entirely sure how to use it for a particular task. In that case, if you suspect that it might be useful, but you're not sure how to use it. Just start out with a few columns specified as the index and see where that gets. You. Go step-by-step and you will eventually reach the optimal presentation for your purposes. 47. Functions on DataFrames: In this video, we are going to talk about functions supplied on DataFrames. We are still going to be using the Titanic dataset because it's great for this introductory type of stuff that we're learning about here. And it also resemble, say we are well dataset without being too complex. So it's a good balance between academic work and we'll work data. Okay? So first of all, we have this apply method that we can call on a data frame. So this apply method most importantly, accepts a function which we can pass in as a Lambda function because there's no point in refining it separately, at least for our purposes. But if you want, you can do something like def myfunction. And it should take in one parameter. And here you can do whatever you want. And then you could simply pass myFunc to the apply method. But I'm using a lambda function here. So what am I doing with it? Well, this function given as parameter to the apply method will be called on every value. So every cell in the DF Titanic DataFrame. Now I want to compute or rather to get a new data frame out. Because this function will return a new DataFrame in which all of the numeric fields are appearing as their square root. So I want to have the square root of the passenger ID of the Survived column, the peak class column. I want to leave the string columns so the name, sex, et cetera, columns as they are, the age also on the square root of that and so on. And that's what I'm doing here. So lambda x, I'm using NumPy for the SQRT function. It returns the square root of x if the D types, so the data type of that cell X is Float 64 or in 64. So if they are floats are integers, and otherwise it returns that cell as it is. So let's see what this does more clearly. If we run it. You can see we got what we wished for it when, if this particular processing maybe it doesn't make that much sense, right? Why would we want the square root of 7i numerical cell? It does clarify how the apply function works. You can see all the numeric cells now have, now appear as the SQRT, the square root of their original values, and the rest remain unchanged. So this is how you use the apply function. It has a few more parameters you can pass in for more control. And I do encourage you to look it up in the pandas documentation. And you should be able to understand what that documentation is saying by now. And I want to show you a few more things before we wrap up this video. First of all, we have this value's property here, which gives us a numpy array of the data frame. And this can sometimes be useful, especially if you only have numerical data in your data frame. Maybe you want to do some matrix multiplications with them or other kind of processing that Numpy is best suited for. And in that case you would use this dot values field. Okay? And of course, if you also have string columns and you only want some of them as NumPy arrays. You can do this, which will give us a numpy array containing just the age and the fair, which are numerical values. And another important thing that, and another important thing that pandas offers, this is null method, is this is null method, which you can see here. We have some NAN values that's not a number. We can see them. We can see them here as well. Alright, and they appear in some other places as well. For example, here for the age. And we would like to know how many of those we have in our dataset. In order to do that we can use is now and some after it. Which will give us a nice table like this, which tells us how many null values we have in each column. So a 177 in the age column, 687 in the cabin column, and two in the embarked column. So that's it for functions on DataFrames so far. There are a few other things we can talk about battalions reduce dose later when we get to them. 48. Advanced visualizations: In this video, we are going to talk about a more advanced topic and that is visualizations. We will see that Pandas supports various ways to visualize our data in plots. That is, a figure that visually describes your data. These are also available in Excel if you've worked in it for a while and you've definitely use them. And if nothing Excel, you, then you've definitely seen them elsewhere. So let's get started and show you what I mean in more detail. So first of all, we're going to load the Titanic dataset again. And this is a big topic so you can read more about it here on Pandas documentation. So first of all, let's see how we can plot a single column, for example, the fair column. And the way we do that is we simply select it in our data frame and call the plot method. So if we run this, you see, we get a nice plot of this column. But you see here it's quite confusing. It doesn't tell us much at all. So let's do something else. Okay? Something more useful. For example, let's plot the average fairs by h. In order to do that, we can first group by the age, selected a fair, compute, compute its average, and then plot that. So if we run this, now it makes a bit more sense, right? Because we have here the horizontal axis or the x-axis, the age, and the vertical axis, or the y axis, we have the average fair for that age. So this already might tell us a few things. Now, this isn't a data analysis course, so I'm not really going to analyze what's going on here. But for example, on a cursory look, we might think that all the people over 60 paid more, right? Because maybe they are, they had more money. Alright. Next, we have this cumulative sum method which plots a, as its name says, it gives us actually doesn't really plot it, but it gives us a cumulative sum. And by calling plot here, we can also plot that. So this is actually quite realistic hollow id, because this can give us the sale. So ever knew as time goes on. So let me tell you what I mean by that. Let's assume that this data has been compiled chronologically. So. You can see here that it's ordered by passenger ID. And we're going to assume, I don't know if it's true or not, that this is the first person that bought the ticket. This is the second, this is the third, and so on. Okay? So by doing this cumulative sum and the fair, we get an idea of how the companies, whoever new, looked like from the Titanic. So if we run this, we can see it goes like this. Alright, so the revenue after about 200 tickets sold was around, here, may be 6 thousand or so. And in the end, all the 8000, these 800 passengers have bought the ticket. It approached 30 thousand for these 800 something passages passengers. So this is quite realistic. This might be something that you would see in an actual report and economical report of sorts. And you can do these kinds of things with Pandas. Next, we're going to look at something a bit more complex. And that is we are going to plot the mean Fair and the survivability per age groups. So let's break down what this seemingly complicated code here does. So first of all, you can write this in one line, but I've split it into multiple lines so that it's easier to understand. You should do that when lines get really long. And in order to split these method calls, you have to use this backslash here, otherwise, you will get an error. So first of all, we have this apply function that we discussed in the previous video to which we divide the age column. We check here if the column of this cell is called age and we only process that one, we do integer division by tans, which results in basically splitting the ages in groups of ten, right? So if the age is lower than term, this will be 0. Between 1019, it will be one, and so on. And then we can group by this. And after group BY, we can call this agg aggregate function, called AGG from aggregate to which we pass a dictionary which will tell Pandas how to aggregate the various columns. So we want survived to be aggregated by the mean across that age group times 100. So it will give us a percentage between 0, telling us basically the percentage in that age group that survived. And we want the fair to simply be. The mean which we can pass in like this. Or we could also do lambda x, x-dot mean. But we can simply pass in a non pi function. Okay? And after that, after we're done aggregating what groupBy gave us, we simply plot the resulting data. So let's see what we get. We get something looking like this. On the x axis we have the age, but now this is not the actual age, it's the age groups. So just keep that in mind. Because we did this here. Okay? And on the y axis, it's either, it's the, the mean actually of the people that survived as a percentage and the average fair that they paid. All right? And again, the point here is not to do data analysis, but we can see that on a cursory look, I don't know if this is completely correct from a data analysis standpoint. But you might be tempted to say, and maybe you'd be right, maybe not. Again, this isn't the point here. That as the age increases, the survivability decreases a bit. And here for the oldest age group, it seems to be a 100%, but I'm not sure how many people fall in this age group. It might just be one or two or three. And in that case it's not very relevant. And also as the age increases, it seems that people have a bit more money and they afford to pay higher fares. If we look at the fair plot. Okay? Now another example. Let's plot the average fairs by age groups. So the same thing but as a bar plot. So it looks like this. You have the age here and the average fair here, corresponding to each boss height. This is just another way of looking at basically the same thing as here. For the fair line. You can actually, if you overlap these two, you can see it increases like this. It increases like this, then it decreases, then it increases again, decreases a bit. This, these peaks here corresponds to this and so on. Okay? We can also do a pie chart by specifying that the kind of plot is pi. And let's break this down again because we have some new things here. So we have goodbye PE class, which represents the passenger class if you look up the dataset. So one is first-class to a second class and third class count gives us a count, just like its name implies. So basically, how many of each, how many values are in each column? And we are only interested in the passenger ID column. You can also select some others, but this is the one with no missing values. So I chose to use this one. And then we simply plot this as a pie chart. And we get this, which tells us that over half of these passengers were in the third class. About a quarter or in the first class and a little over, a little under a quarter in the second class. Now if you're not sure how this works, what you can do is shorten this line, for example, delete that, run it like this. Make sure you understand where all this comes from and then keep adding the others. One by one. For example, we can add in passenger ID it here. Make sure you understand what this represents. And then the plot. Call. All of it. This is how you plot your data using just pandas with no other library. And as you can see, it's already quite powerful. Of course you can get more control if you want to use something lie. Matplotlib that will let you control how your graphs show up and how they look like in more detail. But although at the Out of the Box, Pandas offers a lot of options for plotting data. 49. Exercise 1 + solution: Here is your first exercise in this module. You have to find out who provided the company with the most revenue, males or females? Try to answer this, we take plot. So with a figure. Think about the question very well before you start thinking about the exact wording and tried to figure out exactly what you have to do here. Pause the video here, and after that, resume it to see my solution. Alright, so here is the solution I'm suggesting. Hopefully you figured out what you have to do and you manage to achieve it. So you didn't have to write much code. Basically, we have to go by the sex and aggregate the fair as a sum. So we have to sum the fares for each sex, which is male or female here. If you averaged affairs. That's not exactly correct because we set here, who provided the company with the most revenue and the revenue should be a sum, not an, an average. We are not interested in the average here. So that was kind of trap, so to say that you had to not fall into. And after that, we simply select the fair, either like this or with square brackets and plot a pie chart. And if we run this again, you can see it shows up like this. So the answer would be the males, but only slightly. It's actually quite equal between the two. So that's it. Hopefully you managed to arrive at this at a similar solution. 50. Missing data: In this video, we are going to talk about missing data and how to handle it. So almost all datasets, real world datasets that is, will have missing values. That is usually a column for which we don't know the values for some of the laws. For example, here, for this passenger, we don't know the age, the peers like this as nan. And of course it can also appear as other values like minus1 or null or none. It depends on the dataset how they represent missing values, but it will usually show up as this. So how do we handle this? Because this can mess up with our statistics. If we, for example, want to compute some statistics here that involve adding all of these numbers. What do we add? Four Nance? Well, as you've seen in pandas, we can still compute the means and medians and so on and the sum. And it will ignore these values. It will consider them 0. But if we write our own code, may be it will cause it to crash. So we do need to handle them somehow for best results. Most of the time. Also for machine learning algorithms. There you definitely need to handle them. They will crash if you give them data that looks like this with nans in it. So it's very important to know how to deal with this. Okay, so first of all, we have this is now method and this sum method, I believe we've seen it before. It basically tells us for each column how many missing values are in that column. And here we only have missing values in the age column, the cabin column, and the embarks column. The easiest thing to do would be to fill this in with some constant value, for example, 0. And we can do that using the fill and a method. And if we run this, you'll see all of the nuns are replaced with zeros. And I think I forgot to tell you this Nan stands for not a number. It's basically a valid that's missing. All right, and you can see here fill and a returns a new DataFrame with the missing values filled in. So it does not change the original DataFrame. Another thing we can do is simply to use drop in a, which will remove. All of the rows containing missing values. Now, this is seldom used because usually it removes a lot of the data. You can see here we are now left with just a 182 evils because most wells contain some missing data. If we look here, most dwells miss the cabin value, so those will all be removed and losing 687 rows at least is a big deal. It we are basically left with something like 15% of our dataset and we don't want that. So this is usually avoided, but if you only have a few rows with missing data, this is the easiest thing to do, just get rid of them. Another thing that's practiced is getting rid of the columns with missing data. So we will do the F Titanic grew up in a and we can pass in here axis equals one. And this will get rid of the offending columns. Again, this isn't great because it gets rid of a lot of columns. For example. Why get rid of this embarked column? If it only has two missing values, again, that loses a lot of data. So usually in practice when we get rid of data, a combination of this is used. For example, it might make sense to get rid of this cabin column because most of the values in it are missing. So that would make sense. It would make sense maybe to get rid of the instances or the rows for which embarked is missing, because these are very few. And for The Age, it would make sense to do something else. And we'll see what next. Okay? So we can also fill in values for specific columns by doing this. And notice that I also have this parameter here, in-place equals true, which will, as you can probably guess by now, it modified df Titanic in place. Okay, so now if we were to print df Titanic, let's do that. You can see unknown shows up. So it modified it in place. It did not return a new DataFrame with Cabot filled in with the string here unknown in this case. And if we do this now, you can see there are no more missing values for the cabin column. Now, what can we do about the age column, which has a lot of missing values? And it stands to reason that this is quite important maybe. But we can do here is use the interpellate method, which will replace the missing age values with an interpolation of the existing values. So think about it as drawing a trend line. For example, maybe the age goes something like this. Considering the existing values. And using that graph, we would figure out what the missing values are likely to be. I won't get into the details because this is a bit more math heavy. But just know that it's generally a good thing to try for data analysis and machine learning algorithms. When you have a few missing values, but not enough to warm the removal of the entire column. So in this case that kind of fits. It's too much to remove a 177 rows of data. So there are quite a few missing values, but there aren't enough to warrant getting rid of this entire age column. So in that case, since this is a numerical column, we can use the interpellate method to fill the missing values in. So this is the, these are the main ways of dealing with missing data and we're likely going to see some others in the future. So I do a command that you make sure you fully understand how these work. Now though, to get a better understanding of the interpolate method, I recommend that you visualize the age values, maybe write a for loop and prints them all. See how they look before calling this method. And after calling this method. 51. Merging or joining and concatenating: In this video, we are going to talk about merging or joining DataFrames and concatenating them. This is a very large topic. There is plenty to say about it. So after watching this video, I do suggest that you read the documentation on PI data and that you ran the code examples there as well. And make sure you also understand those. We are only going to be discussing the basics in this video. So first of all, we are going to talk about concatenation, and we are going to do that on the Titanic dataset as well. So here I split the Titanic dataset into three other datasets labeled df Titanic 123. So the first one contains the first 100 rows. The next one contains the rose from 100 to 500. And the last one, the 500, up until the end. In order to do concatenation, we can use the append method, which will give us this. If we append Titanic to two Titanic one, we will get a data set of 500 rows and the same columns. Notice that append does not change. The Titanic won and instead it returns a new DataFrame. We can also use the concat method to which we pass a list of data frames. So if we run this, you can see we get back the original full dataset with 891. So this is all pretty easy. Now we're going to discuss merging or joining. In Pandas. It's called the merging usually. And in SQL or in databases, it's called joining. So these two are basically the same thing and they can be used interchangeably. You will sometimes hear joining used in the context of pandas as well. So first of all, in order to discuss some examples, I have some code here that generates some test data. So it generates a client's DataFrame, a product's dataframe, and a sales dataframe. I'm not going to go into the details on how this is done. You already know more than enough to be able to understand this code. So I do suggest you pause the video here for a while and make sure that you understand what's going on. Basically, we create some DataFrames containing realistic data, similar to what you might be able to see in an actual real-world setting. We also have some Missing data that we are going to handle. And the most important table will, DataFrame will be this sales data frame, which contains an equals 100 random sales. All right, so let's run this. And let's see how these DataFrames look like. So the clients DataFrame looks like this. We have an ID from one to ten, a name, which is a string, and a client class which can be gold, silver, or bronze. For the product stable or dataframe, I use table because I want to make a connection to SQL. If you are familiar with it. If you are, then you know what I mean. And if you're not familiar with SQL, just know that when I say table, I mean the same thing as a dataframe. Okay. So this is this is the product DataFrame. We have an ID, a name, and a price. And then we have the sales DataFrame, which is our biggest with 1000 rows and five columns. It has an ID, ID client. So the idea of the client that didn't purchase, the idea of the product that was bought, which you can see here, can be missing. So why could it be missing? Maybe it's It refers to some product that is no longer in our products dataframe. And in that case, we might have it as missing here. Okay. We have a discount percentage and a number of units that was sold. So let's see what we can do with it. First of all, let's see how our missing data looks like. So we have just 52 missing data cells for the ID product column. The rest are all there. Now we are going to do an inner join. An inner join will, which will list the sales with the product and its price. But it will only list Sales for known products, so it will ignore those with missing values. And let's see what happens if we run this. Okay, so we can see straight away that we get a new data frame. And it looks like this. So it has 94812. And if we go back here, we see that if we add this 52, we would get 100 here. So basically it ignored these 52 missing values. Now let's discuss the actual code. So let's bring out the documentation as well. Expand it. So the first two parameters, left and right, and the rest have default values. In particular, this how parameter is what specifies here that we are doing an inner join. And I left, I left that as the default. Now, this is the left table or dataframe, and this is the right table or dataframe. And if you look at the results, the first columns, columns come from this left table, write these. And the last ones, the ones towards the right, come from the right table. So it will usually look like this. Left on specifies the column and which we are going to perform the join from the left table and write down does the same thing for the right table. So this will basically combine values from these two tables. So DataFrames where ID product in DFS sales is equal to Id in DFT products, right? So we basically combine based on the value of the id of the product. Okay? And suffixs is used in order to distinguish columns with the same name ID in this case. So this id comes from the sale dataframe. And this idea comes from the product dataframe. And it looks like this. And you can see it lets us bring in data from other DataFrames basically. And this is quite realistic because plenty of times we want to see the actual product that was sold and its price. And here we could also compute some other columns as well separately. For example, the paid price, which would be equal to the price times the units, and from which we would subtract the discount percentage. So this is already quite a realistic scenario. Next we have left joins and the right joins, and they are quite similar. They are specified with this How equals left and how equals all right, parameter values. And the difference is that they let us also list the missing values or the the data instances that don't match. For example, let's say we want to list the sales even for a missing products. In that case, everything here remains the same. We just specify how equals left. And in this case, everything from the left column will always be taken and put into the result set. Regardless of whether or not there is a match in the right table. And you can see here that the product has NAN values. And yet it's still shows up in the results. And we have a 1000 rows here. And of course, because there's nothing to match it to, we also have NAN values here on the right. And it's very similar for right joints. If we want to list this time products with sales, even if the cell has missing data, we will do a right join where the left column, the product. And this will take everything from the right column, even if it's null or even if it doesn't have a match in the left column. So these two have been switched around. And they used this heads of 200 to show you that missing values do show up in the results set. And if you want to know for sure that the entire dataset is printed here, we can simply remove it. And you can see we have one hundred, ten hundred rows here. And now the sales columns are on the right. So here because we switched around the order here. So that's it for merging or joining tables or DataFrames and concatenating them. I hope you have a good idea of how they work. They, you will see them used a lot in practice, especially when working with data sets split across multiple data frames or multiple CSV files, where you have links between them based on an ID, which is quite common for some larger datasets. And you will also see them if you will work a lot with relational databases. So I do hope you have a good idea of how they work from this video. And I do encourage you to check this link here for even more information on them. 52. Data export - HTML and SQL: In this video, we are going to see how we can export our data to various formats, and in particular HTML and SQL. Because we've already seen how to read comma separated values. And the most data already comes in this format. So it usually, it doesn't make sense to export this format. But after watching this video, you should be able to find out information about exploiting to your own CSV file as well. So we are going to work with the Titanic dataset again, let's run this. And we have two options that we are going to discuss now. First of all, we are going to create an HTML file that will contain the Titanic data. And we do this like so. First of all, we open a file for writing. And this is the file that will contain the HTML. And after that, we simply do our dataframe dot to HTML and pass in the file handle, which is in our case f. And if we run this, we will get a table looking like this. And we've already opened the file and it will be created in the same folder as your data as your Jupiter notebook. Right? And you see we have all of the columns and all of the values. There is no default formatting, but it's very easy to format it yourself if you already have, for example, some CSS class is defined for tables. And if you want to see how the HTML looks like, you can simply comment-out the file part and run it like this. And it will print the HTML inline. But of course this isn't very helpful. It might only be helpful if you have very little data so that you can make sense of this. But for more advanced, you suggests you are better off opening the HTML file in a text editor. So let's go back. Okay. And the next, I'm just going to mention that you can also export to an SQL database table. But we're not going to do an example because that would require setting up some SQL stuff. And that is outside the scope of this course. So if you are interested in that, if you already have some kind of SQL experience and know how to set it up. You can read about it in the documentation here. It's not that complicated. It just requires setting up some things because you need to pass in a connection to this to SQL method. So that's it for exporting data in some well-known and useful formats. 53. Exercise 2 + solution: In this video, we are going to discuss your second exercise. We are going to be using the same client's products and sales data that we've introduced in one of the previous videos. I've already run this house and the visualizations. So here is your task. You have to write code that will list each product together with at least the following information. The product name, the number of so-called units, and the total revenue from its sales. There are multiple ways to accomplish this. There are no restrictions on the kind of code you can write. So do it in whatever way you are most comfortable with. Pause the video here and resume when you are ready to take a look at my solutions. All right, so hopefully you managed to do this. And here is what I had in mind. So first of all, I used this much function that we discussed about in order to perform a inner join between the sales and the products. And I got this by the product name. You can also go by the products ID. I go by the name assuming that the name is also unique, which is all right. Assumption, but it's also alright if you didn't assume this and did it by grouping on the ID. And after I got this group, I used the apply function to which I passed in a lambda to be able to perform operations between the values of multiple columns. It's a bit more difficult if you use the aggregate function, and it actually might be impossible without preprocessing the data frame separately. So this is one way of keeping the code as short as possible. So the apply method here, passing in a lambda which will return a series. So basically it will get us these as columns because we apply this lambda and each column. So the total number of solid units is simply x units that sum. So we sum the number of units and the total revenue. This is a bit more complex because we need an expression that considers multiple columns. But by doing it like this and the apply method with a lambda function, we can simply use the column labels to do this. So it's just x of price times. This expression here, because we also must take into account the discount percentage. So we divide it by 100 to get it as a fraction. We subtracted by firm one. And that is what we multiply with the price to get the discounted price. And the discounted price, we multiply with the number of units. And all of this we have to sum to get the total revenue for this product. If we run this, we get a nice table that looks like this. If you get the results in a different order, that's fine as long as you get these values. And of course, these values might also differ for me, for you, because this is a randomly generated dataset. So it's possible, in fact, it's almost certain that you will get different results. And I've also got some iterative code here with a for loop that iterates along the DFT product soils and performs the same computations, but in a manner that is more inspiring of confidence towards its correctness, so to say, right? This might be a little difficult to wrap your head around at first. So if you came up with something like this with a for loop, that's also perfectly fine. I think you should be able to understand this code by yourself. So I'm not going to explain it in detail. And if we run it, we can see that we do get the same results. Just underlay slightly different presentation and in a different order. But the values themselves are the same. Okay? Now, one more thing before we wrap up. If you have taken a few minutes and stare that this code and you're not sure how it works. What I suggest that you should do is try to split it into multiple cells and see what happens. For example, when you do just some multiplication here, for example, what happens if you do x of price times x units? How exactly is this multiplication taking place? What happens if you then some debt, stuff like that, break up the code into multiple pieces, especially breakup the pieces that you don't understand this that well until you are able to fully understand what's going on and until you are fully and until you are fully comfortable with your level of understanding. It's very, very important that you understand the code that we present here as best as possible. Make sure that no piece of it is left without a complete understanding from your part. That's very important in order to make progress. Anyway, I hope you managed to come up with something for this exercise. It's a good exercise that combines multiple elements that we have covered in this module. So congratulations if you made it this far. And good luck with the upcoming quiz and the final videos in the course. I hope you enjoy those as well. 54. Pandas Quiz: In this video, we are going to do a panda squeeze. Just like for the previous quiz. It's going to be in Jupiter. And I've left all of the questions on screen here. So pause the video here and try to answer them. And after that, we'll go through them together. Okay? So hopefully you could answer these as they are not very complex. All of them have one line of answers. You don't, you didn't have to be very precise or to talk a lot about the question. Just a one line answer is enough. So how do we, how do we usually import pandas? Well, this is usually done as import pandas, as pd like this. How do we declare a Pandas DataFrame? This is done using pd dot DataFrame to which we pass a dictionary in which the keys are the columns and the values are lists representing values for each column. What method is used to perform SQL like joins? This is the merge method. So if we have a DataFrame df from question two lists, let's say we would use df.loc merge to perform SQL like joints like left joins, inner joins and right joins. What other method we discussed also shows up in SQL. This would be the groupBy method which allows us to group by a particular column or rather a particular column value. We also have goodbye in SQL, which is used very similarly to how it's used in Pandas. Which method applies a given function to all columns or rows of a DataFrame. This would be the apply method which does just that. And it also has a parameter that controls whether the given function is applied to the columns or the rows. So this was it for the quiz. Congratulations if you made it this far and initiative finish and fully understand this module. Okay. 55. Big Secret#1 - Python secrets: In this video, we are going to talk about Python sequence. By Python secrets, we mean lesser known or lesser used features of Python. Stuff that you will not see very often, but that can still be useful in a lot of scenarios. So let's get into it. First of all, I want to talk to you about positional and keyword only parameters. So consider this classical way of defining a function here. It simply takes two power, three parameters and returns their sum. So if we call classical of one to six, we will get an answer of, let's just run it nine. But we can also specify the parameters like this. These are called keyword arguments here. And by doing it like this, we no longer have to obey their order. For example, we could do C equals two and b equals six. Which won't matter in this case, it wants change the result, but it's something that we can do. Now, we also have positional and keyword only arguments. That is, we can mock certain parameters has been transmissible only by their position or by their keyword, by their name. So for positional only parameters, we simply add a slash at the end. And this will cause everything before it. So a, B, and C in this case to be positional only. So this will be a valid function call. But this will throw an error because we specify C by its name. So if we run this, you can see we get nine here, but we get an error, a type error here, because we have passed in a keyword argument when it expected a positional only argument. It works similarly for keyword only parameters. And that is, we add a star here and asterisk and everything after it will be marked as a keyword argument. So this function call will work. But this one will now throw an error. Let me show you. There it is. We get an error on the second function call, but the first one works just fine. Next we have F Strings, which is a recent feature in the Python programming language. And what it does is it lets us evaluate expressions inside strings by putting them within curly brackets here. To do that, we must prefix our string with an F. If we run this, you can see this whole thing gets replaced with the value stored in the pie variable. And this can also be an expression. So we can do here two times pi for example. And this will show the value of two times pi is approximately this value. These are very useful, especially when you have to print a lot of things so that you don't end up concatenating or using dot format or messing with points parameters, for example. So they can definitely save you a lot of time. Next, I want to talk a bit about decorators. So if you recall the Fibonacci function, it's a very slow function. So let me comment out this. And if I run this cell and this cell, you can see it's not instant. It takes a few seconds to compute fib of 35. If Fibonacci is implemented recursively. Now, if I do it like this with at memorize, which is a decorator. Okay, see that result is printed instantly. Now I'm not going to go into details on what memoization is, but basically it stores computed values so they do not get recomputed. And this memo has function here you can see it declares a dictionary. And then we have an inner function called memorized F, which checks if x is in the dictionary. It simply returns it's stored value. Otherwise, it computes the value for x by calling f of x, which is the function taken as parameter by the outer memorize function. And it returns d of x. And this memo iss function simply returns memorized F. So it returns a function. And what this decorator basically does is it replaces fibo fan with this memorized F, so it will make it more efficient. And this is a very useful feature, especially if you are dealing with slow memo eyes above cacheable if you want. Because if functions and last but not least, Python 3.9 and future versions will definitely have cool and interesting features. And whenever they come out, I do suggest that you look up these features and try to spend a few minutes playing around with them, make sure you understand them. They will not be difficult to understand if you already have some Python experience with the previous versions. 56. Big Secret#2 - Numpy secrets: In this video, we are going to go over some num pi sequence. Again, these are some lesser known or lesser used features that you will not see as often in all contexts. So the first one I want to show you is the minus1 in reshaped trick. And this is actually quite a popular trick in the machine learning and data science communities. In other places, you might not see it as often, but here you will see it quite a lot. And you might also see it in code that deals with TensorFlow and machine learning libraries, other deep learning libraries. So let's say we have a made a ARA here. I named it mat. And we want to get some sort of matrix out of this. Well, in this case, we can do it like this. We can put minus one here. And just to show you an example with an actual matrix and going to make this 2.5 elements. So minus1 and ten. And if you look here, you will see that it results in a ten by ten matrix. So this minus one here means an unknown size. And that size will be computed automatically by NumPy based on the sizes of the other dimensions. So if I have here two, you'll see it figures out that this should probably be a ten, because ten by ten by two will lead to 200. And if I have it like this, it will figure out that it should be a 20, because 20 times ten, It's 200. They can also have it the other way around. So I can have a 20 here and a minus one here. And it will result in the same array. So this is very useful because it lets you omit one of the dimensions when reshaping your arrays. And it saves you time from having to compute the exact numbers that should go in. Because many times you will have numbers that aren't as nice here, for example. And you will want to reshape them into something like this. But in that case, if the number sound very nice, like 200 here, it will take a bit of time to figure out what they should be. And minus1 saves you the trouble of having to compute it exactly. Next is OK partition, which works similar to the partition function in quicksort if you are familiar with it. If not, don't worry, I'll explain it. So let's generate a random array of integers between 15. We get this and let's see what our partition does. Well, it gives us an array, and if you look at it, it contains indices between 0. And the length of this array here. So we have 01234 and so on. To better explain how it works, let's also see the results of the, of a indexed with what our partition returns. So ArcPy here. And we get this. Now what our partition. And I suggest you also check out np dot partition. But what our partition does, if we use it like this, let's look at the element at index three. So this one, this one is in its final position if we consider the array to be sorted. So if we would do sort of a, we get this. And you see this two is correctly placed. So this parameter here will make the resulting partition, if you use it like this. Have on disk corresponding index. The final value that would show up in the same place if we were to sort the array. So it basically partitions the array. So not only that, but everything to the left of it. Smaller than or equal to it. And everything to the right of it will be larger than it. They won't necessarily be sorted themselves. But they will follow this invariant. This is larger. This is smaller. And if you'll look up partition, you will see that it basically does this directly. So it would return this array here directly without having to go to the indices here. This is just something that's a bit more general in my opinion because you can also easily use it like you would use partition. That if for some reason you want the indices, you can also do that. Next, I want to show you the clip function, which can also be useful in a variety of cases where you are only interested in some values in some array. So if we run it and the same of a as above, you can see that the values that are lower than three, for example, this two here becomes three. So value is lower than three become three. And values higher than four become for us. So this five here became a four here. And these ones here became tree. This can be useful, for example, if you want to get rid of some outliers in your data. Or if you simply want to bring lower and higher values to a minimum and a maximum. For example, maybe you're dealing with some grades. And if the score on some test is lower than some trust fold, you want it to show up as that minimum threshold. And if it's higher than another threshold, you want it to show up as that other threshold. It can be useful in these cases. And they are definitely very useful shortcuts, vote clip and out politician. So make sure you're aware of them and are able to use them. If you ever see a need to. 57. Big Secret#3 - Pandas secrets: In this video, we are going to talk about pandas secrets. We're going to present two useful features of pandas. First of all, I want to show you leg x's and how they can be used with Pandas. Like axis means regular expressions and they are a powerful way to process strings. We're not going to go into many details. There's a lot that you can read about them in a very, very powerful and there is quite a steep learning curve, but they are definitely worth learning. So I do recommend that you research them and learn more about them. We're only going to do a very basic example in this video. So let's run this cell. And I created here a dataframe containing servers. And each server has a name, an IP, and they serve a type which can be Linux or Windows in this case. So we have two Linux servers. One Windows Server. Their names are home and work for the Linux servers and granted for the Windows Server. And we also have their IP addresses. Now, as a quick side note here, note that these IP addresses are not valid. And if you watch a lot of TV shows, you will have probably noticed that in many TV shows and movies, when they show someone working with networking related stuff and then IP shows up on the screen. It looks something like this, so it's not valid. And that's done on purpose. It's not because the directors and scriptwriters Don't know how networking works. It's usually done on purpose so that whoever is watching the movie cannot just to write that IP address and tried to attack it too, flooded or to Hackett. So by presenting invalid IP addresses, like I've done, you are not able to do any harm to anyone. Okay, now going back to Pandas, which is what we're actually interested in. Using Greg axis. We can do the following. Let's say that we want to add here for more columns. And we want each column to contain a goop of the IP. So we want the first new column to contain 2-3 tree here. The seconds to contain for four or five to one and so on. Well, we do it like this. We say DFS servers, which we index with a list of the new columns. I've labeled them by P1, P2, P3, and P4. And we will assign to this DFS servers dot ip, dot STR, dot extract. And to extract, we pass a reg ex and the Rg x. We have to specify as many groups as the number of columns we've specified here. And they grew up in where gags is specified using parentheses like this. Now, backslash d means that it will match one digit between 09. Plus means that we want at least one match of this, but it can be more. So this group here match this, this, and this up until it reaches this full stop or dot. We also specify the dot here with backslash dot. And why the backslash, because the dot is a control character in regular expressions, is that it does something similar to what plus does. So we have to escape it using backslash. Now we simply repeat these groups because we have four groups in an IP address. So we must repeat it three more times. And if we run this, you'll see they are correctly split into the four columns we are interested in. And this is a nice one liner with regular expressions. It's a very basic example, but I believe it perfectly illustrates the power of regular expressions. One downside to regular expressions is that this is quite hard to weed. Maybe it's still not clear to you exactly how it works and that's perfectly fine. You probably have to follow a regular expressions tutorial to fully understand it. Because I was quite fast in explaining it, since it's not really the point of this course and of this video in particular. But hopefully it's reason enough for you to go out there and learn a lot more about regular expressions. Next, I want to show you how Pandas can tweet to these kind of columns, like the type column should be a categorical column. But right now, if we do this, you'll see it shows up, it's D-type, shows up as object. And in some cases it might also show up as a string maybe. Well, that's not very efficient because for the type here we only have a few possible values. Maybe we have Linux, Windows, maybe some different flavors of Linux, but there aren't that many server types how to there in order to warrant this column, to have a data type of object which is very generic. So what we can do here is we can say that the type column will be a category type column using dot, dot S type to which we pass in the string category. And if we do this, I can see now it shows the D-type as category and it tells us that we have two categories, Linux and Windows. And Pandas is smart enough to figure out when these are equal and when they are not. So this gives true and this should give false. And it does. So hopefully, this helps you see the kind of power Pandas has in dealing with DataFrames. It can let you treat columns as categorical, which will save space in very, very large DataFrames. So that's quite important in many cases. And it also lets you use regular expressions to process your text data very easily. 58. Capstone project with Solution: In this video, I am going to present your capstone project and one of its solutions. So you have to consider the Titanic dataset and plot the Age of passengers and the x axis and the fare they paid and the y axis. Also, you have to draw a regression line. I won't be saying anymore, feel free to Google, whatever it is that maybe you don't fully understand. And also feel free to look up any solutions. I will make a pause here. I suggest you pause the video and give it a go. And I will resume with some hints. Okay, so here's one hint. There is a very, very short solution to this. It's what I would call a one liner. Although some people might disagree and consider it two or three lines, we godless, it definitely can be done in under five lines. So try to think about how to do that. Now we'll pause again before your next hint. Okay, so here's another hint. The multiple ways to approach this. If you can figure out the one line and I mentioned to try to work it out by figuring out how you could do something simpler. For example, star too, it's just the plotting separately. And then figure out how you can perform a regression. Even if you don't plot the actual regression, then try to combine these two. And here's one final hint before I talk about the solution. It's okay if you use other liabilities that we didn't cover. Alright, now, for my solution and some other possibilities of solving this as well, I won't be presenting multiple solutions, but I will talk about how they could work. So first of all, I split my Titanic dataset and DataFrame in today's day FH fair data frame that only contains the age and fair columns because those are the ones we have to work with. You don't have to do this. I just like having this small DataFrame with just didn't theta I need. And here's how you can plot it without the regression line. We've already covered this. You simply use DataFrame dotplot. Now, the last hint might have given it away completely because we have this seaborne library that we can pip install. And it's as simple as doing seaborne dot I am plotting that stands for image plot. And we pass in x equals h, y equals fair. Our DataFrame. We tell it that we want it to fit a regression line. And we have this parameter here that lets us control the way that regression line looks like. And I've controlled it by coloring it in R L. So this is the one line of solution that I mentioned. As you can see, it's very, very grief. It's very easy to, easy to understand. And it perfectly illustrates the point I was trying to make in one of the previous videos about making as much use as possible of external libraries. This is perfectly suited for the requirement we had here. Now, this is a capstone projects. So in order to get some more exercise, I do recommend that you also do it in the pure numpy and pandas weight. That is, use what we've learned about numpy to compute level regression yourself. Then use something like Matplotlib to plot both the data and the agent, the fare and the resulting regression line from numpy. Maybe even implement the regression in one of the ways that we didn't talk about lookup, Another way to implement the regression. It will be significantly more CO2 than the one liner we have here. But it wouldn't be that much. It should still fit in under 50 lines of code. So I do recommend that you do that as an exercise. And the good thing is that you have this to compare against. So if you do it with PO numpy and pandas and some matplotlib may be, and you get something that looks similar to this, then you know your implementation is correct. Otherwise, you now that you have to try harder. So congratulations, if you managed to complete this capstone project in any way that results in a similar graph to this. That means that you now have a working knowledge of numpy, pandas and a few other libraries as well, which puts you're well on your way to learning more and being an expert in data analysis and machine learning. So congratulations for making it to this checkpoint in your journey, and I wish you the best of luck and to have fun in our next videos and causes. 59. Tips and tricks: In this video, we are going to discuss some tips and tricks regarding what we've learned so far. So we're going to cover some useful libraries for this. If you find something to be too difficult, for example, the lesson about pivot tables might have been a bit too difficult. Maybe not entirely sure how to use them and how to make a pivot table look like you'd want it to look like. So in that case, in any such case, you should always try to see if there are any libraries out there that can simplify the task that you're struggling with. So if you're struggling with pivot tables, you should Google and tried to find libraries that make working with PivotTable. See if you're struggling with some concept in numpy. Again, maybe does a liability that makes it easier. And you can start by using that library. So that's what I've done here for a few things. First of all, we have the Pivot Table library called Pivot Table j s. And you can install this directly from Jupiter using this command that I've commented here. I already have it installed so I won't be running it. And after that, we can import pandas just like before. And pivot table js. We only needed here. Pivot TY from it. We load our Titanic dataset, the one we've worked with in the Pandas module. And we simply call pivot UI, giving it our df Titanic dataframe containing the Titanic data. And then outfile our table. Our pivot table will also be saved in this HTML file. So you will be able to share it with other people just by sending them this file. All right, and if you run this cell, it will look something like this. I can also pop it out or go to the folder in which the DF Titanic CSV files. So Titanic dot CSV is saved in and open the HTML file from there. In some cases, this link here might not work. So once you do that and you see this table here, this is an interactive table, meaning that we can drag and drop things like this. And you see, we have a count of each passenger class. So how many passengers while in each class? If you recall, we can get the same result using pivot from Pandas. But you have to know how to use it and it's a bit more work here, it's just drag and drop. We can also have, for example, a sum. Although here a sum doesn't make much sense, right? So maybe we can check here. Survived and then it sums the survivors, the number of survivors for each class. So some just by itself without selecting something here, doesn't make much sense. So it knows. That it must sum by something. For example, bio, the fair. How much each Fair has paid to the company, right? So first-class passengers paid the most, which is to be expected, right? And we can also group these by multiple things. For example, we have here may be survived. Again, you can see that apparently the first-class, so five us pay the most, and so on. We also have, if I can get it here, the sex and also go by that. And you can get these nice pivot tables that wouldn't be so easy to obtain using the default pandas functionality for this. So it can be very useful and it can save a lot of time. Next we have Q grid, which is also a Pandas extension, something that works with Pandas, which we can install using this. And I also suggest, although it might not be necessary, that you installed this and run these commands. Then we can import Cu grid. And we define it like this. Grid equals Q. Good show guid, which we pass in a DataFrame. And I've also passed in Show Toolbar equals two. Then we display it like this. And it should show up something like this, which is also an interactive table. Because if we double-click on a cell, we can edit its contents. So for example here we cannot be 00, a2, a3, whatever we want. Now, if this doesn't display for you, I've actually had this problem where I installed everything and to the table here wouldn't display. So if you get that same problem, try restarting your Jupiter. Don't just restart the kernel. You'd have to restart the whole thing. So depending on how you've run it, run it from a terminal, close the terminal and run it again. You have to restart the whole Jupyter server for it to work. So that should fix it if it doesn't work without doing that. And the cool thing here is that we can add those dynamically. We can remove those. So we can do all the stable editing. We can also pop this out like this. So we can do all of these cool things including filtering. For example, if we only want to see the female passengers, they can do it visually without having to mess with Pandas too much. So it's a cool way of editing data in line in Jupiter. So hopefully that shows you how much time you can save and how much easier you can make your life by simply Googling the things that you'll notice are taking too much of your time or natural simply struggling with. 60. FAQs: In this video, I'm going to go over some frequently asked questions that I hear a lot from students. So first of all, should you always update your installed libraries like numpy and pandas? In my opinion, you should only do this when there is a need. You shouldn't spend too much time looking up if you have the latest version or not, the latest version of Python or some specific library. If there was ever a need, you will find out because you won't be able to do something that you need to do. And then you can worry about installing the necessary update. Why don't I recommend it the done periodically or done immediately when there was a new version out. Because this can also break things called that true of built for some version x might not work for the next version, x plus one. So unless you really need to update, unless there's a good reason to do it, some new feature that you really need, some efficiency issue or some bug fix, then I don't recommend doing it. Waits until the new version has matured a bit. Wait until people have given their feedback and, you know, it's safe to update next, can I use code that I find online? Yes, you can definitely use code that you find online. We all do that. I've also taken gold from various places for my own projects and leave on for some of my courses. But I've done something that's very important for you to do as well. I understood it and I was able to make changes to it. I was able to adapt it to my needs. For example, even just renaming some variables to make it more presentable. As long as you fully understand what's going on, you are able to refactor it to better suit your own code organization style, and just generally are able to work with it in a safe manner. That is, you know, doing changes to it will affect its runtime or its correctness. So this is definitely fine to do. But you must spend time understanding that code. Maybe you weren't able to come up with it yourself, or maybe you didn't have the time to do so. But it is important that you understand it as best as possible before making use of it. And of course, this is not considering any copyright issues. Of course, those take precedence. If the code is copyrighted and you are not allowed to use it, then of course you shouldn't use it. Next, if available, should I always use a library rather than writing my own implementation? Well, here you have to weigh the pros and cons. Is the library's Tillman maintained or is it a dead library that no one is working on anymore? If no one is working on it, that's not a very good thing to do because a new version of Python or of some other dependencies might break it. It might not work anymore. And then you'll end up writing your own implementation anyway. And it might be harder to do so later on. So keep that in mind. Generally, if it's a maintained library, if it's well known, if people are working on it and it's quite safe. But you should also think about whether or not that library is robust enough. Can it really do all that you needed to do? Try to predict what you will need from it in the future. Is it able to handle those things as well even if you don't need them right now today. So it's something that has its pros and cons and you just have to weigh them and make a decision. For example, another consideration would be how long it would take you to write your own implementation example. Let's take a PivotTable. How long would it take you to make pandas show up the pivot table you're interested in versus using the depth PivotTable JS library that we've talked about. Something that you have to weigh. If it doesn't take that long to write it yourself, it might be worth it to just write it yourself. It would take too long. Some of the cons of using some extra library might be worth it. Okay, next, should I learn numpy pandas or something else? Will it still be relevant in a few years? Well, in tech, almost nothing stays relevant forever. There are countless examples of technologies and liabilities that were popular at some point, but that have significantly fallen out of favor today. This is especially to fast-moving fields such as machine learning, for example. However, that does not make it less useful during its time. Today. Numpy, pandas and Python, and plenty of other libraries in Python are relevant and they are useful. They are definitely very useful to know. They definitely represent an important addition to one's skills have I believe. And you should learn them. Even if they won't be relevant in a few years. Whatever will end up replacing them. We'll have similarities to these. So by knowing these, we will be able to pick up the new thing much, much faster than some other person that starting from scratch. Previous experience always helps. And for the last question, I could think of what are the next steps after numpy and pandas. I would suggest that you pick up some machine learning. Machine learning makes heavy use of these two libraries, numpy and pandas and Python in general. Our platform also has causes for that. And I think this is a natural next step. But of course, that's not the only step. Maybe you'd like to go into web development. And in that case, you would have to look up what web development options Python offers. Numpy and pandas aren't specific to machine learning. They can be used in a number of things, maybe even things that I can't think of right now. If you can think of something, you should pursue it. So hopefully this clarifies some popular questions that I'm aware of. If there are any others, I do suggest that you spend some time researching them and try to find an answer. By now, you should be quite familiar with how to do that.