Algorithmic Trading & Quantitative Analysis Using Python | Mayank Rasu | Skillshare

Algorithmic Trading & Quantitative Analysis Using Python

Mayank Rasu, Experienced Quant Researcher

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
69 Lessons (12h 22m)
    • 1. Course Introduction

    • 2. What Is Covered in this Course?

    • 3. Course Prerequisites

    • 4. Is it For Me?

    • 5. How To Get Help

    • 6. Pandas Datareader - Introduction

    • 7. Pandas Datareader - Deep Dive

    • 8. Yahoofinancials Python Module - Intro

    • 9. Yahoofinancials Python Module - Deep Dive

    • 10. Intraday Data - Alphavantage Python Wrapper

    • 11. Web Scraping Intro

    • 12. Using Web Scraping to Extract Fundamental Data - I

    • 13. Using Web Scraping to Extract Stock Fundamental Data - II

    • 14. Updated Web-Scraping Code - Yahoo-Finance Webpage Changes

    • 15. Data Handling

    • 16. Basic Statistics - Familiarize Yourself With Your Data

    • 17. Rolling Operations - Data In Motion

    • 18. Visualization Basics - I

    • 19. Visualization Basics - II

    • 20. Technical Indicators - Intro

    • 21. MACD Overview

    • 22. MACD Implementation in Python

    • 23. ATR and Bollinger Bands Overview

    • 24. ATR and Bollinger Bands Implementation in Python

    • 25. RSI Overview and Excel Implementation

    • 26. RSI Implementation in Python

    • 27. ADX Overview

    • 28. ADX Implementation in Excel

    • 29. ADX Implementation in Python

    • 30. OBV Overview and Excel Implementation

    • 31. OBV Implementation in Python

    • 32. Slope in a Chart

    • 33. Slope Implementation in Python

    • 34. Renko Overview

    • 35. Renko Implementation in Python

    • 36. TA-Lib Introduction

    • 37. TA-Lib Installation & Application

    • 38. Introduction to Performance Measurement

    • 39. CAGR Overview

    • 40. CAGR Implementation in Python

    • 41. How to Measure Volatility

    • 42. Volatility Measures' Python Implementation

    • 43. Sharpe Ratio and Sortino Ratio

    • 44. Sharpe and Sortino in Python

    • 45. Maximum Drawdown and Calmar Ratio

    • 46. Maximum Drawdown and Calmar Ratio in Python

    • 47. Why Should I Backtest My Strategies?

    • 48. Strategy I - Portfolio Rebalancing

    • 49. Strategy I in Python

    • 50. Strategy II - Resistance Breakout

    • 51. Strategy II in Python

    • 52. Strategy III - Renko and OBV

    • 53. Strategy III - Renko and OBV

    • 54. Strategy IV - Renko and MACD

    • 55. Strategy IV in Python

    • 56. Value Investing Overview

    • 57. Introduction to Magic Formula

    • 58. Magic Formula Implementation in Python

    • 59. Introduction to Piotroski F-Score

    • 60. Piotroski F-Score Implementation in Python

    • 61. Automated/Algorithmic Trading Overview

    • 62. Using Time Module in Python

    • 63. FXCM Overview

    • 64. Introduction to FXCM Terminal

    • 65. FXCM API

    • 66. Building an Automated Trading System - part I

    • 67. Building an Automated Trading System - part II

    • 68. Building an Automated Trading System - part III

    • 69. 7 9 Automated Trading Script 4

21 students are watching this class

About This Class

Build a fully automated trading bot on a shoestring budget. Learn quantitative analysis of financial data using python. Automate steps like extracting data, performing technical and fundamental analysis, generating signals, backtesting, API integration etc. You will learn how to code and back test trading strategies using python. The course will also give an introduction to relevant python libraries required to perform quantitative analysis. The USP of this course is delving into API trading and familiarizing students with how to fully automate their trading strategies.

You can expect to gain the following skills from this course

  • Extracting daily and intraday data for free using APIs and web-scraping

  • Working with JSON data

  • Incorporating technical indicators using python

  • Performing thorough quantitative analysis of fundamental data

  • Value investing using quantitative methods

  • Visualization of time series data

  • Measuring the performance of your trading strategies

  • Incorporating and backtesting your strategies using python

  • API integration of your trading script


1. Course Introduction: hello. Friends creating for most of us is still sitting in front off your laptop or computer or creating terminal and looking at the changing asset prices. Probably the charts as well on, then waiting for the signal to worker and then take a creating decision. For example, buy or sell or whatever, Right, so this is, Ah, creating for most of us. However, you will accept that in this present era, off automation algorithms are doing more and more off our job. So a lot off manual things that were never thought could be automated are being automated on. What if I tell you that this trading this manual trading can be automated by accord on? That's exactly what we are going to look in the scores. I don't want you to sit in front off off your train terminal for the whole day for the creating hours, not going to bathroom, not going for lunch, cutting down on your social life on gritty, much being engrossed in, um, in the charts and in the data. Right? Ah, ideally, what you would want to do is have a cord on DA. Let it do all the work. So so this court is running and now we'll see how it will do everything that ah human traitor do. So this scored will now connect to the traded to this Ah broker strain terminal. It will extract data it will analyze data on. Then it will take waiting decision. And as you can see, a new long position has been initiated on. You can see here this position has been initiated. So this straightening position waas pretty much done completely by algorithm that I'll guarded them connected to a trading terminal. It, uh, did pretty much everything that a human trader does on. Then it was ableto identify the signal and create this new trading position. Okay, so this is exactly what will try to build. I will try to understand in the scores. Now something about me. I'm inexperience quant having more than a decade of war experience with leading global investment banks. I've had the good fortune of looking both in Asia and North America, and I follow both these markets bought the U. S. Market on the Asian market very closely. My work experience has spanned from building quant models to trading tourist management about my educational background I got my undergraduate degree in engineering and then I went on to do M B A and M A fee. I have a couple of scholastic achievements under my belt. I qualified for the prestigious and very competitive mathematics solemn beard, and I also performed pretty well and very escorting competitions around the world. I am a strong believer in data driven strategies on, and you will see this in ah throughout the scores that all my strategies are thoroughly grounded in data driven strategies. On one of my objective to the scores is to impart the same level off diligence that I have that I do with data. I want my students to have the same level of diligence when they're working with data. I am a machine learning enthusiast on I Am a student off Applied Machine learning. I look forward for opportunities to apply machine learning algorithms in trading but machine learning and deep learning. Our God concentrating on I'm presently working on a machine learning slash deep learning based trading. Bart. This is my first course on skills share, so your feedback is highly appreciated. It doesn't matter if it's a good feedback of bad feedback. The only way for me to figure out what's going well for my students and but not going well for my student is to feedback. So please feel free to provide constructive feedback. And I ensure you that I'll work on your feet back to make the scores much better. So I hope you like the scores on the scores. Is Asprey Expectation. I'll see you in the lecture videos. 2. What Is Covered in this Course?: Hello. Friends in this course will start from the basics off algorithmic creating and quantity finance, which is extracting data. I have spent quite a few videos. Teoh, help you learn how to extract different kinds of data, which includes intra rater, daily data, monthly data, etcetera on. I spent a considerable amount of time in the section because you need to have a number of ways by which you can get data programmatically into your system. Right, So we'll be looking at equity data Commodity data forex data on. We'll discuss two main ah, ways of getting data for your program, which includes getting data over an FBI on getting data using web scraping. Ah, we will look at radius up fighting libraries which help you extract. Ah, these this data for free which includes pandas Data Reader Yahoo Finance, Alfa wanted fxc Um, I'll go into more complex examples off extracting data, which means, ah, data for a number of stocks data over a choppy Internet connection at sectoral. Right. So I'll I have spent a considerable amount of time or this section off the course because I think this is very important on we'll look into like both which LCV data What you get for stocks which includes open, high, low, close and volume later on. Fundamental data which includes bait, are from balance sheet income statement, cash flow statement. So will then move on to according technical indicators and performing for now mental announces you're invited. So in this section, I'll help you understand both technical analysis and fundamental nails is on DA For technical indicators, we will look into ah, you know very fairly popular. Ah, the eyes which include MSC Alice I 80 x o b b. It's extra on. You'll get you'll understand what these are ones we had in that section. We'll also look at the crossovers slope signals Momentum's how toe how to kind off translate Ah, the working off a trader Ah, you know, just looking at the chart and trading how you can mimic this behavior using you know, your courted sector on will also discuss how to, you know, write programs to perform detail for number. Internal analysis of stocks which will be required for value investing, will then move on to the business end of the course, which will include building, creating strategies and performing back testing if you don't know what His back testing. Don't worry about it. This is one of the very important parts off. You know, checking your cord. Onda, Um, this I would consider it as one of the most important parts off deploying yard. I got to make a trading strategy in the real world. So in this section we will look at how to court trading strategies, how to perform back testing, using historical data to test performance of a strategy, how to calculate KP eyes, which is extremely important because you need to have a way to measure how your strategies performing. So in this section, we will know how to, you know, calculate Sharpay Sortino when ratio tagger Walter T etcetera OK, on da this up will also look at implementing basic ML creating strategies. I mean, this is still a work in progress, but I will definitely include ah couple of strategies that have some ml aspect into it. Okay, then, in the final part of the course, ah will be using rest ful ap eyes to build a fully automated creating set up. So in this section you will know how you can write accord and then they'll just forget about it. The court will do. Were trading it will extract data. It will connect to the Blucas terminal. It will ah parse data. It will analyze it. It will generate creating signal. It will book trades. It will cancel trades. It will move your stop losses that Sacra So in this last section will build a fully automated creating bought which will perform all the steps that train human can perform off for trading purposes. Okay, Andi will be using ah affects CM, which is a very popular broker for FX trading. We believe really using their day more count for testing this life, creating and creating this board. 3. Course Prerequisites: So as you can figure out the see the course off intermediate level difficulty so it would be great if you have Ah, some fightin skills. I mean, basic fightin skills are definitely required, without which you will be a bit lost in the scores. So I would urge you to, you know, build some some level off fighting proficiency. And when I say writing proficiency, I mean, you should be able to understand how to write a function, how to look through. Ah, list. What are dictionaries? Etcetera on, da Good to have skills would be, you know, some familiarity with number e band. As Mac, large live starts mortal SK loan etcetera. So, in the in the court section, off my course, where I explain in the video how the score disfunction ing, I'll try to explain everything in the court. Okay, but if you have some familiarity, you will be able to pick this. These videos ever picked these lessons they more quickly than if you are a complete nervous . So I would urge you to just ah, brush up your like, basic fightin if you want to extract the most out of discourse. Okay. On some experience in creating is also required because I'll be using ah gonna stocks commodity for EC state arts across. So some familiarity with creating parlance will be very appreciated because I'll be using words like high low, um, kind of six stop loss. I'll explain what what these are. But sometimes I may miss it because something maybe, like, really basic, for example, going long or going short, etcetera. So if you have the basic off these creating problems, doctor also be tremendously helpful on basic statistics. High school level statistics and mathematics is also required for the score. So again, I will underline that this discourse is off intermediate level difficulty. So I would urge you toe just brush up the areas that we think will be required. 4. Is it For Me?: this course is for ah, like a lot of demography. So if you are a creator who want toe automated trading strategies, you will be able to benefit the most from the scores. If you already have your trading strategies and your trading strategies alive in the market , you are already trading using charts. Are, you know, performing phenomenal analysis on If you're looking for a way to automate your scribe, the so that you have more time to think about more strategies and think about more intellectual things, then this is Ah, you'll be the demographic that will be most benefited from this course. In addition, if your engineer who want to build an automated creating station, this will also be very helpful for you. Our data scientists who love to work with data off life streaming financial later. How toe make sense off it. This course Ah, you will also find the scores helpful on, and anyone who wants to gain an understanding off creating programmatically or anyone who is curious, uh should be able to find the scores useful. Okay, I would like to underscore that this course is not for high frequency traders, so ah, significant part off. Ah, algorithmic trading. Ah, views with high frequency trading. Which bells in? Ah ah, very low late and see Kind off. Ah, creating a garden where the algorithm is making creating decisions in milliseconds. Right, So this course doesn't cater toe that part off the algorithmic creating Ah ah, world. Right. So all my strategies that I will discuss in the scores will be off medium to highlight and sees. I just want to make this thing clear. Doubts it. 5. How To Get Help: and this is very important because the scores is often immediate level difficulty. So you will have questions when I am explaining So both in the overview sections on in the courting sections, you will have, ah, problems. So do posted in the Q and A section. I regularly review the to any section and provide the answers. Okay, but it will just try to make my life a bit easier. So right, a descriptive post in the Q and a section Ah, for example, if you just say something like I'm getting editor, so I will not be able to help. So if you mention what is ah, you're trying to do and what other stuff that you took on What is the other that you're getting? If you get if you give me the message in this kind of structure format, it will really help me to troubleshoot your problem and help you provide the solution for that issue. Okay. And also our urge you to spend some time a time on the Internet trying to find an answer. And this is one of the most important part by which people learn, right? So if you're Newton in the world. According, if you're starting out, if you're not a doctor like you know an export in according etcetera, you will get into issues. You will find errors. You will thing that you have hit a wall in sector rate. However, now, in the age of Internet help is available everywhere on forums like stack, overflow, get hub Somewhere. Someone has already had this problem that you're facing and they have discussed about it and they have found a solution right on. I'll spend some time trying to, you know, explain you how to get help. Uh oh. And was stacked or stored spectra and how to access documentation off various biting library. So I'll be really happy if you are able to pick up the skill at the end of the school's off . How toe seek help from the relevant forums, right? So being able to frame the right questions and search for the relevant information is ah, very important skill in the world of programming. This cannot be, you know, the importance of this aspect cannot be emphasised more so. Try to get help from those sources as well on you can directly drop me a note on my personal email, I d I I may take some time to get back to you, but I review every single millet I get in my personal mailbox so you can utilize this every new as well. If you have exhausted the other the previous two avenues and you have not been able to get you're, uh, answer. So I'm really excited that you have enrolled for the scores. Andi. I hope that that will be ableto make something awesome. So let's jump straight into the course. 6. Pandas Datareader - Introduction: Hello, friends. So without further ado, let's jump straight into banners. Data Reader, which, according to me, is one of the most versatile fighting libraries to access. Ah, data from a lot of sources from different sources. For example, Yahoo, Finance, Global Finance, World Bank. Wandell What? North. So let me for show you the documentation off this library. There you go. So ah, this ah, documentation, like a lot off documentation for other light and libraries is fairly ah, descriptive on. It tells you about how to install it, the usage, the requirements. You should be able to meet the requirements. Ah, installation. If you have Anaconda most if you will have no problem in that. And also it gives you the list off the sources from which it can help you get the financial aid offer from. So you can see who will finance Fred Wandell, War bank or not a Z. You can see there is no Yahoo financer. And that's because, ah, it has mentioned on the top that Yahoo Finance has made some changes in its FBI which is causing some problem. Ah, using ah pandas, eat a reader. However, I can assure you that and you can still access. Ah, Yahoo Finance data using candles, Data reader. And that's exactly what I'm going to show you in this video. Ah, but in the meantime, let me just show you, for example, in global finance how to extract data from will finance, and they explain how to go about doing it. So any source that you want to extract data from you just click into that documentation for that particular source and really give you a sample script off How to extract data. Okay, First, let me tell you how to install it. So a Z, you can see it's fairly straightforward using pip Pip, install panels, Data reader, and you can install it. Ah, from your spider i d v directly. All you have to do is put an exclamation mark Ah, in front of this, Come on and then just run it. I am not going to run it because I already have pandas data reader install. But if you want to give it a try, go for it. Let's put an exclamation mark and then pay per install whichever library you want to install, and then this Run it and you should be able to have it in your beytin. Okay, so let me show you how to ah, do it. Let me show you a sample data extraction. So we first import the library's. And as you can see, as soon as you put a dot you'll see all the more dues that are in pandas Data reader. Ah, as off now we only born data because that's what is used to get data from Yahoo on DA. I'll show you how to go about it. And also we'll need to import library called date time. Ah, and be needed because Dana's data reader did. A module requires the date, the beginning date and the end date of data to be provided in Ah, the daytime format. Okay, so now you just, ah defined the variable. So say you want to get data for Amazon on. You have to make sure that the ticker is the same as that in Yahoo finance because we're going to use yahoo right on Scott. Date equals to, um I'll come to that. And so say you want to have data for the past one year, right? So the indeed would be ah, date. I think it is. Start date dark today? Yep. So this is so this day, daughter day daughter day is actually, um it will give you the date in today's date in the relevant daytime format. So I just show you how this little So I just ran the library's and then if you run this, you see here Ah, it's today's date 24th February 90. And this is the format that that you get this date and this is the format dot Banda's data reader data model requires you to supply date since so you have to go. You will have to use a daytime format on then for start date. Let's ah, do the same thing and then Sub Strack did by 3 55 days. So for that, I'll use a module call tight in time, their time deltas and yet explain Delta. So this does nothing, but it will reduce. Ah, so it will take today's date and subtracted 3 65 on the supply. The result in the daytime format, right? So you have pretty much all the variables that you need to extract data. Ah, for our for Amazon For the past one year, So now let's see how to do it. So data PDR dot Ah, get data. Yahoo is what we're looking is what we want. So get data. I also you can see the friend Ah, the friend. Ah, you know functions. But you can use to get data from different sources. And I have tried Ah, get a Google Cuando yahoo and all of them have been working so far. So you may want to give it a try on then while I need to do is put in these variables. Um, you should be able to get your leader those This is it. So if I run this, I should be able to get ah, the required data. So it's running. It might take some time. Um, it is run. So if I go here and if I goto data, you can see that we have got daily data here. Right? So we got get open, high, low, close adjusted closed volume. This is the same information that you will get from Yahoo Finance website on the historical section for Arizona for any other equity that you sell it to get data for right. Ah, we can do one more thing here. We can also give an argument Call in terrible. And if you want monthly data or early data, you can do that as well. So if I run this, you will get the monthly data. So it's running, and ah, that's taking slightly longer than it usually does. Onda. Uh okay, so it has run now. So if you see here Ah, and you now get monthly data right on. If you're getting nan values, that means for that particular day there was no value. There was no value in the home finance to get to provide the user. So you make see some non values and I'll explain how to deal with non values later in the scores. But ah, yeah. So this is what I wanted to show you. So you can ah, sort of Louis granularity off. Extracting data is daily data you cannot get intraday. Did are using ah using pandas data reader because most of the sources that it gets data from, for example, yahoo will have Spectra Day provide daily. It s so that's the lowest granularity that you can go upto there also raise off getting in credit data for example. Five minutes, 15 minutes. Start imminent scandals for free. And I'll explain you those those methods in ah, in later, in the scores. But as off now, I just wanted to give you a hang off how to use pandas Data reader. And in the next video, we'll look at a slightly more complicated Ah, question problem wherein we will be would extract the clothes data for a number of takers and how to do it. So, uh, yeah, so I'll meet you in the next video. Thanks. 7. Pandas Datareader - Deep Dive: Hello, friends. So unlike the last video baby extracted data for just one stock in this video and see how we can extract data for a number of stocks programmatically. And this is the problem that you will need to solve when you are working on pretty much any lagarde. Um, because you will need to analyze data from a number of stocks before making a creating decisions. That's typical for ah, creating a guard them. So it's very important that you get a fairly, well understanding off how we can extract data off number of stocks. Ah, at one go Right. So you will see that the score is a bit busy compared toa the previous video where we just had one line off court, which was this line on be extracted Ah ah, the data for that particular stock. But this script, like you see, is a bit busy, and that's ah because we have to get around the possibility off the a p I connection feeling for one of the stickers. So, for example, if you if your script ah makes an A P, I call for the first ah ticker in the list, it may be a successful connection and you may get the the data it again, baby, a connection may be successful second stock, third stock and so on. But if you come to say the fifth or sixth stock, the A P aced connection may feel. And if that happens, the connection shuts down and your entire algorithm fails, your your garden breaks down, and then you you end up with having ah data for five or six stocks, which is not an ideal outcome, right, So to get around this, we use an exception handling methodology. I used the try except exceptional handling. You can use other exception handling. I method, uh, used in beytin if you feel like but it I am more comfortable to try. Except so I use that so by using exceptional handling. What we essentially doing is that if there isn't if, if there's a connection problem of the or there is any issue with one particular stocks AP a connection call It ignores that particular your your algorithm ignores that particular stock and keeps moving forward. Right? So by doing that, we get around the problem off a p. I ah, you know, connect connection failure for some of the stocks that you may encounter. Right. But it does not solve the entire problem because say, you get the connection problem for eight or nine stock out off the 50 stocks in your list, right? That means you You still don't have ah data for eight of those stocks after the first pass through. And that's not an ideal outcome because you want data for all the stocks. So what do we do of a Ah, we baby make a second passed through, and then we tried to get information for those stocks. Ah, that we could not get data for in the first mass. True. Right. So that's a nutshell. What I'm trying to do here, it's ah, long go to get around a small technical problem. But this technical problem you face quite often with pandas, Data reader. And that's why I wanted to take you through how I get around that problem. Right? So if you see here, I have stored the stocks that I want to extract the data for far for in ah, in Ah, this list car stickers. Right? And then, um I am storing. Ah, So I am declaring an empty data frame where I bill store the information for all these stocks. If you're not familiar with data frames, banners, data frame, I would strongly urge you to get a fairly thorough understanding of the airframe because we'll be using them extensively. Toe store information. Ah, data frame is a very flexible Ah, uh, it's it's very flexible. And it provides, uh, ah lot of functions that you can apply of any store data in the data frame which makes ah, your job, your life really easy. So I would strongly urge you to get a very good understanding or spanners data frame. Right? So I am declaring an empty data frame where I'll be storing information from all these stocks. Right. Ah, again, this attempt is Ah ah, I'm initializing a variable. And ah, this attempt is the number of past true's. I will make ah toe good just to make sure that I get information for all the stocks that I intend to get the information for. Right. And this drop is an empty list which I have created where I will store ah, any stock that successfully connected on for which we successfully gold the data. I will take that ticker out off the initial list and put it in this drop list Right on. I do this because in the subsequent passthroughs, we want toe extract data for only those stocks that failed in the previous Masters. Right. And that is something that I'm doing here. If you look at this line off court within the while loop, this is what I am doing. So that's why I like fighting so much, because you can do fairly complicated task and just one or two lines. Of course. So what I'm doing here is that I'm saying if if the ticker is not in drop, only then put in takers, right? So, no, If you see this, if you go through the scored, you will see that in case off a successful AP I call. I am getting the data. I I do get to the organ CV data. Then I just take out the close price. Ah, from that august which LCV data and I append Acto the stock CP empty data frame that we initialized at the beginning of the loop. And then they put the clothes data there right on the last line within the try block is that I append that particular ticker which had a successful connection and for which we were able to get the data into that drop list. Right. So what happens that this drop list gets populated as we as the loop proceeds through the initial list on dumb, then it is left. So so then the drop lis list is populated with all the successful calls. So in the next pass through, when it goes to this line, of course, the tickers list will have only the only the diggers which are not in the drop list, which we just populated, Right. So, for example, if there 40 successful calls out of the 50 tickers that be initially intended toe get later for in the second pass through the tickers, ah, list will have only 10 stocks for which the FBI feel right. So again, I would urge you to go through the scored Ah ah, again, on if you have any questions posted in the Q and a section on, I'll be more than happy to answer those questions. But in a nutshell, what we're doing here is that we take all the equities Ah, that we want to get data for in a list. Right Then we ah, run a loop. Where? And we get extract bed and we try to to extract data for every single stop every single stock on DA way Put reuse an exception handling technique to ah, not too kind off. Get around the problem off. Ah, the FBI connection feeling for some of the stocks within the initial list of stocks that we had on DA on. Just keep going until we reach the end off the loop on. Then go back on a read on that loop for all the stickers for which he had an unsuccessful run. Right on. This continues until we get information for all the stocks. Are we over? Be over on. Ah, the number off attempts that be initially declared. Right. So Ah, it's all love. Like, um, yeah. I mean, it's all very good, but let me from it, and I think then you'll be able to appreciate what I'm trying to do here, so I'll run it and then I'll take you through the log off. What? What's happening? And that's when I'll be able to give you a better understanding or give better explanation off what's happening, right? I have put ah ah lot of commands in the court. So when the court runs and there is a problem, we'll see that there is a failure, and then that will make things more easy. So I have run the cord. It will take some time to run. So I, um, forward in this video. Okay, friends. So the script has run, and it took around Ah, 23 minutes to Ah, run. So, um, I just, ah, take you through The lot of this is log. Um, if you see in the first past true, all these stocks did not run. Right? So all the rate of this, Yes. So 123456789 So nine stocks failed in the first passed through right on da. So in the second pass through, Ah, the same script that the loop was run for only these nine stocks on again. There was problem with IOC, right? It's one of these nine stocks. So in the second passed through, we were able to get the information for the remaining eight stocks out off the name and in the third passed through. We were able to get information for you. See as well, right. Ah, so if you just look at the stock CPI data frame Ah, let's see. And you can see that we have been able to get the information the close price for all the 50 takers that we intended to do, right? Eso the steak sometime. In my case, it took around 2 to 3 minutes. So each taker takes around 20 seconds to extract data. So ah, if when you run the script, you can go and just, you know, just just have a small walk and come back, it should be Relieve it. All the data. So that's it for this video. So I'll see you in the next video. 8. Yahoofinancials Python Module - Intro: Hello, friends. So in this video, we will look at another very versatile, um, by the library to extract data, which is core Yahoo financials again from name it it It is obvious that it extracts data only from Yahoo financials, but again, I'm calling it surely wasit ill. And that's because the way it extracts data is to Web scraping on not over an A p I for those a few who don't know what the difference between escaping and FBI wait for, of course, a great for ah, lecture that will be later in the scores where I'll take you to the rep scraping process and how to go about it. But as of now, just know that it's, ah, fairly versatile way off extracting data from webpages. So first, let me go to the to the get up page off this library to get more information. Right? So again, this is the page. Ah, you can just ah, girls, this library name and, uh, get a page should come fairly. Okay? No. So, yes, put get tough as well. And I want you to go to the good half page because ah, the description is very good the person has done a fairly good job off providing Ah, the information about this library again. So I again I strongly recommend you to go through this Goto document Just like other biting libraries. The installation is fairly straightforward. Pip, Install our financials. Ah, low to say. That's for the next something, Mac. I have been able to install it using the same command on my windows machines. So I don't see why it will not be a problem by it will be a problem. Ah, in case Ah, you're getting some problem with the pip Install. Ah, command, you can use this command. You know what? Let me just try it. Let me try to reinstall it and see what happens. So let me see how it works on this one. So it iss installing. So it may take some time. So empowered in the OK, so it ran. Ah, it has ah requirement. Already satisfied. So I mean, it should run. Um, don't worry about all these long messages. This is because my, uh, e I need to ah, get a more recent version off pit which are due later. I'm being a bit lazy enemy so you can use the same mark a man to install it on your, um, on your windows machine. Right again. Let's go back to the page on. Look at what other information there is. Ah, so again, these are the featured mattered sub which you can use so again you can get financial statements. Information. Our Yahoo finance publishers are fairly detail, you know, financial statement, income statement, cash, gas statement. So of the fundamental analysis can also be done using data from this library because Spuyten because Yahoo publishes it, and this more you'll and this library lets you extract that data as well. Okay. However, what we warned is, um, historical stock price data for this lecture purposes on this is the model that will help us. Ah, get that right again. They are like a slough off other more duels as well. Other methods. So I strongly recommend you to just test them if you can and see if it's like if it is gonna be helpful for your strategies, we will. I mean, be in a later course. We will go through a strategy off using data from balance sheet from income statements of phenomenal data analysis to devise the strategy. So we will do that. We will cover that later in the scores anyways. Okay, so this is a sample gored. So let us run it. Actually, we don't want all of it. We only want historical stock prices. So let me just copy the whole thing and I will delete the ones that I don't need. Here. Here you go. So I don't need balance sheet. I don't need them statement. I don't else. OK, so I don't need all this information, OK? Yeah. And, um OK, fine. So from the previous video you have, you must have a fairly good idea about how to excite. Used these libraries to get data. So all off, all of the library. They're fairly similar in the command structure of how to use these commands to get data. But here there is a slight different difference. And that's ah, what's happening here. So so any a few who are familiar with the concept object oriented programming concepts of fightin is based on Ah, Coop. Like Brighton is an object oriented program. It's everything is an object, right? So what's happening here is that we are defining. We're initiating a new object called Yahoo Underscore financials, which will, um which will have all the attributes that is part off the object called Yahoo Financials. Capital, Capital, I capital F. You're hoping and chills right again. It's a it's a bit Ah, it's a bit advanced concepts. So if you don't know about it, don't worry about it. I just just know that, um ah, been by using this command, we are only initiating an object which can have the attributes that are associated with this particular Ah, with this particular class off objects. Right. So, um, again Ah, yeah. So we are initiating an object here. And here is where we are extracting the data. So we extracting the data from 15 September 2008 15 September 2018 and and we're getting bigly there. Actually, I don't need one year later, brother. Take a lot of time. So let me just do one year of data and make it daily. Okay? And now let's run this sir and see what's happened. Okay, so this ran. So we wanted particular. And let's run this interesting. So it's said that the often Angela object has no active you'd get historical price data. Okay, that's interesting. So let me see. What is the relevant more duel. So get historical. Uh, sorry. We will do get underscore historical stock data. So it looks like the document needs to be updated. So it's one. I'll probably write a feedback on this, so this should work on DA. It is running. Uh, looks like this is the correct model. So again, a lot of times Ah, these documents are not updated as, ah, the functions. And as the underlying methods get updated, so we have to do some digging and ah, it's good thing that it be Well, I treat back to the person who had the author of the documents that they can't get a chance to update. Ah, those references on keep the document abated. Okay, so now we than this. Ah, this command. So let's see, What does historical underscore stock prices? Looks like hopes. This is not good, right? You would say, um, you will see that. Ah, we're not getting data. What? We are used to Ah ah, in the from a previous videos. And that is because this library extracts data in a Jason format OK, so any off you who are familiar with Jason for format? No, the worst reality off this particular data object. So I'll just show you a sample. Jason, file. So this is a sample gets and file. Think of it as a nested dictionaries. Right? So in beytin beytin data structure, you know that addition Eri is ah, dictionary is where you have t value spare on. They are put within curly brackets, right? So again, if you are not from 11 the dictionary data structure I strongly urge you to, um, just go through it and get a fairly good grass proper because ah, dictionary storm a basis off a lot off. More advanced data objects again. So in this case, the data gets extracted in Jason format. Right? And that's what's happening here, right? So it's actually pretty interesting The the data ago, Jason the J sounded a former. So this historical stock prices is the root file, right? Ah, but then so the key is apple. So apple is the key on the value is again a dictionary. So let us click on the value. So values again. Addition Dictionary where the first key value pair is events data on the corresponding values for the events data first rate date. Corresponding values of the first straight dead i d. Is spending. So we are interested in prices, right? So we click on prices again. Prices is a listed dictionary. So you may be thinking What? Why, like, Why do I need to use it? It's fairly complicated, but in the next video, I'll take you through how we can very easily extract data from Jason and how, and I think it's good, because you this will give you a hang off Jason format files, which will be highly beneficial for you if you decide to work in the industry because we keep using just on file formats. And no matter which industry you work in banking, be biotech, whatever. So ah, itself, it's a favor. Data up. It's a favor file format for storing data serialized data in various industries, so you should get a hang off Jason as well. So again, in this one, if you click on this dictionary, so you see. So now you see, some familiarities are just clothes high, low, open values after having like, so this is a nested dictionary data object on be the data that we require is on the fifth or sixth level off this particular it object. Right? So again, in the next video, I will take you through how to extract this. How to use this data to get it in a more readable format. Which LCV format that we have seen in the previous videos on how can be expand this ah command to get data for a list off pickers from Yahoo so over to the next video. Thanks a lot. 9. Yahoofinancials Python Module - Deep Dive: Hello, friends. So in this video, we will continue on the Yahoo financials like libraries, applications for extracting data form or complicated case. But before that, I just wanted to give you more understanding off how to access Ah, Jason data or, for that matter, a lister dictionary. So, um, let us just, ah, make a dictionary. So let's do Let's have a dictionary Gold Country data. And in this, um, eso In this dictionary, we will store Ah, like some information about various concrete. So let's say first is the USA on but then within, within us said there is a lot off information. For example, its capital city. Who's the president? What's the currency secret chakra. So I'll make another ah dictionary. So within this dictionary, I'll just to Capitol City, Capitola would be Washington D c Sorry. So just do every WDC and Terry, uh, on. Then what else? Ah, currency currency would be USD. Okay. Now I want ah third information which has more than one value. So what can it be? Ah, major cities. So let's have major cities. Major cities. Andi, this will have a list, right? So and why l a sa that secret, right? So So this is the data. Only for us. A right. All right. Now just will have just two key value pairs in this dictionary. So let's say France, right? No, actually, even Mark look good. So let's do it here. So for France, I don't know a lot about Franz up. Sorry. So let me just copy this and, uh yeah, so another capital iss Very currency. Used to be frank. Now it's ah, Euro. So Euro major cities I know there is. What else? Um, I don't know. Well, my g k is really bad, but, uh, other in Paris Ah, wasn't Dunkirk in France? Seattle's have done, girl. I'm sorry and ah Ah, yeah. Uh ah, That's mean. Talk about that. Sorry for any French people there, but I need to work on my, uh, geeky for sure. Right. So okay. Uh, s Oh, this is the ah, this is the life. This is the dictionary that were created. So let me just to run it. So you have concrete. Did I hear right? So say you want to access. Ah, currency for France. Right. So you want to know what is Francis currency? So How will you do it? So it's very simple. Ah, for for people who know. Ah, Brighton dictionaries. All you need to do is mention that dictionary name. Ah. Then just But then within the bracket, you need to mention the key that you are interested in. So in this case, it's France, right? So I'll just show you if what will happen if you run this command. So if you run this command, you will get all the information for Ah, for France, we get a dictionary that had all had all the information about France. Say you warn you were interested in currency. So you just you put another bracket on, then you will put whichever whichever key you were interested in. So if we done this, we will get euro, right? Fine. So armed with this information. So this is just sort of fresher off how we access Buyten's dictionary. I mean, I know for a lot of you, it it will be like a return information. But in case someone is not familiar with it, I just wanted to give it a shot because we will use this in ah more advanced ah applications off Yahoo often. Right? So now look at this example off using Yahoo Yahoo Finance shoes Ah ah! Library for accessing data for multiple tickers. Right? So just like in our previous videos very use banners. Data Reader, we have a list where we have stored information about ah, all the tickers that be warned information for So you just yahoo takers for all the companies that you're interested in. This all this is very similar from our pandas Data reader video. So we have the ende the beginning date in their time format because I find it more convenient rather than hard coating the dates right? We have ah secret bebe defying a variable that will have all takers information. And that is because we want to look through Ah, look through our takers And we will if you recall from a Pandas Data Reader lecture will be dropping the takers for which Ah, the run is um for Richard. I run is ah Is ah successful Right on. Then those will be dropped off. So that's why we have another variable that copies all the *** and then we're on the loop, right? This is all this is very similar from pandas Did a reader extraction video. Right where things change is ah, but here. So if you see here again, I have ah, initiated. Um, So once the loop starts, so for I in range land, see? P takers. So CP takers for once the loop is running, CPI ticked occurs. I will be It will be the picker. Ah, that the loop is presently in. So, for example, if for the third passed through of this loop so CP takers, I would be see particles to on. That would mean Cisco, right? So? So for every single run off the loop, we will initiate an object off glass Yahoo financials. Right on by then, we'll work with it. Then Jason objects that so then I am creating another very well call. Jason. Object on this Jason object will have, like all the information that comes with the with the whole financial get historical stock data call. Right. So pretty much this whole thing that we saw in the previous video, right? So once we so obviously we don't want our information. So what do we want? So if you look into this ah, thing be born prices, information right. So all this information, it's not really required for me right now. So from this from this Jason, for matter from this nested dictionary format, I am only interested in price is right. So I need to have another variable guarding them. A so called we shall be where you only extract prices information, right? Okay. But even if you do that right, so say so. Say you get this information. So now you have a list. So you have a list on each element off the list is a dictionary, right? So each element of offer listed is dictionary and those. And on these are the key value pairs off East Diesel, each of the dictionary. Right. So now we have to work. But they're So how do we go about doing that? What we're doing is that that's so remember in, um, in the a Panoz data reader video, I said that I use data frame to store data because it has a lot off really amazing properties on one of the properties is that if you get, um, if you get data like serialized data in a format such that you have a list andi element off each list is ah, dictionary eso. You can directly read it into a data frame. So and when you import such a list in our data frame than the key off, then the key off the dictionary will be the columns off the data frame on DA The 40 called the number of elements off the list would be the index off that, particularly of data frame. So I'll just I'll show you how to ah what happens? What kind of data frame objects get created when ah be import the it shall be variable into with data frame. But as of now, just take it. Take it from me that it will create our data frame. Right. So this command here will create a data frame. We will have the relevant, uh, columns that we required Form the prices. Ah, dictionary. Right. However, we don't want all the information from from that from the from that prices dictionary. Right? As of now, we only want the date on the close price, right? I mean, I fought this exercise we're only looking for I just too close. If you look if you want to extract more data, feel free to have more columns here if you want. Right. So, um, this command the stem eso this to command. We create a variable contempt, which will be a data frame having only formatted date and adjusted close as the as the columns off the data frame rate. Then this command will set the index off that particular data frame as the formatted date. So actually, let me to show you these values, right? So I go to Price is right. So, you see, the formatted date is the day that we want. There is a date in a different form out which we don't want in the UNIX format. Ah, this you. So it's not much useful for our information, right? So we'll need the formatted date. So this one on, we need just a close price, which is here, right? So these are two values that we're creating, right? That be extracting. And once we extract them, we are We are setting the index as the for matter date. Right? Because that's consistent with how we have been extracting data in our previous video as well. Now, this is where things get Start to get a bit hairy, right? so hair what I'm doing, I'll tell you. So. This is something that I found once I started working with Ah Yahoo finance data. And that was that. Once you extract prices data, there are some some elements off this list which are duplicated. Right. So you see the size off each dictionary is a trait. Yes. Keep your eye open for one that is less than eight, right? Yep. So here it's five. Right. So what is this? So this is that dividend information. So for some reason, I don't know. I mean, I don't think that this is a very useful information when you're getting stock price information. But somehow we also get the dividend payout information on the day that the dividend gets paid. So on 10 August 2018 Ah, 73 send dividend was paid by apple, right? Onda Corresponding. Which will be data for a tent. Ah, August 2018. Is this right? So, again, this is a problem. We don't want to get these. Ah, get this information. So that's why I have created one more command Onda again I'll take you through. I'll run the command and I'll show you how this thing works. But what I'm doing here is that I'm using one of the functions off data frames again. Later frame is extremely useful called duplicated. And once you run this command, it gives you a bullion output on That tells you whether each hero is ah is duplicated or not. Right? So if I don't this so I'm running this on temp. Ah, So the data friend that we created in the previous step on I am only looking at the Indus is off templates or if you recall in this is our the for matter date Right on. Then I am running this duplicated Ah, function right on This will give me a list which will have true false true falls kind of information, right? So most off the most off the rose in the out put will be false because most of them are not duplicated only where the dividend day or some different information or some other information is there you will have a duplicated Ah, you'll have a duplicate Ah, like the bullion as guess for those particular for that particular date. So one of the arguments off duplicated function duplicated function call is keep right. So if we could keep as give us first, it considered the first instance off the duplicated duplicated values as the unique value. Right? So if we done this command, don't worry about the squeegees. This is the sweeties sign here. I'll talk, I'll tell you. What are you doing with that? So if I run this command, it will return me. It will return a Boolean list off two landfalls, right on the first instance off their duplicated ah duplicated values that we get That will be false as well. Because that is considered as as, ah, as a unique right. So once we get that, ah has to get that list. We use this sign here. So this, um squiggle are what? You gotta kill their collaborator. So this this this simply does in worse off whatever we have, whatever you give after this particular operator. Right. So what we're doing is that we're saying the inverse offered so any night, all the false values become crew and all the crew values become fall. So if you are aware off data frame off China's data from working, you will understand what I'm doing here, but What I am doing is I am simply keeping all the rules that are not duplicated, and I'll show you step by step how it works, right? And once we get the information off, all the place it off all that just to close prices. Along with the date all the unique values be just upended toe the clothes, prices, data frame like we did in the previous video on we drop, um, and we drop their ticker from the from the unless that we had initially created so that in the second passed through, we don't have Ah, we don't repeat the extraction off that particular ticker. Right? So again, Ah, let us run this and I'll so before, within the entire cold. Let me just ah, let me just make me It's hard on this and show you so if we don't see Peter so waited, let me just do something. So, uh, let us make this apple right. And now that's run each one off the steps and see what's going on. Right? So I think this initiate the object. Now, if we done this, it might take some time because it's five years of data so almost five years of data, so Ah, yes. So it ran Ran again. You know how this will look, So I'm not opening. It will be similar to historical stock prices. Right now, the fun things start. So now if I run this but I'll before that, I'll have to change this as well, right? So right, actually. Oh, yeah. So yeah, so let me run this now, right? So let me show. You are actually so I shall be, like, be expect. Er is the list of dictionaries that has prices information right now is the fun part. So rather than running this entire command, I'll just show you the beauty orbiter frame. So when I run this speedy data frame, which will be you can see that a data frame has been created here, right? Sopranos, data frame. And it is able to pick up the relevant values for all the columns. So again, if you have a list and each value off the list has is dictionary onda the on the corresponding key values for four days additionally, is the same for all those off the list. You can very easily converted into Rada frame, but again, we want only formatted data. Just close. So let me run this and you can see a very nice looking data frame that we have created. Great. So Ah, yeah, again, like we discussed. You know what? Let me first make said the index. Right. So Index has been set. It's in place, so it will change. And within the data, bring right. So we have this now, see? So you see, for the dates that we hard dividend, remember? So we have got non values for those dates and you will see that it's duplicated. Right? So now if I run this command, actually, I mean, I I am creating a different variable to make it more like, uh, like, make it more explicit. But you can do it around this command on the same temporary balloon, it to waste memory space. So if I run temp toe, you see that the number off rose has has decreased. So all so, these are all the duplicate values that have now been taken. Care off, right? So once it's taken care off, other than that everything is asked for our previous video. So let me change everything and let me down it for all the takers actually to take, sometimes limiting it to 3 65 right? So only for one year of data and let us run. It should not take more than a couple of minutes, but I am forwarding the video, all right, so the script has run perfectly on DA. Let us see the close prices and see how it looks. Perfect, right? So we've got the clothes prices for the pictures that we wanted again, I would urge you to go through the script again. I'll be, Ah, just I'll be storing. I'll be saving this script as well, Like all the scripts that I discuss on, just go through it. And if you have any problem, or if you have any questions, feel free to. And if you have investigations as well to improve it, feel free to post it in the Q and A section. I'll be more than happy to take that discussion forward, so thanks a lot. In the next video, we will look into how to extract in, crowded later 10. Intraday Data - Alphavantage Python Wrapper: Hello, friends. So far, we have bean giving with the libraries where we can get daily data. But far anyone who is interested in in credit trading, you will need to get interrogated. And unfortunately, it's not. It has not been very easy to get intraday data like free and credit data from various sources. Most off the services are paid. However, we will discuss about a free source from which you can get the interrogator on that's album on the wanted. So let's goto the website. Ah, I wanted is right. So, like you said, Like it says, it's a deployed free Ap eyes in Jason and see if the formats Ah, and it also like it has FX and Cryptocurrency dies. Bill s. So if you're interested, you can try it. Ah, but mostly will see it for foreign credit equity creating equity data purposes. Right? So the first thing you need to do is to get an A p I. So it's pretty simple. You just go to get your free FBI key, right? Andi, provide this information. Um, put it Put yourself as a student and, uh ah, put any organization and then email and you get your free. A p i G s o a p I key is nothing but an alphanumeric string which you will get. So once you get it, I would urge you to start it like anywhere you want. So in my in my case, I have stored it in a four look or Alfa wanted, right? So once your key is stored Ah, we have Ah, the like. You have a way to connect, do this and get the data. So let us look at the a p A documentation. Okay, so again, Ah, the FDA documentation page is fairly detail and you can see what would all data you can get from Alfa wanted. You can also get daily data, but I use it mostly for intraday data. So again, for X cryptocurrencies, you can also get some technical indicators will discuss technical indicators later in the scores, but very, very, very good source. It's a godsend for people who are into intraday trading and who want to go back. That's the strategy is based on data free data again. So just wanting here. So all the commands that you see here they're not in fightin, So the FBI provided by Alfa wanted is not in fightin so but don't worry about it. Time police. Someone has made a fightin wrapper for a for Alfa wanted. So let us look for the get tab Page Alfa One pitch bite. And so this a person has made a very good bite. And rapper for Alfa Bondage. So again, ah, just go to the documentation off how to install it. It's fairly straightforward and started, so once you install it, which should not be a problem at all. Right, Um then let's see how to go about it. So again from Alfa Underscore one touched our time, cities or time cities is the more deal that you're interested in that gets the data being poor time series. So let us take it for a spin. So hair let me than this. So I already have this hair right? So I have imported time series. Now let's see how to extract eight hour day. The documentation say so. It says that, um, us, you again. So, like now I think you must make familiar with what's happening here. We are initiating or time series object. It's an object which will have same property as the class time series. Right on DA to get the Jason object with the intraday later. Ah, we run this command. So you see what's happening here? So there are two variables data and metadata on. Then you are running this command. So it it means that once you run this command two years don't get in cruddy. Ah, picker, you will get a list or a couple which will have two elements. First will be data which will which we will be interested in. And there's also something called Meta later, so we'll see how to work with this. But again, in this command, we don't see the time argument whether we want a one minute data or find five minute data. So let us look for the command that has all that. Okay, so this looks like something that we may be interested in, so let me copy this. Okay on. Let's see what happens here. Right. Okay, so he s is good times risky. Your a B. I find so again. So you can either just print just based your Ah, if you like e string hair and this will work. However, I am not a huge fan off basting Ah ah! Basting ah kee's hair. So let us eso Let's see how to go about this. So first thing you need to do is give the find location. Ah, your key. Right. So let's d uh, Bones, So let's have left. We first need to give the entire key part and this one more thing here, um, you want to give the name off the file because I'll be calling Just use key part of the very well. So I think it is key dark text yet right on just one. There's a one slight evident hair and biting. So every time you you give apart and you're using a windows machine you need to give to back slash is so yeah, I mean, it's ah, big, tough in annoyance. But if we're working with Windows Machine, we need to do this right? So once we have this, all we need to do is in the key argument just to open he But ah, read we want. So folks who are familiar with it, how to access files how to open files in biting would know this, But if not, it's fairly simple you can just go and look for the open command. And what open commanders with the are argument is just that it means that we won't be open it just to read. Had it been the blue hair, it means we would open a file to write something anyway, so I'm digressing. But again. So what we're doing here is simply accessing the key into of this argument. Great. That format that we want this banda's. So once we're on this command, it will initiate the f. I want bitch Time to ease object which will have all the properties off the given Plus right on hair. Ah, so yes. So if so, if so, you know what? Let me just surround this. Let's not worry about the daytime metal leader right now. So let me run these three. So it ran fine. So if we run this so there's for Microsoft and it's a one minute data to see. So we have got the data hair right, open high locals volume data. But we also have a meta data information. So metadata is more information about the data about you know, what's the interval start excited sector. So we don't really need that. So what I'll do in this case is that I just your data. And since this is a list and we are only interested in the first element off the list, so I'll just go with the first element and then granite, right? So Okay, if you don it on, be just calculated. And we just considered the first element. Then be sure get what we're looking for, right? Open, high, low, close volume. You know what? I don't like this one not to start in the column headings. So I just do one more things. Do data, dark columns and I'll change the maims. So again, this is a pandas. Ah, eight airframe property. You can change the name of the columns. So I just to open high, low, close volume rates. So just open. Hi. No. Sorry. Close volume. Right. So this will bring the data to the format that we are familiar with. Okay. So again, Ah, I would I would strongly recommend you to go to the Alfa Wantage documentation, generate the key on da who run it, run it for not just one taker, but, ah, number of takers using concepts that we have picked up from our previous videos. Onda let me know if you if you come across any problem so that we can discuss it in the next lecture, we'll discuss about Web scraping. Thanks a lot. 11. Web Scraping Intro: Hello, friends. So in this video will talk about Web scraping. So at the outset, let me tell you that the Web scraping is ah, like it's a very broad subject on by their companies founded solely on Web scraping. So it's Ah, it's a very broad subject and I do not pretend that I know everything about it. But I know enough to, um, to extract data. Ah, using this technique for my for my trading strategies. So and that is the intention off this Ah, video is to give you an understanding of that scraping so that you have ah, good enough understanding off this methodology and the relevant biting library that I used . And I'm pretty sure that using this video, you will be able to extract financial aid. I using this technique and even even like non financial data for like from simplex simple pages you can extract later. They're looking for using tripping. So first of all, what is Web scraping? So in a very general terms So let's go toe a website that I like, So let's go. So I am DVR. Come. Okay, so say you want to extract any information that is available on this webpage Problematically. That means you don't want to click on this. You don't want to type the U. N. Address on this website. Onda. You know, like then get then see this information and put it somewhere so reps craving will allow you to programmatically get any data that is available on the webpage into your database. So, in a nutshell, in blaming term distance how I define web scraping, Right, Andi, that's what we usually do. Ah, but also, I would strongly encourage you to go through this very nice article written on medium by Justin Neck. This is Ah, This will give you the basic understanding about the HTML gold that is embedded in the webpage. That's like what you don't see behind. Like what you see on the pages are very nicely for mattered. Um, like, you know, content. But in the background there is esteem. According Ana, this article does a pretty good job off explaining like, what is the farm aggregate protocol, etcetera. So I'm also giving this link in the course Do give it a need. It's a pretty quickly, but it really give you a fairly good head start on the topic of web scraping. Okay, so Ah, just one thing. So, um ah, bile. We are discussing about the background off the page, So why don't I show you? How did it look? I'm pretty sure most of you already know about it. So this is a very nice looking webpage, right? Everything is in Bagram efforts across. So if I click on inspect very scary looking looking window opens on the right rate on this is the HTML cord that this weapon is based off off. Okay, so on Chrome has a pretty nice ah feature off. Ah, highlighting the relevant part off the webpage if you hover over the corresponding ah html code for that particular page, right? So again, um, don't get over. You will not be required to go through everything in the script. I tell you exactly what you need, but I'll just tell you a couple of things that will help you. So, for example, if you look into the script, the whole script Ah, whatever is under the tag body. First of all, you see a lot off tags here. So tag esteem is way off saying that it's a new Let's call it object. So this is a new section or a new object after a tag, right? So it's like, Ah, headline again. Think off it in a very brought home, right? So anything that is under the body tag is what is visible to you on the webpage. Right again, you'll see a lot off deaths or David's nothing but division. So it's like a new section off in that esteem in that page, right? And also I receive Be here. B means paragraphs, right? So this be named way is the paragraph. This paragraph is so if you click on this, you'll see the exact same thing quartered inside this particular HTML block right again. Once you learn about HTML, its's, it's pretty Manuel like the way the pages are quartered. So Ah, yeah, I mean, but it's easier to parse because everything is so messed. Lord methodically accorded. So it's good if you are scraping, but it's horrible if you are actually writing about it anyway, I'm digressing. So again, be is paragraph. So these are some of the things that this article will help you understand, right? So then you start understanding. What is this esteem a script about. Okay, So Ah, so this is this is about paragraph. And if a webpage only has paragraph, that means only has, like, you know, letters, etcetera. Then it's fairly easy to parse. I mean, I don't really see much challenge parsing of a base that is only that only has some bad grafts And some, you know, some language, some English language. That's not really problem. Problem comes when you have something like this. So let me. So this is Ah, Microsoft Yahoo Finance page. So I go to summary, right? This is a very nice, stable hair. So you see, this is not a paragraph right on. Usually you will require we're parsing that scraping to get financial information rights. Until now, we have looked at getting the stock over. It shall be data, right? In this case, we will be looking at how to get financial financial aid. Ireland, for example. Balance sheet income. Statement it across. Let's go with balance sheet. Right. So you see, if you click on balance sheet, the u R l hair changes. So you look at income statement is the different U R L balance sheet. It's a different you are. All right. So, again, this is a very subtle changes. How like you from you. It might be a very subtle change part again. It's altogether different. You other. So be very cognizant about these things when you are doing the web scraping, right, So it's bad in sheet, and then it's animals. That's fine, Andi. Then so this is a table, right? So let us look at how this escorted so again I do. Right. Click. And I go to inspect the scary looking Ah, window opens. Right. And yes. So you see there is We don't see a B hair, right? We see something going table. So this is a table class, right? So if you hold her on the stable of glass, you see, the entire table is is highlighted, right? Awesome. So our job is much easier. So once we start to bars Ah, posit data. Ah, we will will be solely focused on this part of the script. I don't even need to look into the other parks off this scary looking esteem. A script. All I need is parsing Ah, the class. Whatever is the name, right? And this will pretty much cover everything that I want from this page. Okay, so at this point, I'll stop this video. Just take some time to sink all this information and read that Ah, link that I have provided and that I talked about on then open a couple off websites start you frequently, you know, browse through on. Just inspect the html court off that particular website and see if you you can understand what's going on there, right? I think that will be, ah fairly good basis to start our second venue in which I'll talk about how do we write biting scripts to actually, you know, to get this data in locally in to our databases, Right. So I'll stop this video here. Thanks a lot for paying attention. Please go through the video again. And ah, I recommend that you go through the other materials that I have talked about so that you are in ah, in a much better shape when we start the the next video which fell again, get very technical. So thanks a lot 12. Using Web Scraping to Extract Fundamental Data - I: Hello, friends. So in this video we'll see how to use spite and gold to perform Web scraping. So the two libraries that I'll use for this task is requests and beautiful soup. Ah, request is ah, we're fairly popular fighter library, which helps you establish a connection Written a Web server. So I think if it is ah, simply so when you type in. Ah, you are l our Web address in your browser and then you hit Enter. So there's a connection that gets established requests, does it programmatically Anna. We'll see how it does that. Okay, on good. Full supers again. Ah, very popular Web scraping library, Highly used. Um, it's utility lies in the ease with which it lets you bars html Leda and again we'll see example in this video us again. Ah, Both of these are very popular libraries, and there's a lot of document available on both of them, although you don't need to get into a lot of depth. But if you want to go people to do it and again installing these cells, these of those are pretty straightforward. So I'm not going into that. Okay, so let's start writing the courts are forced to fall whichever website you want to get the data from. So in this case, let's go with same Microsoft's balance. She data. Okay, so this is the website from which we wonder data. So this is the U N second would be Ah, say, let's use a radio Bill Carter Page. Ah, regress. Uh, I think it's start get yeah yet then. So here you go. Ah, that's just done the this point and see what we get. Okay, So Okay, so everything ran fine. So let's see, what is this variable page? So if we see this variable, it says the response 200 again. Ah, don't worry too much. If you understand this, this is just like 1/4 response. 200 is like saying that the connection with the Web surfer Webster we're having has been established. And now you can begin to either upload or download data on the server as you see fit. So this tells us that the connection has been established so great. So now the connection is established less. Let's get, um, all the content from the page. So everything I want everything from from that page Teoh you know, to come to me so that all I need to do is use a module call content, right? So if I then this Tran And if we see what page content contains so you see, this is the entire HTML off that off that Bethpage is here right again. It's pretty overwhelming. So this is not of much use for us because we are only interested in table and the table format. I mean, if you're really good in parsing, you can do that. You can use this information itself to get all everything that you need. Ah, from that repeated, you can get that table information. But again, I don't require, like, really export level understanding or for regular expression or some other methods off parsing this data Onda, I would not recommend, uh, going for that because we have easier way of doing that. So now that we have the content with us, we will use let's say soup rate, so we will use beautiful soup. And again beautiful soup has an amazing, um ah. No, not not So right now, we need to initiate ah, believed to initiate an object off a beautiful soup class So obviously what we want is the content. Onda, we are looking for an HTML Parsa. Right, So this argument is required because we are bar sing html. I think it's start and not XY. Yeah, so yeah. So now we have initiated an object that has beautiful soup that belongs to beautiful soup class and which is an HTML part, sir, Now that this is done, go at a variable school. That's a table and sue dot find all this is the function that is guard send for each team and parsing and you'll see why. Um So what This there do is that Ah, it will select which ever People original paragraph You warned from the from, like the entire page. So ah, in this case Ah, so the classes name ISS hardware, coffee. This. Okay, so let me I don't know how to copy this. Uh, yeah, Awesome. So if you do edit attribute, you can actually called be this thing here, So Okay, so let's take the whole thing. So this is the table that be one and then find all we need to mention what kind off tag I've been looking for. So in this case it's stable right there is the cord Ah that we need. So the first argument of find all is whether it's a table or paragraphs. In this case, it's a table. Then we give a former and then in curly brackets b mention the entire like we re mention works the class like the title class and then the name, the actual name of the class, which you can get from here right again. Ah, these things I needed to like again. I didn't know this information, so I had to go through a bunch off stack overflow Bayas to get exact formatted sacra. But again, once you do it, it's ah, not that difficult. It's just a one time thing. So let's run this now. This is run. If I had done this Ah, we have the table. So this, like eso table of this. So this is a sub. This is a part off the entire female content and this has only the table that we're interested in. Let me just show you what's the type hair So the type off table, Di Canio and again we will not need to. I've been rewriting the script. The Lord, go with all these things, like type of Sucre. I'm doing all this to give you a better understanding off what's going on and how we can so and how we can use this knowledge to make our cord better. So if you look at the type off table, it's ah bs for his beautiful soup. So it's being beautiful. Soup element results set. So it's a set. Think off it like a list. Right? But it's Ah, beautiful super results set. So now that we have it, we need to get and you can see there's a lot off, you know, use this information there so we don't want everything. The only warned these numbers, right? So where are these numbers? If you look into all these, you know the sub If you look at all these rules so e r is table row, right? The are tagged stable wrote tedious, stable table parliament Sacra. So er, table, row. If you click on this, you will s Okay. So this is this is the rule. Happy ending. I'm not really interesting. I'm looking for the rule that has the information. Yeah, this one looks like it may have the information of loving policy. If I click on this, the there's a ruined, then the STV is the columns, right? But is the first column second column. So if I click on this, I will get this number. So again, um, the S team a script is very methodical. It's just that it's just a thing off knowing the relevant tax and using that when you're doing the parsing. Great. So now what will we do? So we know that we need to get from this information hair. We need to get the tax with PR. Those are there. They will rose. And they like, That's what we are after. So what we'll do is we'll write loop for the end people. And this is because table is a set, right? So 40 in table. First of all, let me just show you what is the type, uh, within the set within. That is our set. Right? So let me show you. Okay, so if you see within the within the reserve said there is the bs four element tag. Great. So there are various tax within the sort of their set, so we need to access all these stacks and look for the ones that have the TR tag. Okay, so to do that, it's fairly straightforward. We will again use find. All right, so in this case, let's go with Rose. Right? Uh, so he dark? Fine. Um, what is it in TR, Right. So what's the spelling against capital? Courteous, Marty. It's a smart idea, right? So, er so let's see if this works, okay? Okay. Even I am not 100%. So I'm also I may have to change this scored head, and they're a bit because this is not something that I do on a daily basis, But I want to show how I would go about doing it. And if I find any, If I hit a roadblock, how will I, you know, resolve that issue? So let us see what happens here. This man. So fire on those. Okay, so it looks like this is working. So now that we have the rule again, the rules has a lot of information. Like, what's the format? Ah, you know mean so Blake Inish team ill. If you are courting this number, you also have to mention the format, the spacing at sector and for everything there is an HTML. There's a corresponding HTML court and all that gets picked up when you are doing a parting . So we don't really want all that the only one that texts, right? So all these rules that we have, we will need to run another loop. And in that loop, we will extract the text. Right? So that's dog get. I think it's get text. Let us around this. Wow. Nice. So if you see this So I am now printing the rules, and I'm just taking the text from the rules, right? And this looks more like what we are looking. All right, so this is the text version off the table that we're interested in. Let me just check period ending is the first row. Awesome. Yeah, that's what we were looking for. And the last line is neck tangible assets, which is that Daniel is also so now we have got it in the text format. So this was so I'll stop the video. Now, This was just to give you a hang off. What? Like what? Other various functions off beautiful soup and requests how to make a connection and how to get the data like you may have figured out from this video. Even I am a bit rusty because I don't use this often you needed once, Um, like, you just need to write the cold ones aan den like, you know, get data every day or something like that. So you don't get to code these on a daily basis, but it's very beneficial to have a thorough understanding of what's happening on the background. If you understand the concept of 100 it's not really difficult to get the relevant commands and then you know who ran the court. So we have got this in the Expo Mac. So in the next video, we'll take this one step forward. So from this text data, we will create a data frame which will store all this information on DA and that will be the end off scraping in the strategy section off discourse ill again, revisit scraping the same court that we will develop in the next video. And using that we will extract using the information extracted, uh, which will be the balance sheet in concert men financially that zika we will set up Ah, a strategy and see how the strategy works. So I'll end the video hair again. Go through the video again and try to, you know, familiarize yourself with the various functions off a beautiful soup on request on DA boast any questions that you may have in the Q and A section, and I'll be more than happy to answer them. So see you in the next video tanks a lot. 13. Using Web Scraping to Extract Stock Fundamental Data - II: Hello, friends. So now that we have a basic familiarity with Web scraping on duh libraries that I used often to perform Web scraping, I think we can go toe more detail. Example off how we go about extracting information, using Web scraping for more than months. Stock on DA getting information from not just not just one page, but multiple pages. Okay, so what we'll do in this case is that we will extract information for Apple and Microsoft Onda. Information that we will extract is the financial information, which is the income statement. Information. Balance sheet, information on cash flow, information for the animal frequency and not the quarterly frequency Onda. What we'll also extract is statistics paid, so statistics page has information like, What is the existing market gap? What's the beta? What's the various ratio is that craters frequently used for a stock, So I I usually don't need this information, but for some of the strategies that we re discussed later in this course, I do need ah, market cap information as well. So this is the this is the objective at hand, and let's see how we'll go about doing it. Okay, So using this script, we will be able to extract all this information for a number of stocks and stored it in a band as data frame. Okay, so let's get down to business. So you know about these two, um, libraries that needs to be aborted. Uh, I'm also importing panders because we're storing information in banners, data frame, which you'll see later in the scored. Okay, The pickers that I am going with our only for Apple and Microsoft. You can have, like, as many stickers as you want, but just be cognizant that on average, for my on my PC with my Internet activity, which is reasonably fast, it takes around 10 to 15 seconds for one ticker. So just make that calculation off how long the script will take to run if you are extracting for a number of stocks. But, I mean, you can easily go with 30 50 or even 100 stocks on the script should Brandon within 15 minutes. Okay, So in this case, we have Biggers off Apple and Microsoft, right? Onda, we'll go about and you just make sure that these stickers are the same as Yahoo Finance, obviously, because we are scraping the Al Finanz website. Right? That's like that's an obvious thing. Okay, so my usually make I have ah preference for ah by 10 dictionaries when I am trying to spell over trying to store temporary information which are then later import into panels data frame. And I do that because I find ah Beytin dictionary very flexible. Ah, and the the key value combination helps me tagging the information that I ah, that I I want to store So doctors another feature that I really like And also, I mean, I think we have discussed earlier in one of the videos and the scores is that I mean, if you have a nested dictionary, it's like it's just a one line gold that it takes to imported into a panel data frame. So I have a preference for ah ah for ah dictionaries. But you can use list at Sacra if you make if you have a preference for other data structures, right. But you'll see how this is helpful in this particular court rate. What I'm doing here is that I am running a loop on each passed through of the loop, will perform the operation or perform the Web scraping for one taker like so we'll look to every single ticker in the pickers list on. That's how Michael will work on. Then you will see that there are different court blocks within the script on I mean, as you can very easily figure out, each court block pertains to scraping one. You are great. So, for example, the first school block and scraping the balance sheet part off the my balance sheet part off the Yahoo finance upside. So this this thing here so Ah, this u R l here, right then the second gored block is ah, is the financials of that income statement Great. If you go to income statements at this u R l right to like ways so and that's what these court blocks are doing. And no, let's look at all of these crow court blocks individually. So what's going on Hair? In the previous example, I had hard coded the entire you are ill. But in this example, there are they'll obviously be a slight chain. And that's Ah because we have we need to have some level of dynamic, um, kind of a us, because the ticker will keep changing And you you can't really hard cord. You will for all of them. So what we do is that because the U. R L format is very, very like it's very nice on very methodical it start. Some things are constant and only the picker changes at a couple of places in your ill, and that's all we need to a program for that. That's all we need to provisioned for. So that's what I have done here in the U. N. Nothing fancy here we have already discussed. Ah, what's happening with the page contender request? You are ill and the beautiful soup. So I'm not going dead again, Right, Onda? Now the next step is a bit interesting. So Ah, you know, like we're storing in the rose variable everything that has a PR attack, which means table row, right, So we had discussed it, Uh, like in the previous video itself. Now, how we go about extracting the text from every single rule is use off another loop. So I run another loop and then it goes to every single harow off that I object that contains the HTML, uh, script. Ah, Esteem l scripts and then it will extract using this function. Only the text that is hardcore. Did we don't want, uh we don't want any of the HTML script that pertains to the format of the text are the spacing between different decks. We only want hard quarter text on this Get underscore text function. Does that right? I am using, um um paramedical separator, right? And that's ah, like that's one of the parameters that you can use. Forget text. And what it is doing is that if you provide a separator information, it will give you, um it will output the data with those separators by default if you don't have a separator, a beautiful soup. So I think that this get get text function has no separator. So it's like a is there continuous Ah, you know, first string that you get and then it's very difficult to figure out when something ends and when something starts when the next thing start. So that's why it's always good to have a separate. All right? No, What I'm doing in the in, In the latter part off this command is that I once we get that information, I am splitting. I'm splitting that like that text in. So I am pretty sure all of you know about the split function. So what split does is that it takes a string on based on which ever Ah di limiter. You want it? The kind of splits the data into, ah, lest into a vital list. Right? So then what? What I'm doing is that I am splitting the data and less so what will happen? Hair. So what this command will do is that will take this entire row, which you will get as ah dih LTD, which you will get an in delimited format. It will split it so that you have 12345 elements in ah, in a list, right? And then I am taking. And then I am kind of taking only the 1st 2 Ah, 1st 2 elements off that list. So what I'm taking is only the total revenue and the first value, and why am I doing that? That's because I'm only interested in the most in the like the latest balance sheet information, Right? So that's what I'm doing here and then yeah, so and once once I get that and greater than one. And I am using, like an if condition index, because in some off these rules you will see that there's only a heading and there is nothing after that. So I don't want Oh, you know, I don't want to get, like, extract all this information. So I I only want rules that have the numeric value, right? Not the headings. So using this if statement I have kind of taken get off that limitation, right? So once we get the rules that I want that we want, I simply build my temporary directory, which is who's ah? Whose key is the first element off the list? Which means this this select the text in this column on, then the first value is the text in this column. So that's all I'm doing that I'm building my dictionary with the title on the latest value off the latest value off the balance sheet information. So let me just run it quickly and show you step by step what's happening? And once you see like once you have an understanding for the for the first scored block, we don't ever discussed other because it's some liberal because police like every single core. Brock is essentially doing the same job, right? So you know I'll not run the loop. I'll just hard. Good. Ah, I just got I just put ticker as Microsoft, right? So that this variable so that when I run discord without the loot, this picks up the right digger right now, let me than until this part, right? Yep. It's all good. Great. So you have throws? Yeah, that's fine. So and now if I run this Ah, I actually want to show you how it will come when How the ah, get text separator. Ah, split this thing. Okay, so let me just run dis command without Lent and let's see what we yet story. Okay? Ah, yeah. Ah, Okay. So just do something. Borough in Rose print, row dart. Get next. Separate. Actually, I didn't show you the separate Rivard. So let me on this one as well. Yeah, yeah, but it s so let me run only this. Come on. And you can see this is all separated by this Whatever deal emitter I have provided. And now it is very nicely visible, right? And without this, it will just be like one string and then, I mean, it's not visually good right now, let me news the split mattered and show you how that looks. OK, so it should show me only the first and second Collins. Yes. Okay. So, like, you see it so the split splits into, like, split splits a string indo ah, into a list rate. And then since I am selecting only the for stew elements, I'm only getting that. And I'm pretty sure you will see something. Yeah. So, like, for example, here we only have the only have one element in this list. And that's because this is the heading and using this if statement, I am kind off excluding that from my, uh from my analysis. So I just wanted to give you are more step by step, announces off how the script is running, and that's why I showed you all this. Um, most of you already know what's going on here, So But anyways, just for the benefit of people who are not asking worsened with Titan has others are I just wanted to show this. Okay, so this is similar. I am not going to just make sure when you are building court blocks for different. You are ill. Let's make sure that this, um this command is the fines. Whatever goes inside, find all, make sure that it is consistent with that webpage using that inspect material in chrome or whichever browser you're using. Right? So I I checked for ah, balance sheet financial about income statement and cash flow statement. This is the same. So the heading off the classic title is the same, and all three of them are tables. So there there's nothing different. It was only for key statistics. Um, this was different. So let me actually show this to you. So if we go to, um, keys, this sticks on da se. I'm interested in everything. So if I right click and if I go to inspect, So if you see her this heading class stable, USB stacks empty 10 p x. Right. Ah, right on. This is so it's already highlighting the area that, um that you will get information for right. So again, nothing fancy going on here. Yeah. And also the last line for key statistics were is slightly different. You can see that I am in the way. I'm populating my attempt temporary directory is by taking those, like putting the key as the first element on DA value as the second element. What I'm doing here is that the key is the first element on the value is the when might museum minus one? It means the last element, right? And why am I doing that? And that's because if you ah, you know, if you use ah ah, get text, um, more you'll it will like because off these sub scripts and the sub scripts, these sub scripts when you extract them, they come in a separate column, so you'll have first column as market gap for in this. In this example, you'll have first column as market cap in Friday. Second column will be five on Third Column Will be this value, right? So in so whichever sort in whichever table that the superscript for a subscript you will have a column which will have that information. So that's why I went with the with the minus one or mean the last column off the table when extracted. So again, these things these things require some you know, some experience and also mean you will have to probably do some trial and error when you are, like starting out getting, you know, trying to Web Scrape because this is not easy like you. There will be a lot of incenses when the way this best team, according is done, is a bit complicated. So getting so getting daytime, the farmer that you want may require some level off, you know, according and some level of thinking. So this is just an example off how things can get complicated when you're tryingto Web straight, but booster time it's you'll be able to get around it. It's not that difficult, right? So Bill Hare for each passed through my dem directory is populated with key value pair from both from all all the pages that I want. That is the financial statement, uh, cash flow statement statistics. That's right. So once that has done the earlier A directly that I had that I had created, I just have one more command bet in. Ah, the key will be thicker. And then value will be this entire temp directory temporary directory that I populated old this information and so it will be a nester, the direct. So this financial briefly would be a nested directory, right that on the ah, the key is a bigger but within the values There are dictionaries as well. Right? Eso Let me now run this. Learn all of this and show you how it looks So again These are two. So it should take around 20 or 30 seconds So I can forward You know what? Lebanon? Forward this. Let me also tell you what I am trying to do here. So once you find so it's already done. So let me show you how this thing looks, right, So like you can see Ah, it's really nicely. Ah, this this Ah, this directory. It has been populated and it's a nested. Ah, directory. So keys apple and then within apple there are other dictionaries and then likewise that is ah, second element, which is Microsoft. But again by 10 Dictionary is not very good to look visually hit, so this would be just one command. So, like we have already discussed in previous video it's and Mr Dictionary. And it's in the format that that handles data frame understands. All I need to do is run this command and then combined financials would look like, really nice. And it will have, um, you know, information for each picker as separate column. So all these Ah, you know, Ah. Role in this is our all the information that I have got from, ah, financial statement, income statement, cash flow statement and the key statistics. And you can check me about everything. It right. So now that we have it again, this is not a finished product. Uh, we'll need to do some more work on this. For example, all this value is in text, right? Because you have not yet converted them into text, so we'll need to convert them into text. Will have to take care off these signs, like percentage, even Commerz. You see, for Blake for average volumes, so sometimes they use Ah, you know, shortcuts like M for 1,000,000 B for 1,000,000,000. Right? So things like that we'll have to manually inspect and then make right accord Take care of this. So this either take care off in ah, in the video away, we will discuss about ah, strategy which uses the same script and then birds on it. Right? So I will I will deal with these problems about data processing in that in the strategy section off the scores eso Right now I'm leaving it so you'll see that just just took conclude the last part of the video. I know this has become better longer than I wondered, but it's ah, this last two lines of court. What I'm doing is that what's sometimes say, you have 50 or 60 cords. They can be a few pictures for which you will get non values right and then get non values . Subsequent operations can be a bit problematic. So I typically ah, you know, drop. Ah, those non values, um s so that if there if all values are not man, that column gets deleted, and then I kind off re populate the tickers list by running this Cameron s. So if you have data framed or columns will give you ah, we'll give you ah, the like, really, really will give you the columns in in the list format. So that's the the subsequent two steps that I usually do. You mean it's not really required. But just sometimes when the court doesn't work for all the scripts, you have to work around this because it might give you enter if you have. Ah, a lot of non values in in the columns. Right. So with this, I will conclude this really long video. I'm sorry about that. I wanted to keep it short, But the subject, the web scraping is something that I wanted to give some time on because it needs to be on explained in in the more methodical way than the biggest videos. So I would strongly urge you to, you know, run the script. Ah, probably go to some other Web pages that you frequently visit and try to extract information from there. And if you have some, like if you have any questions, please post it in the clinic A section Ah, we're more than happy to help. And I might also learn a couple of things from you guys as well Eso these Ah, make it more interactive Discussion interactive. And ah, um yeah. I mean, I take it for a spin and, uh, please share your any findings or any problems that you may encounter with the script. So thanks a lot for paying attention. I'll see you in the next video 14. Updated Web-Scraping Code - Yahoo-Finance Webpage Changes: girlfriends. This is October 2019 and I'm creating an additional lecture video for Web scraping because one off you reached out to me saying that the format off your finance ah finance page has completely changed on. That's actually correct. When I came back to this page, I saw the former different. I looked into the HTML cord, and that is also completely overhauled. So that has unassociated me to create this lecture with you to just run you through the changes that have been made by Yahoo Finance and how that changes the court. And I also run you to the updated gored that really will need. Now it's not much different. It's just that the changes have made it slightly more difficult toe do Web scraping, but it's still very, very doable. Okay, so let me frustrated. They go into the HTML page so we'll do a right click and click on inspect Right on. It will take a student esteem. Ill source paid. Right. So what am I looking here? I'm looking here at I think this so you know. I mean, Crume has a really nice feature off if you hover over the relevant tag off the female code . It will tell you what area off the page. It got response to rate. So it looks like Ah, yeah. I mean, this or this Any off this Ah, will work rate. So the classes m or MBI 10 p x where they were right. So this is the class now that we will need toe Ah, kind off parse rate. And within this class, how do I get the roads? So if I Yes, so if you go to this are the blue underscored expanded class. You can see that it is actually giving you the rules that you were interested in. But some of these rules you will see that they are wrapped great every time you will see their drop down. But in here, right, you will see that that are the blue underscore. Expanded is corresponds to Laker wrap. It was right. But if you click on this, you will find individual art of Lou underscored expand page within these. Right? So you see r w underscore expanded class which corresponds to the rose within this in line element. Right? So again, what they have done is that they have started we have used something will span back, right? So if you use span, it's different from our tables that we have seen in the previous lecture videos. Spanish. Slightly more tricky. It is more flexible if you are 10 off. If you are a user, it's much better because you get these fancy drop down burdens and all. But again, Parson becomes a bit challenging. That's fine. Okay, Now let's go to the court and see what changes we need to make. Okay, so here is the court will be using Apple and Microsoft again. So everything else is the same. All that I have changed her is that I have when you have created ah, beautiful soup object within a soup variable rate we have We call the this variable Super eight. So when you do soup, understand a super would find All right. And we used to have ah ah, pr. Like I think if I recall correctly, the tag was thing Class Ah was stable on. That's what you are doing there. Here. It's ah sort of tags deaf if you look at the time here. Right. So all this is so this is their right on within their This is the class, right? So that's the only change that I have made her. So the tag it's changed to live on, then be right. Appropriate class rate. So once we run this particular chord, actually, let me than this now. So let's do it only for Apple. So actually, Microsoft, because I have it open. So if I than this Okay, now let's run this particular piece of gold. So if I run till this point, right? Okay. Within this variable, we have the entire this particular class, right? And we have everything that is within this particular class in the steam l page we have been able to capture in this particular variable rate. So now it's all about parsing this this particular table for their Okay. So again we will run a loop, just like in the previous video, right? And then we'll use the find all more duel again. And this time will be using the art of blue underscore Expanded class. The town. The attack corresponding to our w underscored, expanded, which, as we saw, we'll give you every single room. Okay, so that is what we will do here. So let me run this. And now that we have rose, all we need to do is run a loop just like in previous previous video rate on, then split the appropriate Ah, split at the appropriate point rate on then populate our dictionary temp underscored the I R is ah, temporary dictionary where I'm storing information for every single ticket. And that's what we're doing here, right? So if I run the score, let me run this entire section and let me show you what we get in them. Underscored the I r. Right. So if I show you temp underscored B i r. It is a dictionary rate and you see that we have been able to get something similar to would be hard in the previous lecture. Radios are in the older format off Yahoo Finance. Right. But you will see that for some off the rose. Right? We have We don't have a numeric value. We have. We have Ah, another strange late. And can you identify what's going here? So these are all in line rappers that we talked about. So if we go here right every time you have a grub down hair, right? So the art of blue underscore expanded corresponded with this entire record sections. There was one rwe underscore expanded that corresponded to this entire this entire drop down for operating expensive eight. So this rule will have operating expenses. And then the next element in that particular list will be research development. The next ah element will be selling general and administration. And likewise you will have. You will get Ah, bro, with everything here No, not like a subject hair. And then the corresponding numbers, right? It's not like that. So for every time that is a drop down here, you will have some problem, right? It's OK. I'll tell you how I got around this problem anyway. So we have been able to get this from just changing the appropriate tag and appropriate class rate on pretty much used the same court. Not be hard earlier. So this is what we have, right? So let me then. But everything's off. That means for balance sheet income statement, cash flow statement and key statistics right in key statistics. Also, there have been a few changes. There does things unorthodox of significant, but they have changed a string that they used to represent certain things they have changed it on. The court has also changed. So you will see that I have updated the class here because they have changed the class for the key statistics table as well. And here So let me run this on DA Yeah, let me than this and populate every single element off off both apple on Microsoft trade. So while this court is running, let me tell you what I'll do. After I have been able to populate the dictionary for border takers, we will have. Okay, so this is run here. So let me show you once again. How does our directly look Dictionary look great. So we have ah, every single roll captured some off them is kind off began. Ignore them. So we need the We need a way to lead these rules because this is not something that we are interested in rate. But then we have been able to capture the numbers here. Okay, so that's great. Then all I need to do is convert it to a data frame. Nothing new. I'm sorry. Ah, we have done this before. Same as the previous, um, lecture videos, right? So we have been able to get for Apple and Microsoft all the information we have been also able to get some extra information and we need to delete this rate. So all I need to do is kind off manage this data now. So now, like, all I'm trying to do is I'm using something called, um I'm trying to use regular X right regular expression. This thing that that is that you see here this is a regular expression and if you don't know regular expression, don't worry, right? This is pretty much me saying that if you encounter any character which is between small, little small said, we will delete it, right. And that's what I've done here. But you may ask, why not capital? Not Capitol alphabets. Why, why only small alphabets? And that's because if you recall ah, Yahoo kind off presents information in millions and billions and trillions, right? So, for example, for market cap, they may give me a value of 76.8 for B. That means that 76.84 billion rates so I cannot delete ah, you know, values red, and we have capital letter saying I can assure you that if you come across Ah, small Ah, letter. We can dilute it. It will not impact our data capturing ability late. So Dax would have done here. So if I run this part here, you will see that my combined financials is now much better. I've been able to delete all those extremists. Ah, it was right. And now I only have the kind off the values which correspond to a number. Our financial status Take our balance, she item rate. So this, in a nutshell, is what I have changed in the court. As you can appreciate, the court has not changed much, right? All we needed to change Waas Ah, change the bag, changed the class name right on dumb. And be a bit more smart about how we delete some extremists Rose rate. So that's the only difference that that is there in the score. Just go through the school compared to the previous court and shoot any questions that you may have, right? It's great that no finance changed it because, um, you will now be ableto kind of compare what change and it will actually, it should improve your understanding off web scraping as, ah process right in the value investing section where we actually used web scraping. Ah, scripts for Piotrowski score and ah, magic formula. I have also made some changes, although I'm not going to create a lecture video for them. But, um, I am going to upload a document mentioning that the Yahoo court has changed the open and Skordas yo finance html code has changed on. I'm uploading our revised Ah, Gord, please. Ah analyze board the previous version on the new words in sport their differences Right. And let me know if you have any trouble understanding what I have done in the new court rate on. Let me know if you face any problem or if you have any questions, tanks a lot. I'll see you in the next video. Thank you. 15. Data Handling: Hello, friends. So now that we are all pros in extracting data ah, we will allow. Look at how do we handle data once we extracted, like, what are some initial things that be initial operations that people form on the data? So without any further ado, let's dive straight in, do those functions. Right? So in this case Ah, in this Ah, in Ah, in this video, I'll be working on some of the tech stocks data. So I have Microsoft, Amazon, Apple, Cisco, IBM and Facebook. Um ah. And I am extracting 10 year soft data OK on dumb. But you see, I already have extracted Ah, the data to save time. Okay, So the first step is to make sure that the data that we've extracted is fine. There's no issue with the data. Right? So let me just first inspect. So I've saved Ah, the closing prices in the close prices data frame. So if we open it Ah, you can see that we have got 10 years data, right? But there are some non values, right? And I intentionally put Facebook because I knew that ah, Facebook would have non values because it was listed like back in 200. Well, I remember when I was still in school, and I had to loan money from a friend to participate in an I p o. So, Yeah, I mean, I'm digressing, but yeah, so I just I intentionally wanted to have ah, stock which will have man values. And, ah, you're gonna have non values for a number of reasons. It was there was no data in, um, whichever data source that you're pulling it from in in this case, you're open. And so there was obviously no ah stock data off our Facebook before it was listed, uh, and then maybe instances off man values when trading was suspended or, um O r. Maybe just a technical issue. And if you have non values in ah, in your stock prices are in the strong data that you're extracted, it can create problems for your algorithm later down the line. So it's very important that we cleanse the data and we make sure that we don't have ah any of the problems with the data. So there are two things that are mostly used to handle non values. Their fill any on da ah, drop any. So let me just show you. What do we mean by on again? All these things are functions off later. Frame rate. So that's why I say the frame gives you tremendous flexibility and tremendous ease off doing a lot off operations that you might have toe light, right? A lot of court for it. So that's why I preferred it. A payment a lot of people do. So in this case, let me show you what film ladders, right. So let's see, uh, let me go. And yet, so I have a liar so you can look at the arguments that it take right? So ah, the most important thing is value. So what do you want to fill the non value with, right? It can be a stacks scaler. It can actually be a dictionary as well. So that's pretty interesting, because the dictionary lets you choose what value you want to fill for any of the column. So you'll have a dictionary off key value pair where key is the name of the column, and the value is whichever filler you want. Toe. Ah, fill the non with for that particular column. Although I've never seen anyone using a dictionary for Phil Number this again. So fairly. Ah, strong. Ah, fairly good provision that you have if you want to do it. Great. So Ah, say, like mostly people put zero here, right there. So any time there's a non value you put in a zero Sometimes people do that. I've never done it. But yeah, on then there is mattered. So the other two most important Ah, perimeters that go into the film function method is what is the Met? Heard that you want toe perform the feeling so backfill as you may have guessed. Ah is ah. And then, um in your car so back full will take the last value, right? The last valid value and back fill all the man So in this case, all the nance will be 38.23 f Phil will be forward. Feels a back bend forward. Fill out the two most used filling criterias again You may have a different Ah, you may have a different requirement and ah, you may have human use differently it So again what is What is your choice off feeling is It's completely your call So you as architect off your algorithm need to decide how you want to handle these. Non. Do you want to drop it? Which I'll see how we do in a couple of four minutes Or do you want to fill it like so this is a decision that you will be confronted bit when you extract later on. There's something that you will need to take a decision on, right? So in this case, what I will do is that I will not delete Nan Rally was because usually if you delete Nan values Toby consistent you like, actually let me show you How do we drop? Right. So if you do close, close, rise as dot drop any rate. So drop any on again. It is the same thing. Eso gain access. Just remember this, like just remember this very clearly. When access is equal to zero in panels data frame, it means the operation will be performed along the column. When access is one, it means the operation will be performed along the road. Right? So if access is zero, um, and if I do a drop in a, then it will. So the opportunity performed along the column. If access is one the operationally performed along. Right? So in this case, let me let me for just to show you Ah, Is there an in place argument? Harris Bill, let me see so and in place is also very important. So any time there's a function and there's an in place argument, if you don't put it Ah, it's by before it falls. And that means Ah, this this operation is not performed on the original data frame, right? So if you do in places that will do true, that means your original data frame on which the operation has been performed. The values in the original data frame gets updated if in plain physical force, it just gives your copy just a prince or copy. So that's great, because I just want to display and I don't want to actually change the Facebook data. So ah, in this case, let me do drop. Now. Access is equal to one because I wanted along liberals. Now, if I run it, you will see. So it actually do. You did the entire column, so that's not something that we wanted, right? So because there was a non value, it just deleted the entire column, right? If I put access is equal to zero on, then the planet. Yeah. So, like you can see in this case, Ah, all the values before the valid. It's all of, like, Rose where there was any Nance Porter were deleted. Okay, So again, a lot of people may want that if there are any non values, delete the entire Oh, because most of the analysis competitive. So if there is one man value, you don't really have a meaningful announces. So in this case Ah, that's some people may want that and may go ahead with that. But again, I seldom go with drop any, um, not my choice off. You know, operation. Because because, you know, data is valuable. I think that is valuable. Just because there was a few nan values you don't, you know, just delete all the corresponding data because of the animals. So we tried to work with this. So if I Evert Taube wise a strategy using this data, what I'll do is that for the Facebook data. So all the non values, I'll just do a backfill. And what will it do is that it will assigned the same value 38.23 to all the values with start off this data frame. Right? So what will happen is that the return with dilated returned will be zero. So I'm not really bothered. So let me show how do it So, fella? Ah Fila. I just two met. Third equals do backfill. It might yell at me because of the access issue. Let me see what happens but don't given access on. And if I look at FB, it has, uh, in the same thing. So I think by default accesses ego action slightly forward is you let me know what happens if I do. Access equals to one. If I now run it, What happened? I think something. Yeah. So something really bad happened. And what it did is that it performed operation along the rule, right? So it took Amazon's ah value and then it operated all like the non value which was rich in this case was every value with Amazon value and it for every single time. There was, ah man value and it took value from Amazon and put it in FB. So again, access is very important. So when you are performing these operations, make sure that you have the right access in the in the parameter, right? So and now I'll do it in place because to true, Because I want this thing Toby saved in the original data frame. Now, if I look into the close price, all the man value will be abated with the value that I wanted,