Introduction to Predictive Analytics on SAP HANA | Daniel Easton | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Introduction to Predictive Analytics on SAP HANA

teacher avatar Daniel Easton

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

10 Lessons (1h 16m)
    • 1. Welcome to Predictive Analytics and Data Mining on SAP HANA

    • 2. Introduction to HANA Predictive Analysis Library

    • 3. ABC Analysis

    • 4. Exponential Smoothing Part 1

    • 5. Exponential Smoothing Part 2

    • 6. Scenario: HR Analytics

    • 7. Data Preparation

    • 8. Decision Trees (1) A bit of theory and math

    • 9. Decision Trees (2) Running Decision Trees in HANA

    • 10. Decision Trees (3) Effect of Categorical vs Continuous Data

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

***  This course requires a access to an SAP System. Should you need access, you can do my course of how to get access to a SAP system in 30 minutes for free. Just seach for "Your own SAP Cloud System in 30 minutes" ***

This Entry Level to Intermediate SAP HANA Predictive Analytics course will help you master many important techniques to start creating sophisticated, predictive analytics applications that utilize the power of SAP HANA and Business Intelligence.

The course is designed so that you can master all the techniques gradually, starting from basic and relatively simple techniques before moving on to the more demanding techniques that Business Intelligence Professionals use to create predictive analytics applications for their customers.

The course will take you step by step through the process of creating the required HANA objects, such as tables, views and predictive analytics SQL scripts. In particular, from this course you will learn:

  • Fundamentals¬†of the Predictive Analytics Library,

  • The structures involved, such as HANA Tables, Views, PAL¬†SQL procedures and more,

  • A comparison of the raw PAL SQL code with the HANA Analytical Processes available in SAP BW by creating the comparable HANA AP in BW,

  • Integrating Predictive Analytics into SAP BW¬†and¬†SAP Lumira


  • This course assumes no knowledge of the HANA Predictive Analytics Library.

  • BW and HANA¬†experience would be helpful.

What this course is not:

This course does not cover every single Predictive Analytics algorithm. It covers enough of the algorithms for you to get comfortable with using them and apply the techniques to any other functions. Covering all algorithms will result in a high level of repetition without any real value.

What sets this course apart from anything available on other platforms is the fact that it covers the integration and application of the Predictive Analytics Library with the various other SAP BW and visualization platforms.

This course will always expend so check back regularly for updates and more content, for example, integration of PAL into SAP BPC Embedded, more case studies for Regression Algorithms, Text Analytics and more!

Meet Your Teacher

Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. Welcome to Predictive Analytics and Data Mining on SAP HANA: hello and a warm welcome from him. Auction led to this course on Predictive Analytics and data mining on SAP. Hana. I have more than 20 years of step experience covering both functional European Morial speed Doubly, Abbott, Hannah and SAT BBC, and I hope to use some of this experience to make the topic of predictive analytics and data mining both interesting and warning for you. Should you have any questions or comments about the schools or its contents, please feel free to contact me. This entry level two Intermediate Sap and Predictive Analytics course will help him most that many important techniques to start creating sophisticated predictive analytics applications but utilizes the power of sap, hana and business intelligence. Discourses is designed so that you can master all the techniques gradually started from the basic and relatively simple techniques before moving on two more demonic techniques that business intelligence professionals used to create predictive analytics applications for the customers. The course will take you step by step through the process of creating required Hannah objects such as tables, fees and Predictive Analytics sequel script, in particular from the school to learn the fundamentals off the Predictive Analytics Library the structures involved, such as Hannah tables, Fuse sequel procedures and more. A comparison off the Rule sequel code with the Hannah Analytical Process. Available in Sap BW. By creating a comparable Hannah Analytical process, EnBW and then integrating Predictive Analytics into sappy W and supple America, the curriculum for discourse will cover an introduction to the Pell Library. We will then cover ABC analysis in both Powell and creating a BW analytical process to contrast the functioning off both exponential smoothing and then study on a charge analytics using decision trees. This course does not cover every single Predictive Analytics algorithm. It covers enough of the algorithms for you to get comfortable with using them and applying the techniques to other functions. Covering all the algorithms will result in a high level off repetition. What sets this core support from anything available on other platforms is the fact that it covers the integration and application off The Predictive Analytics. With furious are the SAP, BW and visualization platforms. Thank you for purchasing this course. This course will always expand, so check back regularly for updates on more content. For example, integration off pal into SAP. BBC invaded more case studies for regression analysis takes analytics and more 2. Introduction to HANA Predictive Analysis Library: Welcome to this introduction to Hannah Predictive Analytics Library Lecture two Festival. What is Pelle? The Handle Predictive Analytics Library is a state of predictive algorithms in the hand application function like and it was developed specifically so botanic and execute complex predictive algorithms by maximizing the data best processing rather than bringing all the data back to the application server. But an Appel is available with any Hannah implementation. Oft insulation off the FL and the Pell makes predictive functions available. That can be called from sequel script on Hannah. This lease off the power includes classic and universal predictive analysis algorithms in nine data mining categories. These algorithms are the most commonly used based on market surveys, and the algorithms are Julie available in other databases products. To get started with the pier, all you need to install Sepp Anna SPS, 12 will send still there a FL, which includes the pole and then enable the script server. So let's go onto the handle box. This chick, how we can see whether these prerequisites summit. So in eclipse, I'm logged on as the system user and in the downloadable materials. For this lecture, you'll find the file containing the scripted. We're going to run somewhere to right. Click and open Mark Sequel console Testing the cut. Now we're only at the start. Want to check the libraries? Whether the library function so installed in We would also like to start the script service . So I'm just going to highlight their slice of curd and executed and the chicken off the library brings back the function names. So these are all the algorithm sets available in the palce. Everything looks OK. We fought it. Started the script. Serving are also could have gone Teoh administration console Go to Configuration Demon, Go to the script server, which is over the and chick and also set the instance to one. So that was thought the script servers will. So now that everything is fine, let's continue with the lecture. Next, we're going to create the schemer and the project regards to working, going back to eclipse the same secret code we used previously. We're going to execute the creator and the state's key mus lines, as well as grant my user access to that schemer. When this is done and I opened my catalogue and refresh, I should not see that Pal Schema available to my use now that I've got my database catalog , how we really to create the repository and the project? So under the repository tab, right click on my user and create repository to Express and the report it to express, I'm going to create his pal and thats is easiest that so My work, my repository expressing, has been created. Now go to the Project Explorer tab and the project we're going to create is also called Powell and it's off type other. And in the Wizard folder structure, we go to sap Hana application development on the project we're going to create. It's an excess project. The project name is Pelle Connected Teoh hell repose to work space and the scheme of default scheme is also going to be Appel and the access objects were going to leave out for the time being and finish now that I've got my project sit. If I go back, Teoh, my repositories expanded and refresh. You can see that our project who expresses this well and it's a bit of a conflict, So I'm just going to go and say resolve conflict with local natural resolve. The conflict now if I created directory here, sites any folder and create a folder, for example, ABC for ABC Analysis Finish and activators. When I got back to my proposed to express, I can see these ABC analysis Next, we're going to cover the generation of the malfunction, and that is stand from within sequel script curd. And it requires the role f l p m creator s execute not to call this rapid procedure to create the hell procedure we call the procedure Safe system. If l lang frappe procedure create with the parameters here and I'm function. I'm in Citrus. So the area name is always city. If LPL the function name is the bolt in function name from pl And that is, for example, ABC. As you can see from the example below the scheme in moments, a scheme it we're going to work in that This spell in this case procedure name is the name for the purpose. Egion. This can be anything that we want. And the signature table is a user defined table variable on this table Contains the records to describe the position schema, name, table type, name and parameter type. Moving on to the calling off the Pell function. This is also done from sequel script, and it requires one of two roles and began to allocate its roles in the Demmer and we're going to call the schemer name dot procedure names for the procedure. Name is the procedure we created in the previous slide being the parameter table or the data input table. Rather, is the user defined name off the procedure Input tables. Parameter table is the user to find. Name off the procedures parameter tables and upper table is also used to find names for the output tables, and we're going to cover this also as the lick cious progress. And it's with overview because we're going to write this information to the date of us. So let's go on and allocates the required rolls to a user. So in Hannah again, logged on as thesis TEM user execute the two lines had grants those roles to my user. Remember to substitute your user for mine. Highlight this to roll rose executed, and then when we look at it, the users are under security users. Andan embarrasses. So user, I can see that the two roles has been allocated to my use. So, up to this point we created a schemer. Our report history workspace and created our recreate a package. But let's discuss some off the honey by six, and we'll start off with the repository. So, of course, Hannah is a sequel compliant database, and we can create those artifacts using sequel so we can go to the sequel Command prompt in type create table, for instance. But we need create objects directly. Var sequel. We don't have the benefit of creating them in the Sap Hana repository. Creating the Enviros sequel means that the sequel itself needs to be saved and re executed on each target system way. You want the content to be created. That's why the sampan it will separate today is the concept off the repository. The repository allows us not only to store the source skirt and other development artifacts , but it can also store the definition off this catalog artifacts. And we restore things like schema, definitions and table definitions in the repository, and we activate them. The activation process will generate the necessary sequel statements to either create or update existing objects in the catalog. This last full separation between what it's possible with sequel on what we could define in Hannah. Specifically, it also begins to provide a little bit of a data dictionary that provides additional services over and above what we can define it. See, the repository gives us our object management are visioning and our transport mechanism so we can have multiple versions often object we if we create indirectly and sequel, we wouldn't have any version ing except maybe on this sequel create statement. It also provides Thedc transport capabilities. It enables us to package everything all the parts of an application from this scheme of the table, the logic services and the user into a single farm. That recall IT delivery unit this fall can then be given to customers apartness, and it is very easy to install in the target system with the SAP and a repository. There's also patching bolt in, which allows us to deliver only those objects that have changed during a certain period of time. And finally, the handed repository supports suicide development using standard eclipse tools and the chicken and check out as we saw earlier. This allows better control over our defects, beat a team coordination than we would get by just writing sequel directly. Next, we have our skimmer, which is a mandatory database object and all day to post objects have to belong to a schemer. The scheme of in contents all cattle accord effects, and it will help us control access to these artifacts. Now, the last bit of looking at Sir panna basics, we're going to create a C S V fall that will contain the source state. If all our first analysis, we're going to create a table that we're going to let the source data into and created table import configuration fall that contends the mapping from a source file to the database table. So go to step Hannah in the Project Explorer Tap. You might recall that before recreate that are beautiful, ABC. So under ABC, we're going to add data. So we're going to create to fall the first of all, we're going to call the folder data and finish. And in under this, we're going to create a file that's going to contain our seop i Data's. So this will be we're with ABC seop A. Because the data comes from C A p A. On the extension will BCS fee so immediately when I specify, see his feet Philip in exile. But I don't want to go into exile. In this case, we're going to open it with the text editor and then in the far supplied in the downloadable material, which simply going to copy the information from that fall into CSP far and delete the Austro and activates this information is now available in our repository as well. So difficult to repository Expand Paul Workspace, ABC Data and these are data available, so we know after it available with we need to load it some more. So for that purpose, we're going to create a table. And under our ABC directory, we're going to create any folder table. And under this thought they were going to create and you file, I'm going to give it the extension cord ABC cope again. But we're going to give it the extension HDB table. This tells Hanna that recreating a table and it will regenerate the sequel code in accordance to that. So again in the dollar will material. I've already got the definition off the edge to it. HDB table. Copy it in. So your first in this far was first specify the schemer name, which is pole. We're going to create a Colum store table and give a description, and then we define all the fields in the state. And lastly, to give it a key, which is customer material on the fiscal per, we get to activate this table again. And upon successful completion off the generation, we can go back to our database schema and the schemers power. And if I look under my tables, this table is now available. Second, double click on their definition and see the fields not to like data from the CIA's fee fall into the database table. We too great a table interface far, and we're gonna put that under our data directory on the data new file going to call it again, ABC, C, a, P A. And the extension that's done would be edge D e T I for into first far and finish and in the dollar or material. The system could copy the values again, and this fall specifies again this scheme it pal. But it will say specifies a target table and a source fall. This Wallace did alarmist the limiting field and where of the FARC tights a header. When I activate this feel or this file upon successful activation, go to my table on do that to preview and the data has been loaded. Now that we've finished the introduction and we have it available in Hannah, we can move on to the algorithms. 3. ABC Analysis: Welcome back in this lecture, we're going to cover off First and Analytics Function, ABC Analysis. I'm first going to do a quick introduction to ABC Analysis. Then we're going to run ABC on Hannah using the Predictive Analytics Library. And finally, we will run ABC Analysis using the Hannah Analysis Process on 70 w. What is ABC analysis? ABC Analysis is a categorization method, which consists off dividing items into three categories. A. You will see with a being the most valuable and seeing the least valuable ones. This method aims to draw the managers attention on a few critical A items and not the trivial. Many see items Abbas analysis several advantages and disadvantages. The advantages is a reduction in investment and enable strict controls since equal monetary investment time and labor is not needed for war materials. The disadvantages of ABC is that it is not a scientific basis for classifications Off materials under ABC, ABC. It will also not be effective. It's a material are not classified into groups properly. So let's look at the ABC analysis in Hannah. We're going to perform the ABC analysis on the table. We've already loaded the ABC seop a table and looking at the data in the table is contained . Sales by customer by material for Cyril Fiscal Years. We're going to perform ABC Analysis on material movements so on material and quantity for the fiscal years and the investigation. So everything all the data in the table. So to build up this sequel statement to do the ABC analysis, I'm going to open another sequel console and going to the downloadable material. For this lecture, you'll find the full text fall that is required to call ABC, but little this progressively sundries to start off with the state scheme, a statement and then the drop and generate procedures. So it's past that in, and you could see that the creates statement requires us to heart could if LPL on ABC specify a schemer name, which is how procedure name that we're going to generators. You spell ABC, Brock and then the signature table. Now we don't have the signature stable in here at the moment, so if I execute this card, I'm going to get an era. So let's build this signature table. The signature table is also in the fall, so I'm going to copy the statements required best them in. So that creates his signature table and inserts the values in the signature table. But we can see from the parameters that it requires several table types. It requires a type for our input data or ABC data are controlled data and also results. These tables are not available, so I won't be able to execute this sequel. Now let's go and build a table types and from create, create table tired comment in the fall. It's copy that in and these creator terrible times So are dated. Time consist of material in quantity, and this will correspond to the aggregated values in our data file. The control table controls the intervals for each of the classifications. So, for example, we can say the materials containing 70% off the value or, at a category, materials the next 20% or be on the next 10% off, see category materials and then results stable will contain I, the ABC classifications. Well, it's the item. But before I execute this curd, I'm going to delete all the types and all the tables that might have been created during a previous iteration. And this file is also available on downloadable material, going back to my sequel Consul in Executing the curd, letting with the effective. So first of all, under tables, we now have a signature table in. If I'd opened their to preview, we can see the interest that we've entered in the sequel. Consul, also under procedures for a fresh you can see our how ABC procedure that we've created a swell us the resulting table times for the corresponding tabletops. Next, we need to specify a data table and continuous the Kurt so going back to our ticks far it's post in the reso the code. Before we execute this, I'm going Teoh extract again the data table statement and put it in another consul and sit my schema to pull again and execute this. Now we can see what our data table looks like. This wall So going to table something to refresh and I could see my data table and looking at the contents, we can see that now we have our aggregated value for material. This is the data that will be subjected to the ABC Analysis. The race of the sequel code is to fool the control table and that is the percentages for each off the category Surfer category A. It's a top 50% and 30% in 20% and the next statement cause the ABC procedure with the relevant parameters. So let's execute this full statement and they all the results so refreshing are terrible sagging. I could go to my results able open dated preview, former distinct analysis, and I can see that 2% of my materials off account of 25 or A B and C it's it. Now. I could go back to my console, change the values so I can change it to 70 2010 executed. Look at my results table is gain exactly the same thing and we can see the difference in your control table. Entries made to the results with the CEP Hannah Analysis process be doubly. Users are able to use the functions off the SAP Hana database and can combine these with the functions in Set B W. It is available when the Predictive Analytics libraries installed, and it is recommended to use the analysis process above the analysis process. Designer. The A P. D, however, contains more functions and the four might be a fallback if the functions are not available in the hand analysis process. Before we start looking at the analysis process, let's look at the data. So for any created the Cube with the loads and the data is also supplied in the downloadable material. So the Cube see appear. Sales data consists of the material insulted party on the key figures. Then Martin Quantity a force alerted, more sedated for material insulting have created a P matinee be sold. And if you look at AP, Matt, it consists off text on more sedate and the owner. A tribute is a PD Matt ABC attribute, which is a single character, and we go to store the ABC indicator in this info object. Now let's create analysis process. We go to set up an analysis process, and I'm going to use the bell in for area on. We're going to say create analysis, process analysis process I'm going to create is simply going to be called ABC in the overview we specify, are in for a provider. In this ghost is going to be our cube, a P D. C. API, which I just displayed. The data analysis is going to be a predictive analytics function, and it's going to be the ABC analysis and our data target is going to be a DTP because we want to update our material in for object granted data source who simply do the mapping from our source cube to the process. And we're going to say our item is going to be the same as in our malfunction and is going to be material, and the value will be the quantity under analysis. We interrupt percentages, and in this case, we'll say 70 20 10. And if it's not equal 200 the system will give you an error message and then on our data target will simply be the DTP on active it. So once this is complete, I could go back to mine for object and say to my attributes and created transformation, and my source will be a Sepahan analysis process on the name of its will be ABC and I simply do the mapping a swell some ABC clauses, the ABC cloth that I just calculated going to my attributes and active it. Then I create the DTP and it will take me back to the seatings off my analysis process. In this case, everything is OK. I'm simply going t activate and executes and the process is complete. So if I go into my intro object now, but ABC indicator has been populated. 4. Exponential Smoothing Part 1: Welcome back in this lecture, we're going to cover a second analytics function. Exponential smoothing. I'm first going to do a quick introduction to smoothing. Then we're going to run, smoothing on Hannah using the pal. And finally, we will run smoothing analysis using the hand analysis process on set. So what is six financial smooth? It is a very popular scheme to provide a smooth time. Siri's, whereas in single moving averages, the past observation. So weight, equally exponential, smoothing the science and exponentially decreasing waits as the observation skits holder. In other words, recent observations or given relatively more waiting for costing than older observations in the coast. Off moving averages, the words assigned to the observation or the same and are equal to one over the number off observations in exponential smoothing are ever there. One or more smoothing parameters to be determined and thes choices determine the way to sign to the observations. We're going to curve a single double and triple exponential smoothing. Single exponential smoothing are suitable to time. Siri's without a trend and seasonality. The Smith value is weighted. Sum off previous missed values and previous observed values double exponential smoothing issues with is a trained but no seasonality and triple exponential smoothing model time serious data containing the seasonal elements. So let's move on to the anuses before we just continue to Hannah. Let's talk about our day to sit well. Hughes for exponential smoothing where there's always on everyone's mind and with the drought in California in Africa and major storms elsewhere are thought, why not use Hannis predictive tools on some climate date? Even if nothing else, it will give us a decent sit off real numbers to play with. Searching for the Answer Index, the El Nino Southern Oscillation Index. We get a site providing us with the raw data for in so 3.4 here on the side, going to get external data. I simply copied and saved the data as a CS fee, and we will use this data and upload it into Savannah in the Pell Project. In hand off created a folder full smoothing and folders for the data falls and table false we're going to need first we have the ends, a rule that says freefall. I'm going to open with the text editor and this is simply the day tough, copied from the website and served to the CS Freefall and then copied into the Santa Fall. Next, we're going to need the table. We're going to learn the state into. And the definition off this table is in the role that HDB table and you notice that we've got a column year, which is the key, and then a column for each off the key figures. We're going to create a table into edge to be table, which is the transposed view off the roll data. It contains the year, the month as an integer, and we're going to use that for sorting on the in the value Teoh be able to load the internal see SV data into our table. We're going to need a in so doctoral HB tr. So the table interface file, which simply says which file goes into which table and to load the data we simply activated . Now I've already done that and to see the results in our table, we go to system raw data and up indicted preview on the on my loaded fatties. No latest transposed this data. And to do that, we simply run some sequel code such around the sequel curd. I'm going to open the sequel console and Paste in the sequel, and this is also provided a spot of the downloadable material. Before I run, the selects licious truncate the table run, it's. And if I open the data preview, the table is empty. Not going to paste in all the secret code again and run it and refreshing my terrible. I can see that the data is now all the duties in the column on the month. It's simply the four sorting purposes. The Hannah Predictive Analysis Library Developer Guard contains all the information we need to be aware off when running these algorithms, so it explains the algorithms. A little bit of the math explains how to generate the procedure and how to call the procedure, as well as the parameters required in the generation and cording procedures and souls. So in the generation week required input parameter after table types, and we're going to cover the creation of the table time and we're going to create also the in the table using sequel because we want to persistent table that is used by all three are algorithms, so the single double and triple exponential smoothing will use the same data table to create the date table. We're going to run some sequel, and that's equal is also included in this lecture. So we run the sequel Pulse Muse, data table code. I'm simply going to cut based it into my secretly. That is, I've got Secret editor really posted in the curd, and we're going to execute it. So just to explain the code first, I'm going to drop the table, then the existing table. If there's any and I'm going to create this table with the time I d. That's an integer. Any generates in effect, the keys, starting with one and the increments with one, and in my values double go back to the developer guard. We see that the table requirement is this column is integer on the I D description is I D. But it could really be anything that once in the second column is even into Joe double. So we chose the double because you've got decimals, and then we simply select from our insert table where the value is not multi. We don't want any moral values because the whole procedural for over if there's a no value tell spells, Use data type. So looking at a Pulse Smith data table open that Previa and the other values bested into the data that pulse Miss Data Table. We're now ready to move on to the sequel code for the algorithms and souls, so it's very similar to the ABC we did in the previous lecture. We first start off with creating the tables and the table types, but then created for the signature table with first terrible types within, create the procedures and full the parameter type. And in single exponential smoothing, these two parameters were looking at. That's the Alfa on the four cost number and the city to six. So that's the number of periods said we will create a forecast for then create the statistics and the results table and call the procedure itself. Self executes is already, and we concede this interest in the statistics table. Next the double exponential smoothing pretty much the same as a single exponential smoothing. Although there's an additional parameter called beta, enough already execute that is executed that as well and then moving on to triple exponential smoothing. There's a host off parameters, so this chip, and particularly want to look at it In that cycle and season. The Predictive Analytics Library provides a function called the seasonality test that provides that values to us. So if you look at the results, it specifies that the cyclist 12 in the motors additive from the documentation. We know that the seasonality servic mode editors is one, and the cycle is 12 so I can go ahead and execute the triple exponential smoothing this wall. We can also look at the results off these smoothing algorithms because each off the set of secret code had its own results table. So we can look at the results of triple smiting, for example. But I knew the data previous, and then we can see the results on there will be six periods off. Four. Costas. Well, we can look at the results off these different smoothing techniques by looking at the comparison table that I've created and the sequel Curtis again included in the downloadable material so we could see the results of the triple double in single exponential smoothing in this chart with the original actual values. This well so you can see that single exponential smoothing produces a straight line four course because it basically reverts back to the lost of the value, all the double and triple exponential smoothing Bush should have trained, and then the triple exponential smoothing show school seasonality as well. 5. Exponential Smoothing Part 2: welcome to the second part of the time series smoothing lectures. We will create a triple exponential smoothing analysis using the Hannah analysis process before we create a Hannah analysis process. Let's discuss the data flow for data provisioning into the analysis process. Currently, we have a table in the Pell Schemer, which we want to use and be doubly for the hand analysis process. In our scenario, we don't want to relearn the data into B W. Q. But we want to use the same life data when we sit up there. Hannah Analysis process To do so, we will create a few objects in Hannah and B W. First, we need to sit up the small data access layer. So what are we doing in a small, dark data access layer? We are really delegating sequel access to the remote resource and then fitting the result and working temporary on the flight in the state of base layer fire Virtual table in Hannah . This is happening in the background, so there's really nothing additional you have to take care about. And Seth Hannah is managing this for you. We just have to, from a BW perspective. Once off, create the connection to this remote source as a source system. I've already created it, so we will just display the seatings. Let's set up the small data access layer in Sam Hanna now for ready. Set it up. So we're just going to display the seatings, but I'm going to take you through the steps I'm going to use. The VW's is set Mel in my Hannah system with a set of HTC, and I know that we're going to go to provisioning, small data access and remote sources, and you can see the authority set up my SAP system as your muscles, the double clicking on it. We get our settings, and it's important to note that when we create the small data access, we need to sit out Adapted type As Hannah DBC, we enter our servant port details as well as the user details. So let's do the steps under remote sources who say new remote source and sitting the adapted top to Hannah BBC. We'll see the correct driver and we can into the server, the port in the user. And this use is important because Zaplana passes the sequel statement to this user and we need to make sure that this user has the relevant authorizations in the schemer. Four tables, queries and so on. So once this is set up, I'm not going to say if this So once this is set up, we can go to the sub d W system and create rcep Hannah system of the source system. Have already done that as well. So the Smyers set HDC system as a source system and dr clicking on and showing connection parameters. You can see that it uses Sinus, more data access and very much sources to step HTC system. We also set the DB owner or schemer, and that its SAT mel so that it's the SAP system. Can Katyn It'd with a sit off my B W system, which is Mel in this case so I can check the preferences and everything is OK. The next step is that in Hannah, we need a view in the schemer off the BW system in order to create an open, odious fee on top of it. To create the view, we simply run some secret code on as usual, the sequel curd is included in a downloadable material going back to sap Hana, we have in our pal schema are smooth day to table. We can't use this to create an open dears view because our small data access layer is connected to a set mouth schema. So we need to expose the data and set Mel that to do that, we're going to creative you in a set mall schema with a set Mel user, we're going to create the view. So we're going to do that. You seen sequel Curd And in my sequel, Consul, I'm going to post in the sequel curd try and executed. And once this is executed for a fresh my views, we cannot see that we've got our pulse mistake, have you here. And if I do an open day to preview, you can see that the data survival The third step is to create the open DS fear on just three literate in sappy w Sepahan. A small data access is used as a source for the open DS view. So what is an open deals for you and opened ears? View is a virtual object that allows us to define analytical semantics without using in for objects. The last two years, analytical functionality on top of external data structures in a very to create the view I go to be W or Eclipse, and in this case, I'm going to create it. EnBW and we're going to go Teoh Powell Infer area and right click. I'm going to create more open ideas. View that the opened ears for you on when Tikrit is Hello, everyone, and I'm going to call it smoothing data. The semantics is a fact. Table and type is a virtual table via Hannah. Smart data access the remote source. ISS myself HDC semana system source system again this my Sampanis system. And then I specify the DB object name and the DB object name is my view. So I'm going to go to Hannah, select my view and simply copy the name into the object name and create. Can I specify the view fields? So I'm going to space of time as a key characteristic and the value as a key figure on we can execute. So once it's executed, we can look at the associations in this. None of this goes because we're looking at a flat few and the pretty for the career he shows us What the valleys. Also, I've got Mike again, Mikey feels is my time. Markey figures my value before step is to create the compasses provider. The composite providers, basically the successor of things like the inverse it on the multiple wider and allows us to add, on top of a set native hand environment, the VW analytical manager. And that's a lead manager functionality. Also, the Han analysis process cannot use an open, dear spheres and infra provider, so the compasses provided is the perfect vehicle for that there. I can only create the compasses provider in eclipse. So in a suitable project former VW system I've got to go to my Predictive Analytics library in for area right click and saying, New complicit provider. The compass it provide I'm going to create is going going to be called Pel si p one called smoothing data mixed. I'm going to create a union provider since there's only one in four provided involved in this complicit provider, and that is my bell early one odious few and finish any step on you to do is the field mapping, and since this one only one infra provided, it's quite easy. I'm just going to drag Tom 80 and value to the target and then I can activate. Once I've activated the complicit provider, I could go Teoh my BW system, Refresh the view and I could see these must moving day to compensate provider right click displayed data and these my data. So I'm really now to create my analysis process. The analysis process for double exponential smoothing is very similar to that of ABC analysis we created under the SAP Hana Analysis Process tab in our Predictive Analytics Library or Powell in for area, right click and say create set up an analysis process process I'm going to create is a tes The infra provide I'm going to use is my compass provider the functional script drop down. It's triple exponential smoothing, and I'm going to put it into a analytical index. Hated it. A source We map our time e i d to the i d on the value to the value that analysis, we can change the parameters if we choose and then under day to target. I've got the time in the value. It's like now execute acts of it on execute because this is analytical index I can use the bigs query designer to display the data. So once in bakes, I can say create any query, select multiple exponential smoothing Analysis index and create the query, sir. Valiant columns and dates in rows to save the query. Call it his cute Triple A one his results and says production. Or is 40 Query Tae's Que Triple A one generous support execute and the old the results of my Cree. 6. Scenario: HR Analytics: Hello. In the next elections, we're going to look at classification and decision trees and random forests in particular. Before we jump into the analysis, let me oriented you to the process. We're going to follow in the next few lectures. The analysis process we're going to follow will consist of the following steps data preparation. The stick consists off loading the data visualization off the data so we can get a feel for it. And they're preparing the training and testing data training off the classification tree. Here we will use training data tipple, the classification trees. We're going to look at two trees, the court and the C 4.5 tree. We will also spent some time on the algorithms on how to build these trees and predict mysteries. In this step, we're going Teoh, use artist A to to make predictions to try to answer the question which of our best employees are at risk off leading The problem we want to resolve is why are the best and most experienced employees leaving prematurely can predict which employees are at risk of leaving next. The day to sit is included in the downloadable materials for this lecture and the fields in the data set includes the employee Satisfaction Label Lost Evaluation number of projects, the average monthly hours time spent at the company. Whether they've had a work accidents for not zero denotes no accident, with the one denoting an accident has occurred, whether they've had a promotion that lost five years against your own one, the department to work in the salary and then the predictive variable whether the employee has left or not. Zero indicating that the employee has not left on a one indicates that the employee has left. Now let's look at the data loaded in Hannah in Hannah, under my Pal Project and Pell Repository Will Express has created any direct ical classification with the data and a table. They retreat. Let's look at the edge City HDB table definition off ample, you fall a couple of things I want to highlight. Your first is that employee is a character fuel. Next are predictive. Variable left is also a character field off length. One satisfaction and lost evaluation or both. Decimals projects. Palestinian accident are introduced as well as promoted on the wrist or character fields. So let's look at what the data looks like in the table. So I rightly come Appel Pacific Ocean Table Open data preview on these two data So we now get ready to analyze the state. If you look at two sets of records, why did the one employee go on the other state? Given that this satisfaction levels are the same and the performance evaluation during the lost review is pretty much the same assualt analyzing another set of data again, why did the one employees leave and the other state, given that the hours are almost identical and so on and so forth now, plotting these recalls, we can start with the visual representation on the first question. We need to ask from selecting any two attributes randomly. Can we create some sort of classification that could explain why some employees stay on some leave. Is there a line on this craft that we can split the attributes values and get that give us a reasonable, accurate indication off who will stay and who will go? If I make the call and say that everyone with satisfaction index off more than 0.5 will stay, I can increase the usefulness off the analysis in that there is then a distribution of probabilities as follows 100% off those with the satisfaction index off 0.5 or more will stay, and this can be visualized by the hissed a gram on the top right, showing the probability off. See the clause given a value. While 2/3 of those with satisfaction index off 0.5 or less will leave While doing this. I've increased information game and increased entropy, but more this on a later lecture. I could do this for each two. Don't need dimensional combination of attributes. I can't even perform the falling by adding to splitting points in the Astor mention. But which combination of attributes values give me the most information game will tell me the most off the probability that an employee will leave a stay. The question is harder, which is the attributes for analysis, and we will answer this in the next lecture. But before we start building the treats, let's performer preliminary analysis off the day to using a larger day to see for this initial very visualization. Let's look at LEM era. A show enough already loaded the date in Hannah that could, of course, run the visualization using the CIA's freefall attached or connecting it to Hannah. First, let's look at hard to connect Liam era de stopped to Hannah and then run some basic visualizations in order to access Hannah Date. INLA Mira, I need to creative you and an analytical view who with sufficient. So let's look at how we create the view in Hannah. I'm going to create my analytical view under my Views folder. And I already have a view here that I read my Amir analysis on site. I want to interfere with this. So for the purposes off the demon, we had to create a new one. So I go to new other analytical view next, going to call it employees Dema Click on the label to give a reliable and then finish been under the Data Foundation snow. I'm going to add an object, and that's going to be my employees database table. Okay, then I simply double click on each of the fields to add it to my columns and I can save. Once it's saved, I'm going to go back into it and my tender view properties, which is the default schemer, and then I can active it. My view is now active, and I can use it in Amira. Now let's look at our data through the mirror, so any up in a mirror you get the recently opened documents, and you can see that offer. Have an analysis for Hannah and one for C. S. V. But let's create a new one, and my data source is going to be sap Hana. I could just This will choose the CSP far the next into the connection settings for Miranda system and that you can see the view, the demo view that I've created on the one I created earlier. So it's used, the one created earlier, and next selected are mentions and the measures optical creates. What's this has been created. I can start with the analysis. So the first analysis I want to run is a scatter plot that shows the relationship between the hours we work unsatisfactorily. So we simply drag in the hours worked on the satisfaction level two mikes and y axes, and they don't want to see it by employees. And with life lift will state sorry to Dragon employees and Ramirez complaining about the maximum number of data points of into full trod I to some great to filter it on salary and the high and the medium level salary employees. The data points now give me each of the employees is a different color, but I'm going to drag an employee left and put it above employees. And that will give me in a good indication off who has stayed in who has left. So you can see that all the employees with the very low satisfaction label and high working ours have left. He's a cluster of employees over Bay, and summer would be there s lift the restore scattered all over the place. So we need to do some more analysis to determine how are we going to perform that split. That virtual line that we're going to draw and Oscar question if the satisfaction is less than a certain amount. Employees sleeve, whether the ass or above a certain amount and employee stays or leave. So let's do another analysis this time. I'm going to look at the number off years service so that Kenya and that's going to be a sciatic plot again versus satisfaction. I'm not going to do the same with employees and lift and faulty. Here is a very clear indication. Everybody over seven years, 10 years or seven and above has stayed, and there's a cluster of employees with is a mix. So employees with seven or more years of service have virtually no risk off, leaving the risk of employees leaving his seminal list. So when we build our decision tree, we will expect one of the questions, one of the nerds to be Catania all the years off service. The loss visualization I want to do is to determine harmony, ample use of state or left. That's the simple board shot I'm going to drag in left, and then I'm going to add counter on employees. And that tells us that 11,428 employees stayed and 5 3571 employees have lived. So you can continue doing analysis according to the principle side ushered you in the mirror. But I'm going to carry on with building the decision tree 7. Data Preparation: Hello. In this lecture, we're going to discuss a very important data preparation technique called binning. This function is primarily designed to reduce the complexity of the model, especially when it comes to classifications functions sensitive This season, Trees bidding has really been performed in our data set, provided on the salaries at salary surrenders, either low, medium or high for the purposes off a model. We want to reduce the complexity further by burning the lost evaluation attributes into five categories. Employees with the highest performance will be Brenda's five and employees with the lowest , this one. In effect, What we're doing with the spinning is turning the lost evaluation attribute from a continuous value to discrete value, and we can exercise more control over the classifications off the attribute when building the tree. Before we look at our bidding algorithm in more detail, let's look at the results. So I opened the employees there too terribly. Data Previ mode and selected lost the evaluation force distinct value analysis, and we can see that this 65 distinct fairly is for lost evaluation, each with a fairly equal number of employees. So the spread is very even when you build a decision tree mustard with a very complex decision tree because of the number off distinct values in lost evaluation. And we want to reduce the complexity of our decisions So we can do that bar binning Alber than by bending their 65 distinct values into fire values. And this guy's we going to bold Arbenz with an equal number of employees. And that's why this unequal spread across the number of Ben's. And we know that the number five contents of top performing employees. So in effect, we've got a 20% top employees in the number five on building the decision to We now have a lot less categories to evaluates and a decision tree as such will be a lot this complex, a little look at the sequel covered. It starts off with creating a data table type. When you look at the ANA documentation, we can see that the input table must have an idea neural data fueled. Then we're going to select the idea of the employees as a character. Fueling in the lost evolution is a double. The next steps are fairly similar. We create our table tops and then at this table times to the signature table. We then called the rapid procedure to grant a procedure and looking at the parameter table , we're going to use Benue method to which is equal number off for equal split Been. Our smoothing method is zero, which is the mean and we're going to create Farfan's so you can change these parameters according to our needs. We then select from Employee Table Employees Idea and the loss of elevation into our building data table. Call the procedure and get the results. Tables tell me to run this and Hannah gives us the results. We can see that the i d off the employees, the been it has been allocated to and the mean full it's been. 8. Decision Trees (1) A bit of theory and math: Welcome to the next lecture. Now that we've done some data preparation, we're going to look at the various decision trees available to us in particular. We will look at the court and see 45 decision tree Little Qatar visualizations Again we have two classifications classes, leavers and stayers denoted by the blue and green dots. Now let's discuss how we can build the decision tree from the state of the first point of the Porcher is that the whole data city is the root of the tree from the results of a visualization. We know that the member off Leavis and stairs are roughly in the Russia of 11 to 4. Given us a corresponding here's to grab. Let us assume that the tree is really being bolt. So as we go down the tree, we get different, hissed a grams, which represents the relative number off levers, the stairs. So soon we have a data point. We want to test an employee. If UAL the noted by the retort as he propagated down the tree by asking the various questions, and we get to the end note, we can say with the probability that he is a stale or lever in this case from the history, and we can see that the probability of him staying is much higher than leading the summer off. The probabilities in Instagram must equal 100%. Each node is representative of a question on an attribute. For example, the first nerds question might be It's the employees working more than 120 hours a month. The attributes looked at here is the average monthly hours. Now let's look how the sea for five decision treat extra attributes for each night. Let's say we have an equal amount of stairs levers and this guy sits 84 each. The tree on the left uses the average mentally are says this first split attributes while the tree on the right Jesus the time spent in the company and ideal actually splits the day to sit into subsets that all positive or negative In our case, the decision tree on the right splitter Dayton into equal number off stairs and even giving us no butto understanding off the data. The tree on the left gives us a better understanding off the data as they are. Some notes was only positive or negative numbers. In order to make this decision about which attribute to use, we can use the concept of entropy, which is used by the 45 decision Tree intra piers and measure of uncertainty will likewise a measure of information to post a total number of points or the total off positive P and negative endpoints will stay and leave. In our case, the inter pure by unreviewable is simply the probability of a positive occurrence. Chance. The log off the probability of a positive occurrence minus the probability of a negative occurrence, terms the log off the probability of a negative and currents for multi class, classic ocean more than two classifications. There will be more terms, but let's stick to buying Greek loss for the time being. Another term is information game or reduction in entropy. What is the entropy of a note and once a notice, use whatis the remaining entropy of the Children. What we want to see is by making a decision on a note. How much did we introduce the entropy by? Let's apply this to a tree in a new miracle. Example. First we know the graph of an interview function with key variables look like this. The entropy is highest at 0.5, and the unit we measure it in spits so immediately we can see that the right side a tree hasn't high intra peace, since it gives us no better odds than 50 50. The left side a tree has a lower entropy, and this are high information game, which is simply one which is the value of the parents and always sit 21 minus the entropy. Given the two values of information gained for the left and right high in the trees, we see that the left side a tree hasn't higher information gun and thus the preferred attributes to perform. The first split on the calculation is done. Reclusive Lee, for all possible splits. The full expansion off the equation is in the downloadable materials to this lecture. Now let's quickly look at the court Tree Court Jesus Aegina Index Constructed tree. The Gini Index can be defined us the expected erased in the system. First, let's look at the date in the format of a table table contains a tree courts with the time spent at the company and the hours worked. This is the same table used to construct the full five treat. By the way. First, we calculate the probability of the art come off each off the discreet decision attributes . So this guy sits holes. We have full levers and four stairs, resulting in Hough and Hauf. We've been multiply the two probabilities to get the Gini index off 1 to 5. Now we calculate the genie for each of the attributes. I have also seemed to split up 120 as the selection off the split point for continuous day . The seriousness more complex in northern scope of this lecture to calculate the genius hours, we'll look at the probability that the hours worked false within each of a classification classes for hours, as the split point is, 120 hours to off. The record is ours force. Within this range, six of the eight records is ours falls within the range off more than 120 hours within. Calculate the probability of stairs and levers. Within each of these classifications. 420 release us. We have to stares out of two records and no levers for more than 120 hours we have to off the six remaining people staying and four out of the six leaving multiplying the result out and then adding the two results. Give us Gini Index for hours. We do the same for the time spent attributes in this guy's. We assume that the split is at three years, so six out of eight instances force within the classification or three or less hours, and to force within the classification off more than three hours, we again take the probability of stairs and levers within each of the classes and calculate the G after calculating the genie value for each attribute, we need to calculate the game, the gayness, what we used to determine the splitting attributes. To calculate the genie foreign attributes, we subtracted Jeannie Valley off the attributes from the Genie Valley. Off the system as a whole, we take the attribute with the highest gain as a first splitting point. Starting with the highest gain attributes hours, we can split the tree into two notes. The first note contains the Two Values birthday. The note has the support of two and a confidence of 100. From here onwards, the nerds are again calculated using the remaining population it beautifully. Let's see how sad calculated the tree. We can see that SEPs Hannah Court algorithm has also selected. Ours is the first note with a split at 120. Test your understanding of the algorithm. Calculate the next birds and see if you agree with this. With the Hannah split, the full calculation is in the dollars or materials to this lecture. The code for the tree be calculated here's also included in the downloadable material. Now let's move on and perform the analytics using our HR analytic status it 9. Decision Trees (2) Running Decision Trees in HANA: Hello. Welcome to the next lecture in sap. Hana Predictive Analytics in this lecture will continue with the decision trees, and we will look at the running the decision trees in Santana before we run the algorithms in Hannah. Let's review we re on the process. We started off with the loading off the data into the employee table. We've ended some preliminary analysis off the data to get a feel off the day to sit, and then we prepared the data by bending the last evaluation ratings into five categories. We are now ready to run the algorithm. As usual, the algorithm is broken up into several sections, which are declaration off data types declaration of table times, insert valleys into the signature table, run the rapid procedure, selecting data from the data table, insert parameters into the parameter table and then run the decision tree procedure and select the results. I want to highlight some of these sections in more detail and discuss heart will influence the results. First, looking at the Dave to Table Declaration since we have the Fort Benning on the lost evaluation attributes, we will set the type off. The lost evaluation is an integer that turns the attribute into categorical attribute as opposed to a continuous edge. The rule is that attributes the finest inter GIs will string sort of find is categorical while double or defined. This continues, and we will discuss the effect of categorical versus continuous data. Siris in elated lecture There is a wide variety of parameters that influence a tree. Let's look at some of them. Max tapes stop condition that determines the date for the tree. The splits threshold. If the information gain is less than this value, the algorithm will stop splitting men records off leaf and men recruits off parents or to more stop conditions to determine the source of the tree. Now let's look at out there to selection. Since we've been the data as geometrical, we sit the last day of early ocean to introduce, and we need to join our employees table with the bending data. 10. Now let's run our decision tree our core tree in Hannah, I feel really discussed the sections in the lecture, so we start off with the types for the signature table in the parameter table. Solutions quickly. Look at the parameters, said Max, deep to five, and I also said, is our patrols to once because I want to see what the treat looks like. So after we've executed this sequel, we can look at the tree, and this is what the tree model looks like. So it says that the satisfaction level, the split is at a 0.465 and the number of projects just in the next level and as well as tenure hours and so on. So tells based on thes parameters simply will stay or leave it. If you look at the statistics table, it's got too close. Is the first classes? Thes are the values that employee record has at the moment. So these cars off left these kinds of state. So this is six against the existing records. Close to means this is the classifications off the decision tree has, right, So it shows that employees who have left the tree also predicts there will leave on this 3295 of them. Then the second row is the existing employees that have left the trees shows they should have stayed in this 266 of them and then those who have stayed. The tree predicts that will leave 157 so we can say those our lives at risk. And then those have stayed and the tree predict still stays 11,271. So the law step is to test my data against the decision tree from the previous stick. We found out that 157 of iron Police Estate are at risk off leaving, but not all of them are high performers. I want to see who those high performance all that is predicted to leave. So to do that set provides a scoring procedure. So that PAL DT scoring procedure the set up is very similar to the previous step. But I'm selecting from my employee table employees where the lost civilizations over open mind faster, more high performers that is currently staying after running the procedure set places the results into the scoring table, and I'm interested in those employees with scoring of one. So those are that are predicted to leave, and then I simply select with the joint from employee table and scoring table employees that are staying and that is predicted to leave. So after I run the results. I get a list off my high performing employees. You are have you are staying at the moment who are predicted to leave now let's look at the results. The satisfaction rating is the first split and resulted in the falling branching of our country. We can see that there are three branches to satisfaction above 0.465 equal 2.465 And unless than 0.465 Altamira visualization gives us a better indication off the splits. This wall the split is set to 0.465 and for good measure, let's put some split lunch roughly according to the tree results. Now little at some of the areas according to stand guard on Green blue, to keep our sanity and to make things easier to understand with stick with two dimensions, hours and satisfaction for now, 10. Decision Trees (3) Effect of Categorical vs Continuous Data: Hello. Welcome to the next election Sap Hana Predictive Analytics. On this lecture, we will discuss the effective, continuous and categorical attributes and the effect it has on the decision tree. Looking back at our data declaration, we defined Lost civilization isn't into with treats the attributes as a categorical attribute, but that was often bending little. Could a scenario and change the hours a tribute to continuous from the previous lecture, we saw that the split is pretty much determined by the satisfaction level in hours, and we control the split lines feely easily on the lumia of visualization. Now let's strange hours a tribute to an interview and this category running the sequel again. We get the following classifying our says Categorical creates a classic case of over 50 where the model potentially describes each case of the data, resulting in a model that is not very useful for predictions. It is also virtually impossible toe draw. These classifications on the Lumiere of visualization, as we have done in the case, were hours or continues