Apache NiFi Complete Guide - Part 1 - Apache NiFi Basics & Installation | Manoj GT | Skillshare

Apache NiFi Complete Guide - Part 1 - Apache NiFi Basics & Installation

Manoj GT, Big Data Evangelist & JavaScript Lover

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
18 Lessons (1h 23m)
    • 1. Course Introduction

      0:47
    • 2. What is a Data Flow, Data Pipeline & ETL?

      2:33
    • 3. Why should we use a Framework for Data Flow?

      3:19
    • 4. What is Apache NiFi?

      1:31
    • 5. Installing NiFi in a Mac

      3:38
    • 6. Installing NiFi in a Windows Machine

      3:11
    • 7. NiFi User Interface

      3:37
    • 8. Core NiFi Terminologies

      4:58
    • 9. More on FlowFiles

      2:31
    • 10. Types of Processors Available in NiFi

      4:31
    • 11. Processor Configurations, Connections Relationships in NiFi

      8:05
    • 12. Connection Queue Back Pressure in NiFi

      7:06
    • 13. Working with Attributes Content in NiFi

      10:41
    • 14. Working with Expression Language in NiFi

      5:44
    • 15. More on Expression Language Functions in NiFi

      4:39
    • 16. Working with Process Group, Input Port Output Port in NiFi

      8:08
    • 17. Working with Templates

      4:05
    • 18. Apache NiFi Working with Funnel in NiFi

      3:56
31 students are watching this class

About This Class

Apache NiFi Complete Guide - Part 1 - Apache NiFi Introduction & Installation

What is Apache NiFI?

Apache NiFi is a robust open-source Data Ingestion and Distribution framework and more. It can propagate any data content from any source to any destination.

NiFi is based on a different programming paradigm called Flow-Based Programming (FBP). I’m not going to explain the definition of Flow-Based Programming. Instead, I will tell how NiFi works, and then you can connect it with the definition of Flow-Based Programming.

It is one of the fastest-growing Apache Projects and expected to grow exponentially in the coming few years.

How NiFi Works?

NiFi consists of atomic elements which can be combined into groups to build simple or complex dataflow.

NiFi has Processors & Process Groups.

What is a Processor in NiFi?

A Processor is an atomic element in NiFi which can do some specific task.

The latest version of NiFi have around 280+ processors, and each has its responsibility.

Ex. The GetFile processor can read a file from a specific location, whereas PutFile processor can write a file to a particular location. Like this, we have many other processors, each with its unique aspect.

We have processors to Get Data from various data sources and processors to Write Data to various data sources.

The data source can be almost anything.

It can be any SQL database server like Postgres, or Oracle, or MySQL, or it can be NoSQL databases like MongoDB, or Couchbase, it can also be your search engines like Solr or Elastic Search, or it can be your cache servers like Redis or HBase. It can even connect to Kafka  Messaging Queue.

NiFi also has a rich set of processors to connect with Amazon AWS entities likes S3 Buckets and DynamoDB.

NiFi have a processor for almost everything you need when you typically work with data. We will go deep into various types of processors available in NiFi in later videos. Even if you don’t find a right processor which fit your requirement, NiFi gives a simple way to write your custom processors.

Now let’s move on to the next term, FlowFile.

What is a FlowFile in NiFi?

The actual data in NiFi propagates in the form of a FlowFile. The FlowFile can contain any data, say CSV, JSON, XML, Plaintext, and it can even be SQL Queries or Binary data.

The FlowFile abstraction is the reason, NiFi can propagate any data from any source to any destination. A processor can process a FlowFile to generate new FlowFile.

The next important term is Connections.

What is a Connection in NiFi?

In NiFi all processors can be connected to create a data flow. This link between processors is called Connections. Each connection between processors can act as a queue for Flow Files as well.

The next one is the Process Group and Input or Output port.

What are Process Group, Input Port & Output Port in NiFi?

In NiFi, one or more processors are connected and combined into a Process Group. When you have a complex dataflow, it’s better to combine processors into logical process groups. This helps in better maintenance of the flows.

Process Groups can have input and output ports which are used to move data between them.

The last and final term you should know for now is the Controller Services.

What is a Controller Service in NiFi?

Controller Services are shared services that can be used by Processors. For example, a processor which gets and puts data to a SQL database can have a Controller Service with the required DB connection details.

Controller Service is not limited to DB connections.

To learn more about Apache NiFi, kindly visit my YouTube Channel. I have created a Playlist, especially for Beginners.

Transcripts

1. Course Introduction: Hi, guys. Welcome to my cause. It's great to have you on wood. A few months back, I came across this awesome tool called Apache Knife fight, which just blow my mind. The reason behind this is due to the various feature knife I asked on how easy to use it. In this course, you will learn knife A from scratch. No prior knowledge required. We will start from the basics on will dive deep into the most advanced concepts available in 95 My name is manage and I'll be your instructor for this course. I promise you'll never regret taking this course on. It will be a complete guide to learn if I so buckle up your seat and get ready for the knife I journey. 2. What is a Data Flow, Data Pipeline & ETL? : Hi, guys. Welcome back. I have created the schools for people from different backgrounds on how different skill sets. So before we start working with knife fight, I would like to explain a few basic terms. You need to know if you are entirely new to this ecosystem. Let's start with the first, um, data flow. What is the date off? Look, Tom. Data flow is usually used to refer, moving off data from source to destination. Yeah, the data can be off any former. It can be a CSP, a Jason on example, or logs in plain text. Former. It can be even http data or Binah radiate us like images and videos. The data can also be sensor data from your Iot devices or any other telemetry data. No, let's move on to the next definition. They're not by play. What is the data by play? That data pipeline is used to refer movement off data from source to destination on transformation is done along the way. The primary difference between data flow on data pipeline is the Intermedia transformation that can happen along the way during the moment off data from source to destination the next Thomas idiot et Il is also used to refer to the moment and transformation off data from social destination. You may be thinking both off this definition look similar, so let's see the difference between data pipeline on medium ideal is usually used to refer the data moment on transformation happening in batch fashion doctors be read the data from source every four US or six us and process it where us did not by plan is more off a generic term, which can be used to refer data movement happening both and stream on batch fashion. The other definition off all three terms looks similar on sometimes. We may use them interchangeably. It's essential to understand the difference between the mess. Well, that's it we have done with the basic definitions on. We're good to start with. Knife fight. Thank you. See you in the next video. 3. Why should we use a Framework for Data Flow?: I guys, welcome back in this video, we will learn. Why should we use a framework for data Fluke? As you know, data flow is nothing but moving data from a source to a destination at a high level. This looks like a simple problem to solve. It's easy to transfer data from Point A to point B. You can do this with any programming or scripting language off your choice. Force A. You can write a Java program or a pie transcript to move data from Point A to point B. But before you start trying to do something like that, you need to understand the challenges you might face going with the throat. For that, you should learn about the four B's. So what are these for these? When we deal with data, especially big data, there are four ways you should know about the first Venus volume. This refers to the vast amount of data generated every second in the current era. The second Bees velocity. This refers to the speed at which the data moose on the rate at which new data gets generator. The third is the righty. This refers to different types of data gets generator on, we can use no the foot on the final V's veracity. This refers to the messiness or integrity off the leader. Now that you know about the four B's, let us discuss the consideration you should take while implementing such a solution. The solution you implement should support multiple data formats like CSP, Jason plain text images, videos and so one. The solution should also support various types off source and destination, such as the data can be from FTP. Http data data from SQL database on no sq later business. The data can also be from search engines on cat service. The solution should be scalable on reliable for large volume on i velocity data in the current digital era, the size off the data produced this enormous. It's not one leader amount off data is becoming huge, but it's also the speed at which information gets generated as changed. So any program or script you write should have high throughput on low latency. The solution should also consider data glancing on data validation problems. The accuracy on formatting off data on the noise you get from any incoming data will also decide that design off your solution. Now you know the challenges you are dealing with. Will you still want to write a program or script to solve this problem? What if I say we have a framework which sold this problem already on no more will you be interested in learning it. Welcome toe about unify. A robust open source data ingestion on distribution framework. 4. What is Apache NiFi?: I guys, welcome back in this video, we will learn what does Apache notify? Let me force Coty with the official definition given in their website. The official definition goes like this Apache knife fight support powerful and scalable. Director graphs off data routing transformation on system mediation logic. This definition will be very complicated. Toe understand that first. But once you get hands on experience in knife fight, it will make perfect sense. Tell them to put it in more simple terms. We will go with a different definition. This definition is made up by me to explain what does knife fight in more simpler words. The second definition goes like this knife I said cool, which is big toe automate the flow off data between systems it can proper get any data from any source toe any destination for now, Let's go with this one. I know the last sentence is such a bold one toe could but I'm 100% sure Once you finish this course, you will agree with me as well On the same. That's it for this video. Guys, I hope with the help off the second definition, I was able to explain what is knife I in layman terms. Thank you. See you in the next video 5. Installing NiFi in a Mac: Hi, guys. Welcome back in this video, we will go through the installation off knife fight in a Mac. If you are using a Windows mission, feel free to skip this video installing knife fight on any operating system. It's pretty similar and straightforward. The reason behind Artis Knife I doesn't come with separate. Install a bill based on your waist type you can download and I find binary from their website and start using it with your oil specific start upside. The one Lee pre requested is you should have Java and Charlie in your mission on Set Your Job home. Karaki. Even in this video, I'll be installing knife in a Mac, but you can follow along even if you are using any Lee necks or UNIX based operating systems. Now let's get started on install knife in your back first, let's grab the binary from NY face website to do so. Let's go girl for knife I and click on the first leg. It will take us to knife ice Official website. Yeah, you can click on the downloads link and select download knife fight at the time off. Recording this video. The latest knife I version available lists 1.8 point zero. We can go ahead and don't know discretion. You can see your via sources and binaries, the one which we are interested. It's the binary version. Feel free to download file. Type off your choice. Since we aren Mac, I prefer toe Don't know dr dot Giza Pollution. I already have Don't notify and placed in My Users Tools folder. So let's go ahead and untied it. No, it's undock. Let's take a look at the folder. Structure the name off. The folders are self explanatory on. I promise we will go through all these folders by end off this course for now. The folder, which we are interested in the BIN folder. So let's go inside and take a look. As I told you, we have the oil specific start up files on other utility files. Let's open a terminal year and start knife are using knife. I don't is such space stock. That's it. It will take a couple of minutes for knife fighter boot up. You can check the locks on the locks for that as well. Knife. I starts with D four. Port 80 80. Let's go to our browser and type local host 80 80. Great. We can see you at our pay show. Don't worry about the error message toe access knife fight. We should actually use local host 80 80 slash knife fight. That's it. We can simply click on the link. Your and there are message as well. It will redirect us to that knife. I landing page. That's it, guys, we have successfully installed Neiffer. You never, Mac. Thank you. See you in the next video. 6. Installing NiFi in a Windows Machine: Hi, guys. Welcome back in this video, we will go through the installation off knife I in having those mission. If you are using Mac early next, feel free to skip this video installing notify on any operating system. It's pretty similar and straightforward. The reason behind that is knife doesn't come with separate intractable based on your Royals type. You can simply don't Lord and I fi binary from their official website on started using your oil specific startup file. The one Lee Prerequisite is you should have Java installed in your mission on Set your job a home correctly. Now let's get started on installed knife I in your windows mission. First, let's grab the binary from my face website to do so. Let's Google for 95 and click on this first link. It will take us to a patch in, my favorite said. Here you can click on the download Sling on select Don't Norden If I at the time off this recording, The latest knife I version available lists 1.8 point zero. We can go ahead and don't know it. You can see you. We outsource. Assigned by Neri's Let's Grab the Binary motion. Feel free to download any file. Type off your choice. Since we are in Windows, I prefer to don't know, does a portion I have already don't order the file and placed it in my day drive under the Tools folder. So let's go ahead and unzip it. Now it's unzipped. Let's take a look at the folder, structure the name off. The folders are self explanatory, and I promise we will go through all these folders by the end off this course for No, the folder we are interested is that bin folder. Let's go inside it and take a look. As I told your via oil specific start up files on other utility files. Let's open a command. Prompt you on start neiffer using right now. If I don't bad file, that's it. It will take a couple of minutes for knife I to boot up. You can check the logs on Dr Locks for that as well. Now, if I starts with default, put 80 80. Let's go back to her browser and type local host 80 80. Great. We can see a page from my favor than ever. Don't worry about the error message toe access knife. I We should actually use local most 80 80 slash knife I That's it. We can simply click on the link yer in the air. A respite. It will redirect us. So then I face landing page. That's it, guys, We have successfully installed knife. I never Windows mission. Thank you. See you in the next video. 7. NiFi User Interface: Hi, guys. Welcome back. I know what you're thinking. Will we ever do something practical in discourse or the class is going to be entire little giggle, right? Don't worry. I loaded this course with tons off practical examples, but for making your learning journey smooth and easy, ill structured the course with well balanced there lectures as well. Okay, let's get started in this video. We will get the feel on how the user interface off knife I looks and works some off. The terms used in this video will be alien to your first, but be with me for a few more lectures on it will start to make more sense. One off the best features off neiffer use. It's easy to use user interface. As you can see, it's loaded with tons off needles and options. You can drag and drop the required components you want and connect them and configure them toe. Create a simple or complex data flu. For example, Let's Acto processes get fight and put fight and configure toe. Copy any files from the source to destination to add any processor or any other knife. I components. We have to simply select it from the top menu on Dragon. Drop it to the canvas below. In this video, I'll not be explaining the configurations. This video is instead, to show how easy to create a data flow. Using knife for you way you're I'll configured the sore spot to point to a folder named Input on the Destination part to point to a folder named Output Bought. These folders are inside might use us both folder. That's all on, As you can see, within a matter of seconds, I was able to finish this. It's that simple. Now let's start on test over data flow. First, let me start both the processes and see what happens. You can see we are getting play button, which denotes bought. These processes are running. Let me copy a file, toe the source folder and see what happens. The file type can be off any formal. Did you see that the fine just disappeared? The reason this they get fight processor is continuously listening to the configure it so spot on. A soon as it detects a file percents, it moves it to the destination part with the help off the put file processor. Now to validate the same. Let's go and check the output folder. Mullah the Fight Lycopene and Input folder. It's now available in the output folder. That's it for this video. Guys. I hope you got a glimpse off the options available in Neiffer. You wait on how easy to use it. We will start to dive deep into each off these options one by one in the upcoming videos. Thank you. See in the next video. 8. Core NiFi Terminologies: Hi, guys. Welcome back in this video we will go through some off the core knife fight terminologies. You need to know toe. Understand how knife iBooks knife is based on a different programming paradigm. Call flow based programming. I am not going to explain the definition off flow based programming. Instruct. I will tell how knife iBooks. Then you can connect it with the definition off a flow based programming. Okay, let's get started. Knife. I consist off atomic elements which can be combined into groups toe build, simple or complex data flow knife. I asked processes and process groups. What is a processor? A processors and atomic element in a knife fight which can do some specific dusk. The latest version off knife I around 2 80 plus processes on each assets owned responsibility. Example. The get file processor can read a fight from a specific location, whereas ah, put file processor Can't write a five to a particular location like this beer. Many other processes, each with its unique aspect. We are processors toe get data from various data sources on processes to write data at various data sources. The data source can be almost anything. It can be any SQL database server like Post Chris or Oracle or my sequel or it can be no SQL later basis like Mongo, DB or Coach is it can be. Also be your search engine, like solar or elastic such. Or it can be your Katzover like Raiders or which base it can even connect. Took Afghan messaging Cube. Now, if I also ask a rich set off processors toe connect with Amazon AWS entities like as three buckets on dynamodb knife, I have a processor for almost everything you need. When you typically work with data, we will go deep into various types off processes available and knife. I in later videos, even if you don't find a right processor, which fits your requirement. Now if I gives a simple way to write your custom processes now let's move on to the next floor fight. What is a Flo file? The actual data in knife fight propagates in the form off a flo file. The flow fight can contain any data se C S V. J saw example plain text or it can even be a skill queries or binary data. The flow file obstruction is the reason now if I can propagate any data from any source to any destination, Ah, processor can process off low fi to generate new flow flee. The next important dome is connections in knife I all processes can be connected toe. Create a data flow. This link between processes is called connections. Each connections between processes can act as a cue for the flow files as well. The next one, this process group on input and output put in nicely. One or more processes are connector and combined into a process group. When you have a complex data flow, it's better toe combined processes into logical process groups. This helps in better maintenance off the flu. A process group can have input and output ports, which are used to move data between them. The last and final term. You should know for no with the controller services, controller services or shad services that can be used by the processes. For example, a processor which gets on put data toe SQL database can have a controller service with that required DB connection details. Control of services is not limited to Devi connections. We will see more use off controller services in that later videos. That's it for this video. Guys. I hope you got a fair understanding off some off the critical components used to a knife fight. We are more Tom knowledge is or component types in knife I, which we will learn throughout this course for no, this key components are enough for you to get started. Thank you. See in the next video. 9. More on FlowFiles: Hi guys. Welcome back In this video, we will learn about flow files. Nd day flow fights are the most crucial part in knife I. So let's spend some more time on them. Ah, flow files and data. It's composed off two components. Content on attributes. Content is the actual data itself attributes or key value paths, which contains information about the content. These are the meta data from the flow fight. This meta data can be creation date finally or where the data is from and what information it represents. Ah, processor can either. Manu plate the attributes off off loaf, I say update at remove attributes, or it can change the content off the flow fight or it can do both. As you know, knife. I is based on flow based programming here. The entire construct is to use the components provided by knife I to manipulator attribute on content off the floor fights to get the required output or data flow off your choice. One final it important detail you should know about flow FISA, its persistent in the disc on its past by reference. Let me explain this in more detail. Whenever a new flow finalists generated by a processor. It gets immediately persisted into the disc. And now, if I will just past the reference off, the flow failed to the next processor. A new flow file will be created. One living new update a content inside the existing flow file or when you in just new data from a source to a processor. New flow files will not be created when you man you play just attributes off the flow fight . This is one off the critical detail. You should know to develop an efficient data flow using knife fight. That's it for this video. Guys, I hope you got a fair understanding off low files and knife. Don't worry. Even if you are not able to understand this completely, you will get the clear understanding once we start their hands on with knife. Thank you. See in the next video 10. Types of Processors Available in NiFi: Hi, guys. Welcome back in this video, we will go through. What are the types off processes available in knife fight in knife fight? To create an effective data flow, the user must understand the various types off processes available for them to use knife. I contains different types off processes out off books. These processes provide capabilities toe in just data from numerous data systems. It drought transforms process splits on aggregate data on distribute date are too many systems. The latest version off knife is having 280 plus processes on. This number increases in each release, so I will not attempt to name on Explain each off these processes that are available. But instead I will highlight some off the most commonly used processor by its name, categorizing them by their functionality. Let's get started. The first type of processes is data ingestion processes. These have the processes which will help us toe in just data from various data Source. As you can see you knife I supports are rich set off processes toe in just data from almost all the popular data. So it's currently available in market. The next type off processes is data transformation processes. These are the processes which will help us to transform data to various formats according toe over requirement. The next type of processes is data egress or sending data processes. These are processes which will help us to send their process to date at various types off destination systems. The next type off processes is routing and mediation processes. These are the processes which will help us toe conditionally change the way how flow fight Toby processed. The next type off processes this database access processes. These are some off the commonly used processes. Toe access the database. The next type off processes this attribute extraction processes. These are the processes which will help us toe extract and manipulate attributes off a flow fight from its content or other existing attributes are boot. Usually these processes will provide with right set off attributes which can be used with routing ONDA mediation processes. The next type off processes is system interaction processes these set off processors and knife. I supports us to run on oil specific command specified by the user on writes the or put off that common toe a Flo file. The next type of processes is splitting an aggregation processes these processes helps us to split or aggregate data According to our requirement. The next type off processes is http on U dp processes Now, if I can even in just data or send data using the http and UDP protocol, the next under final type off processes is Amazon Web service processes knife. I comes with rich set off processes to communicate with your AWS entities as well. That's it for this video. Guys. I hope you got a glimpse off the various types off processes available in knife daughter names off. These processes are self explanatory. You will get a better understanding off all off them. One livid answer going forward. That's exactly what we will start to do. Thank you. Seeing that next radio. 11. Processor Configurations, Connections Relationships in NiFi: I guys, welcome back the central philosophy off knife. I use configuration over Cody, so it's essential for us to understand some off the standard configurations offer knife a processes, all processes and knife. I have a set off standard configuration properties on set off unique configuration properties. In this video, we will mainly go through those common properties available for us to use. Let's get started. We can use the previously created flow toe copy files from source to destination. In this video, the first and fundamental property offer processor is their name property. This helps us to define more meaningful name for the processor. Once the flow become dense, this property will help us to understand the significance off each processor on understand the flow better. The next set off configurations are for scheduling a processor. You can schedule a processor to run in regular in travel according to your requirement. Using these configurations. As you can see you, the scheduling stategy is trying. Driven on brunt. Schedule is zero seconds. This means this processor will run continuously without any gap. Let's go ahead and change the run schedule value to five seconds. And now the get fi processor will look for a file inside the corn figured So it's folder one, Lee. Every five seconds you're we are your seconds. But you can also yours minutes on this we can also our crown driven scheduling stategy where we can use any valid crone expressions. We can also control the number off tasks that should be concurrently scheduled using the concurrent task property. To demonstrate this, we will quickly create a simple example using generate flow file processor generate flow File processor is used to create flow files with random data or custom data. This will come in handy toe. Do any quick testing or simulation off actual later Flew will be using this a lot in this course toe demonstrate of working off other processors or components in knife toe Adah generate flow File processor. You can simply click on the processor icon in the top navigation bar on drag it to the canvas. This opens the list off processes available for us to add at the time off. Recording this video we have toe 83 processes on. You can search the required processes using the search box in the top right corner. You can also filter them using the list off that's available in the left side. No, let's go ahead and add the required processor for this demo, I will add Generate flow file on log Attribute Processor log Attribute Processor will merely logged the attributes off the floor fight. I will also use log attributes as a valid termination off any flow in this course. Know that we have added the required processors. Let's establish a connection between them by most over the processor from which we want to create the connection. On Once you see the blue circle with an arrow click on drag on drop on the processor toe, which you want to connect, this will prompt us for which relationship we want to create. The connection Relationships are another white L Concepts in nine fight, which we are not talking before each processor will have. Zero are more relationships defined for it. Once a processor has finished this processing, it just routes the flow file toe one or more off its relationship, but on it's the responsibility off the flow file Creator toe handle these relationship by creating a connection for each off them toe another processor. Sometimes, if you don't want to do anything for a particular relationship, but you can terminate it. Knife. I will complain. If you have any Hernandez relationship for a processor on, it will not allow us to start the processor till the angle it you're generate flow. File out one lee one relationship. So let me go where and confirm the connection for the same no generate flow. Fight lists ready for use. This can be identified with the help off the little red stop button in the top left corner . Then what about the warning symbol on log Attribute Processor. Let's most over and take a look at the other message. Your knife. I clearly says the relationship status success is not connected toe any competent. In another case, this is expected, but to start the processor, we have to get rid off this morning. I will do so by order dominating this relationship. Know what the processes are ready to be started. But before starting the generate flow fight processor, I will change the run scheduled value to 10 seconds on, then started. As you can see the processors started. This can be identified with the help off the green play button on also ah, flow Phyllis generator on its waiting in the connection. You No, let's go Where on update the con current task property, which I mentioned earlier on. See what happens toe update any configuration we need to stop the processor. So let's stop the processor first now that the processor has stopped. So let's go ahead and change the value off conquering task property to fight and start the processor and see what happens. Did you see that the Q size increased to six? This means another find new flow files got generator on. Got added to the Q. This is the primary purpose off the can current task property. It spawns the corn figure number off tasks in parallel on process the flow files in parallel The final configuration. I want to show us the properties specific toe each processor. This configuration is under the property. Stab. These are the properties specific toe generate flow file To learn more about each property , you can use that tool tip next to it. That's it for this video. Guys, we will not go into each off these properties for now. We will take that for another video. Thank you. See in the next video 12. Connection Queue Back Pressure in NiFi: Hi guys. They come back in this video. We will learn more about the connection que, as you know, in knife fight, the connection between each processor will also act as a cue for the flow files. This brings a new key feature off nicely, But before going into this feature, let's first see how to access the flow file inside the Q. As you can see in this previously created flow, we are six flow files already available in the Cube. Let's see how to clear this. No clear flow files in a queue. We can rightly took you and select Empty Que, That's it, and it will remove all the flow files inside the cube. Next, let's see out of you the content and attributes off the flow files percent inside the Q. Before that, let's make a few changes to the generate flow file. Processor toe. Update the configuration off a processor you can right click the processor on. Select the configure option are we can simply double click on the processor, which we want to configure. You're generate Flow file processor is configured to generate one random data off zero bytes every 10 seconds. Let's go ahead and change it toe one bite on. Change the property. Unique flow files So true. Also, I will change the run schedule. Property back to zero seconds. That's it. Now let me go ahead and start the processor. This time we are getting tons off low files generated. This is due to the run schedule. Property, since it's gone, figured for zero seconds. The process terrible produced as money flow files as possible. No, let's go ahead and take a look at the content and attributes off the flow fight, which got generated dodo so we can write like the connection Q and Select list Que And now we can see the 1st 100 off 10,000 flow folks available for us to view to view the content on attributes off the floor fight, we can select information icon in the very beginning, off each off the flow file. Yeah, we can download or view the flow file content using the respective buttons available. Let me click on view to see the flow file content. This flow fight is having a random character off one bite Asper our configuration. We can check the another Flo file content as well you know, we are another random character to view the attributes off low fi. We can select the attributes, stab the attributes off a Flo file very based on the origin off the floor. Fight on. Also based on the previous processor, this flow file. Being through Later and discourse, we will go through various other attributes off the floor. Fight on how toe add custom attributes to a Flo file toe. Help us achieve our require data flow. Did you observe something stranger? Overflow files in the connection you are showing in dread and stopped in 10,000. Why is Doug? One reason for this is the long attribute. Processor is stopped, but why it stopped in 10,000. Why not chosen or 100,000? This brings on to another important feature off knife back pressure in Knife I. Each processor will take different time to process off low five. Based on the complexity in world, for example, around on a tribute processor will be speedy compared toa convert record processor or a hash content processor. This is because routing a Flo file based on the attribute er's related Lee Simple compact Okan Morton and tear flow file content this is once in out of you. We'll be having many such scenario when we start to build complex state off loss to handle this knife I Outback pressure configuration Each connection can Avid's back pressure defined? Let's see the back pressure configuration off the Q, which stopped in 10,000. You you can see to property back pressure Object Special on back pressure size structural on Object Special is set to 10,000. This histories and our Flo file inside the cube stopped in 10,000 knife. I use intelligent enough toe Pass the previous processor if the process and before them is running slowly on back pressure configuration off each connection. Q. X to do this. Yeah, the object threshold is reached since each flow file is tiny, but knife I stops if either off the confident reassured reaches first to demonstrate that lets go where on update the file size property off the generate flow file. 22 50 a. M. B. And start the processor again. Did you see now the connection Q is showing red and stop in just four flow files. This is because now the size threshold value is reached before object pressure you be out. One lee to processes. But when we are complex data flow with tens and hundreds off processes back, pressure will be beneficial. Toe a wide date, our lord and memory overload. This is because back pressure configuration will not one leave us one processor. It will slowly passed up processes behind them as well. Once it's threshold is reached, this will happen eventually. Since the processor before it is altar on each connection Cube will start to reach its limit due to the same. That's it for this video, guys. Thank you. See in the next video. 13. Working with Attributes Content in NiFi: Hi, guys. Welcome back. I guess we have covered enough theory. Now let's get some practical experience in this video. We will play around with knife fight by manipulating attributes and content off the flow fight, using a few out off the box processes available in knife I This is going to be very basic toe. Get you toe. Feel comfortable working with 95 Throughout this course, we will do many examples to make you an expert in knife I. So let's get started first, let me go Where On Arda Generate flow file Processor on configure to generate random data every five seconds off sites one bite. Next, let me go ahead and add another processor called Replaced X processor. But before clicking on the add button, I would like to show you a small it essential thing you should know. As you can see when I select a processor, a small description about the processor is displayed below. This will come in Andy. Whenever you want to learn that you sage off a specific processor, ask for the information provided year are replaced. X processor objects the content off a Flo file by evaluating a regular expression against it on replacing the section off the content that matches the regular expression with some alternate value. Now let's go where and configure the processor and see how it works. In practice. Let me establish the connection between generate flow, file and replace text first. Now that the connection is established, we will go head on. View the configuration off the replace text processor to understand the usage off any property or the value off a property. You can use the information I can't next to it. You can use this information to understand any processor on what are the various ways you can use the processor before designing your flu. In this video, I will walk you through all the properties offer replaced X processor. But in the later videos I will only explain the properties which I will be modifying. I would strongly recommend you to go through all the properties off any processor before using it and test it out using small examples. This way, you could get a solid understanding off all the processes available in 95 The first property I would like to highlight here is the replacement strategy. Property replaced X processor can be not only used to match a regular expression on replace the match value with some alternate value, it can match any string literal and replace it with another value. Also, it can completely replace the content off the floor. Fight with the replacement value. We don't matching it with the regular expression or a string literal. Or it can be used toe weapon or upend your text to the flow. Feel content. You can also see if you select a value off a property. It makes another property absolute or the behavior off the property changes based on the value off another property. For example, if you select always replace for replacement strategy property, it makes the search value property irrelevant. Also, the open on prep end strategy will change the way the values up under either toe end airflow file or for each line in the flow file. Based on the evaluation. More property. The next property is maximum before size Property. This defines the maximum size off the floor. Fight will be buffered before processing it on. If this limit exceeds, the flow file will goto the failure relationship that the next property is the characters that property, which helps to denote the processor on the character, set the flow finalists and quartered with the next property is such value property. This property takes a regular expression or a string literal, which will be used to compare the flow file content on replace it with the value available in the replacement value. Property Know that we have understood the various types of properties available for replaced X processor. Let's go ahead and configured it according to our requirement year. I want to replace text to replace any content available in the flow file with a comma separated value. Say a B C D. Judo. So I will update the replacement strategy. Property toe. Always replace on. Update the replacement value. Property Toe A B C D. That's it to logically end this flow as usual. I will add a log attribute processor and connect the replace text processor with before going on to the next step. Let's go where and test this flow by starting that generate flow file processor first, as you can see once the generate flow fail processor is started. We got a Flo file off one bite with random character in it this can be seen in D Connection que itself. You can validate the same by exploring the Q and view its content. No, let's go where and start the replace text property and see what happens. Did you see that? We got the flow file in the next cube on. Now the content sizes. 10 bites. This is due to the one bite random character is replaced with a BCD with the help off the replace text processor. Let's confirm this him. As you can see you. The replace text processor simply replaced the value off the floor. Fight with the replacement value provided by us. No, let's go ahead and add the next processor. The next processor I would like to add is the extract X processor. Using an extract X processor, you can extract out the value off the content and say the flow file into one or more flow. File attributes. Toe extract a Flo file content. Owen attributes. We can add a new property to the processor on. Assign a regular expression to it. I will not be explaining how to write a regular expression in this course. It's a broader topic on its own. Instead I will write a regular expression on 10 water Does no Let me go Where on add a new property using the are taken on a sane its value to the falling regular expression year. The regular expression used to will extract split on assigned values off the flow file content into a Flo file. Attribute. Call C s Me. We can connect the replace text processor with the extract X processor by dragging the existing connection between the replace text processor on the log attribute processor so that newly added extract X processor. Also, let's create a new connection between the extract X processor on the log attribute processor for the match relationship for no. We no need to worry about the unmatched relationship so we can auto dominate the same. That's it. The flu is completed. We can test the extract X processor by starting the same on analyzed the output flow fight . As you can see, your know the size off the content does not changed. The reason for this is the extract X processor will only extractor content toe attributes without altering the actual content. First, let's go ahead and take a look at that list off attributes. As you can see, we got new attributes here with the perfect CSP with the extractor out values. We can also confirm the content off the flow. Filers Annihilator Using the View Content option That's it for this video. Guys don't this example is minimal. It's essential to understand all the used knife ice out off box processor to man a plate off low file content and attributes. This is important because at the high level, if you take any data flow, there are only three vital steps. The first step is to get data from one or more data source. The second step is manipulating or transforming the data, according Toa. Other requirement on the third on the final step is to put the data toe one or more date I think you're achieving Step one and Step three is pretty straightforward. All you have to do is to identify the right set off data ingestion on data egress processor according to your requirement. But when it comes to Steptoe, it becomes trachea and usually you need to use more than one processor to manipulate the content or attributes or both toe achieve your data transformation requirement. the primary focus off this example is to act as a stepping stone for mastering that data transformation using knife fight on by end off the schools. You will master this by doing numerous examples. Thank you. See in the next video. 14. Working with Expression Language in NiFi: Hi guys. They come back in this video. We will continue to manipulate the attributes and content off a flow fight, but this time we will do it using the expression, language, support available and knife fight. Let's get started. We can continue from the previously created flow where we generated a random flow file on replaced content with the comma. Separated Value ABC. If you remember, we also extractor out this content value, so the flow file attributes with perfect CSP as a next step. I would like to replace the content after Flo file again, but this time, instead of replacing it with a CS three value, I would like to create a Jason strength with the keys field one field to field three feel for on a Senate with the values we extractor toe the attributes for fix with CSB. No, let's see how we can achieve the same. To replace the content, we can use the replace text processor itself with the same set off configurations. We can do this with our usual approach off dragging and dropping the processor icon on selected leap list X processor from the list off processes, but we have another simple way to do this. We can select the existing replaced X process off from the canvas on press command Z or control. See if you're using a Windows mission on press Command B or controlled me to place a new copy off the same processor. With the previously configured values, we can validate configurations by double clicking that newly had it replaced X processor and check its properties. Did you see that, as expected, the properties also copied from the previously configured processes. Now let's go where? On update the replacement value property According to our requirement, the Jason String we are expecting will look something like this, but instead off according the value. A BCG. We would like to read it from the previously extracted attributes, but how can we access the attributes off a Flo file on a Senate toe? The value off a processor? This is where expression language comes to rescue. Using my face, expression, language support, we can access any flow file attributes by simply using the dollar symbol, followed by the attribute name inside a curly bracket. Another case. We have four attributes CS, we don't want CS, we don't do taste. We don't really and see if we don't for So let's go ahead and update the replacement value property with its respective expression. That's it. We can go ahead and test our floor again, but before that, let's follow over usual drill by realigning the connections. Now that it's done, we can go ahead and start that newly added replace text processor on examined that newly created flow fight. You're a the flow fight content, this mortified asked me expected using the expression, language and knife fight. We can not only access the values off a Flo file attributes, but we can also compare them toe other values. Or Manu played their values. With its rich set off in bill functions available, I will not attempt to explain all the Enbrel functions off expression language in this video. Instead, I will slowly use one or more functions throughout this course. This will give you a better practical use it off the expression language functions, but I would strongly recommend you to go through the documentation available in my face website to get a strong understanding off all the various types off expression language functions available in knife fight. That's it for this video, guys, if you really look at what we have done so far, has nothing but converting a CS. We record to a Jason record using the out off the box processors available in knife. I know what you're thinking. Don't worry. This is the last time we will Lucius we to Jason conversion or any other record format conversion. Like this knife, I provides it Set off processes and controller services. Toe convert records from one form to another former but the key Take away from this video. This we should consider knife I as a tool kit with various processors and components for us to use as a Flo file manager. It's up to us to use the right set off processes and components toe achieve over requirement efficiently. Sometimes we can out more than one approach to achieve the same functionality in knife on one off the approaches can be better solution compared to the other, the efficiency off a solution can be determined based on multiple factors like the number off Ivo operations, memory utilization on many other factors. I will have a dedicated video later in this course which will help you to choose the right set off processes. So design your flu. Thank you. See you in the next video. 15. More on Expression Language Functions in NiFi: Hi guys. Welcome back in this video, we will continue to explore the expression language support available in knife fight. To be more precise in this video, we will mainly focus on some off the Enbrel expression language functions. So let's get started. We can continue from the previously created flow. You. We are converted a CS. We record to a Jason record using the replace text on the extract text processors. But before making any changes, I would like to show you one off the flow file attributes, which will be available for all the flow files. To do so. Let's view the attributes off the existing flow file in the queue. The attribute. We are interested. Here's the file name attribute in knife I. Each flow file will be assigned to the file name from which it has been ordinator. Since we have used the Generate Flow Fi processor Your toe, generate the data. The file name attribute ascent to the U. U 80 off the flow fight. You may think why I am talking about one specific attributes. Is it so special to demonstrate the use off this attributes? I will go ahead and add the put file processor on a just a connection toe point. The replace text processor toe the newly added put file processor. I will also configure the put file processor toe point to a folder named Output Inside My Users Tools folder. That's it. Let's go ahead and start the put fell processor and see what happens. As expected, the flow file has been removed from the Cube. Now let me go to the output folder. Did you see that? We got a new file with the name matching the file name. Attribute. This is the use off the file name attribute. It is a very common scenario that when we create a flow, we want to control the file name off the output file. Say the input us off type C S V with the name input dot CS fee. After processing, we would like to change the file name along with the date on which it got processed on saving toe on output directly with the different extension to do so we can use the functions available as part off the expression language. Let's see how to do this to create or update and attribute. We can use the update attribute processor now that the processor is added to the canvas, Let's go where and configure it the same by double clicking the processor. Yeah, we can use the plus icon toe. Add the attributes you would like to add or updated another case. We need to update the file name. Attribute. So let's add the same name your and click on OK, the value off the property can be assigned to something like this. This tells the update attribute. Processor toe can coordinate the existing file name with the current date in the re mm bye bye former on it also opens the extension off the file ass dot Jason In the end. No, let's include the update attribute processor between the replace text processor Onda put file processor. That's it. Let's go ahead and start the flu. So far, we have been starting that individual processes to start the entire floor. We can right click on the canvas on click on Start. Now that all the processes off the floor started, we can go in and checked output folder for any new files. Did you see that this time? The file which got generator as a custom name which we updated using the update attributes Processor. That's it for this video. Guys, I hope you got a better understanding off the practical usage off expression language on its in bill functions. Don't worry. Even if you are not able to understand this completely be up tons off practical examples to come in this course in which we will use expression language extensively. Thank you. See you in the next video. 16. Working with Process Group, Input Port Output Port in NiFi: Hi, guys. Welcome back. In this video we will see the practical usage off a process group on how to communicate between process groups using input and output ports. Let's get started in knife I one or more processes are connected and combined into a process group. We can add a process group by dragging and dropping that process group. I can't do the canvas. Once we add a process group knife, I will prompt for a name, you know. I will name the Process group as CS Vehicle Jason Kahn Bordeaux. That's it. Toe at the required processes. Along with its connection toe a process group, we can press the shift key on click on drag around the processes. This will select all the processes on connections or any other components available as part off the selection area. If the floor is significantly big on extending outside the current canvas view, you can zoom out a little bit using the most will or the zoom Adoption available as part off the navigate menu on Try Again. Once all the required components are selected, we can click any off the competence available in the selection on drag and drop in say the process group, which we just created before sometime. Please note. Ah, blue border appears when we are about the process group. This will act as an indicator to drop correctly. Now that we have added the previously created a flow inside the process group, we can go ahead and look inside the process group by double clicking on the process group. Did you see that all the processes and connections are no successfully added as part of the process group. To leave the process group we are currently in, we can write like the canvas and select Leave group. We can also use the breadcrumb below to navigate out off the process group. When we have a complex data flow, it's better to combine processors into logical process groups. This Elkeson better maintenance off the Flows Process group also helps us in floor usability and provides an easy way to duplicate and modify a similar type of flow. To duplicate a process group, you can click the process group you want to duplicate and copy it using command, see or control see on based on duplicate off it, using command B or controlled Be this way, not one leader process group. All the components inside the process group will get duplicated along with its configurations. Next, let's see how we can communicate between two process groups using input and output port. Sometimes we may create a process group which will do a specific set off work on produce an output. This output will be needed as an input for another process group. In this case, we out to transfer data from one process group to another. To do so, we can use the input and output put. But before adding any input and output put, I would like to split the current flow in tow. To be more precise, I would want to move the put file processor along with the update attribute processor to a new process group. You may ask why the most obvious reason this this way I can use the existing flu rather than creating a new flu. But this can be very well be a real world scenario where currently you are saving the converter Jason to a file system later, you may want to save 200 abyss, so it's better to decouple both the C S veto Jason conversion on writing the Jason to a file system. Let me first go ahead and add a new process group. I will name this process group US right Jason to file system. Next, we can remove the connection between the replace text processor on the update attribute processor. Then we can select the two processes. We would like to move on drag and drop inside the newly created Process Group. Since the New Process Group is still inside the old process group, it is not at loosely coupled, so make it entirely loosely. Couple. We will need to keep both the process group parallel to each other, you know, so we can write like the new Process Group and select More toe the parent group. That's it. But we are one big problem now. The floor will not work because there is no connection between the replaced X processor off the CSB to Jason Process Group to the update attribute Processor off the right Jason to file System Process Group. This is where the input and output port will be used. Before fixing this issue. I would also like to highlight though the name as the Tom Port and it doesn't mean we need to open any network port for each input and output port. We are ready in knife I All the input and output ports are accessible with the default port on which knife is running. In our case, it's 80 rates. It'll we can refer to individual input and output port using its name. You're the C s veto. Jason Process Group asked to send data or flow fight to the right Jason to file system process group. So first we need to add an output port in the C s veto Jason Process Group on connect the replace text processor to it. Next. We can hardly input port inside the righteous onto file system process Group on connected with the update attribute processor. As you can see, adding an input or output port is no very different than adding any other components in knife. I That's the reason I are not explicitly explained how to do the same going forward. I will keep it simple on We'll skip the basic explanation off, adding any competence on creating or relating the connection between those components. This way we have more time to concentrate on the actual implementation. Let's continue and finish our flow by connecting the two process groups by establishing a connection between them. Please be noted. Each process group cannot more than one input and output ports. According to the use case on, we can only establish a connection between two process groups if they have the correct set off input and output port, for example, you can only connect the Process Group, which has an output port with the Process group, which as an input boat. If both the process group is having one Lee Input ports or one Lee Output ports, we cannot establish the connection your be out one Lee one input and output port so we can click on the add button to confirm the connection. You fi out more than one input and output port. We can use the appropriate drop knowns to select the correct port for which we would like to establish the connection. Now we're fixed a flaw. We can go ahead and start both the process groups by right clicking the process Group on click on start. This way, all the processes inside the process group will get started. The Samos applicable. If you want to stop them all That's it for this video. Guys. I hope you got a fair understanding off the practical usage off process groups on input and output ports. Thank you. See in the next video. 17. Working with Templates: Hi, guys. They come back. I have a problem statement for you. When you're working in knife, I assume you are getting issues in the floor you are creating or the floor is not working as you expected. On. You want your friend or colleague who is working remotely to help you or assume you are working in your desktop on. You are going on a business trip for a couple of days, and you want to continue your floor design from your laptop. Or you can even as you you have completed your data flow and you want to move your flow from the development environment. Toe testing environment. What would you do for all this? You will need some way to import and export your data flow. This is where templates comes to rescue. Let's get started and see how to create a new template or import or export and existing template in knife fight to create a template. First, we need to select the component. We want to keep us part after template on click on the create template. I can't under the left operate Manu. This will ask for a name on description off the template After entering a meaningful name and description, click on the create button to create the template. That's it. It's that simple. We can also select the process group and create a template to bundle all the processes along with its configuration. As part after template, which gets created, create template, we create a template on Keep it in your local night, for instance. So don't not a template. We are creator. You can goto the templates link under the top right corner as you can see your via list off already created templates, so download a template. We can use the download button available. You. We can also delete the template using the bellybutton. Now let's see how toe import on use and already created a template. Before that, I would like to show how to find some off the interesting flows created by a dose on available for us to use and learn from it. I don't know. So let's go with a knife, I template and click on the first link. You be up so many templates for our perusal with detailed description for our testing. Let's go here on don't know anyone off the template. I will go ahead and don't know the retry count. Look, template. Now let's go to the knife for you. Weigh on. Upload the template using the upload template. I can't under the left operate Manu. This will prompt us to explore on select a template. We would like to upload another case. I will select the already downloaded retry count, look, template and click on upload. Now that the template is uploaded, we can go ahead and use it by dragging and dropping the template icon and select the required template. We want to use another kiss. It's that we drank on group. We can also explore all the available templates using that drop down available year. Now let's go ahead and click on the add button. Did you see that? We got a new process group bundled with all the processes along with its configurations. That's it for this video, guys. I hope you got a fair understanding off the practical usage off templates and navy. Still the complete as a drawback when it comes to ocean control on knife registry. Is that right? Told to solve this problem, we will see more about knife registry in the later will use. Thank you. See in the next video 18. Apache NiFi Working with Funnel in NiFi: Hi, guys. Welcome back. In this video we will learn about another notify component called funnel on how it is used to combine data from several connections into a single connection. But hold on a minute. Previously, we have seen input and output ports on how it helps to transfer data between multiple process groups can't be used the same to come by in the data from several connections into a single connection. Theoretically, this, but the way holder input and output port is designed, you have tow wrapped the processes which generates data inside a process group on the processes which needs to handle the combined data inside another process group to make this work. I know it will be a little confusing for you if we talk about this problem in theory. So let me go where on demonstrated with a simple example. In this example, I have created a process group with the name combined data on added to generate flow, failed processes and connected them to an output port. I have also added an input port and connected it with the log attribute processor. Now let's try to connect the output port to the input put did you see that? It's not allowing me to connect the output port with the input put. The reason for this is on output. Port can connect with an input port one leave both off them are inside separate process group parallel to each other. Let me show what I mean by that. For this, I will create two more process group on name them. Generate data on combined data. Now I will move the generate flow failed processes along with the output port inside the generate data process Group I will also move the input port along with the log attribute processor inside the Combined Data Process group. Now that the output port on the input port are inside separate process groups which are also parallel to each other, we can establish the connection between them. But for a simple combining off connection data, we have ended up creating multiple process groups on input and output port for each. Off this process groups, this number of components will start increasing based on the number of flow file connections. You want to come by now that you understood the complexity off combining data using input and output puts Let's see how funnel gives an elegant way to handle the same For this. We will go back to the stage where we try to connect the output and input put without the process groups on Let's go ahead and Violeta Input and output puts next. Let me go here on. Add a funnel to the canvas on connected generate flow file processes to the funnel on. Connect the funnel to the log attribute processor. That's it. It's that simple. We can add more, generate flow, fail processes to the canvas and connect it to the same funnel on the final can handle it without a flinch. That's it for this video, guys. I want to restate the statement, which I have told a few lectures earlier we should consider now if I as a duplicate with various processes and components for us to use on as a Flo file manager, it's up to us to use the right set off processes and components toe, achieve our requirements efficiently. The scenario I have demonstrator in this video is another example for the Obbo statement. Thank you. See in the next video