Apache NiFi Complete Guide - Part 2 - Apache NiFi Advanced Concepts | Manoj GT | Skillshare

Apache NiFi Complete Guide - Part 2 - Apache NiFi Advanced Concepts

Manoj GT, Big Data Evangelist & JavaScript Lover

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
19 Lessons (1h 24m)
    • 1. Working with Controller Services in NiFi

      9:13
    • 2. Working with Variable Registry in NiFi

      6:04
    • 3. FlowFile Prioritization in NiFi

      5:10
    • 4. FlowFile Expiration in NiFi

      2:15
    • 5. Monitoring NiFi

      4:28
    • 6. Monitoring NiFi using Reporting Task

      2:43
    • 7. Remote Monitoring NiFi using Reporting Task

      7:18
    • 8. Data Provenance in NiFi

      3:20
    • 9. Overview on NiFi Registry

      2:22
    • 10. Installation of NiFi Registry

      2:18
    • 11. Configuring NiFi and NiFi Registry to Enable Version Control

      5:09
    • 12. Configuring NiFi Registry with Multiple NiFi Instances

      3:31
    • 13. Configuring NiFi Registry to Enable Git Persistence

      3:25
    • 14. Overview on NiFi Clustering

      3:18
    • 15. Limitation in NiFi Clustering

      1:43
    • 16. NiFi Cluster Configuration using Embedded Zookeeper

      8:09
    • 17. NiFi Cluster Configuration using External Zookeeper

      3:32
    • 18. Overview on NiFi Custom Processor

      2:55
    • 19. Our First Custom Processor

      6:46
29 students are watching this class

About This Class

Apache NiFi Complete Guide - Part 2 - Apache NiFi Advanced Concepts

What is Apache NiFI?

Apache NiFi is a robust open-source Data Ingestion and Distribution framework and more. It can propagate any data content from any source to any destination.

NiFi is based on a different programming paradigm called Flow-Based Programming (FBP). I’m not going to explain the definition of Flow-Based Programming. Instead, I will tell how NiFi works, and then you can connect it with the definition of Flow-Based Programming.

It is one of the fastest-growing Apache Projects and expected to grow exponentially in the coming few years.

How NiFi Works?

NiFi consists of atomic elements which can be combined into groups to build simple or complex dataflow.

NiFi has Processors & Process Groups.

What is a Processor in NiFi?

A Processor is an atomic element in NiFi which can do some specific task.

The latest version of NiFi have around 280+ processors, and each has its responsibility.

Ex. The GetFile processor can read a file from a specific location, whereas PutFile processor can write a file to a particular location. Like this, we have many other processors, each with its unique aspect.

We have processors to Get Data from various data sources and processors to Write Data to various data sources.

The data source can be almost anything.

It can be any SQL database server like Postgres, or Oracle, or MySQL, or it can be NoSQL databases like MongoDB, or Couchbase, it can also be your search engines like Solr or Elastic Search, or it can be your cache servers like Redis or HBase. It can even connect to Kafka  Messaging Queue.

NiFi also has a rich set of processors to connect with Amazon AWS entities likes S3 Buckets and DynamoDB.

NiFi have a processor for almost everything you need when you typically work with data. We will go deep into various types of processors available in NiFi in later videos. Even if you don’t find a right processor which fit your requirement, NiFi gives a simple way to write your custom processors.

Now let’s move on to the next term, FlowFile.

What is a FlowFile in NiFi?

The actual data in NiFi propagates in the form of a FlowFile. The FlowFile can contain any data, say CSV, JSON, XML, Plaintext, and it can even be SQL Queries or Binary data.

The FlowFile abstraction is the reason, NiFi can propagate any data from any source to any destination. A processor can process a FlowFile to generate new FlowFile.

The next important term is Connections.

What is a Connection in NiFi?

In NiFi all processors can be connected to create a data flow. This link between processors is called Connections. Each connection between processors can act as a queue for Flow Files as well.

The next one is the Process Group and Input or Output port.

What are Process Group, Input Port & Output Port in NiFi?

In NiFi, one or more processors are connected and combined into a Process Group. When you have a complex dataflow, it’s better to combine processors into logical process groups. This helps in better maintenance of the flows.

Process Groups can have input and output ports which are used to move data between them.

The last and final term you should know for now is the Controller Services.

What is a Controller Service in NiFi?

Controller Services are shared services that can be used by Processors. For example, a processor which gets and puts data to a SQL database can have a Controller Service with the required DB connection details.

Controller Service is not limited to DB connections.

To learn more about Apache NiFi, kindly visit my YouTube Channel. I have created a Playlist, especially for Beginners.

Transcripts

1. Working with Controller Services in NiFi: Hi guys. They come back in this video, we will see the practical usage off another important abstraction off knife fight called the Controller Service. Ah, controller service. It's a shad service which can be used across processes or other controller services. Let's go and see what it means with the help off a simple example. But before starting with the actual implementation, you should Outpost Chris or any other jail BBC compiling data base running in your mission or any other server which can be accessed from your mission. Now that you know the prerequisite, let's start implementing overflow. The first processor I would like to add is other usual generate flow file processor on Configure it to generate user objects every 10 seconds. The user object I want to generate. We look something like this to generate this randomly, I would update the custom text, attribute property off the generate Flow file processor with the falling expression. This helps to create the user object with some random number in the first name. Last name on em. A lady, also the user object will have the current data and time in the created on field. Next, let's go where on our dialogue attribute processor and follow our usual drill now that the generate flow fight processor vicinity we can go where and start the processor on. Inspect the flow fight content as expected, we got the user object with some randomness ended. Next, I would like toa inside this user data in my data base to do so. First, you should have a table in the database with their 11th user. Feels I assume you know how to create a table in the database. So I'm skipping the step. Please feel to reach out to me if you need any help in creating the table. I have already created a table off name TBL. Underscore Uses in my database. Go inside data to a database knife I provides on out off box processor court. Put a scale processor, so let's go ahead and add the same. But before clicking on the add button, let's read the usage description available below. Ask for these needles. Put a scale processor expects the flow fight throughout a scale statement to work. Another case we want We have the user data in Jason Format in the flow file. Just take a moment and think about how to fix this One way is to use the extract X processor on replaced X processor to create the SQL statement from the Jason Object. But before doing something like that, let's go ahead and search in the processor list for convert. Jason, Did you see that? It looks like knife. I provides a processor named Convert Jace onto a skill to do this for us. As for the documentation below, this processor will create us insert, update or delete a skill statement. Let's go ahead on at the same on corn figure took on whatever user Jason object to a SQL inside statement. Todo so we primarily need to configure three properties. The first property is the table name property, so let's go ahead and update this with the appropriate table name. Another case. It's tbl underscore uses. Next. We have to mention what kind of statement type. We want the processor to generate. Another case. We want a processor to generate inside statement, so let's go ahead and select Insert for the statement I property. The final property that we need to probably is the jerry BC Connection. Pull details, but hold on a minute usually when we provide J BBC connection pull configuration people how to give connection. You are user name, password, driver class and so on. But you're via one lee one property for the same. This is where the control a service obstruction comes to picture. We can go ahead and inspect the information off this property as well to conform the same. Did you see that? Asked for the help text. Also, this property request, sir. Control a service to work. So let's go ahead and create a controller service on Configure it as per hour requirement by clicking on the create new service option. This will prompt us to provide the type of controller service on the name. I will not change the type since we need the jail BBC Connection pool rather than hive connection pool. I will merely add post Chris at the end off the name and click on Create. That's it. This will create a controller service, but still we out to configure the details for the same to do so we can click on the arrow here on, except to save the processor configuration. This will take us to the list controller service screen you began used to configure button to configure our controller service. Let me go air and configure the database connection. You are on other required properties. You're I have given the configuration off my local DB Silver. Kindly make sure that you have updated it according to your database over details. Also, you will need to have the right driver motion according to your database, over on its version. That's it. We can click on OK and connect. They convert just onto a scale processor so that put a skill processor. If you observe you, we have three relationships. Come plateau over usually two or one. You're the SQL relationship we give the generator skill. Statement on the original relationship will give the original floor fight content. He never case it's the user object in Jason Foreman. This will come in handy if you need to insert the same data into a different table like an order table. So in this case, I will Acto log attribute processes on connect the convert Jace onto a skill towboat off them for a scale on original relationships. I will also order dominate the failure relationship now that the converges onto a scale processor facility. Let's go ahead and start the same on check the output flow Fight in the scale Connection Que Did you see that? We got the neckwear? Param it Christ SQL statement dio on the substitution for each feel will be available as part off the flow. Fail attributes. Next, let's go ahead and conflict and put a skill processor. The one Lee property we need to set in this case is the Jerry BC Connection pool property. So let's go ahead and inspect what it needs. It looks like this property also needs us to have a controller service off the same type as before. So let me go ahead and select the same control. A service for the put SQL processor as well. That's it. It's that simple. This is the cool thing about the control A service you can create it once on use it across other related processes, which are having similar functionality. Now that the configuration off put a scale processors also completed, we can go ahead and add it as part of our flow. By establishing the appropriate connections you feel noticed. I'll connected the free trade relationship back to the put a scale processor itself. For now, don't worry too much about it. Just remember, whenever you our processor, which is having a similar relationship description, you can also follow the same. Now that the floor is complete, let's go ahead and run the put Eskil processor on. Take a look at that database table. Did you see that the user object created by the Generate Flow Fight processor is now? Insert that successfully in the user's stable? That's it for this video. Guys. Hope you got a fair understanding off control of service and knife way control. A service is not limited to database configuration. You can do various obstruction or functionalities using it. We will see many such example later in this course. Thank you. See in the next video. 2. Working with Variable Registry in NiFi: Hi guys. They come back In this video, we will see a small essential feature available in knife fight called the Variable Registry . But before we start to see how to define a variable in knife fight, let's understand why we need to use the variable registry in the first place. If you take our previous example, we have configured the controller service too old over Jerry. Bc connection details These details will tend to change based on environmental environment , so it's always a good practice to keep the values which will change from one environment to another environment as part off a property fight or a variable on, refer the variable instead off referring the actual value itself. This also helps us in continuous integration on deployment off over data flow in knife fight, There are two ways to manage custom variables. Oneness. By using that knife, I you way by other variables window the another ways by referring custom property files using the knife. I doubt properties. Now let's go ahead and see how to configure the same in practice. The first approach I would like to show is using a custom property fight on before the same in the knife. I don't properties. Five. You know, I have created a fight under the corn folder off knife I and named it db dot properties. I have also added all the required database connection related properties in it with a meaningful name. Next, I will go ahead on at the property file Reference Student. I find out Properties file to do so. Let's search for the word registry on update its value to refer the related part off the DB , not properties file. We can also logically create multiple property files according to its business value on referred them you by using a comma separator. That's it. It's that simple. The one leak at yours. You need to restart the knife, for instance, after making any change for it to reflect. So let me go ahead and stop. Then I, for instance, by using the common knife I stop and started back by using knife. I start. We can also use knife. I restart command to restart knife fight. Now that then I face over a started Let's go here and see how to refer the values off this property in the way to do so. Let's disable on Goto the configuration off the controller service. We would like to update on use regular expressions to refer the appropriate property name. Now that we have done the required changes, we can go ahead and enable back the controller service on Start the floor Again and check if it works. But before that, I would like to know I let one key point. You can't use the variable registry for all the property values. You can only use it for the property with support, expression, language. Support us a variable expression. For example, your we cannot substitute the value off Max, Wait Time or max. Total Connections. Property with a variable registry since it's not supporting expression language. Now let's go ahead and start the flow and see if all works as before. It looks like there is no error in the flow. We can also validate the data in the database to double check. That overflow works as expected. Now that we are validated, let's go ahead and see the alternate way to define a variable that next way to define a variable list by right clicking the you weigh on select variables. Yeah, we can use the plus icon to create a new variable on add its value. The key take away from you a based variable lists. They are reflected immediately, and I have a scope based on the process group it's currently inside to demonstrate what time and I will create a simple example. Using a generate flow file processor Onda put file processor. In this example, I've configured the generate flow fight processor to generate flow file off size one bite every five seconds on Put the output toe the folder inside Might Users Data folder. I am also bundled all the processes inside a process group called Generate Data. Now I will create a new variable outside the process group on Name it or put folder on Set its value toe user slash data slash output one. I will also create another variable off the same name inside the process Group on assign its value toe user slash data slash output toe. Did you see that as soon as I have added another variable off the same name, it has overdone the variable which was created outside this process group. This is the scoping off the variable registry. The variables which are inside the process group will always hold ire presidents over the variables which are outside the process group you A base variable registry will come in handy if you want to define a variable in your flow without restarting the knife ice over. It also helps us to override an incorrect variable defined in the file using the property file based approach. Also, I would like to highlight whenever you create a template off a floor, which is having variable scorn figured in it. The template which God creator will also have the variables along with its value in it. That's it for this video, guys. Hope you got a practical usage off. Variable registry off notify. Thank you. See in the next video. 3. FlowFile Prioritization in NiFi: Hi Grace. They come back in this video, we will see how to prioritize flow files and knife. I prioritizes. Come in handy when we have data coming from multiple sources on you would like to process some data immediately after it arrives. Compact toe. Other later. Let's go ahead and see what it means with the help off. A simple example. In this example, I have created a process group with the name generate data on addict to generate flow file processes on connected them toe to different update attribute processes. I have configured one off the generate flow fight processor to generate data with the text . Some data everyone second on another generate flow file processor is configured to generate data with a text critical data every five seconds. I have also configured the update Attribute processes toe add a flo file. Attribute off name priority on set its value toe nine on one. To be more precise, we are telling the flow file with value some data as Priority nine on the flow file with value critical data as priority one, I have also connected the output connection off the update attribute processes to a funnel . You may wonder what I'm trying to represent you. Basically, I'm trying to simulate two types off data being generated from two different sources on setting the priority off data, which holds less value as low on the priority off. Valuable data is set too high. Also, the I priority data is generated comparatively slower than the low priority data. Before going on to the next step, let's start the flow and analyst the output floor fights available in the queue. As you can see your both. The high priority and low priority data is placed in the Cube based on its creation type. But this is not what we wanted. We wanted the flow fight to be cured and processed based on the priority attributes available. Inflow fight to do so Let me go ahead and update the configuration off the connection que year the configuration. We are interested in this available preorder faces on selector preorder faces. As you can see your there's no prioritization done on. Also, you can see we can dragon drop and select multiple prioritizes. This will come in handy if to flow files falls under the same value toe the first priority so that the next priority can be used to resolve the conflict. Now let's try to understand what each off this priority means. If we use first in first out prioritise er the flow file which reached the connection first will be processed first. Instacart if we use us flow fi first, prioritise er the flow file, which is newest to the data flow, will be processed first. You may wonder that bought the first in first out prioritise er on newest flow file prioritise er look similar. But if you read its definition one more time you will understand the primary difference between boot in knife fight the age off the floor fight will be based on the time the flu. If I got creator in the data flow on, it will be different. Compact At that time, the flow file entered a que you're the first in first out prioritise er will prioritize the flow files based on the time the flow file entered the queue on the newest flow file. First, prioritise er will prioritize the flow files based on the age off the flow file. The next prioritise er is the oldest flow file. First, prioritise er this is the exact opposite. Off the newest flow. Fe first preorder Taser. You're the flow fight, which is oldest Autodata Float, will be processed. First on this will be the default schema that is used if no prioritizes are selected. The final prioritise er is the priority attributes prioritise er this. Expect the flow fight tohave an attribute call priority on the value off this attribute maybe alphanumeric. Where is off Higher priority. Dancy on Oneness off higher priority than nine. In other case, we have already had a priority Attributes owed a Flo file. So let's go ahead on Dragon dropped the priority attributes prioritise er toe the selected prioritizes. That's it. It's that simple. Now that the prioritization is set, let's go ahead on analyst. The output flow files available. Indyk You one more time. Did you see that? Now? The high priority flow fights came to the top off the Q, and it will be processed faster. Compact so that low priority flow files That's it for this video. Guys. Hope you got a fair understanding off the flow file prioritization in 95. Thank you. See in the next video 4. FlowFile Expiration in NiFi: Hi, guys. They come back in this video. We will see. What is Flo file expiration on outside their expiry off the floor. Fight in knife. I, flo. File expiration is a concept by which data that cannot be processed within a particular timeframe can be automatically removed from the flu. This will come in handy when the volume off data is expected to exceed the amount that can be processed by knife fight. Now that we understood what this flow file expiration. Let's go ahead and see how to send the expiry off the flow file. In knife, I we can use the same example. We have used another earlier case on open the connection. You on. Look for the flow file. Expiration property. Yeah. The default value of zero seconds indicates that the data will never expire. Let me go ahead and update the value off the flow file. Expiration property to fais agains on click on a play. No, Let's wait for five seconds on, see what happens. Did you see that? The flow finalists removed from the connection que also you can see a small clock. I can't appears on the connection label. This Elster data flow manager to understand if there are any flow file expiration going figured by looking at the flow on the canvas, please be note that the expiration period is based on the time the data and deny, for instance. In other words, if the file expiration on the given connection is set to 30 minutes on the file that has bean in knife, for instance for 30 minutes three just that connection, it will expire on removed from the flu. We can use expiration in condition with prioritizes to ensure that the highest priority data is processed. First on, then anything that cannot be processed within a specified period can be dropped. That's it for this video. Guys. Hope you have got a fair understanding off flow file expiration on its practical usage. Thank you. See in the next video. 5. Monitoring NiFi: Hi, guys. Welcome back in this video we will see how to monitor your knife, for instance, using the various options available in knife I A knife is a powerful tool, and it can do multiple tasks in parallel, so it's essential to have a proper way to monitor. And I, for instance, this is where a patch in my face rich set off monitoring capabilities comes in. Andy. Let's go ahead and see these capabilities one by one, the first and the foremost statistics are available in the top status, but off the night for you way year, we can see a few wait'll statistics about the current health off Neiffer, for example, the act illiterate starts shows the number of fact illiterates currently running across the knife, for instance, on depict whole heart knife. I was working for us. The total Q data specifies. Khomeni flow files are percent liqueur across the entire floor. On it also represents that total size off those flow files. Yeah, we can also have stats off the total number off running processes, stop processes and processes which are invalid due to some configuration issues. If the knife I instance is in a cluster more or disconnected toe a knife, a registry for abortion control. We will see more statistics like how many notes are in the cluster on how many components are up to date are out of sync with the component version available in the registry. We will see more about clustering on my face registry in the later videos, where you will get a better understanding off what I am talking your No. Now, if I also provide starts at individual component level, each processor and process group on the canvas provides much more info about how much data has been processed by the competent in the past five minutes. It also allows us to see the number off low face that has been consumed by a processor. As for less the number off low files that has been produced by the processor in the last five minutes by default. Now, if I will take a snapshot off these five minutes, status sticks every minute for 24 us on, keep it. In a separate report, Street called the Status Upholstery. You can view these details statistics by right clicking the component and choose the new state as a street option. Please be no. Took the default configuration off the status supposed tree snapshot can be changed by updating the falling properties in life. I don't properties five in addition to the status sticks provided by each components. Now, if I will also notify when any issues off type warming or error occurred another flow This is called bulleted. It looks like a sticky note on it will show all the issues produced by a processor in the past five minutes. Don't demonstrate this. I created a simple flow with get file processor on, configured to point to one invalid folder path. Now let's go ahead and start the flu. Did you see that? We got a Nerio, which is notified by the bulletin indicator. Let's most over the bulletin to see the details off the error or warning message bulletin message comes handy for the user and saves a lot off time, which usually takes toe filtering through a lot. Fight to find us in plaster more. The bulletin will also indicate which Norden the cluster emitted, the other Even you can change the log level off the bulletins by configuring the bulletin level off the processor configuration available in the setting step. We can also use the bulletin board option of a level in the top right global menu to see all the bulletins that has occurred so far in the knife, for instance. Yeah, we can also filter the bulletin based on the component name, message and so on. That's it for this video. Guys hope you have got a fair understanding on how to monitor your knife, for instance, using the various options available in knife fight. Thank you. See in the next video. 6. Monitoring NiFi using Reporting Task: Hi guys. They come back in this video. We will see how to monitor the memory utilization on disk utilization. Often I, for instance, using that reporting task in knife fight. But what does the reporting task? Our reporting task in knife fight runs in the background and provides various statistics off the knife, for instance. Now let's go ahead and see how to add a new reporting task. To add a new reporting task, we can select the controller settings option from the global manu of a level in the top right corner. Yeah, we can go to the reporting tasked up on Click on the Plus Icon. This will show us that list off reporting tasks available for us to use the reporting cast . We are interested via this monitor, memory and monitor disk usage. The name off the reporting gusts are self explanatory. Monitor memory helps us to monitor the Java heap on monitor disk usage. Help us to monitor the storage space available for the specified directory. We can configure them to warn us if the memory utilization or disc utilization reaches beyond a particular pressure. Now let's go ahead and at both off them, then click on the edit icon to view the configuration. Yeah, for Monitor Memory Reporting Task. I will select the memory pool as Jeevan Ulgen and said that usage trash old as 1%. I have configured it the one person so that I can quickly show you the warning without reaching high memory utilization. In the real world scenario, we will usually configure toe some higher threshold value for the monitor disk usage Reporting task. I will again configure the threshold value toe one person on Will Point that direct relocation toe might use this folder. That's it. It's that simple. Let's go ahead and start both the past and see what happens. Did you see that? As soon as the reporting tasked our starter, we got some bulletins about the memory utilization on disk utilization reaching its special . Please be in order. Reporting tests are similar to other components. In 95 you can configure it to run a spot that run scheduled property. You can also provide an absolute value, like a G B for the threshold property rather than giving a percentage value. That's it for this video, guys. Thank you. See, in the next video 7. Remote Monitoring NiFi using Reporting Task: Hi guys, they come back. So far, we have seen various ways using which we can monitor and I, for instance, but there's one big problem in all these approaches. It's practically impossible for someone to sit in front, off a computer, on monitor for any others using the U. S. So in this video we will see how to monitor your knife. I instance remotely using reporting task knife. I provide various reporting tasked to monitor your knife. I instance remotely. Ah, few reporting task that we will explore in this video are site to site bulletin reporting task on site to site metrics Reporting task before we get started. Let's understand what the site to say. Site to say this A protocol used to send data from one night, for instance, toe another night, for instance, In a smooth, effective and security. You can also use I to say protocol to transmit data from any application which produced data to a knife, for instance. Now let's get started and see how to remotely monitor your knife. For instance, using the site to site based reporting task the important prerequisite for this example. Lis. We will need to neiffer instances since I am working in a single mission toe depict multiple instances off knife I I have created a copy off my knife. I folder on updated that knife i dot web dot HTT people property under Then I find out properties fight toe 8081 This will help me to start to neiffer instances in different ports One using the default port 8080 on another using the Newport 8081 this property come in handy to start your knife, for instance, using a different port rather than using the default put. This will be one off the stand that security recommendations in many organisations. Next we have to configure the knife for instance to enable site to site protocol This is disabled by default on can be enabled by updating a few properties in that knife I don't properties file the property We need to update your orders Remote input host on remote input socket put Since I am running both that knife, I instance from the same mission I will update the remote input host property to my local Oh Stripey. In the real world scenario, we will out both the knife. I instances in separate missions. In this case, you can update this value. Do the appropriate network. I p off the corresponding devices. Next, let's put some port which is not being used in this mission for that remote input socket port property. In my case, I will use 8082 on 8083 Please be in order. This port will be used internally by knife. I on you can still use the port on which the knife, for instance, is running to send the data using site to site protocol. Let me explain what I meant using a smaller presentation. Yeah, we have to notify instances running in port 8080 on 8081 on the side to, say, talk it ports for the Samos AIDS rate to an 8083 Now to communicate from the knife, for instance, which is running in port 8080 to the knife, for instance, ending in the port a 381 We can use the port 8081 rather than using the socket port 8083 That's it. Now that we have understood water side to side on how it works. We can go to the first instance off night five and add the required reporting task. The first reporting task I will add us. Decide to cite bulletin reporting task Next. Let's go ahead and confident. The same you the property. We are interested in our the destination. You are all under remote input. Put. I will not modify the instance. You are all property. Since my knife I instances running in the port 800 Let me go ahead and update the value off the destination. You are all property. So the second night, for instance, you water next. Let's update the remote input Port name, Property value Toe bulletin. You may think why I'm mentioning it as remote input Port name though the property name is actually input port name the reason for the Samos for the center to send data using site to site protocol, you need tohave input port in the receiving knife, for instance. At the root level, configuring an input port at the root level makes it a remote input port on. You can use the sport to send data from outside using the site to set protocol. Let's go ahead and see what it means. In practice, you're in the second instance. I have added an input port at the root level on connected it to a process group called Handle Messages. If you notice there's a small icon in the top left corner off the input put this denotes that input port is a remote input port. This icon will only appear if you drag and drop and input port at the root level off a knife, a canvas. Now let's go ahead and start the side to side bulletin reporting task and see what happens . Did you see that? A Sonesta reporting task. A starter. We got a FLO file inside the second knife, for instance, by other remote input port. We can go ahead and explore the content off the flow file as you can see your we got the bulletins available in one night, for instance, toe another night, for instance, using the site to site bulletin reporting task. Next, let's go ahead and add another reporting task called the site to Site Metrics Reporting task. The con figuration off This reporting task will also be same as the previous reporting task . So let me go ahead and confident the same. Yeah, I have given the input port name as metrics. I have also added a new remote input board. In the second, I, for instance, on connected it to the same handle messages Process group. Now let's go ahead and start the site to site metrics reporting task and see what happens. Yeah, we got a new flow fight of the metrics input put with a different set off values. To be more precise. We got various vital metrics off the knife. I instance toe another knife. I instance using the site to site metrics reporting task. That's it for this video. Guys. Hope you got a fair understanding off out to monitor your knife. I instances remotely this floor is not completed fully. You know, we just got very a statistics data from one knife I instance toe another night, for instance, which is running remotely to complete the floor and make it a truly remote monitoring. We need to do more transformation to strip the messages separately on notify it or email it so the user to take action. But I guess you got the glimpse off it. We will try to improve this floor in a later video. Thank you. See in the next video 8. Data Provenance in NiFi: Hi, guys. Welcome back in this video, we will learn about one off the key features available and knife. I called the data Providence. So far, we're starting one way one processor on exploring the input off each processor on Dender output produced by that processor. This approach is useful when we are in development, face to debug, overflow. But once you move your court to production, you need a better way to debug your flow if something goes wrong. This is where data provenance feature off. Knife comes in handy, but what does data Providence knife? I keeps a comprehensive track off the data by recording all the evens are played on a floor fight starting from its injection point till it's removed from the data flow. Ask the data is processed through the system knife. I captures all the details like when the data got transformed, split router aggregator or dispersed to other endpoints. All these informations are stored and indexed in a separate repository called the Providence repository. When your floors running in production on suddenly it's not working as expected. For one particular data set, you can use the data Providence feature to go backward or forward in the flow to see where the data came from, on where it went wrong. Let's go ahead and see how it works in practice. Yo, for demonstration, we will use the previously created can. What See us? We two. Jayson Flore. Let's go ahead and start the flu. Since the floor is completed one full cycle successfully. Let's see how to view the data provenance. You can view the data provenance by right clicking one off the processor in the flow or by using the data Providence option of a level Indy Global menu in the top right corner. As you can see your we have a detailed report on ventilator got created on when it's attributes or content got modified on when the data got dropped. You can also use the shoreline Age option to see a graphical representation off the flow file part through the data flow. You can use the slider below to replay the data flow backward or forward. You can also use the view details option by right clicking any note in the graph to get a detailed somebody. Yeah, If the content is modified, you can view the content before the modification on after the modification. If the attributes off the flow files or modified, you can quickly identify the change that tribute by selecting the show modified attributes one literate books. Please be in order. Initially, the stable is populated with the most recent 1000 Providence events that our could. But you can use the rich search option, which is available in data provenance, to really find they even do you want to, such on helping replaying any flow occurred in the past 24 us. You can change these configurations using their extensive set off settings available in that knife. I don't property file. That's it for this video. Guys. Hope you got a fair understanding off the practical usage off data Providence feature in knife. I thank you. See in the next video. 9. Overview on NiFi Registry: Hi, guys. Welcome back in one. Off my previous videos about templates I have vaguely mentioned about knife a registry on Bush and Control Using my fate registry. In this video, we will learn about knife registry. Indeed, a knife registry is a complementary project that provides a central location for storing and managing shad resources across one or more knife. I instances it is a separates. A project off Apache Nathalie. This means we must downloaded separately on it will follow a different release cycle and Bush in. But why do we need knife I registry in the first place? I assume you're working in a team on more than one person is working on a data flow than washing control off. The flow becomes complex for a very long time. People used knife I templates toe enabled version control on. It's such a pain. The reason for the Samos knife I templates is not created or optimized for doing motion control in the first place. So every time you must manually download a template and commit your changes to any other version control tools like DFS, or get also during this process, much is going to be a nightmare since you always get one XML file for your and their data flow on understanding. Oh, changed what? In a complex data floor template, XML will become tough to interpret. We will also out more complexity when we need to take the latest version off the template from the reports tree on much with our uncommitted local Russian. This is where knife I registry comes to rescue knife. A district provides a floor industry for storing and managing version data flows. It also integrates seamlessly with multiple knife. I instance us by allowing to store retrieve on upgrade wash and a tough loss from a registry. That's it for this video. Guys hope you have got a fair understanding off the usage off night fatal industry. We will dive deep into the installation and configuration off Naifeh registry with one or more neiffer instances in the falling videos. Thank you. See in the next video 10. Installation of NiFi Registry: Hi guys. They come back in this video. We will go through the installation off knife registry, Installing they fail. Industry is very similar to installing knife fight. Yo, I'll be installing knife registry in the Mac, but you can follow along even if you're using a Windows mission or any Lenexa or UNIX based operating systems. Now let's get started on install Knife Registry. For that, let's grab the knife registry binary from Knife Ice Website to do so. Google for Knife a registry and click on the first link. It will take a student I fail registry sub pitch Available as part off the knife. A website. Yo, you can find a download link below and that the links section at the time off. Recording this video The latest knife registry version of a Level list 0.3 point zero We can go ahead and download it as you can see your via sources and binaries. Let's grab the minor devotion. Feel free to download the file. Type off your choice. Since I'm using Mac, I prefer to download the heart disease a promotion. I already have the file downloaded and placed in my Users Tools folder So let's go ahead on untargeted Now that it's untucked. Let's take a look at the folder structure Yo, the folder. We are interested in this that bin folder. So let's open a terminal year and start knife a registry using dot My failure to street artists. Such start. That's it. It will take a couple of minutes for that knife fighter district. A boot up. If you're using a Windows mission, you can start the registry by using that run knife A registry, Not bad file. Knife Registry starts with the default Port 18080 Let's go back to our browser on type local host 1800 slash ny Fed industry. Great. We can see a page from knife it industry. That's it for this video. Guys, we have successfully installed knife registry in our mission. Thank you. See in the next video 11. Configuring NiFi and NiFi Registry to Enable Version Control: Hi, guys. Welcome back in this video we will go through the configuration off knife I and my favorite too strict to enable version control off of a data flow. To enable motion Control, we need to connect the newly installed knife a registry with our knife. I instance to do so we can select the controller settings options from the global manu available in the top right corner. Yeah, we can go to the registry client app and click on the plus icon to add and I fail. Just read details. This will prompt us to enter the U. N. Detail off the knife A little street along with the name and description. Let me go ahead and update the name as local registry on copy paste the same code that description field as well. Next, let's update their just reward with the appropriate value place. Be noted, I have only added the yuan off Beverages street. Tell the port number on Ignore the partner knife industry. This is very important. Kindly make sure you are also following the same. That's it. It's that simple. We have no successfully connected overnight. For instance, with the knife registry on now it's ready for washing control off over data flows. We can validate the same by right clicking anyone off the process groups available in the canvas. Did you see that? We got a new option called version. This shows the connection we have established between the knife I and a knife religious treat This working correctly please be in order. We can also connect multiple knife registry toe that same night, for instance, on vice versa. Now let's go ahead and add one off our process group as part off the version. Control by right clicking the Process group and select start version control under devotion option Police Remember, we can one Lee at process group toe washing control on no other individual components can be added separately. Now in the safe lotion dialogue, we can select the registry from the list off registries. Another case. We want Leah one registry called local registry, which is already selected. Next. We must select the bucket. We want to save the flu, But hold on a minute. What does it look? It ah, bucket is nothing but logical segregation off data flow on one bucket can out more than one flow associated with. To put that simpler, you can consider a bucket like a folder available in the file system, where we can keep related files inside it. In other case, the files are nothing but that related flows. Since we have not created any bucket, let's go ahead and create the same Indian. I feel registry to create a bucket. We can use the settings icon in the top right corner and click on the new bucket option. This will prompt us for the name off the bucket we would like to create. I will name this bucket as first bucket and click on Create That's it, were successfully created the bucket. Now let's go back to the night, for instance, on try to add the same processor, toe the registry again. This time we can see the bucket were created a few seconds back on. We can also use the drop. Don't to change the bucket If we out more than one bucket in the knife industry, let's go ahead and name the flow as first flow on copy paste the same for the description on version comments Huntley concept That's it. Were successfully added our process group to deny favorite distribution control. We can identify the list. Off processes which are up to date are out of sync in the registry, with the help off the appropriate indicator in the top left corner off the process group, you're the green tick mark shows the process group is up today. Let's go ahead and add some random components inside the process group and see what happens . Did you see that the grain tick icon is now changed to the great star icon? This denotes the process. Group is currently out of sync with devotion off the flow available in Denifl registry. No, If we write like the process group again on select version, we will get more options. Using this, we can commit the changes to the registry or revert the changes. Or we can build a local modifications. Let me go ahead and select the show Local changes option. This will show the detail list off the changes. Don't put a process group now that we have seen the local changes, let's go ahead and commit the local changes by providing some random comments. That's it. It's that simple. Now the process group again came back to green. Since we have committed the latest Washington deny fate industry that took for this video case. Thank you. See in the next video. 12. Configuring NiFi Registry with Multiple NiFi Instances: Hi guys. Welcome back In this video we will see how toe or more knife I instances can connect with a single life a registry on enable a team toe do wash and control off their data flaw. Since I am working in a single mission to depict multiple instances off knife I I'll follow the same approach off having Toko piece off knife, I running into different ports. I have also started both the instances. You think that you should start up common Now let's go ahead on access both and I, for instance, from the U. S. As expected. Now I have to notify instances landing in my mission in two different ports. Now let's go ahead and connect the newly created knife, for instance, to the same knife registry just to remind you can do this by using the registry client option under the control of settings. Now that it's connected, let me go ahead and add a new process group. But before providing any details, did you observe something new? You We got a new cloud I can't called input. This option is added since of a knife. I instances connected to deny fate, registry and you can use this to import any flow from the registry to your local instance. Let's go ahead and click on the same and see what happens. This will show us the list off Naifeh registries. Your knife, for instance, is connected toe on the list off buckets and flows inside that registry. Another case. We are one Lee one registry on one bucket and flow under the same. Let's go ahead and important. Please be noted. If you have multiple versions off the same floor, you can choose the version you want to import. Now that the floor is important, let's go ahead and make some new changes to the flow from this instance Oncoming the same. Next. Let's go to the other night, for instance, on wait for some time. Did you see that? You got a warning that a newer version off the same flotus available in the registry, and it will not allow you to commit your local changes. We don't taking the latest tuition. You can take the latest version by right clicking the Process Group on select Change version on that devotion option. Yeah, you can select the latest version and click on change That's it. It's that simple. Please be in order. Don't knife Ira. Just chase powerful. It's still in its earlier stage on missing a lot off important features, which you expect from a typical Russian control tool. One such important features. You can't take the later situation off the floor without reverting your local changes. So put it in a simpler words. You can't merger local changes with the newer version available in the registry. To overcome this, you can one Lee revert your local changes and take the latest version on Greedo. Your changes again and commit the same. That's it for this video. Guys. Hope you have got a fair understanding off using my phone registry for washing control off your floor. Thank you. See in the next video. 13. Configuring NiFi Registry to Enable Git Persistence: Hi guys. Welcome back. So far, we have seen how to use knife I registry to enable Bush and control off overflow. Please be in order. This will process the flow oceans inside the file system. Very knife registries running, but you may already use other version control tools like get or to your first toe. Maintain your source code on it will be great to use the same for your flow filed motioning as well. This is where knife registries get. Persistence comes in Andy for no knife, a registry one. Lee supports persistence with git repository. We can expect more Russian control tools, support to be added and then later releases. Now let's go ahead and see how to enable get persistence. Never knife a registry so that we can automatically save the version. Flows available in NY Fed Registry won't get reportedly but before starting with the configurations, there are few prerequisites. First, we need to ever riposte, recreated and get so let's go ahead and create a new repository on name it Knife A flows. Next, we need to have a personal access token toe access your get repository remotely to create a personal access. Toker we can go to the settings, then developer settings and click on the personal access token. Yo, we can use the generate new token button to create a new token. After providing the required token description and scoops, I will go ahead and generate the Koken. Next, I will clone the newly created Get Reports Tree Under My Users Projects folder. Now that the reports tree is cloned, let's go ahead and update the provider configurations off knife a registry inside the providers, not xml file. You're We are no comment out the existing fire system. Best flow Persistence provider on uncommon get best Flop assistance provider. Next, let's go ahead and update the attributes with its appropriate values. I will point the flow storage directory property to the local get riposte Relocation on put origin for remote to push property, I will human get user name. A newly created access token Values toe remote access user on remote access Password properties. Now let's go ahead and create a bucket. Another knife registry on Put one off the process group available another knife, for instance, to abortion control under the same bucket. That's a other floors persistent to get repository it's that simple. We can also valid it if the floor is available and get by. Refreshing. Do you weigh as you can see your? We have a folder created for the bucket on the floors available as a snapshot file inside the same. Each version off the floor will be maintained as a Newgate commit. That's it for this video. Guys. Hope you understood how to configure your knife Registry toe enable get persistence. Thank you. See in the next video. 14. Overview on NiFi Clustering: Hi guys. We come back in this video, we will have a quick overview off knife, a clustering as a data flow manager. Sometimes you may find it difficult to use one night, for instance, to process huge amount off data. Instead, you can use multiple knife I service to process large data sets by segregating the data set into multiple smaller data sets on. Sending these smaller data sets two different service to process them separately. But this creates a data flow management problem because each time when you want to change or update a data flow, you must make those changes on each server on then monitor eat server separately. Also, segregating big data set into multiple smaller data sets on processing it separately. Using different service will have its own complexity. To overcome this, we can use clustering feature off navy in knife. We can cluster my people knife. A service on each service in the cluster can perform the same set off task on the data, but each can operate on a different set off data by clustering than I face of us. It's possible toe haven't increased processing capability along with a single interface through which you can make data flow changes on. Monitor the date off you in a knife. A cluster. If you make change in one note, it automatically gets replicated. Toe all the notes off the cluster, also using a single interface. That later flow manager can monitor the health and status off all the notes. Knife I follows a zero master clustering pattern name in knife fight. We can use Apache zookeeper for cluster management on any fail Over is handled by a zookeeper. If you're hearing zookeeper for the first time, just remember it's an open so silver which enables highly relatable distributed coordination. Many open source software like solar users, so keeper for its cluster management in an if a cluster zookeeper elects one after knife, I notes aspect luster, coordinator on all other notes in the cluster will send status information toe this note. The status information is also called us heartbeat. It's the responsibility off the cluster coordinator to disconnect notes that do not emit any heartbeat status for some amount of time. Also, when a new north hopes to join the cluster, the note must first connected a currently elected cluster coordinator in order to obtain the latest flow. If the cluster coordinator decides to allow the note to join the cluster, the current flow is provided toe that note. On that note, we'll able tojoin that lister if the bush in off the flow configuration available in the new note before us from the version available in the cluster coordinator they're not will not be able to join the cluster. That's it for this video, guys. Hope you got a fair understanding off clustering in knife on how the election process works and a knife. Eight lister Thank you. See in the next video. 15. Limitation in NiFi Clustering: Hi guys may come back in this video. I would like to highlight one important limitation in knife reckless tree in a knife fight . Lester data is distributed for processing, but it's not replicated. I know this will be a little overwhelming for few to understand. So let me try to put it differently in a knife fight cluster. When we split large data set into multiple smaller data sets, each note will be given the smaller data sets it needs to process on. The North keeps this data in its disk storage. A copy off this data is not maintained or replicated in any other notes. So if one after notes and then I felt Lester goes down, that data inside that note must be handled gracefully in a knife fight. Lister, if one not goes down, we have a profession toe off. Lord. That note on the data in that note will be distributed toe other active, not slowly, provided that notice still connected to the network. But if they're not completely goes out off the network for some reason, say someone pulled a network cable off that mission, the data inside that note will not be processed or distributor toe other active notes until it comes back to the network. This is the side effect off, not replicating the data across Dick Lester on. We must live with this limitation. Tell knife. I supports data application across cluster. That's it for this video. Guess hope you got a fair understanding off clustering in knife I and its current limitation. Thank you. See in the next video. 16. NiFi Cluster Configuration using Embedded Zookeeper: Hi, guys. Welcome back in this video we will see how to configure, and I faked Lester using the embedded to keeper before starting with the cluster configuration. I would like to highlight a few key points. Please be in order each night, for instance, is bundled with the zookeeper instance inside it. In this video, we'll be using 395 instances on the three, embodied to keeper instances available within it. The One Lee Prerequisites Years. You should out more than one mission available to follow along on the connectivity between these missions must be enabled. Darkness. These missions should be able to connect with each other via Internet or intranet. Your toe depict 395 instances. I'll not be using my local mission like I used to do previously. Instead, I'll be using three digital ocean droplets if you are coming across digital ocean droplets for the first time. Just remember, it's something similar toe credibly. It's easy to to put it simpler. I have loaned commodity emissions according to my records. Specifications on I will be built based on the number off us. I have used these missions. I have already created three Lin X box in digital ocean on downloaded on untucked knife. I instance in all these missions now that the initial set up is done and you have understood the prerequisites, let's get started with a knife, a cluster configuration. The first configuration I will do is updating the TTC host file with the corresponding server eyepiece along with the name Yeah, I have named and I find notes, us knife. I know no one nor two and three. You may wonder why I must name my service rather than simply using I p address off the corresponding service. The reason for the same US. It's always a best practice story for your server using a host name like this rather than using the actual A P address. This also helps to reuse the same configuration across various missions just by updating the host file entries rather than changing the actual property file configuration each time when the mission AP changes Next, let's open the zookeeper property file. Inside the knife, I can fold it on at the falling entries. If your family of it zookeeper, these properties will also be familiar to you for newbies. This property tells we will have three. So keeper instances on each will have a name. 1213 please. Remember, zookeeper starts by default. Using the port to 181 on the port Range mentioned here will be used internally by zookeeper for electing leaders within the supercluster. To complete the zookeeper configuration, we need to create a file off. Name my I d Inside the zookeeper Data directory. According to the zookeeper configuration available year, the data directory is pointing to a folder off name zookeeper inside the state folder. So let's go ahead and create the required folders. Next, let's create a file off name Miley on Add the zookeeper server name inside the same. The server name for this instance can be one. Next, we can move the configuration off the knife Fighter Properties file, which needs to be updated. Yo, the first property we need to update us knife A straight management embedded to keep a start property. This property must be set to true hotel Notify to use that embodied to keeper the next property. Veneto A beard. This knife A zookeeper connects string property. This property must be set with the list off. Zookeeper server details separated by comma. Next, let's go ahead and update. The cluster is not property. Toe through Andi Cluster Note. Address property to the corresponding node name for this note. Instance the Norden Amis knife A node one. You also need to set some a new sport number value for the cluster, not protocol. Port property. I will go ahead and configure it. 2.28081 We also need to update the web. Http host property with the corresponding node name. That's it. We are almost done. Please remember the knife I notes within the cluster communicates that data with each other with the help off the side to side protocol. So we need to update the properties letter toe enabling site or sight protocol to complete the cluster configuration. If you remember the properties letter to site to site protocol are remote input host on remote input socket port. So let's go ahead and update the same with the appropriate values. Yeah, I'll configure the host with the corresponding node name on the port with another unused port 8082 Finally, to complete the cluster configuration, we need to mimic the same configurations in the other two knife, for instance, on the Corresponding Zookeeper Server index needs to be updated inside the my i D file off the Embodied to Keeper. That's it. It's that simple. Please do remember to update the e TC host files off. The other told the next box as well. You may out to also configure the host entry off the mission you're working toe access, their immortality, for instance, seamlessly. Now that we have successfully completed a cluster configuration, we can go ahead and start all the knife I service. But just a minute. I forgot to highlight two more properties inside the knife. I doubt Properties file using which you can control the election process off the knife, a cluster, the properties. We are interested. Here's the cluster floor election Max. Wait time property on the cluster flow Election Max Candidates property. Using these property, you can control the max time taken for the knife a cluster to finish the election on the maximum number off notes, which is required to start the election process, I will go ahead on Abd IndyMac's candidates Property toe. This radio says the maximum time taken for the election to start that is five minutes on the election will be conducted as soon as too active in I, for instance, joints Dick Lester. Please be know that setting this property is not mandatory on skipping the step will simply increase the time taken for the knife. I cluster toe get created to five minutes. Now that we have done all the required configurations along with the configuration which controls the election process, let's go ahead and start all the three night, for instance, is on See what happens. Looks like all the arena, for instance, got started on their formed at Leicester. We can validate this by accessing one off the knife. I instance available as part of the cluster using its appropriate host name BDB for port 8080 Did you see that? The knife, I instance off, Booted up on. We can confirm it's in cluster with the help off the new metrics available in the top left corner This matrix depict that have a knife. I cluster is having three notify notes on all our active. That's it for this video. Guys hope you have understood the steps required for knife I cluster configuration using that embodied zookeeper. Thank you. See in the next video 17. NiFi Cluster Configuration using External Zookeeper: Hi guys. They come back in the previous video. We are configured Knife a cluster using the embodied zookeeper. This is perfectly fine to do on. It can be used in production. The one Lee drawback in this approach is both knife I and two keeper are running in the same server. On if the server goes down, we will lose the knife and to keep, for instance at the same time. So it's always better to go with the knife a cluster set up using an external zookeeper so that we can run the knife a and so keeper instance and different missions. Now let's go it and start the configuration for the same. We can continue to use the same set up we have used before. The only change me out to do here in the knife I don't properties. Filers changed the embodied zookeeper start property to false in all the three night. For instance, this will tell Notify not to use the embodied zookeeper and its corresponding properties. Next, let's go ahead and download the latest zookeeper version from the zookeeper Official website. Yeah, I have already downloaded on untied zookeeper in my virtual missions so Let's go ahead and configure the zookeeper instance by renaming the existing zoo. Underscore sample, not see if defile dozo dot c f g Please be noted. This file is available as part off the Corn folder off the Suki Passover. If you explore the conflict, file it as the same set off properties that we have seen inside the zookeeper dot properties file off Notify in that previous media. Yeah, the main difference is the data directly off zookeeper was pointing to a different location , so let's go ahead and create a folder off name Zookeeper inside the temp folder on Add the My 80 file Inside that with appropriate zookeeper. Several index. Also, don't forget to update the zoo. Underscore CFT file with the zookeeper and symbol related properties. That's it. It's that simple. Now that we have done all the required configuration, we can go ahead and start all the zookeeper instance using the DOT that case over not a set start comin. As you can see, all the zookeeper server has been started successfully so we can go ahead and start the knife. A instances next. Now that it's completed, we can go ahead and access deny, for instance, using the host name off one off the service in the cluster. Did you see that? The knife, for instance. I woke it up successfully, and it's looking the same as before. You're the one. Lee Differences Cluster Management is done by an external zookeeper cluster rather than using the zookeeper embedded within knife fight. This helps to run the knife like Leicester on Zookeeper Callister in different set off missions if required. That's it for this video. Guys hope you're understood the steps required for knife a cluster configuration using an external. So keep it Lister. Thank you. See in the next video. 18. Overview on NiFi Custom Processor: Hi Grace. They come back in this video, we will learn about custom processes and knife I, as you know, Apache knife. It comes with set off processes for your data ingestion or data sink or data transformation requirements. The latest Russian off knife I have around 2 80 press processes bundle on each assets owned responsibility knife. I have a processor for almost anything you need when you typically work with data. But still, there will be situations where you will not be able to use any off this Enbrel processor, which comes bundled with knife. I locator your requirements. This is where the custom processes off knife. I comes in handy knife. I provides maybe not types toe create custom processes or custom controller services, which is compatible and easy to include as part of our data flow. But whenever someone tells you can extend their tools, you get tons off questions regarding how easy it's going to be on how we can migrate. Discussed Um, court when a new washing off the stool arrives. But trust me, knife A provides the easiest way to create custom processes. Think for a second what a processor. Actually, this a processor and knife I Dixon, input flow fight and do some processing on top. Off it on produces an output flow file. It can have some properties using which you can configure the weight of processes Process that input data. That's it. The flow file, obstruction off knife. I makes it so simple and easy to create custom processes. Still, you may wonder how and way to start this is with the maybe not tape comes to rescue the knife I custom processor are makes her life easy by order generating the court required for us to get started. The auto generated code as everything we need on all we have to do is write our custom Java code inside it to process the input flow file to produce the required output flow file. Hi. I know this will be a little overwhelming for some, but be with me for some time. Once we jump into the practical example, it will all start to make perfect sense. That's it for this video. Guys hope you got a fair understanding off custom processes in knife fight peace Be noted. I created the examples off custom processes using java with me even on Expect you to have some working experience in maven projects. Toe Follow me alone. Kindly reach me out. If you need any help in setting up the development, admit ornament. Thank you. See in the next video. 19. Our First Custom Processor: Hi, guys. Welcome back In this video we will see how to create a custom processor in knife fight using the maven uptake. I assume you already have a working may even set up in your mission on I d off your choice toe work with maven projects. So let's get started. I'll be using a Mac on the idea off. My choices eclipse. But you can follow along even if you're using a different operating system on I. D. You just need to add your Java and Maimon sent up to continue first, let me open my terminal Inside my Users Project folder on Executive Command may have an ARC type generate. This will list all the various types off maven projects you can create using maybe not from this worst list. Identifying the right knife, a project this challenging so I would go ahead and type knife. This will filter the knife I projects from the whole list as you can see your via two types off knife A projects, a custom knife. I processor on the custom knife. I control a service. Let's select the custom knife a processor project by providing the appropriate number for the same. This will prompt me to select a knife, a version I'm using. I will go ahead and select 1.8 points. It'll next. Let's go ahead and provide the require Maimon properties. If you have family you're working with May when you'll be familiar with these properties. If you are new, kindly mimic what I'm doing. That's it. Using may have a knack tape. We have successfully created other first custom knife a processor. It's that simple. Next, let's go ahead and import this maven project another I D. Now that the project is important, we can go ahead and take a look at the fights which got generated for us. You're the two essential fights we're interested in. Are the Java file on the property file? Inside the Processors folder, the property file beloved the absolute class part off the Jell O Phile. This will come in handy if you want to rename the Java file. Please be in order. This file acts as the starting point for this processor. In a complex custom processor, you can out more packages and fight to model. Raise your good. Now let's go ahead and rename the Java file. US passed through. Since the file name is no changed, I will update the same inside the property file. Next, let's explore the court inside the Java file. As you can see your we have some basic good good step for us to use the system main advantage off Using the maybe knock tape, we got all the required court for us to get started on. We can refer the scoring patent to evolve this project. Now let's go ahead and understand that usage off each line off. Good year. The property off name my property is the input property, which this processor requires on the relationship off name. My relationship is the relationship type. This processor supports inside that innit? Mentor these properties on relationships are set to the appropriate class valuables. The next method via interested in Urus the own trigger mental. The on trigger minted will be called whenever a new flow file reaches the processor. Currently, this method is not doing anything. I will just add one line off court to complete it. The court I have added this, merely sending the input flow fight back to my relationship without doing any processing. I know what you're thinking we're not doing anything with the flow file in this processor, then what's the point? Right? The primary objective off this example is to understand the overall flow off custom processes. We will create another example in the next video, where we will dive deep into more details on how to create more input properties for our custom processor on how to validate and input property on how toe are are removed a relationship. Asper Overuse Kiss on. More importantly, we will see how to manipulate the content or attributes off the flow file. For now, let's go ahead and build this project. Now that the build is completed, we can deploy the NAR file, which got generated. But hold on a minute. What is in our a novice knife i r k file on It's very similar to a war. Are jar file. We can deploy than are failing knife I similar toe deployment off for face in Tomcat or any other applications over. So deploy the not we can go ahead and copy paste on our five, which God generator Inside the knife. I lip folder on restart the knife ice over. Now that the service restarted Let's go ahead and search the newly created custom processor under the processor list. Did you see that? The processor via creator God added, along with the other building knife I processes. Now let's go ahead and add the same to the canvas. Also, let's add our usual generate flow file processor on log attribute processor. Yeah, I can figure the Generate Flow file processor to generate some random data every five seconds on the same will reach the log Attribute processor. Why are our custom passed through processor? We can also see the properties and relationship off this processor, which got auto generated during other maven project creation. That's it for this video. Guys hope you're got a fair understanding how to create a custom processor and knife I on how to build on included in other data flow. Thank you. See in the next video