Anonymization with ARX | Hackademy _ | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

11 Lessons (54m)
    • 1. De-identification software

    • 2. Random Data Generator

    • 3. Create a new project and load generated dataset

    • 4. Configuration and testing of anonymization/privacy models

    • 5. Further settings and tweaks

    • 6. Lessons resume

    • 7. Anonymization through synthetic generation of data

    • 8. Insufficient identification

    • 9. External Data Source linking

    • 10. Reversal of Pseudonyms

    • 11. Conclusion

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class

Anonymise datasets with ARX software,

Meet Your Teacher

Teacher Profile Image

Hackademy _

Visit us at


In the true sense of the word hacking is about exploring, understand how things work and we can change them in order to make them operate as we want too.

See full profile

Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. De-identification software: we have since quite some theory techniques in order to anonymized data. Now we are going to apply it Using the Iraq's software. It's open source tool for transforming structure IT and personal data using selected methods from the broad areas off denim ization and statistical disclosure control. The software has been used in a variety of contexts, including commercial big data, analytical platforms and research projects and clinical trial data sharing. And for training purpose. The software is able to win the large data sets and community art, where any features and intuitive cross platform graphical user interface that you can find at that. You can check over your under, link the linking else on the description and you can learn download it. He's available for Windows Next and clinics, although it will also depend a bit off on the hardware that you're using. Because if you are using lo specs computer, you might as well just load a smaller data set because the larger that they descent, the more art where their software will require. Okay, as we're seeing, it's in an imposition to so you need to download and install your copy off the software because we are going to use it on the next practical assignment 2. Random Data Generator: nor to in any money state. Um, we need the data set to perform the an enemy ization, and we are going to use the databases from a coma. Karu, as you probably didn't know by now that the place is a collection off data that can have multiple inputs and the myriad off just the Richard information. The more valuable the data sent will be, since we are looking for debate this database that as multiple fields and different types off that that the solution was found out that mockeries website. This website allows users to generate synthetic data for free, and even though it described itself as the macaroon lets you generate upto 1000 rows off really stick their statement, we can quickly seat that some off the information There's not a fit right, for example, the dead rate that when we generate that he felt like the base and check some things like all the emails are conjugation off the first and second name most off them most emailed the minds. They are really suspicious. For example, they come from where else, like tiny Ural, that common dropbox and whatever on there are no the mind such as Gmail or Yahoo don't come , which are the most used on the every services. Also, the names don't feed the country's, and all the subjects from China and Western names a name such as Addie are billy are least it. As female, generous and more indicators could be listed as highly suspicious. Regarding that, they dissent that Takeru give us. However, the list. The reasons shows that there's something wrong with these synthetic, that dissent at least that it does not look riel, and they clearly indicates that synthetic. So the bottom line is the non ization process adapts to the data content we are using, and we can simply far simply co p the unanimity models and run them on another data set. We must adapt and configure in order to obtain consistent results. So this is the data set values that we are going to generate on mackerel. So I'm we are going to have an I D and disease and identifying attributes. Then we have the first name and last name. It's the positivity. Fine, because imagine that you have the first name, then you get to less get the last name, so you will easily identify the subject they mail. It's also mostly like unique to one subject, and then we have the gender in the country that are also quasi identifying meanings. That country, the country alone does not identify a subject. However, if you you combined with the gender, you get a high chance off identifying who the subject is also the i p address makarios, but can enters. Those are identifying attributes and also the classifications crimes from the basis that the defining attributes are associated with the high risk off identification that idea Children. They narrowed a correct record to a specific user. Also, the quasi identifying as we have seen attributes are unassociated with high risk off identification. In our kissed Mary, the general narrows the possibility defecation by one off. So the sensitive attributes include properties with which individuals are not willing to be linked with a sensitive attributes are not associated with any privacy risks, and they will be kept and modified. One might say that tub of classifications aren't precise. For instance, a user content. These are her computer Mac address. It's not technically complicated, however, our analysis it's based on the premise is that the data. We are generated getting it's accurate and corresponds directly to what the users are and use. So before we end up this life, I'll just Where is it? OK, I'll just show you the website gates on the description, and this is where we'll set up our data to conceive. You just have to change, adds a mattress over here that in their field and check on the road heart. You can also don't know that they to set some there must listen files and then you don't know the data and mark a rule will generate damn ups. Too much will generate the synthetic data for you as you have seen disease. It seems that that OK, so as you if you look quite to it, you'll see that there are some some things that just not fit right, like really strange names and stuff. However we want this to, but from some experiences, So for what it this it's so there's good. Okay. On the next lesson, we are going to create a new project on a Rex, an important our data set and said the defining classifications. As we have said, you can also download these. Take a set on the project files 3. Create a new project and load generated dataset: but they're softer. Iraq's arcs. They now This is the first looking at the interface as it gets. It's pretty simple. I mean, there's perform some pretty complicated. The expert in the end, as he considered turf, is it's not really that hard to perceive anyway. We will have our data marketed, and we'll set the types for the columns over a year. Okay, so let's start. Uh, first thing is that you go to the Faro new project called called his first Project. You can call it as you wish. Okay. And now we are going to import that. They said that we generated online on my crew. And to do this, we'll go to file import Payton As you conceive. There's more than one type of far that we can import their CSP. Also excellent database G DBC extension files. So go for the CSB. That's what mockery gave us, says CSP fire. So our next you know, we left to Bro's bronze for the file. Uh, there you go. Minus this one, as it could Still is a new file. Then we'll have another one with data anonymized. Good. Open up. That's hadn't already gives your seven information, and we'll just leave the default values. They did die. Okay, just living as it is. Finish now, according to the size off the data set will take more or less time to import all information. Does he conceive since ours not theirs that big? Well, it's the maximum allowed by Mark A rule, but already give us information. Now next step, it's to give ah, type to the columns that we are using. Okay, And so, as you might remember, for the ID. So we need to choose between the type identifying, quasi defined, sensitive or insensitive. No, I start with the I. D. Well sees each user as a specific I. D. These kind off type off data. Well, it's sensitive because it is defined because if an attacker knows that, I d well, we surely know what Who is the user that is looking for? So it's identifying now for the first name. Well, only with the first day we might not be able to identify the user so we can say it's a quasi identifying same goes for the last name. And now for the email. Well, e mails, even I would DDP Are they are considered a personal helman cell we can say it sits identifying. As for the gender, it's a quasi identify. Okay, the dinner alone does not give you the user, but it can get the options. By also we can say to them to fight now the appear address. I mean, this could be it's a bit subjective. I mean, the AIP it's not really attributed to you because it will reserve. So we consider the quantitative firing. Well, we can say it's more like a sensitive attributes. OK, so it's a sensitive for the Mac address. We can also say sensitive because we can change the Mac address even though something that with we don't do it often with something that we can do, we are able toe so we'll set is as sensitive. Actually, it's. However, as for the Bitcoin address, we are going to select edit defined Okay, so we are not always changing being kind address, at least if we have someone, it's a big current over here and now for the country. I will stay as quasi addicted fight. Okay, So just to do a little recap with the air identifying attributes are assisted Associated with the high risk off re identification, as you can see and we did never heard it the i d field and the name off the specific user. Now the quantitative find attributes they are associated with the high risk off red edification. Also in our case, Neri, it's the gender, if you consider never is that possibility by house also for the sensitive. Actually, it's a good properties with which individuals are not willing to be linked with. Okay, so sensitive attributes its we want to keep like those big like secret now the sensitive attributes we have none of those were the defaults once, and they are not associated with any privacy risks. And we will be kept and modified because I have seen ah, these classifications, we can discuss them, I mean, but for what I've seen, they are the most certain toe apply to this columns, but also feel free to change because they will also have some effect on the results that you will get with an enemy ization process. 4. Configuration and testing of anonymization/privacy models: Now we have some type sets to our columns. Now we're going to apply some privacy models. Okay? So in order to apply privacy model that we are going to set this to Lebanon, you click on the green. Okay. Button. But as he concedes, says cannot anonymized they don't know privacy model for sensitive attributes. Meaning that a representative actually messed ever privacy model. And easy is the Mac address and that I'd be address. Okay, now click on. Uh, okay, that was hard over here. Sorry. I'll just show it to you again. Okay. On privacy models put aside and for the i p address, he confused l diversity. And it'll go choose the same model for the Mac address. Good. Now we quickly come the green, Okay, but and you'll see that explore result already. Show a seven information over a year. Okay, this is good, but not perfect, because it's showing red and we are looking for the green box is not the red box. You can also, if you check over a year, as you can see, there's out. There's no hard put data. However, if you go back right click and a blind transformation Well, this I'm thinking over back here. As you can see, the data has been an animal. Eyes nevertheless hits way to anonymized. Okay, we just got the Asterix and this data even though we can say it is secure. Okay, there's no red identification risk. I mean, the usefulness off it. It's almost none. I mean, there's what are we going to do? This just doesn't as if many use uses. Okay, So what we need to do is to apply Iraqi often immunization to the other columns in order to hot put date them. That is an animal ized, but not so much an anonymizer. Okay, I'll show it. Simply conduct I d. Then on add it, Create a hierarchy. And for the Heidi's since disease a number, it's an interval. We are going for the first option is intervals OK next and over your disease a bit random. So just click on the right. Sorry, right. Mouse click and go for the add new level. And you can say that after our that before for job I mean digest do it that it does you wish because the go earlier is to make I ran them hierarchy can also create another one. It's not this one Sorry at any level. So the more levels the best, because after that, you can change. And so it's a good idea to change the ranch into the values so the data gets more and minimize. Now, over here on next, you can see that these levels and finish Okay, so disease goes for the ideal. Now, for the first name, click on the first name added Great American. Although over here we are using masking and we are living a bit with the default settings. So you're not going to complicate this? We've got you, OK, we'll just let this one good will do the same for all their remaining fields. Okay, E mail. We are just going to skip the I P address and the man caterers because they already ever privacy model. Okay, now four day mail at the barricade. Just click next, next, next by the jan There, find most. Be quiet address. And last, but not the least that country. Okay, perfect. Now, if you go back and we click again as you can see, the Clara's results show this. But now if we click again we'll see more options. We can now select more options to anonymized sedate them. And as you can see, we still have something like being a little bit off gibberish. Now, we three click over here are the best option. Mostly it's like that top one. Okay, so it's bed, as we have seen. It also depends on the output that you are looking for. But right click on it at blind transformation and with a bit. And as you can see now, on over here on analyzed risk, the risk is still really low. Meaning that they dates with urine atomize. Okay, So depending on the goal, we want the Ricks to be more or less on the all Okay. Over a year, as you can see, the disease, the original data set and these is the anonymized they descent. And now, in order to make it more useful, you couldn't go over a year. Ideas like level also very allow to We re a little trade like on the grain. You can also play these with technologists Russian limit. This is also an important value. Just also increase it. And by clicking on there, check, You can see that digs for results. Now we have way more auction that then we probably is at, because we are changing these values and also the levels. So now, if you right click over a year at Black Transformation, as you can see, the values start to accept differences. So as you can see, we cannot perceive what's the generators? We also have the first letter for the name. As you can see, the values have bean rearrange it and 40 analys risk. Now, as you can see, there's increased a little bit, though nevertheless, the that they said it's more useful because, as it was, it was. I mean, it was just a straight city. There's no good. So now, over a year, as you can see the risk, as in Greece, slightly, however, the date that will be more useful for what we're looking for. So no, my advice to you, it's that you explored is a little bit, and in the end, the settings that you are going to use over here will depend on the house put that you are trying to obtain 5. Further settings and tweaks: then anonymity. We can apply some other models so our and enemies actually process. And you can do it simply by going very young privacy models, like on the plus side. And as you can see, you have also these cannot omitting key map presents models. The most used are the cannon key and key map. Also, this one off likeness they are refused lead over all the 10 and a medium and also selected . I can also play a bit around with their suppression limit. You know, the higher tomorrow the U. S utility that data centre left. So it's just a good ratio over here, and then it just the minister of trying out and see what it this. Also, the main goal is to arrive at deer at half off the identification. And as we have previously seen, they're the type of really difficult your risks for the prosecutor, journalists and market here at that model, as we have seen on their previous the slights. Okay, No. Over here, Doug, um, we can also choose the key map model also known as the car no map. It produces the need for extensive calculations by taking advantage off human Parton's recognition Kevin capability and also permits the rapid identification and elimination of potential rest conditions. And it proved to be a good solution for these, data said. And also it's over this one key, mad good and also selected. Also, some settings were tweak it in order to produce a more compressive final result on the configuration transformation general setting step. The suppression limit, he said, to 7% also tend it's a good value are more than welcome to try with another settings and share your results. 6. Lessons resume: elected privacy models and configurations had put it an increasing result, even though the names are for wild cards these days that could be used to compare values off the Jenner I P addresses and neck address with no practical risk off Theoren immunization. Besides all the plight techniques we must take into consideration social engineering. If a doctor knows that I'd be address or the make address, the subject will be easy. Identifiable information is secure as long as we make it to on a Now is risk her year. We have the following output, and there are two men columns that we need to differentiate president. If ours and it is the distinction value, it's where the point where the degree to which the variable makes the records they stink and a separation. It's the degree to which the combination off variable separate the records, the lower the results that more minimized that they dissent thes. Nevertheless, the last name feels does not have a significant variation, and this is due to the fact that this is a unique value per record that directly defies the user, make it it and arguably an anonymized, it'll feel, and that's for the name. Uh, sorry. Now there are some further and an immunization that we can do and see in such data set as it sounds characteristics. Other privacy models were applied as we have privileges seen over a year. Okay. Can also take change these values and nor to try to reach the best at put possible. Also, the conclusion is that there's no secret recipe to anonymized data. It will always depend on the U. T might go and out years off the anonymized output. If the data is shared between two secret agencies, they could choose to slightly and animates the data. It is okay to share information with your allies, but keep your citizen state that private and a neporent on the other end on the other side . If we are giving away data to the world, right where, like the Netflix situation, it's best to take serious efforts to secure the date and user information. A r X air X business, off use and simple interface allows novice users to anonymized data and never to the last is can give us the user of false sense of security and leaders to publish sensitive date that however, as we know, no system is 100% secure and the risk is always present 7. Anonymization through synthetic generation of data: No. Now imagine that you got some data and that you want to minimize it and one way that we can do it. It's still inject synthetic data in order to send to an animal eyes a really data set with sensitive. Actually, it's okay, so I'll just took a quick check on the index. There's this quick introduction. We'll also see some that generation services, as we have seen as Whitmore like mackerel. Then we'll analyze the synthetic that injection, and we'll check anonymity benchmark and a conclusion. Okay, so the goal here is to generate a set of values that match the original data set and that to follow the same Prentice instructor off the original data sent. Then we are going to replace some values. Same random values on the original data set we to the data set that was generated with the synthetic data values. At the end, we are ready Air X benchmark to see if there was. There was any change on the results regarding the text that someone can do now. As we have seen, mockery allows to generate it 1000 rows off really stick test data we can also exported, stated to this yes fee. Jason is SQL and Excell formats and can also generate Cem that without having to register on the website and you don't need a set of programming skills, nor it'll quiet this data so it's not technical and its greatest straightforward, and it's also free. However, as we have seen, there are some issues because it gives us that that does not, they respond. It's not likely to correspond to the truth is there seems one off the roads say's that the first thing is rice. However, the country's from shyness, so it's Yeah, it's a bit suspicious on the regarding the data that we are getting on the other end. It's free. Was it a genuine Inditex around the several fields he'll types. We can also use them regular expressions to create our own feel. Feel the types and there's also a free a p I where the user can mimic back and it's not. This is the case that if you're writing some application and you need to search some data, you can also use their A P I. And if you check on the website, you can see it really easy and simple to go. If your interface to get the data now on the other end, we have data bake. It's also free. However, it allows you to degenerate a little bit more roast and mark a room. However, from my experience, it's that it doesn't always work. If you have more than 1000 rows, the loading time will take a lot of time. And at the end you will not have any health poot or a data center bone low. Because it's simply it's maybe it was just my bad luck. Who knows? Okay and that the strong pointing to the local definitions because it allows you to set name from a specific region and not having Western names with Asian nationalities like we have seen with rice from China. Uh, nevertheless, we can normally export the data on the C S V former So it's a bit limited over a year. Okay, so we need, um at the beginning. We are using these they descent, and we needed to ask them identifiable values. And we are using this tack Overflow Developer survey from 2017 and the goal earlier. It's that stock stack overflow every year conducts a massive survey off people on the side covering all sorts of information. So it's quite the beak. They just said that we are checking and confirming. You can also don't know this data set and the data set original, that the city is so big that I did even deleted some rows and columns because it was taking too much time for my computer. 2% all the data and that Unite, chosen for the status said, because off the structure, the content it does also has some sensitive fields. And it is a subject that I d subject. It's one that I have a particularly interesting. Okay, so the first goal is to use Marca room to make a data set like the one the original one. Okay, so because we want to mimic they dissent where we are going to inject some affected. Okay, so we start by creating all these fields that you can see you're on the prince cream and most of them, they did use them default properties. However some others sees, we have specific columns like the formal education. We had college, primary school, wherever those fields, they had to be confused in order for it to work and select between the values now for database to tell her a specific data set. Well, the local definitions they are A. They are a bless regarding that to be bake, it's easier to set it. Also, some more options regarding the distribution settings that we can shoes. There are several options, so it's It's not just random, like mackerel. And there's also the chance to create some relationships between the columns. So at first that the right might look a little better than Marco Rubio, 18 a. M. It does not allow us to import external files with data. So imagine that you would think sport is set off options that you have mentioned for the college. And you cannot think sport that important dated today the bake. So you have to insert the values one by one. Also, we cannot register on the side, so you need to save the link, and if you lose the link, your efforts will be lost. And also I never managed to. Dio got more than 100 rose because because they export took too long. An attendant gave us nothing. Also, the 1000 rows export takes a bit longer, then Mark aru Okay, now for Iraq's disease that labels that I did set for each of the columns. We have the identifying, sensitive and cause identifying absolutes. I did notice that defined as we have seen, it's associate ID with the iris cough unification. So we are talking here about I D cards and specific numbers or emails. The question City Far attributes are sited associated with high risk off identification, meaning that it could be the first name. Okay, that's a question. It defying the last name. It sells inquisitive defined meaning that if where we have both the first name and the last name, it will be really easy to identify the person and insensitive in tributes its properties which individuals are not willing to building with. Okay, so, no, we did use air next to an anonymized data, but this time we are just going to perform utility and risk benchmark and comparing the results results Okay for the this is for the original data set. As you can see, these are these are the values that I'm getting now with more karu. What I did was I copy that they values that they had on the synthetic data step and based it on the original data sent. And the goal is to compare the benchmark results that I previously at Nevertheless and the beat. As expected, the utility and risk benchmark did not differ so much however we have. Ah, I did notice a major problem over here. It's that we have. When we have feels that they are free studies, it can type wherever he or she wants to. It's almost impossible to sensitize, for example, the salary. As you can see, some people just wrote a big number. Others even inserted that place e to and then they resent it. Also, we cannot, um, imagine or plan what people are going to write. Also the automatic Schofield this I'm talking about the break it gave us generals as a nurse. McLean apartment in your journalist, whatever. So it's not connected to I D. That our data set is regarding i t. So if you have a jealous geologist ing working in I t, it's like it's it's fake, so it's not really And when we talk about that, the bank, as we have seen, we cannot import CST files and the lack of configuration, it does not Spence the available number off rose that we can generate when it weekend and right nevertheless to you to leave result is the same as the previous benchmark, meaning that the injection off sympathetic data did not produce any change on the bench work. Okay, justice set me up even though the injection off syntactic that will help us significantly to reduce the risk off identification. Because one part off the values will be synthetic. So they have no riel. They cannot represent a real danger. Nevertheless, we need to tailor their synthetic, date them to the original data set. And sometimes this can be really hard or even impossible to do. Okay, so it is will very also according to the original data sent nevertheless also with injection off synthetic data, we do notice that same values wants Since they are syntactic, they did not real, so they can be highly suspicious and also the Remember that if we are inserting injecting synthetic Tatum, the hard part four statistical analysts will not to be as accurate if as if we only adds the real date. Okay, so one, since we are rejecting fake that that we cannot expect and real output off the results of our analysis that we want to perform now. The conclusion is that the possible solution will be a tool that analyzes a set off data and automatically creates patterns by learning how to generate similiar data. And with this, we might have to that learned of the rial data center is created and generate synthetic data from there that for what I was still in the asserted for there's no such toe, at least for the time being. But maybe in time this tool will appear. 9. External Data Source linking: as we have seen what sometimes causes the data to be de anonymized and exposed. It's what an attacker does by connecting to another date asserts and another information. Although we have this practical cases off the Massachusetts governor. Medical records from 1997 where the Massachusetts Group Insurance Commission they publicly made available a little set off identified medical information with your objective off promoting the improvements off health care and the control off the respect and the respective coasts. And when we are talking about two sensitive information on the database, we're talking about the birth goat Jenner hospitalization dates, diagnosis results, off examinations and also the Rev Nous and close in current. Now the commission, with the support off the off Governor William Weld, assured that patient's privacy was safeguarded since personal identifiers be eliminated. Although a researcher, LaTanya Sweeney, argued that much off his medical information could be easily really in the fiber based on Lee on Tree, quasi identifiers it off birth, sex and deep code, and we are talking, she crossed the medical information with the Cambridge Messages messages. Its voters list that she persisted for $20 and with this, she managed to read, Identify our off the governs govern ALS Medical information Onley six hospitalized people share the same date of birth off which only tree were male, and the last attributes allowed to complete the red identification seems off. The true potential candidates on Lee, the governor governor, lived in a certain postal coat after 2000 tree. The release off his medical information would not be allowed as it would violate hyper. Now there's also the quays off the Greek database regarding the Brits Christian off medicines where they replaced the names off the book, prescribing doctors with random ideas and applied various and any musician techniques. And we're talking about K anonymity and differential privacy. They were that these techniques were applied to several variables, including generalizations about the type of drugs prescribed and medical specialty off the prescriber. Seeing instead, data also indicated the hospital. In some case it was possible to identify the prescriber by his Facebook page, where he was informed with some precision the days when he was on vacation. As you can see that crossing the information with the stiff occasion map off each off the hospitals, their ratification would be achieved even work successful 10. Reversal of Pseudonyms: or the less type off attack we have. The reversal offset their names. We have this case scenario regarding the New York City taxes, where some public data that was supposedly anonymized words made available and it was regarding the taxi journeys met in the city. The day is Fertitta characters accident, missed accidents, more attributes resulting from the application off nd five s function. However, these and the five function regarding the taxi code and taxi driver Lance's license number . It was easily reversible. Using a brute force attack, Ning got in this case, about 22 million calculations will be sufficient to obtain out possible combinations and consequently allowing the entire database to be obtaining, meaning that only because one function is hashed, it's encrypted. It does not mean that it is reversible, even though this ash function it's by definition on directional meaning that it can only be encrypted is they're not implied that it's technically irreversible on though in fact, technically, no function is irreversible. And, for example, take the example off a Texan it education number. It consists off nine digits off which the last digit is the control digits, so there's a maximum off 10 Power eight possible combinations. Now there's an example. If coding off these attributes was adopting using the sheriff shot one hash hard Britain And assuming that the debut usage allows rate off 400 metres per second, all combination off the number will be calculated in less than one second. Nevertheless, one way to make this type of southern organization safer. It's by darting, adopting Nashala, Britain using a dynamic salt Upton and by combining a private key with the representation off the contents off the record itself. It will also depend on algorithm that you are trying to use because no Britain can if is only strong as the power to encrypt that it s meaning that just because you have encrypted value, it does not mean that it is secure because Hogg written also needs to be strong in order to protect that right 11. Conclusion: to some about the things that we have seen. We I would like you to retain mostly these four points. Even when the risk of re identification is it going to be residual, there will always be a flight risk. Edit that meaning that you talker. We must always supposed that the attacker as more information on the subjects that we are trying to minimize. Also, if it's not required to maintain the Christie off the data sill immunization, maybe an appropriate option allowing preservation after original structure off the data set . And this does applied to data sets that we might as on our company, that it's best to anonymized them. Also, the result off the normalization process will not always be considerable acceptable and GDP are per set. Procedures, including security controls should be implemented to ensure confidential confidential team in the processing of personal data. These days, apply to companies that operate in European Union and will change according to the situation and the effective balance off the immunization is to find the correct balance between the risk off identification veracity is infamous and data quality and these Sam's best off the in a difficult Asian and identification techniques that we got to see. Okay, so I would like to, of course, the thank you for the time, that sailing for acquiring discourse and just thank you day. There's a little bit off a challenge over a year that I advise you to copy. This text anti should try to decrypt it on over on this side. And I hope you did enjoy all these lessons.