Web Forms 2021 - Part 3: What is URL Encoding and why does it matter for Web Forms? | Clyde Matthew | Skillshare

Playback Speed

  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Web Forms 2021 - Part 3: What is URL Encoding and why does it matter for Web Forms?

teacher avatar Clyde Matthew, !false | funny, because its true

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

9 Lessons (59m)
    • 1. Class Introduction

    • 2. URL encoding

    • 3. What is a URL?

    • 4. What is hex?

    • 5. How does URL Encoding work?

    • 6. International characters

    • 7. How do spaces work?

    • 8. JavaScript encoding function

    • 9. URL Encoding - Class Outro

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.





About This Class


What we cover in this particular Class?

We will cover of ton of information in this entire series, but for Part 3 I'm going to teach you about URL Encoding.

Web browsers request pages from web servers by using a URL (the URL is the address of a web page, like: https://www.google.com).

But in order for data to be transported over the internet, certain rules need to be followed. Specifically, URL Encoding needs to take place.

This class is all about URL Encoding. URL Encoding is a mechanism for translating unprintable or special characters to a universally accepted format by web servers and browsers. The encoding of information can be applied to Uniform Resource Names (URNs), Uniform Resource Identifiers (URIs) and Uniform Resource Locators (URLs), and selected characters in the URL are replaced by one or more character triplets comprised of the percent character and two hexadecimal digits. 

In layman terms, URL encoding converts characters into a format that can be transmitted over the Internet. Another nuance is that as per RFC 3986, characters found in a URL must be present in the defined set of reserved and unreserved ASCII characters. But since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format.

In order for URL encoding to take place, a two-step process is usually followed. The first step is the conversion of the character string into a byte sequence with UTF-8 encoding. Secondly, each byte that is a non-ASCII character is converted to “%HH,” where HH is the corresponding hexadecimal representation of the replaced byte. URL encoding can help in the conversion of non-ASCII characters to a format that can be transmitted over the internet.

Why is this all important?

Well, URL encoding is important for users and servers to be able to correctly interpret and retrieve URLs. Incorrect URLs can result in a high number of error codes. Each error code, in turn, can be interpreted by search engines as poor maintenance of the website.

It’s also important because when you’re dealing with forms, and specifically the GET request, you will see characters appended to the URL. By taking this class you’ll understand what all the ‘seemingly’ random characters mean.

So what are you waiting for? 


--- in case you're wondering, this entire series is about web forms. 


A web form is also known as an HTML form. It’s a place where users can enter data that’s then sent to a server for processing. Web forms allow users to place orders on your site, to provide their name and email address to sign up for a newsletter, or to sign up as a member to your site and so forth.

What’s really great about web forms is that there is no “one size fits all”. You can use your artistic flare, and personal business acumen to create web forms with a particular length, format, content type and appearance.

By doing this course, you’ll be able to improve your web form usability, which will ultimately enhance user experience and get website visitors excited about completing your form and converting.

Why is this course so important?

  • Forms which are on point present an opportunity for a company to grow and capture loyalty.

  • A form can often be both a marketing tool and a necessity. A form that puts the user at ease, that evokes feelings of trust, will get filled out far more often than a form which looks (or is) complicated and confusing.

  • After completing this entire Skillshare series, you will be knowledgeable, confident and the “go-to” person for forms.

Let me share my form building skills with you

Understanding how forms work will equip you to become an awesome front-end programmer. Learn how to implement your creative, different and dynamic ideas into your website. Master forms and you’re half way to becoming a full stack web developer. Take control through understanding. Delivering a perfect, interactive and functional form is challenging. In this series, I take a deep-dive into explaining web forms and how they work. Why do we need to wrap our form within <form> tags? How can you include a progress bar in a form? How can you customize a toggle or checkbox? Where does the data go once a user clicks on the submit button? How can you perform validation on your form? How can a user upload a file? What happens once the data arrives at the server? What are the different types of events we can listen to on forms? By understanding these questions, and many more in the course, you will be able to build advanced forms and better yet, understand the full stack process! In other words, you will be able to create dynamic forms that improve user engagement and experience.


This series is huge and comprehensive, from basics to advanced

This entire series (which I've split into multiple classes) will give you solid fundamentals and practicals regarding forms. It can be taken alone (you don’t need to do any other course) to achieve your goals. You will emerge from this course with an advanced understanding and practical experience with building forms. It will take you to the point where you will understand what method (GET or POST) to use when sending form data, how to define where the data goes, how to perform advanced client-side validation (checking errors on the form before it is sent to the server), how to write custom pattern validation rules (using regular expressions), how to run servers and how to view all HTTP request information. This is awesome knowledge. This series will captivate you and catapult you to the next level and set you well on your way to becoming a true Grandmaster in front-end form web development.

By the end of this series, you'll be able to “speak” and “walk” FORMS by gaining an understanding of how you can build it, manipulate it and style it in meaningful and practical ways. 

Why should you learn so much about forms?

A web form is one of the best ways to get input from prospective clients and indirectly establish a relationship with them. The time you spend in bringing the user to your website should be matched with the time spent in perfecting the user experience with your web forms. It is surprising to see so many sites these days contain complex and frustrating web forms that cause a negative experience.

Ultimately, a web form allows your visitors to get in contact with you and to send information, such as an order, a catalogue request, or even a query, which is passed on to your database.

Can you begin to see how important forms are and how their use can be escalated?

*** The most important course on FORMS on Skillshare***

Successful programmers know more than rote learning a few lines of code. They also know the fundamentals of how HTML code works behind the scenes. This is particularly true when it comes to building forms. If you’re wanting to become a full stack developer, you need to know, really know, how forms work. You need to understand how the browser URL encodes form data, how the browser sends data to a URL that you specify, and how to perform validation to ensure the user does not submit invalid data.

A unique approach

You will learn "why" things work and not just "how". Understanding advanced topics about forms (URL encoding, accept-charset, multipart/form-data, regex, etc) is important as it will give you infinite possibilities and upskill you. Armed with this knowledge, you’ll be able to create forms that are tailored to your needs, and allow the form data to get submitted to a server via AJAX. You will be able to create forms that are customizable by the user (for example, if the user wants to change the color of the form theme). You can create a control on a form that displays the progress completion of the form and displays messages to the user along the way.

Can you begin to see how pivotal forms are and how important having knowledge about forms is?

How is this Skillshare course different?

There are lots of courses on Skillshare that focus on web development. Many never get into the detail about how HTML forms work behind the scenes – a skill that every full-stack developer needs to master in order to utilize potential.

In this series, I focus on the more advanced topics of true web development when it comes to forms. This includes understanding what all of the attributes on the <form> element mean, understanding the different <input> types in detail, what URL encoding is, and how data is sent over the wire to a server.

Practice makes perfect

Theory is theory … but there’s nothing like getting behind your computer and typing in code. That’s why we will be coding, laughing and pulling out our hair together as we code real life websites and exercises during this entire series.

Is this course for you?

Absolutely. If you fit in any of these categories then this course is perfect for you:

Student #1: You want to advance in the world of programming.

Student #2: You want to know how successful developers build dynamic forms that engage with the user, have high conversions that put them ahead of the competition.

Student #3: You want to gain a solid understanding of how forms really work

Student #4: You want to start using backend technologies like Node or PHP with forms


Right this second, your competitors are learning how to become better web developers.

Web development is a blazing hot topic at the moment, and in the foreseeable future. But you have a distinct advantage. This course offers memorable learning topics, actionable tactics and real-world examples.

Lets get started.

See you in the lectures.

Meet Your Teacher

Teacher Profile Image

Clyde Matthew

!false | funny, because its true


Ideas are a dime a dozen. The hard part is execution. Unfortunately, most people never carry tasks to completion.


I've worn many hats in my career …  As a result, I have an ability to view all sides of a coin, something that is becoming crucial in our tech-savvy world.  


My experience and a few words:


·        I’ve had to learn things the hard way (aka: hard slog)

·        I want to teach people what I’ve learnt, with the hope of making a meaningful impact (cliché, but true)

·        No one is a master of everything. But at the same time... See full profile

Class Ratings

Expectations Met?
  • Exceeded!
  • Yes
  • Somewhat
  • Not really
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.


1. Class Introduction: Welcome back to yet another class in this course and forms series. We are, we are learning all about forms. It's been a big hesitates. And in previous classes we discussed the form element, the Form Wrapper. But in this class, I want to discuss a very advanced concept when it comes to forms, and that is URL encoding. So what exactly is this class going to cover? Will, I'm gonna give you a complete overview as to why URLs look the way they do when you submit data via a GET request, remained on the form RecA, we have a method attribute, we need a seat to date. All the data in the form. Submit button is appended to the URL. And the URL in order to transfer data over the wire has to abide by certain rules. The RFC 39, 86, to be specific. And why is this important? Well, it's important because the W3C organization, which governs all browsers, has accepted that specification. It's accepted certain rules that browsers need to follow. And I'm going to be explaining this in this class. So it's going to be really, really exciting. I'm going to be talking about why the URL can only contain certain characters. I'm going to be explaining the difference between reserved and unreserved characters. I'm going to be looking at why sometimes you see a percentage hex, hex value in the URL itself. I'm going to be mentioning penny card and a whole bunch more. So it really is going to be an interesting, interesting class. And it's for everyone, That's the good news. If you're an experienced coder, great. You're going to learn something new, I assure you. And if you've never even heard of your URL encoding, will just cost us perfect tune. Because I really do start at the basics and I explain if Ocon weights are gone, right? See you now. 2. URL encoding: Before we move on, I just want to clarify something and that is, this URL encoding can get quite complex. It seems on the face of it, it should be really simple. The specs that govern URLs state that non ascii characters have to be URL encoded. Simple enough. Putting why does this happen? You can see behind me I've got a very simple, nothing special. And all I want to do is grab some Japanese text. So I'm just on a random Japanese texts website. Let's get the English word. Hi. Let's copy this text. Let's go back to our form pasted in a. What would you expect to happen if I click Submit? Well, as I just mentioned the previous lecture, we would expect URL encoding to have to take place. So in theory, we should not see these characters in the URL bar. We should see a percentage symbol followed by two hex values, right? Wrong. Look at that. We see those Japanese symbols in the address bar itself. We are dried. Well it is weird, but don't panic. That's exactly what I'm here for. An upcoming lectures we're going to be talking about why we see these color to international characters showing up in the URL address bar. It really is fascinating. And at first it's quite daunting because this should just be one big that governs URLs that would just make saints. And of course that's not the case in reality. They are different organizations that always think that can do better than others. And this results in a lot of confusion. Even browsers themselves have to decide what to implement in the baseball. What should they allow users to see? It is when is confusing. Don't worry, because in the next religious I'm going to be shooting a lot of white on how this all works. And by the end of it, you'll just be a bit of programming. You'll know exactly why those characters are showing up the way they do it. Anyway, audios see you in the next lecture. 3. What is a URL? : Okay, As I promised, let's get into URL encoding. And before I get into the nitty-gritty, Let's just take a step back and understand why it's necessary. In the first place. We know that computers can only deal in numbers. In fact, they can only deal in electrical impulses. Something's either they or it's not. In other words, it's binary. At the end of the day, it's a one or a 0. But we know in our language itself, there are a lot more than just two binary values. And of course we have a combination of numbers to represent letters, characters. So if that's the case, then let's look at an example. Let's say that my computer uses the following character map. For a, B, and C that assigns the numbers 1234. Pretty simple. But now let's say your computer uses a different character map so that they didn't pay it assigns the number five for B, the number six, number seven, and D is represented by the number 8 Pi. With this pose a problem. Well, let's say I wanted to send you the message Hi, In the numbers according to my carrying, the set would be 89 and those numbers would work across the wire to your machine. But for your computer, 89 will represent the letters D and E won't represent H and I like mine. So your machine's going to decode the message to day, more like hair. So can you see the problem we have with us? Sure you can. It just means that two computers won't be able to speak to each other. So to communicate effectively, we need a standard way of encoding characters to numbers. And of course, we've looked at a few. We've looked at ascii, ISO, UTF-8, et cetera, et cetera. These are just different encoding types that have been developed over the years. Okay, CLI, but this course isn't about encoding at Spot forms. Okay? Well, let's talk about encoding in the context of forms. We know that when daughter and an HTML form is submitted, the names and values or sent to the server using either the gates or the post method. By default, it's the get method. But anyway, let's look at a very simple form, just asking for a user's name and the password. And one of my courses wouldn't be for fold if I didn't incorporate one knee into everything. So here we go. Let's say the name of the users while the watermark and his password is secret one. And as I mentioned, the default method on a form is good. So let's just assume for now because we talking about URL encoding, that we using a get method on our form. What's going to happen now when the user clicks on it? Well, that's right. We're gonna get this very long URL string. And I guess those characters, the actual WALL-E, why, for example, That makes sense. That's also in the URL. But what's useful and what's interesting to us developers is what all these random characters, the question mark equals plus the ampersand. What is all this and why are they there? Well, my dear students, in order to understand this, you need to know a little bit about URLs themselves. So let's look at a URL. Let's just take a very simple URL. Nothing too crazy about that is there. But let's break it up. We can break it up into four parts. When you see which four-part. Well firstly, we've got what's known as the schema that always comes before the colon and the two forward slashes. And it tells the web client, tells a web browser how to access the resource. In this case, it's telling the width client to use the hypertext transfer protocol. In other words, it's telling the browser to use CGP to make the request for the resource. But we don't only have HTTP, we have other protocols, we have other schemes like FTP mail to get, but the ones we mostly use is HTTP. So that's the first part of this URL. The second part is this www.com that's referred to as the host. And you can think of this as just telling the browser we, the resources hosted on located. Next part of the URL is what's known as the path. And the path of the URL is optional. And it basically just dig deeper. It's basically telling the browser what local resource is being requested. And then finally, we've got this funky thing on the aim that begins with that question mark, what is that? That's referred to as the query string. We're going to be talking more about this later and you'll see it in some of our current query string is just made up of query parameters. And it's used to send data to the server. And of course, when you're not sending data, you don't need it. So this part of the URL is also optional. Okay, So that's a very high level overview of a URL. And as I mentioned the previous dictator, just as they are specifications for HTML and JavaScript, they are also speaks for working with your rise or URLs. And one of the major, major stakes that we're going to be looking at is the one specified by the Internet Engineering Task Force in a document called RFC 39 86. And I'll say in fact 3987. I don't want to jump ahead of myself. We're gonna be talking in more detail about these shortly. But for now, just know that URLs are designed to only accept certain characters, not all characters. Well, if they can only accept certain characters, what are they? Well, historically only us and ascii characters weren't allowed to be in the URL. But this poses a massive problem because often a URL contains characters outside of the ascii character set. And sometimes they're going to be reserved characters that are being used. Reserved characters like spaces, tabs, et cetera, et cetera. And when these characters are used, the browser needs a way to convert them to valid ascii format in the URL itself, because the spec says we only allowed certain characters. And my dear students, this is exactly what your own coating, aka percent encoding, is all about. The process of converting characters of the URL so that they can be safely transmitted over the Internet. So don't get lost in all the detail. This lecture kind of just was very high-level, showing you why we need to have encoding in the first place. Because we need a standard way of machines talking to each other over the wavelength. And we've got the specs that define how a URL should be constructed. Lump all of those together. We've got a URL encoding to ensure that all the rules are being followed. So that's what it's all about. But enough of introductions. Let's get into the meat. 4. What is hex?: Hey Ye, I'm super, super amped because now we're going to be getting into some of the meaty stuff when it comes to URLs, the first thing I want to say is that URLs can understand hexadecimal values. What does that mean, Clyde? Well, it just means that any character typed into a URL can be replaced with hexadecimal encoding for that character. For example, if we type google, the same 65.com In, that can be used to get to Google.com because 65 is the encoding for the litter. In other words, when you type that into the browser, the browser's going to swap the percentage 65 for the character e. And it's going to seem that honor. I know that so weird, right. Don't believe me. Let's hop on the browser and mini quickly show you, sorry, let me prove to you that a URL can understand both Hicks and ascii characters. They kind of go hand in hand. You've got hicks on the one hand, and you've got ascii on the other hand. And although they're two separate things, the URL understand both Hicks and SQL. Give me make this a bit bigger. And let's just type in the address bar, Google. But now instead of writing e, I want to write the percentages 65, as I just said in the lecture. And if we do that, and I hit Return, the browser automatically converts hex value into a character, an ascii character to be specific. Got it, Let's jump back to the lecture. They sought an action. I'm not making these things up. But most people will just move on from yeah, right. And I don't want to, I want us to understand what is Higgs you may be thinking. And that is a very good question. It's very, very important to understand the fundamentals before you start getting more advanced in your programming career. And don't get intimidated because hexadecimal is just a way of writing values. In fact, it's no different from the one we use every day. And that is the animal decimal, US-based obtaining. And it's great for us. And it was designed that way because we've got 10 fingers. So it makes sense that is based in, with decimal, we can use 10 unique digits to represent large values or small values. And I bet you can already tell me what those 10 you need digits are. 0 to nine. And it's a very unique space, tin, tin, unique values and they can be combined to represent numbers and teeny-weeny little venues. Okay, Got it. So similar to decimal. Hex just combines digits to create large and small numbers. Well then, how is it different to decimal? Well, he doesn't use only 10 digits pixels. More powerful. Ecs uses a seat of 16 unique digits. What digits does it use? Well, it uses 0 to nine, just like our fingers, just like decimal. And that represents Team unique digits, 10 unique values, but over and above decimal hex also uses the latest a2 if which represents another six unique digits. So you combine that and we've got Hicks, which is base 16. Client is base 16, but what does that really mean? Like a hostel there and get it. Let's okay. Let me try and show you in a slightly different way. Let me pick the startup, throw it at the screen. And let's just start from a blank slate. Let's say we've got a decimal system and a hex system, right? And we want to represent 16 unique values. Well, how can we do that with decimal? Well, very simply, we can just represent 16 unique values. Rotting the number 0 all the way through up to 15. Neck represents 16 unique values, right? And those lost six digits. These are repeated digits. It's just another one that's just another 23405. Extra one more 0 place. Some way. We've used all those digits before in the first 10 digits. And that's why this is known as base teeny. We don't ever use more than 10 unique digits. What about Hanks? As we just said, is base 16 and it uses the characters a through to eat as well. But you'll notice now that these are not repeated digits of it, they all six brand new digits. And this is why hexes known as base 16. Okay, cool, That makes sense, but how do we count in hanks? We'll counting in hex is very similar to how we count in decimal. Once a digit becomes greater than if the last character in hex, you start at 0 again and increments the digit to the left by one. This is awkward, confused. Diabetes don't really isn't that bad? Let me show you. So yeah, we've got two tables, decimal and decimal. Let's count up to seven. How do we do that in hex? Oh, it's got exactly the same digits run. So nothing changes. This is how we count in here. Now count from eight to 15. How's that going to work in hex already, I'm sure you can figure it out. We've got eight to nine, which is exactly the same as decimal. And then as we just said, we then get a if. All is good and well. But now how do we do say 16, for example? How would that work? Well, it's not that difficult. 16 and decimals, just the characters or digits 1 and 6. With headaches, we start all over again, but we increment the lift digit by one. So we actually literally have a one. And then we stopped from scratch, which the first digit in hex, Sarah. So what that means is if we want to represent the number 17, keeping the lift digits the same, and we just keep incrementing R1. Pretty straightforward and we just keep going on and on. I'm starting to understand it, but how do I know if a certain number see on the screen is hex or should it be decimal? Like how do I know what it is? How do I know what characters are represented to me on a screen when I'm looking at it. That's very, very good question. And let me pose it in another way. These characters on the screen, am I wanting a drink? I'm not trying to represent the decimal number 74 depositor double 716487749. Well, to avoid confusing situations like this, you'll often see a hexadecimal number prefix. And that prefix is kind of like a clue to us or to soft way that we are talking hex numbers. And what's really confusing is that we turn to bundy have one prefix, different soft ways, use different prefixes. I know it's irritating, but let me just give you an example of a few of the most popular ones I can think of. Firstly, and you'll definitely know this one with what the hash symbol. And this is also just a hex. And these are just color references in HTML. If you've coded any webpage, you would have used these ton, okay, what about another type of x, o x is a common prefix in Unix and sea-based programming languages. And we get a whole bunch more. Literally we get tons, we get and hash D, which represents a Unicode character, backslash x or h, etc, etc, etc. I don't want to get into the na. The most important one that we have to concentrate on now is one relating to songs, because this course is all about forms. And sometimes when we eat him with forms, you have to deal with URLs. So what is that prefix? Let prefix is the percentage sign. The percentage sign is used in URLs to express different characters. In Hanks. I hope this is starting to make seem to type worry if you're not quite sure how it looks in the URL, we're going to be looking at examples shortly. But just taking a step back, there are many different numerals systems out there. We know that binary is the language of computers. Binary uses two digits, zeros to represent unique values. We just looked at another numeral system. We looked at decimal and we looked at and hits this often used in programming. I don't know why it's just the way it is. And the community of URLs, the wave, they decided that URLs should be based in decimal. So if there's nothing else you've learned in this lecture, otherness. And that's okay because it is such a good thing to know that URLs are based in hexadecimal. That's why we could replace the percentage 65 with the letter E, for example, the prospect did that automatically. It can speak one in the same language. And anyway, now that I've introduced to you Hicks, I think we can move on or we can start getting into more complex topics. 5. How does URL Encoding work?: All right, I want us to now start talking about more about the actual URL encoding itself. Firstly, what is and what is not illegal. Url is formally defined in the RFC 39, 86. But as we're gonna see a bit later, they are others. And why is this so important? Well, it's important because the W3 URI specification has accepted this as being almost pick that should govern URLs. What are the main conclusions we can draw from the spec? It's not that difficult. They are just two broad categories of characters. Serve characters and unreserved characters. Reserved characters have a special meaning. They are reserved right? Like a forward slash or a question mark that represents the start of a query string. We really seen some examples of these. So those are the reserve characters and those kind of have to be treated in a slightly different way as you would expect. And then on the flip side, we've got these unreserved characters. And these have no special meaning. And because they have no special meaning, they are allowed to be in the URI itself. Let me just get rid of all this noise. As I mentioned, we've got reserved characters and unreserved characters and reserve characters are very special. The specification has chosen these characters to mean something very specific when it comes to URLs. And because of this, if you want to use a reserved character in your URL, then they have to be encoded. For example, we know that question mark is the start of a query string. So if you want to use a question mark in your URL in some area, how does the browser know whether that's now a query string or whether it's just your character, and that's why it has to be encoded. Makes sense. And reserved characters, traditionally, like I mentioned, are a limited subset of the ascii character set, but I've rigged the spec and hey, my opinion has spec does not explicitly state that this list is entirely exhaustive. I'm saying that with a bit of a tongue in cheek because behind the scenes they may still be URL encoding. But visually and talking about what the user sees, that is not the case, but don't get lost in all the detail. Encoding is particularly important for encoding characters that are not permitted to be in a URL. That's all that URL encoding is trying to do. Just take non-performing characters and transform them in a way that's safe to transmit over the web. So what are these special characters? Spaces, centered signs, tabs, colon's the equal sign, and a whole bunch of others. These need to be treated very specifically in the URL. And if we use them in a URL, they have to be encoded to distinguish them from the reserved seat itself. Is net kind of making sense? I hope so. But it's not that you don't have other questions. I mean, you might be asking, why does the URL not permit certain characters in the first place? Why can't we just have whatever we want? All your browser's just trying to make sure that all the characters you want to Saint with a GET request can arrive at the other end and the destination. And the browser has to URL encode some characters, like unprintable ones, spaces for example. And as I just mentioned, it has to encode characters with special meaning. Because if it doesn't, How's it going to know whether that query string is a query string or whether it's just a question mark from your side. So it just logically makes sense that the URL has to not permit all characters. In other words, it makes logical sense that URL encoding takes place in some situations. Let me clarify here, Medea students, encryption is not the same as encoding. Url encoding is all about sending your data over a network. It's about transport. It doesn't make your dots are safe in any way. Just wanted to clarify that. Cool. Have you got it? Good, but now, how does your URL encoding actually take place? What does it do? Your URL encoding replaces non-conforming characters with the percentage symbol, followed by two hexadecimal digits. But what is this non-conforming character? What I mean by non-conforming? Well, it's those reserved characters we were just talking about, isn't it? If you want to use a question mark in new URL, it has to be URL encoded because it'll be nonconforming. And what is that percentage symbol? What's that all about? Well, in a previous lecture, you'll know that it's just the Higgs prefix. And after that percentage symbol, we have two hexadecimal digits. Hexadecimal. Well, the good news is we spoke about that in the previous lecture. So it should all start kind of getting together right now. You should start understanding the structure, what these things mean. And before we move on, I just want to discuss one of the most frequent URL encoded characters you'll come across. And that is the character space. The space characters quite special because it's unprintable and they fought makes sense that it has to be URL encoded. Well, the ascii value of a space in decimal form is 32. But we don't care about decimal forms, Dewey, because URLs only understand Higgs and in hex, the value 20 is a signed to the character space. Another one, when I look at actually is the plus sign as well. That's a common one you're going to see. And that is represented by to be in ascii. So often in URLs you're going to be seen percentage, twenties, and percentage to be who? We are cruising through this. And amazing, quite daunting, but don't feel overwhelmed. In fact, the sets of reserved and unreserved characters are constantly changing with each revision of the specs that govern URLs. So it can be very confusing. But don't worry, once you understand that once you grasp the fundamentals of URL encoding, then it really doesn't matter what's reserved or undeserved, who cares, because we know exactly what's happening and we can always adapt as developers. All right, So I hope it's starting to gel a herbicide to make scenes. But in the next lecture are really wanna jump into these international characters. Remember that example where we used the Japanese characters and we could see it in the URL. Well, how's that possible? When the RFC 39 86 defines URLs is only containing a limited subset of ascii characters. Let's solve a weird. Well, the good news is we're going to jump into it right now. 6. International characters: Welcome back to yet another awesome lecture on URL encoding. And I don't want to hop on to much about your URL encoding for the reason that this course is about forms. But it's good to know because sometimes in forms, especially with the GET request, characters, are appended to the URL and it's going to be doing funky things sometimes. And that's why we've kind of taken this tangent to learn about URL encoding and this lecture in particular is quite advanced. There's a ton of information I want to get through, so please forgive me if I'm going too fast. If anything's unclear, please ask on cuny but let me ask you this. What is the address, what is it used for? That's right. And we have a trace used to point to a resource on the web, such as a webpage. You can think of a web address as directions. It tells your browser window go to fetch a resource. And currently, web addresses are expressed. They define, they written using Uniform Resource Identifier or your eyes. And we've already seen that these URIs are governed by certain rules. And these rules are defined in a document called the RFC 398. And the long and the short of it is that a URI is defined as a sequence of characters chosen from the ascii character set. And the key word here is the ascii character set. This essentially restricts web addresses to a small number of characters, basically just upper and lowercase letters of the English alphabet, European numerals, and a small number of symbols. Well, as I'm sure you can agree with me, my students times changed and uses expectations and the use of the Internet have moved on since then. And is now a growing need to enable use of characters from any language in web addresses. Why? Well, we've addressed in your own language and alphabet, it's just easier to create, memorize, transcribe, interpret case, and relate to. So it doesn't really make sense to restrict web addresses to the US ascii character set, does it? Well, no, it doesn't. But I wish things were simple. But unfortunately, when it comes to coding and development, things are complex. And the is not one unified spec for URLs or URIs. Many different places, different organizations have tried to attempt to write rules on how they should be governed. Don't want to get into all these different specs. And what these organizations say. The point I'm trying to make is that over the years there have been lots of changes. What kind of changes, Clyde? That's a good question. Originally and I'm talking way back now and the 90s, everything was defined as a URL. If we think that you wrote in an address bar, a Uniform Resource Locator. But then the term URL was later changed to become a URI in 2 thousand and later the RFC 39876, remember 39 86 that defines a URL. The RFC 39 87 defined an IRI and say that IRR is can be used instead of URIs. But do you notice still left with separate definitions? Is a URI and is an IRI. Don't worry, I'm going to be talking more about shortly. But for now, note that this RFC 3987 is important. It's important because the W3C has accepted this spec, which means all browsers need to conform to it. And remember I said that all the other organizations that have attempted to define URLs, one of them is this, what would Consortium. They've produced the neural spec, basically mixing ideas from URI's, URL's and high-rise with a strong focus on browsers. And it kinda makes sense, right? Because how confusing as it does, so many different speaks around these different definitions, that doesn't make sense. Actually just confuse it. So what else can I say about this? What week consortium, what did they try and achieve? Well, one of the goals was to align our S3 on H6. Remember that defines the URL and RFC tonight, which defines IRAs. And what's cool about the what week is that it's very liberal and water URL can except in fact, they say that a URL should be able to handle non ascii characters, which makes sense. And I guess unsurprisingly, they say that URL should be specified as UTF-8, which you and I know can contain more than enough characters. So it really would be ideal if the speck became mandatory. But as I mentioned, it's RFC 3987. Which rules of roost at the moment. Okay, cool. That's fine. But if you're anything like me, he loved seeing examples. So let's look at a URL. Don't worry about what the international characters, meaning. Some Japanese characters are put there. I just want us to talk about how the URL will be encoded. I want to talk about what this means. Or firstly, this is not a URL, strictly speaking, this is known as an International Resource Identifier. And I are. And why is this important? Well, this is important because as I mentioned, a URI supports only ascii character encoding. Remember, that's defined in 3986 and IRI, and other hand, fully supports international characters. And good news for us and set UTF-8 is the most popular encoding used. Iras. We're going to be talking a lot more about this example. Url it up, put up the shortly. But for now, and I've mentioned this before. Important for us is that the W3, your eye spec, basically the Worldwide Web Consortium have exempted RFC 39 87, which defines an eye on and why is this important? Well, it's important because various document formats, specifications, and browsers support IRAs. Okay, clients are various documents speaks browsers are really support IRR. The problem is that not many protocols allow our eyes to pause unchanged. And the protocol that we're familiar with when it comes to building sites and apps is HTTP or HTTPS. So if an IRI can pass through a protocol, as is, what do we do? Well, that's a great question and how an IRR works? The pins on way the non ST character is located, is it located in the domain name, the path? And this Medea students has created a lot of confusion. Even a banks developers, trust me. So what I'm about to share with you is super, super interesting and it's gonna put you ahead of the pack. Let me not get ahead of myself. Let's look at the URL again and the same one we had before. And let's break this up. We already know that this HTTP is what it's known as the schemas. Schema contains information about a scheme to be used. And this is what's important. Non-sql characters are not allowed on the scheme. So that's step one, and it's pretty obvious. We don't want funky characters, they just got to keep it plain and simple. The next part is known as the that's right, it's known as the domain name and the remainder of the URL is known as the path. And the path indicates the actual location of the resource you are trying to point to from the server route. Okay. Have you got it? Dimerizes picture. I want us to now talk about what happens with those international characters in the domain name versus what happens to them in the past. It's first and very briefly discuss the domain name. Remember that's middle portion, the domain then. What happens to domain names? Well, what's interesting about domain names is that they are managed by domain name registration companies that are spread across around the world and the Internet Engineering Task, back in early 2000s, they produced a spec that governs how multilingual domain names should be Douglass. And if you're very interested, you can read all of those specs. But the long and the short of it is that the domain name registered, the fines, the list of characters that people can request to be used in the country for top level domains. And what's really cool is that these organizations have agreed certain kind of formats. And if a person requests a domain name using non-SQL characters, like those Japanese symbols you just saw. In these symbols, these characters will get converted over to petticoat. Wait a second. What is Penny code? Although stressful students, I don't wanna get too much into it. But it just allows for the encoding of characters in the host name, in the domain name. That should in theory only support ascii characters. That's all that panic code allows for. And there's certain rules around punny Kurdish, certain rules that define how the conversion should take place and the format of it. And all these domain register companies around the world have agreed to this format. And in theory, What's cool is that punny code could be used to allow for host names that use emojis. How cool would that be? Better? Emojis are not widely supported standard has yet. So there's only a limited subset of top-level domains that support emojis currently. But you never know things do change back into the ledger. So we've discussed domain names that kinda you can view them as having their own set of rules and any non-SQL character in a domain name has to be converted over to panic. What have you got it? You really are doing a lots of please stick with me. We're almost, almost done. So I've spoken about domain names, but now I quickly want to talk about how the path is darker because the path is what we care about when bullying forms, right? Remember with the GET request dot of the form is appended in the path of the URL. So this is really what concerns us, not really the domain name. And to remind you, we've got our URL. And are only want to now talk about a path. Remember what we just said when dealing with domain names, that they are domain registration companies spread all over the world and they've all agreed to accept domain names in a particular form with a particular encoding. And then encoding was Ascii based panic code path names on more complicated. Why? Well, just because path names can identify resources located on many different kinds of platforms for our systems do and will continue to use many different encodings. And this makes the path much more difficult to handle than a domain name. But we don't have distress because the good news is that the IETF standard 3, 9, 8, 7 deals with non ST characters in the path. And at the crux of it, it's actually pretty simple. The spec says that browsers need to represent all characters using percent escaping. Okay, URL encoding. So what does this mean for how URL? Let's just forget about the domain section. Let's just look at our path, right? The dough 1, the Japanese symbol. That's the path. Let's assume the page the characters are on are encoded in UTF-8 because that's pretty much every single cycle we visit today. How does work? Well, the IRI specs is at the IRR should be converted to UTF 8. First, the user agent, aka the browser, they need to convert every non-SQL character, 2% escapes. Remember this is just URL encoding. So it starts with a percentage symbol followed by two hexadecimal values. So what does this mean for our URL? Well, firstly, the day one stays as do one. They are all ascii character based. Write a, d and INR and a one. All formed part of the esoteric assets and nothing needs to happen. The only thing that needs to be present encoded is that symbol. And there we have it. It'll look something like this. Isn't, They're pretty cool. It just means the URL path section is now a URI for its kinda being converted from a IRI to euro. And why is this important? Well, remember a few slides back. If I go back, yeah, let me try and show you here on the slide here, remember I said many different documents, speaks and browsers support our eyes. That not so many protocols allow our eyes to pass through unchanged rights and the protocol we dealing with HTTP. So HTTP needs a URL, a valid URL, in order to transport that over the wire. Here we are back here again. That's why it's important. Tried to get from an RI to valid URI is just going to allow protocols such as HTTP to send that request and just know Toledo one did not change. They were unreserved characters in the ascii set, which we spoke about earlier. So at this point, the user agent, the browser canals in the request for the page. So this is all good and well, and we're almost done. But you might be thinking, okay, cool. Anything in the path name has to be kind of percent encoded. I've got it. But when we look at an example of a form submitted Japanese symbols, why did we see those actual Japanese symbols in the address bar? In other words, why don't we see all the percentage hex values in the URL address bar with a very, very good question. And just remember the address bar and the browser is not the actual URL that sent over the wire. The address bar is a UI component that allows users to enter all kinds of fun strings that will get converted URL at some point. Basically, it just makes the web experience nicer for a user. So you can kind of think of the URL address bar is just being a visual help to us users. That's not necessarily the actual true form of the Euro. And modern clients. I'm just meaning way browsers. They able to transform back and forth between percent encoding and Unicode. So the URL is transferred as SAP. But it looks pretty for us as the user because the browser understands both. The browser understands IRI, is it understand URIs? So it makes sense that should just display as the correct symbols and it can do its thing in the background. I know, I know this lecture is getting very long and I'm just about to finish off, what is the bottom line of this whole lecture? Well, it is that our eyes are basically URIs that allow non ascii characters to be used. And most browsers today will allow you to see international characters in the address bar, but in the background, they are using techniques to convert these characters to SP, so it can be transported over the HTTP protocol using techniques. What kind of techniques? But let the painting is right at the pains of those international characters are in a host name. When the path, if it's in the host name, the browsers use honeycomb. If it's in the path. The browsers use the euro encoding, which is defined in RFC 39863987. Day we have it. I told you this was going to blow your mind seriously. This is very, very advanced stuff. Took me a long time to actually wrap my head around how your own cutting works. So I hope you appreciate it. Thanks. Yes, you might not need to know as much as we've discussed yet, but you know what? It's just a have a feather in your cap. It's going to make you a better programmer. And when you start dealing with GET requests and you start seeing some percentage your own coding. See that the miracle character reference. Remember that when you see those things, She's now you've ever appreciate what's happening. So I think let me end the lecture here. There's still a few more things I want to talk about when it comes to your alley coding. Things that are perhaps but more practical, stem motivated. Grab a coffee and I'll see you in the next lecture. 7. How do spaces work?: I want to try get through this lecture as quickly as possible because there's a lot I want to cover. I want us to go back. Remember we had that form of boiling water organism, but to the possible of secret one. And it produced this URL. This is where it all began, right, when we started thinking about your URL encoding and why we see these strange characters. Well, the question mark equals the ampersands. Characters are reserved characters, and that's why we can see them in the URL. We've been through this in previous lectures. We notice that, what about that plus sign? Why is that? Well, remember that URLs can't have spaces. Hey, they are unsafe. And a space in hex is to find a specific age 20. So why then does the URL have a plus and not a percentage 20? Well, that is a very, very good question. Firstly, you need to understand that the encoding used by your browser by default is based on a very early version of the URI, a saint in coding rules. And over the years it's modified the slightly. One of these modifications was using a plus symbol to represent spaces instead of percentage 20. But this has caused a lot of the bait and a lot of confusion. Let me explain. It turns out how the character is encoded, the pins on where it is in the URL. I know it sounds weird, right? That explain the old spec. I'm talking. Html2 said that space characters could be encoded as a plus in the key value pairs in a query string. Remember the query string part of a URL. We covered it in a few lectures back. That's after that question mark. Remember, this means that only after the question mark, aka inside the query string, spaces can be replaced by a plus. But this is according to HTML2. And we know that HTML2 is not the latest spec. So then comes along HTML5. And that spec updated how a URL should be encoded. And how did HTML5 deal with this issue? The use of a vintage 20 to encode a space in a URL is explicitly defined in HTML5. But nothing's really mentioned of the plus, which causes a lot of confusion. However, the latest speaks do continue to define plus is legal in the application X WW form URL encoded Content-Type. But these latest speaks don't explicitly state that query strings should have the plus. So that kind of mentioned the plus in other areas of the browser. But not when it comes to URL encoding, it's very, very confusing. But HTML5 does explicitly mentioned the percentage 20 in the URL encoding. So the burning question is, should you use a place to replace spaces? Or percentage 20? If you use both, it just means at what a water-soluble being coded differently in the path and the query parts of the URL. Well, let's assume you use both. And this means that while he wore talk can or may or potentially will be encoded differently in the path and query parts of a URL. What I mean, Let's have this URL here. And let's include wanting warthogs in two different places in the path of the URL and also as a query string. So what I'm saying is the percentage 20 must always be between these 20 in the actual path. But when it comes to the query string, we've got a plus. And remember the query string is just defined as all the values of this question mark. So if you had a URL like this, is that a problem? Well, no, not really. If your URL encodes the cluster percent is 20 or vice versa, that's going to work just the same as a lot of legacy code, which has pluses in the query string part and is still a lot of code that generates a plus in the query string part. So the odds are that you're going to be breaking nothing by using one over the other. Got it. Right, bottom line, URLs or technically covered entirely by the latest picks, which explicitly stated spaces ought to be converted to a percentage 20. So my recommendation is one and we just always use percentage 20 where we can it's just better just conforming to the latest specs. Okay. Client, that's all good and well, but what happens if you don't mind coding software in the background doesn't convert mass space to a percentage 20. As an example, we saw Gerald the giraffe was converted to pluses. Well, the good news is it JavaScript and other languages all have functions that can be used to encode string. For example, in JavaScript, we can use that encode URI component method. And this encodes all spaces as percentage 20 naming, however, when T to the console and show you an exalted 8. JavaScript encoding function: I've just got Visual Studio Code open here in front of me and I want to show you a quick example. Just remember that all modern browsers automatically encode spaces and special characters before transmitting the data over HTTP. However, for various reasons, sometimes this may not work. And even worse, I guess some of the very old browsers and web servers don't understand some special characters in NUS. They are encoded in a very specific way. And even some very old browsers don't even encodes spaces in the URL at all. In for these reasons, you may explicitly want to encode or decode URL data using your preferred language in JavaScript, It's the encoded URI component method or the decode your high component if you wanting to decode certain characters. But enough said, I want to quickly drop some code with you and I want to show you what it looks like. Let's get into it. The first thing we wanna do is always to create an HTML document with a body in here. I don't even want to wrap this in a form. I don't even want to waste time. I just want to quickly go straight to a text input box. Name attribute of, let's call it message. Id can be message, and let's give it a value. Hello World. But do you notice yet I've got a space. I've done that very deliberately, very, very deliberately. Or odds and end, of course, we need a way to submit this to a server. And then, yeah, I wanted to URL, I just want the button to have some meaning. So let's save this. Let's open up our server and we got very simple. Now what I wanna do is I want to add an inline event listener. I know it's not the best way to do it. But really this is just a quick, quick example. I want to add a click event on this button, and then I want to trigger our own code. So let's just define a function called euro in code. But let me make this text editor bigger so you can see what we're doing. So all I've done is I've added an inline EventListener called On Click to this button. And when it's clicked, this function URL encode is going to execute. Then what I wanna do is I want to create an empty paragraph for now, because I want to display our result in this paragraph. And let's just give it an ID of encoding. It's all intuitive. Now remember seed. Each programming language has their own methods and functions to give you more control about the encoding, I want to use JavaScript, HTML. And JavaScript has to be wrapped within a script tag. And of course, we need to define our function called URL encode. If you're unsure about JavaScript, please check out my JavaScript in Thai series. Really fun. Then you add the first thing I'm gonna do is I'm going to grab our message data, right? And I'm going to be using the document object on a get elements by D. We gave it an ID of message. And of course, I don't want to get actual elements are needed its value. So let's put its value into a variable called message. Okay, what's the next step? Well, the next step is I want us to use the JavaScripts encode URI component method. And that method takes an argument. The argument is the actual text, the actual value you want to incur. So yeah, let's define a new variable called coded message. And that just be assigned the value of what? Well, like I mentioned, it's used the code, your AI component function in JavaScript. And let's pass in the message. That's all I wanna do. And the next thing, in order to see this result on our page, I want to grab the paragraph. So get element by ID, and we gave it an ID of encoding result. I just want to change its inner HTML to the encoded message. The should work. So let's go back to the page. Click on this button. And habit that space has been encoded now to a percentage 20. It's a special character. And that's exactly what that method does in JavaScript. Very cool, right? Awesome. Let's jump back into the lecture. Man, I love this stuff was nickel. Or I almost finished this lecture. A promise. I just want to quickly give a summary. So remember we had our URL, this is where it all began. I don't want you to get lost in all the detail. With a get request, that data is inserted directly into the URL itself. And URLs, as we have seen, have to abide by certain rules. For instance, the RFC 39, 87. And these rules say that a URL can only contain certain characters. Reserved characters like question mark, and they are not URL encoded if we don't use them ourselves. That's why when we submit a form, we see all these random characters appearing in our URL. It's because they reserved, and it's just a way for the browser to separate out the different parts about your own. But if we use our own special characters, like a space or a reserved character in URL encoding and has to take place. And traditional URLs could only accept ascii characters that browsers have beautified the address bar over the years. And now we're able to see non-nested characters in the room. I know we've been through a lot. This entire section has been a bonus about why the URL looks like it does. I'm just trying to shed more light on this topic because it is very confusing. Not many developers know about it and it is advanced. But I hope you've enjoyed it because that's the main thing. That's why we're here. We're here to have fun, and we're here to really, really get better at programming. And you can only get better about truly understanding all the pieces of the puzzle. We're dealing with forms. Whenever you use a GET requests, all the form dots is going to be appended to the URL. And you're going to be seeing with characters. And that's why we've spent, that's why I've spent a whole section describing this URL encoding process. I hope it really has been useful to you and our continent way to jump into the next section. See you soon. 9. URL Encoding - Class Outro: Take a step back. Remember, a web browser requests a page from a server using a URL. So you can think of a URL is an address to a webpage like HTTP colon for snacks for slash, google.com is just an address for a page. And as we've seen these URLs, these addresses are governed by rules. And these rules state that all the characters in a URL have to be ascii compatible. But as you and I know that ascii character set is very limited and we offer use characters that are outside of the ascii characters. So what happens they implied? Well, that's three URL encoding takes place. So what happens is the URL encoding takes non safe or unsafe ascii characters and it converts them into a percentage hex value. And we ever get more in depth, because sometimes you can have non-SQL characters in the actual host name. In which case penny code occurs. And you can kind of think as Penny code, it's just a way for servers and domain authorities to understand what all the characters mean. I know that perhaps is a bit outside the scope of the course, but interesting nevertheless. So I hope you had a lot of fun in this class. We have got a ton more to learn when it comes to forms. Cook, but I love it to so, so exciting. Thank you for sticking with me in this class and I really do hope I'll see you in the very next. Plus. Thank you.