- 2 months ago
Ever wondered why C strings are just character arrays that magically know where they end? It's all because of the null terminator - that sneaky \0 at the end.
In this video we break it down super simply:
- What a C string actually is (spoiler: just bytes)
- Why old-school assembly needed separate length labels
- How adding a single zero byte fixed everything
- Quick look at ASCII codes for "Hello"
- Why \0 is the perfect terminator (and how to write it in code)
- How loops know when to stop reading the string
- Real example in assembly with and without null termination
Perfect if you're learning C, diving into low-level programming, or just curious about what's really happening when you write "hello" in C.
Introduction to Null Terminators 00:00:00
What is a C String? 00:00:11
Problems with Explicit Length in Assembly 00:01:06
Idea of a Special Terminator Symbol 00:01:24
Introduction to ASCII 00:02:20
Why ASCII is Outdated - Move to UTF 00:03:59
Example - "Hello" as ASCII Bytes 00:06:32
Mapping "Hello" to ASCII Codes 00:07:00
How Strings are Stored in Memory 00:08:46
The Null Terminator (\\0) Explained 00:09:39
Null is ASCII Code 0 (NUL) 00:10:02
Escape Sequence \\0 in Code 00:10:49
How Loops Use Null Terminator to Stop 00:11:55
Null-Terminated Strings in Assembly 00:12:41
Summary of C Strings 00:13:23
Outro and Call to Action 00:13:32
Thanks for watching!
Find us on other social media here:
- https://www.NeuralLantern.com/social
Please help support us!
- Subscribing + Sharing on Social Media
- Leaving a comment or suggestion
- Subscribing to our Blog
- Watching the main "pinned" video of this channel for offers and extras
In this video we break it down super simply:
- What a C string actually is (spoiler: just bytes)
- Why old-school assembly needed separate length labels
- How adding a single zero byte fixed everything
- Quick look at ASCII codes for "Hello"
- Why \0 is the perfect terminator (and how to write it in code)
- How loops know when to stop reading the string
- Real example in assembly with and without null termination
Perfect if you're learning C, diving into low-level programming, or just curious about what's really happening when you write "hello" in C.
Introduction to Null Terminators 00:00:00
What is a C String? 00:00:11
Problems with Explicit Length in Assembly 00:01:06
Idea of a Special Terminator Symbol 00:01:24
Introduction to ASCII 00:02:20
Why ASCII is Outdated - Move to UTF 00:03:59
Example - "Hello" as ASCII Bytes 00:06:32
Mapping "Hello" to ASCII Codes 00:07:00
How Strings are Stored in Memory 00:08:46
The Null Terminator (\\0) Explained 00:09:39
Null is ASCII Code 0 (NUL) 00:10:02
Escape Sequence \\0 in Code 00:10:49
How Loops Use Null Terminator to Stop 00:11:55
Null-Terminated Strings in Assembly 00:12:41
Summary of C Strings 00:13:23
Outro and Call to Action 00:13:32
Thanks for watching!
Find us on other social media here:
- https://www.NeuralLantern.com/social
Please help support us!
- Subscribing + Sharing on Social Media
- Leaving a comment or suggestion
- Subscribing to our Blog
- Watching the main "pinned" video of this channel for offers and extras
Category
🤖
TechTranscript
00:00Hey there! Let's quickly talk about null terminators in C strings.
00:11So first off, what's a C string? It's basically just a character array. It's just an array of numbers, and the numbers will be interpreted as characters.
00:20And what's a null terminator? It's pretty easy. It's just a
00:23way to terminate the string so that a program can automatically determine where the string ends.
00:28In assembly, a lot of times we'll have, you know, a string, like we'll say, oh gosh, I don't want to write this out right now.
00:35We'll say, you know, my string, and we'll define it as an array of bytes, and then we'll say, you know, hello or something, right?
00:45And then later, in order to tell the assembly program how long the string is, we'll use an extra symbol.
00:52This is not really an assembly video, but we'll say my string length is going to be, you know, equal to,
00:59this is the ASM assembly, by the way, in case you're wondering, some kind of special symbol, my string length.
01:05And then now we have defined the actual string and then the length of the string. But this is kind of annoying, right?
01:11We have to define an extra thing to say how long the string is. And then we have to, you know, pass two different things
01:18that are, you know, kind of the same name later on when we want to print a string. But that's not really the best way to do it.
01:24What if instead we just put a special symbol? I don't know, what should I put right now?
01:30Put a special zero. I'll just put like a question mark, and we'll say that the question mark is a special symbol.
01:38And then when we want to have part of our program print the string, we can just only give it the string,
01:43and it can scan the string. And as soon as it sees that special symbol, let's say the question mark for now,
01:49it'll know when to stop printing it or processing it or whatever. We could allow the program to scan in advance
01:55to compute the length and maybe store it somewhere so we can keep using it.
02:00We could allow the program to just continuously print out characters, maybe they're buffered or something,
02:05or processing characters, and then just stop when we get to the question mark.
02:08But the point is, it's a little bit less work for us to have a special symbol at the very end.
02:14Okay, so spoiler, the symbol is going to be a zero. I'll say that in a moment.
02:19But for now, let's look at a special table.
02:24That is called ASCII. Well, the table is not called ASCII, but this is sort of like a scheme
02:30that was invented a long time ago. I forget what this stands for.
02:33It's like American Standard, maybe character something something.
02:37It's basically, for American purposes, a long time ago, we decided, you know what?
02:42I'd like to represent characters using one byte per character.
02:46So eight bits or a value that can go between zero and 255.
02:51And we'll just start assigning these numbers to different characters.
02:55So you can see here on this ASCII table, it's kind of like an old thing.
02:58Let me turn on my little annotator.
03:00You can see on this table, well, the letter H is mapped to the number 72.
03:05So this was arbitrary at the time. Now it's a standard because everybody accepts ASCII in certain parts of your programs.
03:14It's kind of a little out of date now because, you know, in the modern era, it's a lot smarter for your programs to support a world of different characters, you know, for all different languages, than only to support characters that are common in English.
03:27And even here, there's probably some stuff missing that English users would probably want to add, like certain accents and symbols and things like that.
03:36So, you know, the letter O lowercase O, it's 111.
03:40The letter O uppercase is 79.
03:43So we've got, you know, a basic idea of mapping.
03:47So for the purposes of this example, I'm just going to assume we're using ASCII to make things easier.
03:53Okay, so that's an ASCII table.
03:56Oh, I guess I should say that the modern scheme that we usually use is called UTF.
04:03Let me just write that down on a notepad right now.
04:06So we've got UTF right now.
04:08It's just a different standard, a different mapping, but the same idea.
04:12You know, with ASCII, we only had one byte.
04:14With UTF, we have a wide variety of choices.
04:17We can do UTF 8, which basically also means one byte.
04:20It closely maps to ASCII.
04:23I think all or almost all the characters will map to ASCII.
04:27And then we have UTF 16, which means two bytes or 16 bits.
04:31And then we have UTF 32, which is basically four bytes or 32 bits.
04:37As far as I understand, I haven't checked this in a while.
04:40UTF 32 should be able to handle all the characters that all the languages of the world commonly use.
04:46Just to double check.
04:48If you take two to the something power where the power, let's say y, two to the y,
04:53where y is the number of bits in question,
04:56then you can get an idea for the number of combinations that you can represent.
05:00So two to the 32nd power, that's about 4 billion.
05:03I should have known that off the top of my head.
05:05That's about 4 billion possible combinations.
05:07So I don't know.
05:09I think in my opinion, probably 32 is enough for all symbols in the world,
05:13for all languages in the world.
05:15I'm not sure.
05:16If not, then we would probably want to use UTF 64 if that exists,
05:20which definitely would cover all symbols in the world.
05:23But that's really not the point of this video.
05:25I just wanted to mention it because if you're writing an application on your own,
05:29you want to start your own business or you're, you know, you're working for someone,
05:32you probably almost certainly do not want to use ASCII.
05:37You probably want your database and your program to be able to use UTF 16 or 32,
05:43because you want to be able to support international users.
05:45And that's not going to work.
05:48I had a website a long time ago that I wrote.
05:50The database was in, it was like a Latin aid or some scheme in the database where it was basically ASCII,
05:57just, you know, eight bits per character.
05:59And I started attracting international users and none of them could type characters in their native language.
06:05And they were kind of disappointed.
06:07And I lost a lot of business until I realized, oh, I should upgrade my database and my program to support UTF.
06:13But anyway, the point of this video is not international character support.
06:17The point of this video is what is a null terminator and how do you use it in a C string?
06:22So I'm going to get back on track here.
06:24All right.
06:25So imagine that you want to, let's say, represent the string.
06:31Hello.
06:32Okay.
06:33So we got hello.
06:34This would basically be an array of characters or an array of an ASCII one byte characters.
06:40So I'll put like a number here and a number here and there'll be five numbers.
06:46What numbers are they?
06:47Well, if we're using ASCII, then we should map these using the ASCII table.
06:51So again, in the ASCII table here, what is capital H?
06:55Where's capital H?
06:57I can't even see it.
06:58Capital H.
06:59That's the number 72.
07:01And then the lowercase e, that's going to be the number 101.
07:05So I'll just maybe do like an arrow here.
07:07The number 101.
07:08And then the capital H is going to be the number 72.
07:11And then we got two L's lowercase.
07:13So that's going to be this right here.
07:15And that's going to be the number 108.
07:17And then lowercase o, that's going to be the number 111.
07:22So basically, you know, you just go through the characters you want to represent and map them using the ASCII table or hopefully an actual lookup table inside your program.
07:31And then you can do the same thing backwards.
07:33You know, what is, what does 72 mean?
07:35It means H.
07:36So I'm going to erase this for now.
07:38And I'm going to go back to my little notepad here and I'm just going to fill out the numbers.
07:43So the first is going to be 72.
07:46Oh, actually, before I fill these out, hang on.
07:48Let me just show you that this is basically what we're trying to translate here.
07:53So we got like an uppercase H in many programming languages, definitely C++, which is the one I'm thinking of.
08:00We would use double quotes for an actual long string, like a proper string, a literal string or like an STD string or something.
08:08And we would use an array of single quoted characters to say this is one character that maps to one or more bytes.
08:16In this case, one byte.
08:18So I'm going to say H E L L O and put them all in single quotes.
08:22And then we'll just sort of like translate the numbers here.
08:24So we said the H was 72 because it was capital.
08:28It was like a lower number.
08:30And then the E is going to be 101.
08:32And then the L is 108.
08:34We'll do two of those 108.
08:36And then the O was a 111, 111.
08:41So you can imagine somewhere in an array of data in the computer, there's the number 72, 101, 108, you know, whatever.
08:49And even further thinking about it a little more deeply, the number 72 is not actually in the machine.
08:56It's really a sequence of ones and zeros, right?
08:58So just imagine that inside the machine, there's just a ton of ones and zeros for this particular sequence.
09:03You know, five times eight, which would be the number of, you know, eight is like the number of bits per byte since we're doing one byte characters.
09:13What is that?
09:1540?
09:16Let me just double check here.
09:18I want to make sure I'm saying things that are somewhat correct.
09:21We'll do eight times five.
09:23Whoops.
09:24I think I screwed something up.
09:25Eight times five, 40.
09:27Yeah.
09:28Okay.
09:2940 ones and zeros just inside the machine.
09:32Anyway, so we talked about before not wanting to specify the length of a string.
09:37So we'll just place a special marker instead of a question mark.
09:41We'll use the number zero.
09:43So I'm going to put a zero at the end of this array.
09:46Let me show you why.
09:48I forgot if I actually pointed this out or not.
09:50I'll just do this to be sure.
09:52Somewhere on this, you can see that the number zero in decimal or zero in hex or zero in octal.
10:01You're welcome.
10:02You know how to do those other systems is just considered null.
10:06So it's sort of a control code or it's not necessarily a symbol.
10:10It's more of a control code.
10:12We'll say that when we're sweeping through our string, you know, if we see the number 72,
10:17that means H capital H, but if we see the number zero, that means null.
10:22And in the scheme that we're going to use, we will say that null terminates the string.
10:28That's why we call it a null terminator.
10:30So null terminators are basically just zeros that terminate the string.
10:34I could put null here inside of the character representation just to kind of show you that it's equivalent.
10:41But when we're in C++ and some other languages and we want to do single quoted characters,
10:46we can usually use an escape sequence, which is just basically slash zero.
10:51This is not a video about escape sequences, but imagine that every time there is a slash, a forward slash,
10:57and then, you know, one character after that, then the character will be interpreted zero.
11:02For example, if we just put zero by itself, then that actually would be the ASCII code for zero or just the character code for zero.
11:09In this particular case, zero is, uh, where the heck is the zero?
11:13Can't even find it.
11:14Oh, there it is right there.
11:15That's like, so the, the character zero is represented by the number 48, but that's not what we actually want.
11:22We want the actual number zero because we're trying to get a null terminator.
11:26So anyway, we couldn't just put a regular zero in there because it would be interpreted as 48.
11:32Instead, we put a slash, which tells the computer that the thing that follows is a control character.
11:37So that will be interpreted as actually a zero or null terminator.
11:43So now that we know how to put a null terminator in an array of characters and we know what it's going to look like if we're just talking about numbers.
11:51Uh, we now know that a, you know, if you have a program that has, let's say a for loop and it's kind of like going through, you know, do all the characters.
12:01If it's going through every single character one by one and doing something to them, printing them, modifying, adding them to an algorithm, whatever.
12:08This for loop, or actually this should probably be a while loop since for loops are kind of limited.
12:14And the while loop will know when to break because as soon as it sees a character that is actually just the number zero, it'll say, oh, I found the null terminator.
12:22I will now ignore this and break the loop. I won't do anything with this character because the null is a terminator.
12:28It's not part of the string.
12:31Um, what else?
12:33You know, actually, I think that's pretty much it.
12:36If you're in assembly, uh, I'm just going to go back to Yasm assembly for a second.
12:40Cause I liked, I like to talk about that lately.
12:42Um, imagine we have like a string called my string and we say it's a sequence of bytes.
12:47And I say, hello, you know, this is how you would do this in X 84, 60 X 84, 64 Yasm assembly on Ubuntu or just Linux.
12:56Then, uh, you know, this string right here is just a string with no terminate with no null terminator.
13:02You want to add a null terminator to the end of it so that you can write a better function that would scan your string and print it without knowing in advance how long it was.
13:11Then you just do a comma zero at the end.
13:13And now you can easily write a function to scan or read or process your strings because this is really what's what's happening under the hood.
13:20It's just like a sequence of characters and then it ends with the zero.
13:25And there you go.
13:26That's the basics of C strings with null terminators.
13:29Uh, thank you for watching this video.
13:31I hope you had fun and learned a little bit of stuff.
13:34I'll see you in the next video.
13:36Hey everybody.
13:41Thanks for watching this video again from the bottom of my heart.
13:44I really appreciate it.
13:45I do hope you did learn something and have some fun.
13:48Uh, if you could do me a please, a small little favor.
13:51Could you please subscribe and follow this channel or these videos or whatever it is you do on the current social media website that you're looking at right now?
14:00Um, it would really mean the world to me and it'll help make more videos and grow this community.
14:05So we'll be able to do more videos, longer videos, better videos, or just I'll be able to keep making videos in general.
14:11So please do do me a kindness and, uh, and subscribe.
14:15You know, sometimes I'm sleeping in the middle of the night and I just wake up because I know somebody subscribed or followed.
14:20It just wakes me up and I get filled with joy.
14:22That's exactly what happens every single time.
14:24So you could do it as a nice favor to me or you could, you could troll me if you want to just wake me up in the middle of the night, just subscribe.
14:30And then I'll, I'll just wake up.
14:32Uh, I promise that's what will happen.
14:34Also, uh, if you look at the middle of the screen right now, you should see a QR code, which you can scan in order to go to the website,
14:41which I think is also named somewhere at the bottom of this video.
14:44And it'll take you to my main website where you can just kind of like, see all the videos I published and the services and tutorials and things that I offer and all that good stuff.
14:52And, uh, if you have a suggestion for, uh, uh, clarifications or errata or just future videos that you want to see, please leave a comment.
15:03Or if you just want to say, Hey, what's up, what's going on?
15:06You know, just send me a comment, whatever.
15:08I also wake up for those in the middle of the night.
15:10I get, I wake up in a cold sweat and I'm like, it would really, it really mean the world to me.
15:15I would really appreciate it.
15:16So again, thank you so much for watching this video and, um, enjoy the cool music as, as I fade into the darkness, which is coming for us all.
15:33Bye.
15:34Bye.
16:03Bye.
16:04Bye.
16:33Bye.
17:03Bye.
17:04Bye.
17:33Bye.
17:34Bye.
18:03Bye.
18:04Bye.
Comments