Internet Technologies – Computer Science for Business Professionals – by CS50 at Harvard
Articles,  Blog

Internet Technologies – Computer Science for Business Professionals – by CS50 at Harvard


[MUSIC PLAYING] DAVID MALAN: So odds are you’re
on the internet these days, but what does that actually mean? And indeed, this internet that we use
very often these days for messaging, for email, for browsing the
web and other services still, there’s a whole infrastructure that
underlies it that is increasingly powering new ideas, new start
ups, new companies, new businesses as well as new forms of
communication among humans. And yet, like most every
topic we’ve explored, you’ll realize that
while it’s very complex, perhaps, up here, or
certainly seems complex up here, if we begin with some of
the fundamentals and then layer and layer and layer on top of
those, do we pretty quickly get back to today’s technology but with
a much better understanding of what’s going on from the ground up. So here is a bit of alphabet soup. Odds are you might have seen
one or more of these acronyms to date, IP, DHCP, DNS, TCP,
UDP, ICMP, and so many more. These are all examples of
something called protocols, where protocols are
kind of like languages that computers speak with one another. They’re not programming
languages so they’re not used by humans to make computers do
things or follow instructions per se. A protocol is really
a set of conventions that two computers or
two computer programs might use when intercommunicating. And so what’s an example of
a protocol in the real world? Well, we humans have some
silly protocols, one of which here is, culturally, when you
meet someone to extend your hand and then he or she
presumably extends their hand and you do this for
who knows what reason. And now you’ve sort of completed
that social transaction. But it’s a protocol in the sense
that when I extend my hand, most any polite other person
knows that they’re probably supposed to extend their hand
as well, embrace for a moment, and then complete. And the protocol says, too, you
probably do this for terribly long. And so there’s these rules
of thumb or actual rules that you follow when
implementing protocols. And so computers, great as
they are following rules, very often use protocols
when they intercommunicate, in order to get data from
one place to another. So let’s tell exactly that story. If you’re on the internet,
right now, on the internet, what does that actually
mean and how can it help us solve problems,
ultimately, having access to this inter-networked infrastructure? Well, let’s consider what happens
when I first visit my favorite web page, for instance. If I go ahead and visit something like
Facebook.com, I go ahead and log in and I’m immediately
presented with my news feed. Or maybe your favorite website
is Gmail or your favorite website is Bing or maybe your favorite
website is any number of other places you might go on the web, all of which
take in as input a request from you and produce, ultimately, output,
the screen that you ultimately see. But how does that data get
from one location to another? Let’s begin to draw a picture, perhaps. And this picture might be
representative of your own home network or maybe your campus network
or maybe your office network. But generally speaking,
you are on the internet maybe with your phone or your
laptop or your desktop device, and we’ll just depict that is
this sort of abstract laptop here. So that laptop somehow wants to
communicate with a web server elsewhere, Facebook,
Google, Bing, whatever. And we’re just going to present
that as way over here in the picture in a really big corporate
office building, perhaps. And inside of that
building are the servers that compose that particular web site. But how do I get data from that server,
which, if it’s Google or somewhere else might be all the way in California
or halfway across the world and back to my laptop? Well, somehow I have to be
able to send messages to it and receive messages from it. And of course in between me
and this resulting website is what we’ll generally
call the internet. It’s kind of conveniently
drawn as a cloud here, which is another
semi-technical term that’s come into vogue in recent years. And the cloud really just refers
to internet services these days. It’s not a technical term unto itself. It’s just a sexier term than saying,
my business is on the internet. Oversimplification, and we’ll
come back to that before long. But you can assume here
that the internet is somehow this delivery mechanism. It somehow gets data from
point A to point B and back. But how does that work? If my data’s coming in as
input and it’s reaching, eventually, its destination
and then a response is coming back in this direction, what’s
actually going on underneath the hood there, especially since, in the
story at hand, all I’ve typed is something like Facebook.com
Gmail.com or the like? Well, it turns out that your computer
these days, when you first turn it on and you connect to the Wi-Fi in a room
or you connect with an ethernet cable to the wired network, your
computer receives some information automatically. Your computer speaks a protocol
called DHCP, typically, Dynamic Host Configuration Protocol. But in most of these cases, the
acronym isn’t really what’s important, certainly, it’s what the
protocol itself does. And in this case, this Dynamic
Host Configuration Protocol dynamically configures hosts
via a protocol, if you will. So what does this mean? Essentially DHCP says this,
when you turn on your computer or you take out your
phone for the first time and you’re connected on Wi-Fi or to a
wired network, it says, hello, world. I am alive. I would like to be given an
address that I can communicate with other computers on the internet. It’s not quite that verbose,
perhaps, but it is a question. Hey, computers around me,
please give me an address. And what it gives you is what’s called
an IP address, Internet Protocol. So just as in the real world where
physical buildings have historically been uniquely addressed
with postal addresses like Harvard’s computer science
building is at 33 Oxford Street Cambridge, Massachusetts, USA. 02138 is the more
precise zip code as well. That uniquely identifies
that building in the world. So does my computer need
an address, and it’s not going to be some free form
address like that in words. It’s actually going to
be a numeric address. Specifically, I’m going to get an IP
address of the form number dot number dot number dot number, so four
numbers separated by dots. Each of those four numbers
happens to be a byte long or eight bits, so each of these
numbers, therefore is between 0 and 255, and so this means, long story short,
that the total address is 32 bits– plus 8 plus 8 plus 8– and that means
there’s four billion possible addresses in the world. And that’s great because people have got
a lot of computers and a lot of laptops and a lot of desktops
and servers these days. But it turns out we’re
actually running out because we have so many such devices. So there’s a newer version
of IP that’s increasingly being used called IP version 6. We’re talking here about IP version
4, since it’s so omnipresent. And IP version 6, just so you know,
uses 128 bits for its addresses, way more than 32, so we’ll
be good to go for some time. But DHCP gives me this address, an
IP address of the form something dot something dot
something dot something. And the purpose of this address is
to help my data get from point A to point B. And indeed,
anytime my computer sends a request on the
internet like, Facebook, please show me my news feed, or
Gmail, please show me my inbox, my computer has to use that IP address. So much like if sending a
letter in the real world, you might have an
otherwise blank envelope and you might want to send a message
to someone else in the world, you might write their physical address. But in the computer world, we
might write something like 1.2.3.4 in the to field, assuming that
this is the IP address to which we want to send this data. Meanwhile, my from
address might be 5.6.7.8, so I’ll write it in the
top left hand corner by convention, whereby that
indicates to the whole internet this is where this request came from. Now, I know my origin address, the
source address here at top left because DHCP told me. How do I know one, two, three, four? How do I know the IP address of
Facebook.com or Gmail.com, right? We don’t live in the
world of 800 numbers anymore, where you dial 1-800 something,
something, something, something, something and you have to advertise
your phone number, per se. We don’t necessarily live only
in the world of 1-800-COLLECT any more where we had these mnemonics
where you had letters mapping to numbers just to help remember it. We went full in on
this idea of mnemonics such that now we have Facebook.com
and Gmail.com and no numbers whatsoever for us humans to remember. So thankfully, it turns out there’s
another system in this world, another acronym, if you will, a new one
now, called DNS, Domain Name System. So there are also in the world,
not just DHCP servers that have people IP addresses
from their local network, there’s also DNS servers
whose purpose in life is to convert domain names
to IP addresses and vice versa and a few other features as well. Now, what does that mean? That means that when
my Mac or my PC sees little old me, the human,
typing Facebook.com or Gmail.com, my laptop contacts
a nearby DNS server and says, hey, my human has asked
me for Facebook.com. What is its IP address? And DNS server’s purpose in life
is to answer that question and say, oh, Facebook.com, it’s 1.2.3.4. Use that address instead. Now, thankfully, my
computer can now write that number on its virtual
envelope, so to speak, and then pass that envelope out to the internet. And because of these numeric addresses,
it will be properly, hopefully, routed across the internet
to its destination. Because it turns out inside
of the internet here, interconnecting everything
in between point A and B are things called
routers or gateways. And I could draw this picture
in any number of ways. But the point is that it’s
just so darn interconnected. And indeed, there might be
even more pathways still or maybe even fewer pathways. Indeed, on the internet,
there’s often multiple ways for data to get from one point to
another, some shorter, some longer. But there’s this
resilience, this redundancy, and this was a feature back in
the day, especially in so far as the internet had
militaristic origins. It was meant to be redone into as
to withstand failures of one or more of these nodes, these
dots in the picture. Now, each of these dots
is just a server, really, a special server called router
or gateway, whose purpose in life is to do exactly that, to route data. Upon receiving a virtual
envelope like that one, it looks at the to address realizes,
oh, this is destined for 1.2.3.4. I know that that address
is over this way. Meanwhile, if it gets another
envelope from someone else, it might say, oh, this
is some other address. It’s going to go this way. And so routers have
multiple cables or they have multiple virtual
network connections elsewhere or wireless connections, any
number of possible connections might they have to other routers. And so it can route it to
its next hop, so to speak. And generally on the
internet, within 30 hops, within 30 transmissions
from router, router, router will your data get from
one point to another. And it might not follow
the same path each time but it will traverse
this so-called internet. And so that’s kind of
what the internet is. It’s this collection of routers
and it’s this collection of networks, a network of
networks that is incredibly interconnected in different ways. So DHCP gives me an IP address. So I have a unique IP address. DHCP, it turns out, also
tells me what the IP address is of my local DNS server so I know whom
to ask to convert domain names to IP addresses. But once I have that, I
can now use a protocol called TCP to send my data reliably,
typically, from one point to another. So whereas IP is responsible
for a few things, one of its most important
functions is this notion of addressing and standardizing
how things are addressed. But TCP, one of its
most salient features is to guarantee, with high
probability, delivery. And what I mean by that
is that bad stuff can happen in the middle of the internet. These routers can get really busy. They can get really
congested and overloaded. And so routers might– well, virtually drop packets. They might receive so many
packets at once they just can’t, like a human, deal with
it all at one time because they have a finite amount of memory or RAM
or disk space and so they drop them, so to speak. They just delete them and they
expect the senders to resend them. TCP is a protocol, another
agreement between computers, that if the receiving computer realizes,
hmm, I got some of your packets but not all of them, TCP mandates,
much like our human handshake, that something next should happen. TCP says, my laptop should
retransmit that virtual envelope. But TCP allows us to do
something more than guarantee with high probability delivery of data. It also allows us to multiplex
among services, or put more simply, it allows a server to receive
different types of data for different types of
services, for instance, web services on the server, email
services, chat services and the like. And so it turns out that on this virtual
envelope that gets sent from a computer to a server, it’s
actually not sufficient for there to be the return address
and the IP address of the destination. I also need to specify
what type of information is inside this envelope, or
equivalently, what kind of service I’m trying to contact. And I could do this by specifying in
words what’s inside this envelope. Maybe it’s something
like HTTP, the prefix that you’re familiar with from the web. Maybe it’s an email. Maybe it’s a chat message or the like. But if it is, in fact,
something like HTTP, turns out the convention is not
to use words but to use numbers. And so in fact, I need to pull
one other piece of information on this envelope, which is a
so-called port number, a TCP port number, which is numerically printed
after a colon on a virtual envelope this. And in this case I wrote 80
because 80 happens to be, by human convention, the number we
humans agreed on some years ago, identifies web services on servers. But this means that if the
server I’m sending this to, 1.2.3.4, actually has other
services on it like a chat server and an email server and the like,
this won’t get confused with an email that I or someone else am sending
to the server or a chat message. The server will know
upon receipt of this, oh, this is a request for a web page. Let me send this virtual
envelope to the web server. But HTTP isn’t the only such protocol. There are something called UDP, which
is common in some circles as well. UDP works a little
differently, in so far as its feature is to
not guarantee delivery. If some data gets lost, packets
get dropped, so to speak, for whatever reasons, malfunction,
technical difficulties, routers are overloaded,
UDP says, our protocol shall be not to retransmit that data. And that’s a strange thing,
because it sounds worse. And yet, this protocol’s been around
for quite some time, still used, quite appropriate in some contexts. But what context would you actually
want to just forge ahead, irrespective of getting complete information? Well, go to here is something like
videoconferencing or audio conferencing or live TV on the internet, watching
a game like a football game, for instance. If you want to watch
it in real time, you might prefer that the
stream, the bits that are coming from the NFL or wherever
to your computer don’t actually buffer don’t actually stall. You would rather miss a second
so that at least you stay current in real time with that game,
or video conferencing even more so. It’d kind of be annoying if you have
a bad connection or some packets get dropped and you just have to
wait and wait for the person’s voice or image to be retransmitted. You’d rather just say, what did you say? Could you repeat yourself? Say again? You can just use human protocols
to deal with that, too. So sometimes you want live streaming
applications for whatever purpose and you want the data
just to keep coming. As much of it as can
make it through is great. But you don’t necessarily
want it to be resent. So data is going from
one point to another, but how long does all this take? My god, this is kind of a long
story just to get data there. Well, let’s do an experiment. Let’s go ahead and pull up a program
that uses a different protocol altogether, ICMP. And there’s other protocols, still. This one’s a little
more technical but it’s wonderfully revealing in a few ways. I’m on my Mac here in the
so-called terminal window that you can pull up something similar
on Windows and other operating systems as well. And what I’m going to
do is literally trace the route between my laptop
here and some foreign server, for instance, one on the west coast
of the US, Berkeley’s web server. So let me do that, traceroute,
www.berkeley.edu, enter. And curiously, we start to see a whole
bunch of lines of output, most of them numerical. And indeed, notice that each
of these is an IP address. But what is it an IP address of? Well, we have 18 of these between
me and Berkeley, apparently. Turns out those represent routers
between me and Berkeley, California. Each of them has an IP
address and each of them has a measurement of how long it took my
data to get from my Mac to that router. It’s highly variable. Notice, it’s kind of all over the place. In fact, this is just weird. This took 3,000 milliseconds
or three seconds, so I’m guessing that
that router in row eight was congested for some reason, some
kind of network issue there temporarily, but then my data actually went through. And it’s not cumulative. These are individual tests from
my Mac to each of these routers iteratively, one at a time. And you can kind of
get an aggregate sense of how long it takes, therefore,
for data to get from the east coast to the west coast. If we look at some of the later
numbers, they’re kind of variable but they seem to be
around 75 milliseconds. So this is kind of extraordinary. If you want to fly from Boston,
Massachusetts to San Francisco, it’s going to take you
five, six, seven hours. You want to send an
email or send a packet, it’s going to take you 75 milliseconds. That’s astonishing, how
quickly the data can transmit. Now, notice this is not
all that enlightening knowing these IP addresses. But eventually, some of
them have domain names, just because the humans
controlling those routers decided, we’re going to give these routers
actual names, domain names, as opposed to just having IP addresses. And you can often, but not always, infer
from the domain names where they are. So I’m going to guess
that at least row 11 here, I don’t know what
XE7000.rtsw is, but losa.net, Los Angeles in California. I’m guessing my data kind of came
into Southern California first. But then notice what happens next. A couple of nameless servers,
LAX, so maybe that’s the airport. Indeed, routers, for
historical reasons, tend to be named after a
nearby airport codes. I’m not sure what this
next one is here but I do recognize Oakland and UCB, UC Berkeley. So I’m guessing one of the next routers
is actually in Oakland or near Oakland. And so that’s a pretty long cable
or interconnection essentially between LA and Berkeley. But the result, ultimately,
is that my data makes its way to Berkeley, this time via this path. If I ran it again now
or in a day or a week, the path might be a
little different based on congestion and interconnectivity,
but the data actually gets there. And cutely enough, it looks like
Berkeley’s web server is called Cal web farm prod– for production– ist.berkeley.edu. 75 milliseconds only. But what about this, what
if we don’t stop at the edge as we do at the edge of this
continent but keep going? What’s going to happen? Well, let me try to trace the
route to, say, www.cnn.co.jp, the domain name for
what I presume is going to be the Japanese version
of CNN’s web site in Japan. Here, too, we have a bunch of nameless
servers just with IP addresses. Gets through them pretty quickly. We seem to have some lulls sometimes. This program won’t– sometimes the
routers won’t respond to these queries so they remain, essentially, anonymous. But now this is quite interesting. Oh, my god. We went from routers 12, 13, 14,
15 taking about 63 milliseconds, give or take, to 193 milliseconds,
which isn’t a blip because it stays around that
value, 180 milliseconds, 160 milliseconds, 177 milliseconds. That’s a big jump of 100-some
milliseconds just between routers 15 and 16. Why might that be? What could be between routers 15 and 16? Well, if you know your geography, it
might very we be the Pacific Ocean. There’s quite a bit of
distance, there’s quite a bit of cabling that actually connects the
west coast of the country to Japan and other areas in Asia and beyond,
and that’s what’s pretty amazing. Not only is there interconnectivity
on the internet these days via cabling and via Wi-Fi signals and
via satellite signals, via microwave signals and the like, you
have so many different ways for data to be transmitted. And it’s absolutely astonishing
and exciting, dare I say, just how interconnected
the world now is. In fact, thanks to this animation
online, let’s take a look and appreciate just how extensive
this network actually is. [MUSIC PLAYING] All right, so let’s actually solve
a problem now with this internet. All right, the internet, as you
probably heard, is filled with cats. And yet, these cat
images can be pretty big. And indeed, bigger,
still, than images are things like video files
from Netflix and the like. And so there’s huge amounts
of traffic transmitting over those kinds of interconnections. So how do we ensure, at
least with high probability, that data can actually get through? How can we ensure that there’s some
form of fairness, if not net neutrality, so that my data can
get to its destination just as readily as your
data can get there? Well, sometimes it’s
opportune to actually take big packets of information
and chop them up. So indeed, what a computer will
often do, thanks to TCP/IP, the combination of these
protocols, is we’ll take large files and large
images, in this case, tear them up into, say, roughly– oops– equal sized parts like this here
and then tear it down even further, perhaps, to get it into a
smaller byte-sized piece and then send not only one packet
of information over the internet. But instead, put one piece of
information in this packet here. Put one other piece of
information in this packet here, whose addressing, both to
and from, is identical. And then do the same thing
for the two other pieces so that ultimately we have
four packets, each of which contains one portion, one
quarter, in this case, of the resulting message, all of which
are destined for the same destination. But the problem to be solved, now, is
what do you do with this information? If I have four seemingly
identical envelopes but inside of which are
disparate pieces of information that somehow need to be reassembled– let’s put on our proverbial
engineering hats– how do you solve this problem? Is this sufficient information
on the envelopes so that if I send this out on the
internet toward Berkeley or Stanford or Facebook or wherever, how does that
recipient know what to do with it? What would you, the
human, do if you have not virtual but physical envelopes? Well, here, too, and
here’s an opportunity really to bring to bear human
intuition to a problem that seems fairly technical and well beyond
one’s own technical understanding. And yet, it really is just a technical
manifestation of a real world problem. I need to keep these in order somehow. So you know what? I’m going to say something like one
of four on the first one, like this. The next one, I’m going to say two
of four on the next one, like this. And then I’m going to say three
of four and then on the next one here, I’m going to put four of four. And what’s the takeaway, now? Now, whoever is the recipient
of these several envelopes as I send them out on the
internet– and indeed, they don’t have to follow the same path. One can go this way. One can be routed that way. Another can go to this router. Another can go to that router. Because they’re all addressed
and because all of these routers are somehow interconnected,
all four of those packets will hopefully get to their destination. But if they don’t,
the recipient can look at that additional detail I wrote on the
envelope and see, oh, I got part one. I got part two. I got part three. But where is part four of four? It didn’t arrive because of congestion. Literally got dropped on
the floor or not picked up. So the computer, who’s
supposed to be receiving that data, thanks to TCP recall,
can say, hey, please send me again packet four of four. And so as technical as
the internet might seem, it really, again, is just some
fairly intuitive solutions to problems like this, albeit translated
to more technical contexts, more technical protocols, and
more technical languages. But let’s look at some
more user-facing protocols. The ones we’ve discussed thus far
are fairly low level, if you will. And indeed, there’s this
whole internet hierarchy of protocols layer on
protocols layer on protocols so that what we humans really tend to
care about, if we’re not the engineers but we’re really the software developers
and we’re the users of applications, we care about application
layer protocols that is right between the human and all
of those lower level protocols. For instance, these, at least one of
which has got to jump out at you, HTTP. Odds are you’ve seen this. Odds are you’ve typed
this, though decreasingly do you have to still type it because
browsers will just add it for you, HTTP. The secure or encrypted version,
HTTPS, IMAP for email in-bounds, SMTP for email outbound, SFTP for Secure
File Transfer, SSH for Secure Shell, an encrypted text textual channel
between two computers, and many more. But HTTP, let’s focus on that one
because that is Hypertext Transfer Protocol. Or HTTPS, the same
but the S stands for– not savings– secure, so it’s
actually encrypted in this case. So what does this actually mean? Well, at the end of the
day, HTTP is a protocol that governs what kinds of messages
go inside of those envelopes that I’ve been preparing for the
internet, what kinds of messages go inside of those envelopes. And it turns out the simplest
message that a computer sends through this whole internet,
ultimately, inside of virtual envelope is quite often, thanks to HTTP,
inside of this virtual envelope, if I’m trying to request
a cat from the internet, might literally be a
message like this, get me, for instance slash cat.jpg for JPEG. And maybe some additional
text after that, maybe some additional text below
that, but at the end of the day inside the virtual envelope, if I am on the
internet and I’m going on Google Images and I want to find a picture of
a cat, inside of my envelope, if I am a web browser speaking HTTP is
going to literally be a textual message that says get/cat.jpeg, if I know that’s
where the image is on some server. The response is going to be
what was just inside of those four envelopes back
from the server to me, chopped up maybe into multiple
pieces but in a way where I can then realize, oh, wait a minute,
you sent me only three or four. Please send me the fourth one. So it works in both ways, whether
it’s me sending a cat to someone or receiving a cat from someone. This protocol, HTTP, governs
how the messages are formatted and what language, so to speak, is
spoken between web browser and server. So indeed, HTTP is
entirely about having a web browser communicate with a server. And we can see this in action. I’m going to go ahead and pull up
a so-called terminal window again, this textual command
prompt on my computer. And I’m going to
pretend to be a browser. So I’m not going to just trace
the route between point A and point B. I’m actually
going to request a web page as though I am Chrome
or Edge or Firefox or Safari or whatever your favorite browser is. But of course, as before,
all I know is that I want to visit my favorite web
site, Facebook.com, for instance. But I don’t know its
IP address necessarily, so let’s go through that step. How do I look up its IP address? Well, my Mac already has an
IP address because of DHCP. I’m already powered up. I’m already connected to
the Wi-Fi here on campus, and so I already have my
own IP address, and I also have the IP address of a DNS server. So my Mac just knows that. But I can use that capability
now to look up the IP address for the name, Facebook, and I’m
going to do that as follows, nslookup, for name server lookup. And I’m going to go ahead and
type in www.facebook.com, enter. And interestingly, we get back
this somewhat cryptic response but let’s make some sense of it. So it looks like the server that this
response came back from 10.0.0.2, which happens to be a private IP address
here on campus that you might have in your own company or
university or even home network, Then a non-authoritative answer
is this, www.facebook.com, whose canonical name is, curiously,
star-mini.c10r.facebook.com. Well, it turns out that
companies like Facebook absolutely have many, many,
many different web servers, and they might not necessarily
have just one IP address. But we might just be
seeing one IP address depending on where I am in the
world and depending on how Facebook has configured its infrastructure. The takeaway, then, is that apparently
so far as my Mac is concerned, www.facebook.com is an alias
for or a synonym for this longer less well marketed
domain name here. But what we really care about, if
I’m about to pretend to be a browser, is this IP address. Facebook’s IP address is
apparently 31.13.65.36. And I can see this, in fact. Let me go into Google Chrome,
or any browser for that matter, and go to http://31.13.65.36, enter. And voila, I made it to Facebook. Now of course no one
in their right mind is going to advertise the IP
address as 31.13.65.36. No one would remember that. We’re not in the age of phone numbers
on the side of billboards anymore. Now we have Domain Name System and
DNS which does this conversion for us. But now that I know that IP
address, I can use this information and pretend to be a browser and
not just see the response in Chrome as we just did, but I can
see it in my textual window so I can look inside the envelope. Indeed, this terminal window
is going to let me pretend to– well, actually send a message as though
I’m a browser pretending to be one. But it’s going to let me see inside
of the response that comes back. Here’s what I’m going to do. I’m going to go ahead
and type in cURL dash I, and I’m going to go ahead and type
http:// and then that IP address and I’m going to hit enter. And notice, uh-oh, Facebook
has moved permanently. But this is a good thing. To where has Facebook moved? Well, apparently we’ve
gone back a response, via version 1.1 of of HTTP that
Facebook, per this status code, so to speak, has moved permanently. Has moved permanently, which sounds
scary, but where has it moved to? Oh, they don’t want people visiting
their IP address, even though it works. They want to redirect people, so
to speak, to their domain name. So we seem to be kind of in a cyclical
situation here where, wait a minute, I thought I had to convert my
domain name to an IP address. And indeed, I do, but it
turns out cURL is pretending to be a text-based
browser here, effectively, and it is already going to do this
DNS look up for me so this is OK. I’m going to go ahead now and do cURL
dash I, http://www.facebook.com, enter. Oh, my god. Facebook moved again. But where did they move this time? Well, it seems that
Facebook would prefer that we visit
https://www.facebook.com, which is the secure, the encrypted version. OK, I can oblige. Let’s go ahead and do that. cURL dash I of the HTTPS version, which
I’ve just pasted in, enter, and voila. Now, this looks overwhelming, but what’s
really important is this message here. It turns out everything is OK. And indeed, what’s come
back from the server is a virtual envelope, inside of
which is this message here saying, hey, no big deal. Everything is OK. And you never see this number
when you visit web pages, unless you’re a software developer
and you know what tools to use. Instead, some of us out there,
some of us normal humans occasionally see a different
number, maybe the one number you associate with the web. Let me simulate it as follows. Let me go ahead and request
this completely bogus page. Hopefully that’s not actually
someone’s user name and hit enter. Scroll back up a bit. What do you notice this time? If you’ve ever wondered
what 404 means, it is the numeric code inside of a virtual
envelope coming back from a server when you have requested
some nonsensical URL because of a typographical
error or just nonsense that I typed that’s now having the
server tell you, uh-uh, not found, 404. So this is just a special numeric code. And this is common in
programming to have numbers correspond to different
types of things that can go wrong or, better yet, that can go
well, as in the case of 200 OK. Now, all of this stuff
is called HTTP headers. So I was oversimplifying
earlier when I said HTTP is just this handshake of
sorts between servers where you say, get me a cat picture and then you get
back the response as per those four envelopes. There’s more headers. There’s more key value pairs, words
with colons, words with colons, words with colons, and then
values to the right of those. And that is just additional metadata,
more information from the server that tells you a little something about it. But if I instead run that
same command one final time, this time doing cURL and then specifying
not dash I but just the URL itself and hit enter, this
craziness comes back. And this looks like a whole lot of
programming language in something called JavaScript or big JSON object. And my god, look how much data
came back from the server. But notice, I’m starting
to see some structure. Open bracket div and
the word label here. And if I go up here, input here. And indeed, what you are seeing
is a language called HTML. Inside of the virtual envelope, if
you’re requesting not a cat image but a web page that has your news feed
or your inbox from Gmail or your search results from Google is
a language called HTML. And HTML’s not a programming language. And indeed, it’s not as
cryptic looking as this. Google is being very– or, Facebook is being
very efficient when it comes to showing me
this information and just getting rid of as much
formatting as they can to save space, to save on internet
bandwidth or transmission thereof. But it’s a language that comes
back in this virtual envelope that a browser knows how to display. It’s a markup language
in the sense that it’s going to tell the browser what to show
on the screen, where to show the cat, where to put words, whether to make
those words big or bold or italics or centered or any
number of other things. And indeed, what you are seeing is this. This is www.facebook.com graphically,
as we see it in the browser. Underneath the hood is that black and
white seemingly nonsensical Greek, if you will, that at first
glance, there’s no way most of us would understand it. But that’s because we’re
looking at it here. We need to dive in a
little deeper, take a look at what HTML is, how
it’s actually structured, make the simplest of web pages, a
hello world of web pages, if you will. And then can we realize
and build back up to this point exactly what composes
pages like Facebook and Gmail and Google and Bing and others. Because at that point, we’ll
have understood not only how the internet works,
but how you can use it as a delivery vehicle for your ideas,
for your programs, for your products, for your companies and more and actually
deliver information and deliver cats and much more to your
users on this internet.

3 Comments

  • Siva kalyan

    Hello David , one personal question , but might be useful for every one , you are delivering loads of content at an amazing quality,I got fascinated to your videos but the problem i have been facing is , what can i do if i want to retain the 100% of content you have delivered, already am reciting myself thinking about what you said when i get time(even making notes watching videos at a pace i can ) ,but i dont want to miss out even a point because everything matters, so please suggest me how to retain all the content ,because am feeling like as i am going to your next video , i am afraid that i may forget some of them discussed above.
    Thanks for your patience in reading all this,as always love you sir.

Leave a Reply

Your email address will not be published. Required fields are marked *