我的CPP之路

路漫漫其修远兮
随笔 - 42, 文章 - 0, 评论 - 16, 引用 - 0
数据加载中……

Les Notes d'Etude de Programmation TCP/IP

Ce sont les notes des livres à propos de Programmation TCP/IP.

=========
Everything in Unix is a file!


When Unix programs do any sort of I/O, they do it by reading or writing to a file descriptor.

So when you want to communicate with another program over the Internet you're gonna do it through a file descriptor, you'd better believe it.

Where to get this file descriptor for network communication?

Make a call to the socket() system routine.It returns the socket descriptor, and you communicate through it using the specialized send() and recv() (man send, man recv) socket calls.

We can just use the normal read() and write() calls to communicate through the socket, but send() and recv() offer much greater control over your data transmission.

There are all kinds of sockets, DARPA Internet addresses (Internet Sockets), path names on a local node (Unix Sockets), CCITT X.25 addresses (X.25 Sockets that you can safely ignore), and so on.

“Raw Sockets” are also very powerful and we should look them up.

Two types of Internet sockets, “Stream Sockets” and “Datagram Sockets”, which may hereafter be referred to as “SOCK_STREAM” and “SOCK_DGRAM”, respectively.

Datagram sockets are sometimes called “connectionless sockets”. (Though they can be connect()'d if you really want. See connect(), below.)

Stream sockets are reliable two-way connected communication streams. If you output two items into the socket in the order “1, 2”, they will arrive in the order “1, 2” at the opposite end. They will also be error-free.

Indeed, if you telnet to a web site on port 80, and type “GET / HTTP/1.0” and hit RETURN twice, it'll dump the HTML back at you!

Stream sockets use a protocol called “The Transmission Control Protocol”, otherwise known as “TCP” (see RFC 7936 for extremely detailed info on TCP).

TCP makes sure your data arrives sequentially and error-free.

“IP” stands for “Internet Protocol” (see RFC 7917).

IP deals primarily with Internet routing and is not generally responsible for data integrity.

Datagram sockets are called connectionless. They are unreliable. If you send a datagram, it may arrive out of order. If it arrives, the data within the packet will be error-free.

Datagram sockets also use IP for routing. They use the “User Datagram Protocol”, or “UDP”. You don't have to maintain an open connection as you do with stream sockets. You just build a packet, slap an IP header on it with destination information, and send it out. No connection needed.

Why would you use an unreliable underlying protocol? Two reasons: speed and speed. If you're sending chat messages, TCP is great; if you're sending 40 positional updates per second of the players in the world, maybe it doesn't matter so much if one or two get dropped, and UDP is a good choice.

A layered model more consistent with Unix might be:
• Application Layer (telnet, ftp, etc.)
• Host-to-Host Transport Layer (TCP, UDP)
• Internet Layer (IP and routing)
• Network Access Layer (Ethernet, wi-fi, or whatever)

All you have to do for stream sockets is send() the data out. All you have to do for datagram sockets is encapsulate the packet in the method of your choosing and sendto() it out. The kernel builds the Transport Layer and Internet Layer on for you and the hardware does the Network Access Layer.

The router strips the packet to the IP header, consults its routing table.

=========
The address ::1 is the loopback address. It always means “this machine I'm running on now”. In IPv4, the loopback address is 127.0.0.1.

To represent the IPv4 address 192.0.2.33 as an IPv6 address, you use the following notation: “::ffff:192.0.2.33”.

The network portion of the IP address is described by something called the netmask, which you bitwise-AND with the IP address to get the network number out of it. The netmask usually looks something like 255.255.255.0. (E.g. with that netmask, if your IP is 192.0.2.12, then your network is 192.0.2.12 AND 255.255.255.0 which gives 192.0.2.0.)

So you might have a netmask of, say 255.255.255.252, which is 30 bits of network, and 2 bits of host allowing for four hosts on the network. (Note that the netmask is ALWAYS a bunch of 1-bits followed by a bunch of 0-bits.)

But it's a bit unwieldy to use a big string of numbers like 255.192.0.0 as a netmask. You just put a slash after the IP address, and then follow that by the number of network bits in decimal. Like this: 192.0.2.12/30. Or, for IPv6, something like this: 2001:db8::/32 or 2001:db8:5413:4028::9db9/64.

The port number, it's a 16-bit number that's like the local address for the connection. Different services on the Internet have different well-known port numbers. You can see them all in the Big IANA Port List 12 or, if you're on a Unix box, in your /etc/services file. HTTP (the web) is port 80, telnet is port 23, SMTP is port 25, the game DOOM13  used port 666, etc. and so on. Ports under 1024 are often considered special, and usually require special OS privileges to use.

If you want to represent the two-byte hex number, say b34f, you'll store it in two sequential bytes b3 followed by 4f.This number, stored with the big end first, is called Big-Endian.

The storage method, by which b34f would be stored in memory as the sequential bytes 4f followed by b3, is called Little-Endian.

The more-sane Big-Endian is also called Network Byte Order because that's the order us network types like.

Your computer stores numbers in Host Byte Order. If it's an Intel 80x86, Host Byte Order is Little-Endian. If it's a Motorola 68k, Host Byte Order is Big-Endian. If it's a PowerPC, Host Byte Order is...well, it depends!

A lot of times when you're building packets or filling out data structures you'll need to make sure your two- and four-byte numbers are in Network Byte Order. But how can you do this if you don't know the native Host Byte Order? Good news! You just get to assume the Host Byte Order isn't right, and you always run the value through a function to set it to Network Byte Order. The function will do the magic conversion if it has to, and this way your code is portable to machines of differing endianness.

htons() host to network short
htonl() host to network long
ntohs() network to host short
ntohl() network to host long

Basically, you'll want to convert the numbers to Network Byte Order before they go out on the wire, and convert them to Host Byte Order as they come in off the wire.

=========
A socket descriptor is just a regular int.

struct addrinfo is one of the first things you'll call when making a connection.

You can force it to use IPv4 or IPv6 in the ai_family field, or leave it as AF_UNSPEC to use whatever. This is cool because your code can be IP version-agnostic.

Oftentimes, a call to getaddrinfo() to fill out your struct addrinfo for you is all you'll need.

To deal with struct sockaddr, programmers created a parallel structure: struct sockaddr_in (“in” for “Internet”) to be used with IPv4.

sin_zero (which is included to pad the structure to the length of a struct sockaddr) should be set to all zeros with the function memset().

The sin_port must be in Network Byte Order (by using htons()!)

inet_pton(), converts an IP address in numbers-and-dots notation into either a struct in_addr or a struct in6_addr. The old way of doing things used a function called inet_addr() or another function called inet_aton(); these are now obsolete and don't work with IPv6.

inet_ntop() (“ntop” means “network to presentation”—you can call it “network to printable” if that's easier to remember). The old way of doing things: the historical function to do this conversion was called inet_ntoa(). It's also obsolete and won't work with IPv6.

Often times, the firewall translates “internal” IP addresses to “external” (that everyone else in the world knows) IP addresses using a process called Network Address Translation, or NAT.

The details of which private network numbers are available for you to use are outlined in RFC 191815, but some common ones you'll see are 10.x.x.x and 192.168.x.x, where x is 0-255, generally. Less common is 172.y.x.x, where y goes between 16 and 31.

=========
The most correct thing to do is to use AF_INET in your struct sockaddr_in and PF_INET in your call to socket().

Another thing to watch out for when calling bind(): don't go underboard with your port numbers. All ports below 1024 are RESERVED (unless you're the superuser)! You can have any port number above that, right up to 65535 (provided they aren't already being used by another program.)

We need to call bind() before we call listen() so that the server is running on a specific port. (You have to be able to tell your buddies which port to connect to!)


posted on 2010-12-27 11:21 yanvenhom 阅读(324) 评论(0)  编辑 收藏 引用


只有注册用户登录后才能发表评论。
网站导航: 博客园   IT新闻   BlogJava   博问   Chat2DB   管理