How to write a TCP server with the select
syscall
Yesterday I described the minimal commands for a TCP server. But that server can only serve one client at a time! It does some work with one TCP connection, then closes it and deals with the next TCP connection, etc. This is not how most servers work; the clients expect to be able to talk to the server regardless of what other clients are around.
The reason the server can only handle one connection at a time is that the process blocks waiting for a single kind of event. If the process calls accept
, the process blocks waiting for a new TCP connection, and nothing else will wake it up. If the process calls recv(fd)
, the process blocks waiting for some bytes sent on that existing TCP connection, and nothing else will wake it up.
To solve this, the process needs to say, “OS, please put me to sleep and wake me up when something interesting happens”. In our case, “something interesting” would be a new TCP connection or some bytes sent on any existing TCP connection.
The old-school UNIX way to say this is the select
syscall. Roughly, we call ready_fds = select(fds)
, which means “OS, please put me to sleep, and when something happens on a file descriptor in fds
, wake me up and tell me which file descriptors are ready”. Here, “ready” means “you can call a blocking syscall on it, but it won’t block”. If the file descriptor is linked to a TCP listening port, you can call accept
on it, and it won’t block. Thus, "“ready” for a TCP listening port means “there is a client waiting to open a TCP connection”. If the file descriptor is linked to a TCP connection, you can call recv
on it, and it won’t block. Thus, “ready” for a TCP connection means “there are some bytes in the buffer waiting to be read”.
After the select
call, the process can decide which file descriptors to call accept
or recv
on. Because neither accept
or recv
will block, the server process can deal with clients in a timely way.
The fds
argument is an fd_set*
. An fd_set
is an array of bits, each of which corresponds to a file descriptor. There are FD_SETSIZE
bits, and on my machine, FD_SETSIZE
is 1024. Thus, we are limited to 1024 file descriptors, which means ~1000 connected clients.
First wrinkle: the syscall doesn’t return a new file descriptor set; it overwrites the one passed in. So we must track which file descriptors we have elsewhere, then copy them to an fdset
before calling select
, and after select
returns, we must iterate over the same fdset
to find out which file descriptors are “ready”.
To work with fdset
s, we use the functions FD_ZERO
(the empty set), FD_SET
(add to set), FD_CLR
(remove from set), and FD_ISSET
(test set membership). A convenience FD_COPY
copies one set to another. (They’re actually macros; we could see in future how they are implemented.)
Second wrinkle: you actually pass in three fdset*
arguments, not one. The first is for “read” operations, the second for “write” operations, the third for “exceptional” conditions. Thus, select(readfds, writefds, errorfds)
means “OS, please put me to sleep, and wake me up when a file descriptor in readfds
is ready for reading, or when a file descriptor in writefds
is ready for writing, or when a file descriptor in errorfds
has some exceptional condition.”
This is a mouthful, and it’s not totally clear what it means. What is “ready for writing”? As always, it depends on the resource that the file descriptor is linked to. If it’s a TCP connection, it means there is space in the TCP outbound buffer. If it’s a a TCP listening socket, I don’t know.
If we’re not interested in some of these sets, e.g. we’re not interested in write-readiness, we can pass NULL
as the argument, and the process will not be notified of write-readiness.
The third wrinkle is that the process can pass another argument, a timeout. If nothing happens in that time, the call will unblock the process, telling it that nothing happened.
Similar posts
What syscalls does a TCP server need?
A minimal TCP server in C uses the socket
, bind
, listen
, accept
, recv
, send
, and close
syscalls to manage connections. 2016-12-14
How do I call a program from C?
To call a program from C, use `fork` then `execve`. There is no more direct way! 2017-02-07
How do I use fork
in C?
fork
duplicates the current process. It returns 0
in the child process. In the parent process, it returns the child’s new process id. 2017-02-06
How do I use execve
in C?
execve
replaces the current process with a new one. It takes a path, an argument array, and an environment array. The process never returns unless execve
fails. 2017-02-05
How to write a TCP server with the kqueue
API
Kqueue is a more efficient alternative to select
for managing multiple TCP connections, providing a publish-subscribe model for tracking events in the kernel. 2016-12-18
How do I duplicate a file descriptor in C?
Use the dup
system call to duplicate a file descriptor in C, allowing two references to the same underlying pipe. 2017-02-15
More by Jim
What does the dot do in JavaScript?
foo.bar
, foo.bar()
, or foo.bar = baz
- what do they mean? A deep dive into prototypical inheritance and getters/setters. 2020-11-01
Smear phishing: a new Android vulnerability
Trick Android to display an SMS as coming from any contact. Convincing phishing vuln, but still unpatched. 2020-08-06
A probabilistic pub quiz for nerds
A “true or false” quiz where you respond with your confidence level, and the optimal strategy is to report your true belief. 2020-04-26
Time is running out to catch COVID-19
Simulation shows it’s rational to deliberately infect yourself with COVID-19 early on to get treatment, but after healthcare capacity is exceeded, it’s better to avoid infection. Includes interactive parameters and visualizations. 2020-03-14
The inception bar: a new phishing method
A new phishing technique that displays a fake URL bar in Chrome for mobile. A key innovation is the “scroll jail” that traps the user in a fake browser. 2019-04-27
The hacker hype cycle
I got started with simple web development, but because enamored with increasingly esoteric programming concepts, leading to a “trough of hipster technologies” before returning to more productive work. 2019-03-23
Project C-43: the lost origins of asymmetric crypto
Bob invents asymmetric cryptography by playing loud white noise to obscure Alice’s message, which he can cancel out but an eavesdropper cannot. This idea, published in 1944 by Walter Koenig Jr., is the forgotten origin of asymmetric crypto. 2019-02-16
How Hacker News stays interesting
Hacker News buried my post on conspiracy theories in my family due to overheated discussion, not censorship. Moderation keeps the site focused on interesting technical content. 2019-01-26
My parents are Flat-Earthers
For decades, my parents have been working up to Flat-Earther beliefs. From Egyptology to Jehovah’s Witnesses to theories that human built the Moon billions of years in the future. Surprisingly, it doesn’t affect their successful lives very much. For me, it’s a fun family pastime. 2019-01-20
The dots do matter: how to scam a Gmail user
Gmail’s “dots don’t matter” feature lets scammers create an account on, say, Netflix, with your email address but different dots. Results in convincing phishing emails. 2018-04-07
The sorry state of OpenSSL usability
OpenSSL’s inadequate documentation, confusing key formats, and deprecated interfaces make it difficult to use, despite its importance. 2017-12-02
I hate telephones
I hate telephones. Some rational reasons: lack of authentication, no spam filtering, forced synchronous communication. But also just a visceral fear. 2017-11-08
The Three Ts of Time, Thought and Typing: measuring cost on the web
Businesses often tout “free” services, but the real costs come in terms of time, thought, and typing required from users. Reducing these “Three Ts” is key to improving sign-up flows and increasing conversions. 2017-10-26
Granddad died today
Granddad died. The unspoken practice of death-by-dehydration in the NHS. The Liverpool Care Pathway. Assisted dying in the UK. The importance of planning in end-of-life care. 2017-05-19
How do I call a program in C, setting up standard pipes?
A C function to create a new process, set up its standard input/output/error pipes, and return a struct containing the process ID and pipe file descriptors. 2017-02-17
Your syntax highlighter is wrong
Syntax highlighters make value judgments about code. Most highlighters judge that comments are cruft, and try to hide them. Most diff viewers judge that code deletions are bad. 2014-05-11
Want to build a fantastic product using LLMs? I work at
Granola where we're building the future IDE for knowledge work. Come and work with us!
Read more or
get in touch! This page copyright James Fisher 2016. Content is not associated with my employer. Found an error? Edit this page.