Need advice on client/server networking (long-ish)

I was reading the mailing list archives, and one of the
messages mentioned the inherent evilness of using a
separate thread to read data from each connection.

I have two questions I can’t quite figure out on my own.

First, what if there is another thread that is CPU-hungry?
Will I get a better overall performance if I use a separate
thread to read from each socket, as opposed to using
select( ) / poll( ) ? (Reasoning: if there is no data on any
of the sockets, threads will block so they won’t waste
CPU time, while a polling mechanism will actually use CPU).

Second, what about sending responses back to the client?
Is it a good idea to have a separate thread to write data
to each socket? My gut feeling tells me it’s not, especially that
there is a usually limit on the number of threads… however,
here’s one scenario: suppose I have one thread that has to send
a message to all of its clients, and it does it like this:

for ( i = 0; i < CHAT_MAXCLIENTS; ++i ) {
if ( clients[i].active ) {
SDLNet_TCP_Send(clients[i].sock, &data, sizeof(data));
}
}

Now, if one client decides to die, the loop would freeze until it
either manages to send data to that client, or we get a timeout.
Which sounds like a bad thing.—

Any insights on why (and when!) a threaded or non-threaded
approach is better are greatly appreciated.

Thanks in advance,
M.C.

merlecorey at crosswinds.net wrote:

I was reading the mailing list archives, and one of the
messages mentioned the inherent evilness of using a
separate thread to read data from each connection.

I have two questions I can’t quite figure out on my own.

First, what if there is another thread that is CPU-hungry?
Will I get a better overall performance if I use a separate
thread to read from each socket, as opposed to using
select( ) / poll( ) ? (Reasoning: if there is no data on any
of the sockets, threads will block so they won’t waste
CPU time, while a polling mechanism will actually use CPU).

I recently tried using a threaded model for my scrolling game, and the
result was that the updates were extremely choppy. I think the
explanation goes something like this:

The CPU-hungry thread is a common case for games, as usually the
rendering (blitting/drawing/whatever) thread is constantly running,
and doesn’t really ever block (unless you have hardware double buffer,
and it can block on the vsync). Since it doesn’t block, it uses it’s
entire timeslice (10ms in my case) then the other threads have a
chance to run, then the render thread runs again for 10ms. So my
packet updates were being forced to wait until the 10ms render
slice was completed.

If I were polling on each frame draw, then I’d have updates more
frequently, and thus smoother motion.

It seems to me that a threading solution will only work well if you
have one of two situations: 1) all threads eventually come to a block,
and thus never use their entire time slice. 2) you have multiple CPUs,
so the CPU-hungry rendering thread can continue running while the
service threads take turns at the other procesor.

Second, what about sending responses back to the client?
Is it a good idea to have a separate thread to write data
to each socket? My gut feeling tells me it’s not, especially that
there is a usually limit on the number of threads… however,
here’s one scenario: suppose I have one thread that has to send
a message to all of its clients, and it does it like this:

for ( i = 0; i < CHAT_MAXCLIENTS; ++i ) {
if ( clients[i].active ) {
SDLNet_TCP_Send(clients[i].sock, &data, sizeof(data));
}
}

Now, if one client decides to die, the loop would freeze until it
either manages to send data to that client, or we get a timeout.
Which sounds like a bad thing.

I like to use select for things like this. build the list of sockets for
writing based on which ones have data to be sent, then select on the
full
list of sockets for reading and the partial list for writing. Then for
each socket that has data available to read, deal with it. for each that
is available to write, use a non-blocking write to write the data.
remove the data from the buffer based on what the return value from
write was. The key is that the timeout value for select should be either
small or zero, since you still only want to poll.> —

Any insights on why (and when!) a threaded or non-threaded
approach is better are greatly appreciated.

Thanks in advance,
M.C.

The CPU-hungry thread is a common case for games, as usually the
rendering (blitting/drawing/whatever) thread is constantly running,
and doesn’t really ever block (unless you have hardware double buffer,
and it can block on the vsync).

You can’t even rely on stuff blocking waiting for vsync. Many
implementations just busy-wait (I think XFree DGA does this)

Since it doesn’t block, it uses it’s

entire timeslice (10ms in my case) then the other threads have a
chance to run, then the render thread runs again for 10ms. So my
packet updates were being forced to wait until the 10ms render
slice was completed.

The default linux timeslice is 210 ms if I recall correctly, so 10 ms might
be an underestimate. (remember timeslice != jiffy)

It seems to me that a threading solution will only work well if you
have one of two situations: 1) all threads eventually come to a block,
and thus never use their entire time slice. 2) you have multiple CPUs,
so the CPU-hungry rendering thread can continue running while the
service threads take turns at the other procesor.

It seems to me that threads can work if you have a good priority model
(or use explicit blocking primitives, same effect) for stuff like
AI routines that run for longer than one frame, and maybe for audio
in order to minimize sound latency and be safe against underruns

Mattias Engdeg?rd wrote:

The default linux timeslice is 210 ms if I recall correctly, so 10 ms might
be an underestimate. (remember timeslice != jiffy)

In my quick and not-so-scientific experiments, I’ve found that
it is typically about 10 ms, with occasional leaps to 20 ms, most
likely because my xterm stole a timeslice :slight_smile:

It does always seem to be a multiple of 10 ms, and was never less.
This was done with two busy-waiting threads running, while the
main process used SDL_WaitThread on the first thread.

Mattias Engdeg?rd wrote:

The default linux timeslice is 210 ms if I recall correctly, so 10 ms might
be an underestimate. (remember timeslice != jiffy)

In my quick and not-so-scientific experiments, I’ve found that
it is typically about 10 ms, with occasional leaps to 20 ms, most
likely because my xterm stole a timeslice :slight_smile:

You still confuse timeslice and jiffies (a.k.a. ticks). The Linux
system timer runs at 100 Hz on most architectures (1024 Hz on Alpha).
The timeslice is how long a process will run before being pre-empted
by the scheduler and another process given time instead. The length of
the timeslice is dynamic (see kernel/sched.c), and ~20 ticks by default