I'm sure I'm missing something obvious, so it's duck time.

If I'm waiting for input from a bunch of streams (in C), I'll have a select (or poll) statement that waits until one of the streams has data ready. Easy so far.

When data comes in on a stream, I want to feed it into a parser that does a bit of work and returns when it gets to the end of the data (or, more accurately, gets to a point near the end of the data where it can't parse the next token because it's not complete).

How do I store enough state so that the parser can continue where it left off last time?

I can't/don't want to 'just' cache the stream until it's complete because I don't know how big the stream is (and I'd rather parse as I go).

On the other hand, memory isn't that expensive, and I can afford to buffer tens of megs if needed. If the parser knows that it's not going to be called half way through a token, then it can keep state much more cleanly.

I was playing with "state machine as a bunch of functions" a while back, and I think this is probably a reasonable use case.

(I'm going to save the entry here and maybe carry on later)


Remember the context here. This is for the browser, not some universal solution. There's going to be the main page, which is probably HTML, and that's going to trigger CSS, JavaScript, and images.

HTTP responses can be comfortabley modelled as status line, headers, and body. Status line ends with \r\n. Headers end with \r\n\r\n, and don't need to be parsed until they've been received. They do need to be parsed before the body, to get either the length of the body, or that the body is chunked.

The spec says don't expect headers to take up too much space, and the status line should be tiny (relatively).

I've been thinking too much like a server - that I'm going to get random hits that I've got to handle right now. But I'm a client. I'm asking servers to send me data, I look at that data at my leasure, and then maybe I ask for more.


Also, remember to push as much up to Lox as possible. Handing Lox a Response object with a status code, hash table of headers, and a body, and then letting Lox do the parsing is a reasonable position to take.


To remember your current position in the blog, this page must store some data in this browser.

Are you OK with that?