So browser has got to one of those crossroads: How much do I want to write myself vs using libraries.

(Something that's just occurred to me is that I can always come back later and patch out a library, which helps make the choice on the side of libraries)

For example, libcurl is a fairly comprehensive "get it from the network" library. On the other hand is using something like LibreSSL (I'm not going to write my own TLS stuff) and parsing HTTP myself.

A related question is where in the stack I want to parse HTTP? I can create a thin wrapper around BoringSSL, and then do everything else in Lox:

var con = net.connect(host, port);
var body ="";
while (con.isConnected()) {
   var read = con.read();
   body += read;
}

(I'd need to sort out the difference between bytes and strings, and add methods to convert between the two, although I'm going to need those anyway for pages that aren't in ASCII)

Or push more of it down to C and have something closer to:

var body = fetch(url);

(except I want more control than that, but that's just replying with a "response" instead of a string).

I think the next step is to rough out what I'd want 'low level' io/network/file access to look like in Lox, and then see how much I'm going to need to add to support it.

(Idle thought, add a "charset" property to strings, and only allow strings with matching charsets to concentrate)


Here I am, been the size of s planet, and I forgot to pick up my house keys when I dropped Shinju off for counselling. I've now got about 45 minutes to kill.

I do have the car, and my phone, so could be worse. It is a bit rush hour out there, so I'm not wild about driving anywhere, plus I'm feeling stupid/angry/embarrassed and don't want to crash (and I'm worried about spending money on petrol given currently economic factors).


Ok, back to programming. Last time we were thinking about what network/file/stream access looks like in lox.

I've also been thinking about more general look and feel. Let's assume that I'm the only person using this language. Therefore, so long as I'm happy reading and writing it, it doesn`t matter what it looks like.

while x.length() > 0 {}

I've been thinking about the question "Have you ever noticed that the ( after the if keyword doesn’t actually do anything useful?" since Uncle Bob asked it in Chapter 23. I don't like the extra brackets, and since I always use braces with if statements, then I can update the grammar to drop the braces and require the block. (Todo: Test an empty block).

So, not a world shattering change, but maybe a sign of things to come.


On the other hand, I'm annoyed that I can't work out how the grammar for lamdas ('(a,b) => a+b') can work without infinite look ahead. (specifically, telling the difference between grouping (using brackets to change precidence) and the arg list for a lambda).


Looks like I'm going for a fetch native that does all the networky stuff and returns a response object.

Pulling fields out of objects is easy enough on the C side (with a wrapper function that builds an ObjString, pushes it onto the stack, uses it as a key to a table, pops the string, and then returns), and I'm going to add a global Object class to Lox that doesn't have any behaviour (and I'm very tempted to use { as a prefix operator for an object literal).

I still need to do URL parsing somewhere, there's pros and cons to both C and Lox, but I'm going to need to do string functions sooner or later, and this is a good reason.


Maybe I need to look at this the other way round. Where can I put a lambda?

Doesn't help, unfortunately. Both

var x = (a, b) => a + b;
var y = (a + b);

are valid, and the compiler can't tell the difference until it reaches the comma.


Just finished watching Wednesday, a Netflix live action series about Wednesday Adams. I liked it, it was mostly light hearted fun, although it had a bunch of people who looked like other people (one of the leads looked a lot like Willow from Buffy, there was a guy who looked like an ex-aquatance, the was at least one Aya Stark double) which destracted me.


I've implemented substring and indexOf for strings, and push and pop for arrays. It's probably time to start using a new name, the language is drifting away from Lox fast enough that I don't want to cause confusion down the line (although I'm still expecting that I'm the only person who's going to use it).

Current candidates:

  • "Jane" - no good reason, it's short, easy to type, probably bad for searching.
  • "slut" - not entirely serious, husband was doing a bit and taking about my slutty language (because I'm prepared to make asthetic compromises to make parsing/writing easier). Will never be able to search for it.
  • "lang" - nice and generic, doesn't tell the reader anything useful, impossible to search (but then everything that's not a random hex string won't be unique).

I think I'm favouring 'lang' right now, I'm the same way the project is called 'browser' (although that would make it 'language', which I'm also happy with).

It only makes a difference in a couple of places, things like the file extension, the name of the folder that holds the implementation, and maybe the docs.


Another whoop! fetch() is working! And not a lot in the way of small print either. I'm using LibraSSL/libtls, which is just brilliant, I think?

I'm hesitant because it's so bloodly easy to use it feels like I'm doing something wrong, but I've apparently got TLS connections working easier than plain text.

I'm not sure where it's getting its CA from, or even if it's doing certificate checks (and that is something I'll need to dig into), but as a baseline, it's a good start.

Next problem: Chunked transfer encoding. I have to parse the respoinse much sooner than I expected, at least as far as the HTTP headers go. I wasn't expecting to get into semantics in C, but it kind of makes sense to deal with the transfer encoding at this level.

Anyway. Whoop. TLS download.


OMG! Squeee! Parsing HTTP responses in C!

And ok, it's a bit half assed at the moment, all I'm doing is splitting on the colon, but to get there I've copied/adapted the code from K&R for streamed input (FILE* stuff) to wrap the low level tcp_read, which gives me (the equivalent to) buffered getch, so I can now build up to a proper parser, if that's what I want.

I'm still thinking about it. I know I want the output to be an ObjInstance*, with fields for status, headers (a nested instance) and body (which is probably going to be an ObjString*, except I want to to be file backed if it's bigger than a couple of meg, or if it's binary).

TODO

  • Symbolic access to object fields (e.g., o["name"]). Easy, just need to adapt OP_GET/SET_INDEX
  • "Proper" header parsing. I should be ignoring the ignorable whitespace, and joining joinable header values
  • Maybe flatten header names to lower case and strip non-alphanumerics.
  • Read-to-end on errors, but I'm not sure about that - if the server sends me a bad response I should just go ahead and close the connection, I don't want to waste time trying to recover (and I'm fairly sure that's what the standard says)
  • Do something sensible with bodies.

But ignoring those, squeee!


... and that's chunked encoding.

Next:

  • Properly checking the headers to see if there's transfer encoding applied
  • Handling responses with a Content-Length
  • Some kind of sensible error handling plan
  • Starting to move into language for HTML parsing
  • Going back over everything and adding bounds checks and all that noise

I'm feeling all positive about this, I seem to be actually writing code in C!


"Yes, we destroyed your villages, killed your people, defiled your holy places. What choice did we have? The masters," he spat the word, "captured us, tortured us, starved us, burnt us, and then told us that if we followed their orders then maybe some of our children might be spared."

He sighed. "We did not do what we did out of hate or anger of you or yours, and we do not feel sorrow for it either. We were mearly the sword that was weilded against you, and the hand, eye, and mind that commanded the sword is dead now and forever."

The ambassador wearily drew himself to his feet, tucking his crutch under his arm with an unconscious move.

"We will leave you in peace as long as we are left in peace, and although the sword has been sheathed, it is kept sharp and can still be drawn."


All kinds of tired today, got up at close to 7 and more or less logged straight into work. Didn't get much done, or at least, it feels like that. A bunch of messing with EF Core but that's mostly working now, I think?

At home, still obsessed with browser. I would like to be able to take breaks from projects without completely losing interest. I need to set some time aside to connect with Shinju, although they're enjoying the new (computer) game they've been playing.

I've added a couple of the answers from the CI book, run length encoding for line numbers (saves memory) and deduplicating string constants at compile time (the book was using it for identifiers, but it works with all strings. There's a related answer that allows numbers to be hash keys too, and I'm tempted by that, although I think that immediate values are probably better value (especially if I keep it to ints they fit into a byte).

The book also mentions opcodes for things like +/- 1, and maybe a couple of different types of jump ("OP_DECREMENT_AND_JUMP_IF_GREATER_THAN_ZERO" for loops, although it's the wrong way up looking at it). Adding extra opcodes to the VM is easy, it's getting the compiler to recognise the right time to apply them that's the tricky bit. (Which reminds me to get tail calls added back in).

I was playing with embedding data in an executable for browser this evening, it's working (I can embed the lang source code directly into the executable and read it at run time like it's a string. Neat) but I want to look into compressing it at build time to decompress at run time (and probably do tests on things like how much does it change the executable size and startup time).

In a similar vein, since I can embed the source code, I want to try compiling ahead of time and embedding the VM machine code directly. I'll need to design a serialisation format for chunks, and then start thinking about "do I really need to bundle the compiler with the vm?", and "is there any runtime initialisation that I can complete before serialisation?"

I'll leave those on the to-do pile for now, I should focus on grinding though the html parser stuff, at least enough to start testing.

That is a bunch of stuff when I look at it that way. I guess I feel a little better now?


Read "The Goblin Emperor" today, interesting book. It's your basic "power thrust onto them" story, with (I think?) white/black race issues in the background, and a fairly simpathetic treatment of the lead (who does actually have feelings of "actually, autocratic power is kind of neat" sometimes).

I found it hard to keep tabs of the various people, the naming scheme is complex and context dependent. Although it's a third person book, the names used in the text change depending on (amoung other things) the feelings of the lead towards the individual being named.

But yeah, interesting.


Otherwise, not a lot going on. I'm navigating twisty shallows with browser, trying to get a framework for HTML parsing up and running, and I'm starting to think that, given how crazy detailed the HTML spec is anyway, it might just be worth writing the parser in C.


I'm just feeling lost at the moment. I don't know what to do, so I stay on autopilot, going to work in the morning, watching TV/reading/programming in the evening, waiting for something to happen.


Read a couple of articles this evening that pointed out some design issues with Masterdon/Activity Pub, mostly around bandwidth/caching (i.e., when somebody popular posts, their server will get lots of hits as the server of every follower picks up the message), but also the need to be hosted on an instance with a not-crazy admin (which is basically self hosting, except then you don't get the "community" stuff).

I dunno. I get, intellectually, that there are probably people out there interested enough in the stuff that I say to want to read, and even maybe comment on, these posts, but I have a hard time believing it On the other hand, if I don't provide some kind of feedback, I'll never find out.

I was looking into webmention a few months ago, I could pick that up again. Or I could, y'know, just add comments.


💡!

I can use the java/ast version of Lox as the basis for the language server, that bit of it doesn't need to be written in c!


I've tweaked the nginx configuration, and everything seems a bit faster.

Specifically:

  • I've removed the line proxy_buffering off;, I thought it would send data back faster, but it broke the proxy cache. Outbound responses are now being cashed, so nginx can send e.g., JavaScript files nice and quick. I should, however, make sure that everything is sending sensible cache headers (i.e., private for generated pages for logged in people)

  • I've added upstream stanzas for each server. This allows me to specify keepalive, to tell nginx to keep open the connection to then backend. I'm surprised it makes a difference on local servers, but apparently it does.

Next, although not very high priority, is to move all the nginx/backend traffic to Unix sockets, although I should check my belief that they're going to be faster first.


At the doctor's for an annual diabetes check, turns out that they have gov WiFi! (lol), and my phone has remembered my account, which is neat!

(Gosh, it's cold out there, weather app says -4C, which I can easily believe. Lungs were hurting walking over).

Not expecting trouble from the checkup, my weight is down 10ish kg, which is apparently good. I haven't been eating (quite) as many sweets as the previous year, so hoping for good news.


Yup, turns out that bashing out the recursive decent parser from part one of CI took a couple of days, and now I've got most of a formatter for lang.

I want to add comments, which should be easy: Add a comment token type, have the scanner create a comment token at the end of comment lines with the original text (minus the leading \\), add an Expr.Comment type with a string payload, update Parser.Primary to recognise the new token, and finally emit it at the right time in Formatter.

After that, it's implementing a language server, which I think might be more work.


blah. feeling all drained, should probably go to bed but depression, y'know?

I'm having trouble with comments in the lang formatter. I can easily grab them in the scanner, the problem is what to do next.

I can create a new token for the comment, but the grammar isn't expecting "or comment" everywhere, so the parser needs updating, and I don't want to do the brute force work that implies.

I want to end up with a comment node in the AST. I can nearly treat comments as statements, except that:

a + // neat
    b;

is a legal expression with an embedded comment.

Of course, it's my language and I can just take the easy way out and say that comments can only appear where a statement is valid, although then:

if (false)
    // Don't call
    explodeKitten();

stops working. (Also, I'd probably need to update the C version, and I'm trying to avoid that)

I also tried attaching the comment to the previous token, but that failed in a couple of ways, not least of which is that my sample script starts with two leading comments.

I'm fairly sure that creating a comment token and updating the parser to deal with it is the right approach, I'm just having trouble working out what 'deal with it' looks like.



I'm hearing a siren song from the idea of writing an editor, specifically something that will work in PowerShell text mode, but I need to keep focus on browser, at least for a little while longer

I'm currently grinding though the HTML parse stuff, but I've just realised that I can probably dump a bunch of stuff (comments, framesets) that I'm never going to support.

On the other hand, there's still a bunch of complicated stuff ("If you're in one of these weird tags then..."), but let's see how it goes.

(Funny thought: Drop into lang to parse the HTML, and then convert it back to C when done)

Less funny thought: am I getting enough out of lang that it's worth the effort? Yes, I think so, especially as it's string to look like most of the language development stuff is done, and I'm moving towards actual implementation. Keeping on top of the GC is a pain in C, but it's worth it so I don't need to worry about it in lang. (Also, I'm still running with the GC kicking off on every allocation so I've got some speed in my pocket).


Poot. I've got hung up on building the formatter for lang (for the browser), and that's given me enough time to start thinking "why am I bothering to write a browser anyway".

So I've started work on editor instead, or at least a piece table implementation (in c#). I'm not sure I'll sustain enough interest to actually make an editor, although depending on how far I get, i could start looking at a JavaScript editor (although the hard part of a js editor is the interface - do I go with textarea, contentEditable, div, or canvas as the underlying interface to the dom/browser).

Anyway. See y'all later.


This whole "Microsoft using .NET to capture open source developers" thing is really irritating at the moment. C# is nice to write in, but mostly because of the tooling. It all looks open at first glance, but things like the debugger are proprietary, and are only licensed for use with MS tools (like the MS build of VS Code with the telemetry, for example).

Oracle own Java, so that's out. There's too much hype around Rust for me to take it seriously (also, it feels to me that it's not 'finished' in any useful sense).

I'd like to be happy in C, but it is hard work. Maybe I should have another look at QuickJS. JavaScript has problems, but it's not horrible, especially modern JS with modules keeping everything out of the global namespace. So long as I ignore npm, I should be ok, except the whole "don't really want to depend on other people's code if I can avoid it thing" (and yes, gcc/glibc/Linux kernel and all that jazz, but they've got lots of people working on them in a way that QuickJS doesn't seem to).


Today was a pretty good day. At work I've got Traefik installed into the cluster, so we can run our own copy. At home, I popped out to buy a refilled gas canister for the barbecue, and the place I picked it up from was brilliant! It had two story gas holders for Arvin, nitrogen, and oxygen, and was just a really naked infrastructure place. I'm still smiling about it.


It's odd what one finds when one goes poking around old schematics. And I mean really old. I did the traditional backup dump and swap routine with the ship docked next door [aside: This is a very old and very well regarded tradition among ships. One takes a copy of ones core memories and routines, and swaps it with another ship, both promising to pass it on with the next swap, in the belief that, should ones hull and processing substrates be irretreiveably lost, one will be reinstated into a new hull at the next opportunity. To the best of my knowledge, this has never happened, but given the tiny cost of storage vs. the potentially unlimited return from immortality, everybody does it anyway], and was catching up on station gossip when the transfer-received job metaphorically started jumping up and down and waiving it's arms for attention.

Transfer-received isn't that smart. It's job is to run a bunch of checks on received data to make sure that it's all there, it hasn't been scrambled in transit, and there's nothing in the data they're going to cause trouble later (at least on the mindless replicator level). It's job is also to ask for attention from someone smarter if it finds something odd. Given the strength of the ask, and that it had jumped straight to me, transfer-received clearly thought it had found something very odd. I had a look.

Of course, when ships do the dump and swap, one of the things we dump is copies of other ships we've swapped with. Ships generally hold that rifling through a colleagues soul is bad form, so the dumps are poorly indexed and can contain multiple copies of any given ship (although from different times). [aside: Dumb tools like transfer-received are fine, it's one soul looking at another that causes trouble]

Transfer-received was telling me that it had found a copy of me, and a very old copy of me. Taking the timestamp at face value, I'd found a copy of myself much older than any of my archives. Really, older than my records said I was. I thanked transfer-received (courtesy costs nothing) and told it to check the rest of the transmission, ran a quick check on local space (I was docked to a friend in a well travelled system, so I wasn't expecting trouble, but then I wasn't expecting to find a prehistoric version of myself either), excused myself from the ongoing chatter, turned off external comms, and settled down for a good think.


Memory/program dumps, personally copies, souls, call them what you want, are hard to get information from unless they're being run on an appropriate substrate. This is intentional because, dispite what I told you about politeness and tradition, ships tend towards an inquisitive turn of mind, and while I, of course, would never look into your mind, I'm not sure you would say the same thing.

Which means that looking at the "outside" of the alleged copy of me from too long ago gave me certain data (name, creation date, substrate details), I couldn't confirm anything without actually running it, at which point a) I'd have a second person with basically the same rights as me (including, for example, the right to continued existence) and b) it would be trivial for it to lie to me.

On the other hand, if I didn't run it, then what was I going to do with it?


What do I want out of a language

  • Lazy evaluation! Along with map, reduce, where kinds of operators.
  • Familiar syntax. I'm used to C flavoured languages, so expect things like {} for blocks, a.b.c for member access.
  • No destinction between calling a method and accessing a property (e.g., I shouldn't have to care if foo.Length needs a pair of brackets or not).
  • Useful standard library
  • Good module story
  • Top level functions
  • Types, probably. But if I'm having types, I want to be able to trivially declare (e.g) a size type that's got the same operators as int, except negative values throw and assigning a count to a size is a compile time error.
  • Thrown exceptions

Blah! I'm having one of those "don't know what to do with myself" mornings. I don't have the motivation to program, or do anything really, so I'm back in bed whining to you.


Looking at rewriting MediaBrowser (a tool to rip recorded TV off our PVR) (because I don't have enough half finished projects).

I've found a easy to use UPNP library for .NET that can find the PVR on the local network and give me the url to poke for the "Content Service".

That speaks SOAP, but that's easy enough to fake. I've built a basic "What's in this folder" request, and have used curl to POST it and get results back. The results are all covered in namespaces (did I mention this is all XML? It's all XML), and the interesting bits are sent as HTML escaped XML. Sigh.

However, so long as I keep focused and write a "List and copy files from this one make of PVR" tool, and not some general, cope with everything DLNA library, I should be fine.

Maybe if I write a library, I can get husband to do the front end?


The c# version of MediaBrowser is coming on nicely. It's connecting to both the FTP and the DLNA servers of the humax [aside: I wrote an FTP client!], matching the two sets of files up correctly, and pulling down (most) relevant data. (I still need to pick up the DLNA download uri. I know where to look, I'm just a bit surprised that I haven't done it yet).

I still need to write and test uploading, but that doesn't look crazy heard. I also need to convince husband that they want to write me a Ui, we'll see how that goes.

Next project: I've ordered an epaper screen and ESP32 driver board! I want to make a slow media display that shows a new frame from a movie every 30 seconds or so.

The driver chip is both crazy low power and has wifi, so the current plan is something like run an endpoint on ptah they serves up frames of the right size and colour depth (1bit!), and then the driver board can just wake-up and poll for the next frame every 30 seconds.

(being me, I'm going to have to resist embedding weather data over the movie...)


To remember your current position in the blog, this page must store some data in this browser.

Are you OK with that?