(What is this? Check the lab notebook's table of contents)
Entry 003: What's immediately next?
Created 2024-02-16
Published 2024-02-17
Brainstorming
I've given myself a few days of shower thoughts, marinating my brain:
- I kept the latest prototype open and just refreshed the page every once in a while to replay what I have so far.
- I listened to various episodes of the Future of Coding podcast from the "Hosted by Jimmy Miller and Ivan Reese" era (totally scattershot, not at all chronological).
- I thought about skimming The Whole Code Catalog, but never got around to it. I will. I'm just in a "knowing how to read sheet music will only hold me back" phase of the project.
Desired interactions
"Model" is a better word than "program" for what I want to use a PPL to create. I can imagine embedding a PPL into another application, sure, and I keep thinking of ways I might use a PPL in the interface itself, but primarily, if I'm using a PPL, it's because I have guess at the underlying process that generates some real-world data, and I want to test how well it actually matches that data. A few examples:
- I have a mental model of how my heart rate varies during a run. I think of it as a lagging function of how hard I'm working, which is a function of how steep of a hill I'm climbing (thank you, San Francisco) and how long I've been running. If that's true, I can estimate parameters like lag time, recovery time, etc by importing some workout data.
- I have a mental model of what causes me to consider some foods healthy or not. I think it's a function of serving size, total calories, what I've eaten recently, and the time of day. I'm aware of statistical analysis tools that could give me some answers with just a table of all that data, but I'd have less control over the causal model being assumed during the inference. If there's room for analogues of the "laziness" term in ProbMod's tug-of-war example, for example, I want to control where they go.
- I have a mental model for how to play Wordle. I guess words using mostly non-overlapping sets of letters for a few turns, then use only the hinted letters until I figure out the correct locations. I'm curious to compare that with my sister's strategy of jumping straight to my second stage. She wins more often, but is it due to the strategy or just because she's played way more Wordle?
And so on.
Key aspects of these ideas:
- The models are tiny. In textual code, they can usually be expressed in a few lines because they come from causal diagrams with only a few edges. I don't need "large code base" features.
- Speaking of edges, to understand the models, I need to be able to rapdily move edges and values around to run quick experiments in all parts of the code. In the tug-of-war example, I've played with the shapes of the priors, the laziness probability, participants in the queried match, the match history, etc. It's a shame that in the WebPPL editor, the history of those experimental changes is confined to the undo stack, and that the results aren't more readily comparable without saving results outside the system.
- The models require access to data. The examples above described only tabular data, but I think about data like images, paths (such as of a mouse cursor, or GPS data) that aren't done justice by a tabular representation. Can I take inspiration from the recently-released Observable Framework, which supports "data snapshots generated by loaders in any programming language"?
Priorities
So model editing and data appear to be near the top of my list. But I think it's significant that none of the models I mentioned above are able to be expressed in any form using the language constructs I have. So I think I actually want to first make a beeline for the nearest interesting model to implement and then start on editing features.
I keep coming back to tug-of-war, but it requires functions, function application, memoization, maps, sums, conditionals, comparison operators, factor… 🥵
But surely there's something interesting between coin-flipping and that. The more I think about estimating the likelihood of a coin landing heads, the more unpleasant a feeling I get in the pit of my stomach, so I'd really like to avoid that "problem". Just going through some of the examples in ProbMods: medical diagnosis is fraught, phsyics simulation takes engineering, fair exam vs does homework… There's an interesting one.
Inferring whether a class's grades are explained by an exam being unfair or the students not studying isn't bad, and I can play with the idea of trait attribution to find nearby problems if I get tired of that one. The simplest versions require only conditionals, though the models can be expanded to take advantage of functions, and at an extreme can use data infer more and more of the assumed likelihoods based on data.
I like this problem. I'll start there. Conditionals, then memoized functions.
Home