Benjamin Pollack's Rants and Thoughts on Technology

How I Review GitHub PRs

September 10, 2024 1:31 pm

After a long time spent in management, I found myself switching back to an IC role in February. Since then, I’ve spent more time than I’d care to admit re-learning how to do all the things I used to do. Or…maybe not re-learning outright, but rebuilding habits and patterns that used to be automatic, but that I now have to think about.

One of those patterns involved code reviews. The place I work has a bias towards large, epic-sized PRs. If it’s been a minute since I’ve been an IC, it’s been an hour since I was actively using GitHub for reviews, and an aeon since I was reviewing truly massive PRs. While I was pleasantly surprised to find that GitHub has improved a lot in that area, it’s got a long way to go. You can mark a given file as reviewed, for example, but not a given chunk. You can’t really make notes to yourself in any way other than leaving draft comments, which means you need to be careful what and how you write notes lest you accidentally publish a comment saying, I dunno, “oy, we’ve got to fix Suebob’s dependency so this isn’t necessary”. And while you can zoom out to see a whole file instead of just the changes, you can’t easily navigate around the code base to understand how the PR fits into the larger whole.

Thankfully, I keep a developer’s journal where I write down the various problems I’ve hit over the years and how I’ve approached them, so I quickly banged out hsh q review and found my notes on doing large PRs. While not all of them holds up (and some of them are specific to dealing with the vagaries of Phabricator and Gerrit), there’s enough there that’s useful that I figured it’d be worth sharing.

Pull the changes locally

First things first: the best tool to understand code is the editor and tooling you use every day. If it’s a really trivial change, I’ll review it in GitHub directly, but otherwise, it’s almost always worth pulling down the changes locally and opening them up in a relevant IDE or editor. Once I’ve done that, I’ll see any warnings and suggestions that my environment offers, I’ll be able to use tooling like “find references”/“find overrides”/etc. to make sure that I understand the full impact of the change, I can run (or even debug, step through, or alter!) the code and tests, and so on. This sets me up in a much better place to understand what’s actually happening with the PR, and to verify it works as expected.

GitHub makes it really easy to grab changes locally with their CLI tool (it’s just gh co <PR number>), but once you’ve done so, you’ll note that git diff shows nothing: there are no changes to look at because you’re up-to-date on the feature branch. If you don’t do anything else, you’ll either need to keep the GitHub review open in another tab, or you’ll need to keep the output of git diff main.. handy in another terminal window so you know what you should actually be looking at.

Thankfully, there’s a much better way.

`git reset --mixed` for fun and reviews

There’s a really easy trick to get PRs’ changes locally in such a way that Git itself, as well as your editor and tooling, will show the changes. If you’ve ever messed up a merge, or wanted to throw out your work, you’ve probably used git reset --hard main, which discards all of your local changes and sets you back up on the trunk. But a great tool in this case is to use is git reset --mixed (a.k.a. the vanilla git reset, without the --hard) in the form of git reset --mixed main, which will leave the changes in your working copy, but reset both the index and HEAD (i.e., what commit you’re working off of). Once you’ve done that, you’ll see that you now have all of the PR’s changes showing in git diff, ready to be staged. In many editors (including Visual Studio Code and Helix, the two I use most often), you can even jump from change to change across the project, making it really easy to see what was altered and how those changes fit into the project as a whole.

I use this pattern enough that I have a fish function, revpr, for…well, “review PR”:

function revpr --description "check out a given PR for review"
  gh co $argv[1]
  git reset --mixed main
end

This is already a huge improvement: we can see what’s changed in our preferred tooling, as well as navigate around, debug, and experiment. But we haven’t solved leaving comments or tracking what we’ve viewed. Let’s solve those one at a time.

Stage hunks you’ve reviewed

One of the reasons I use git reset --mixed instead of something like git reset --soft is because it resets the index. People like the index in the first place is because it lets you stage parts of a change, rather than just unceremoniously committing all the changes at once. git add -p and similar tools even let you add just parts of a file to a commit. We can use that for our own purposes: in this case, if you add hunks as you review them, you can easily see what you’ve already looked at via git diff (stuff yet to be reviewed) versus git diff --staged (stuff you have reviewed). In many editors, including again the two I use, staged changes are shown differently than unstaged changes, so I don’t even actually need to jump to the CLI to track where I am in the review.

Keep comments in the code

That just leaves where to put comments. My suggestion here is simple: put the comments in the code itself. I use two specific forms for my comments to make them easy to find: // REVIEW(bmps) are comments I want to put in GitHub, and // NOTE(bmps) are comments just for myself.^[1]

Now the one part that’s annoying: once I’ve concluded the review, I find all of those review comments (you can use any global find you want; I have a fish function that just does rg -F 'REVIEW(bmps)') and…erm, well, manually copy them into GitHub.

I’ll be honest: this is one part of the process that genuinely annoys me. I’ve looked at tools like prr for inspiration on how to automate things, but I haven’t had any “eureka!” moment on how to do this without human intervention. I will say that the ability to write review notes to myself however I want, knowing that I’ll have a chance to manually review and edit everything before sharing with the author, does have some pretty strong benefits—but so would just having a sort of review-your-review-by-hunk flow similar to git add -p. That all said, this annoyance is completely worth it to me due to all of the other benefits. It’s just not where I want it.

Suggestions on improving this part are always welcome in my inbox.

Use a separate Git worktree for reviews

Finally, one tip I that’s new to me as of this year (and really the past few months). All of the above has one major drawback (at least, if you’re not using something like Jujutsu), which is that you have to either stash your in-progress changes or throw them into a temporary commit before you review work. Yes, you can make an entirely separate clone of every Git repo for reviews, but 1) that can eat a lot of disk space (one of the repos I work in most often is a whopping 8.6 GiB), and 2) you have to constantly remember both to grab what you’re reviewing, and to update main, or else the git reset --mixed step will result in you seeing a bunch of false changes.^[2]

The fix for both is to use a git worktree specifically for reviews. If it’s a repo I’m not in much, I usually won’t bother with one, and will just do the review right in the tree. But otherwise, a simple git worktree add -d ../foobar-reviews gives me a clean place to work whose branches will always be fully in sync, but that won’t mess with my own in-progress code.

Future improvements

I’ve already called out manually copying comments over to GitHub as annoying and something I want to improve, but there are three other bits I’ve not been thrilled with.

First, I haven’t really found a good way to keep my comments locally as the review matures. With the above flow, from Git’s perspective, you’re basically squashing the entire PR, which puts you into a prime position to destroy the upstream PR and/or push up your inline comments. I’ve never gotten as far as actually doing that, but I’ve gotten uncomfortably close. I’m currently toying with a variant of the above where I make a branch to hold the review comments right when I start, and then squash/commit on that, but that ends up feeling like a lot of ceremony for relatively few benefits.

Regardless of which of the above approaches I pick, the following git rebase when there are upstream changes inevitably results in merge conflicts as my notes are addressed. This is actually not a huge deal, since the conflicts will by definition be just with my review comments and thus be trivial to resolve, but it can sometimes feel like more friction than I’d like. I feel as if a custom merge tool would probably handle this well, but I’m still chewing on how exactly that merge tool would work and what it would actually do.

Finally, I can’t easily see what comments others have made.^[3] In practice, this isn’t that annoying: I like to go through a review first without looking at what others have said and only then read their comments anyway, so it’s not a huge deal to do my review pass locally using the above workflow and then read GitHub afterwards, but I definitely feel like that sequence could be improved.^[4]

That all said, for me, this flow ends up working very well in practice, and is a meaningful improvement on the vanilla GitHub PR flow for large changes. If you find yourself in a similar boat, I think this approach is worth a try.

I didn’t used to bother with the // NOTE(bmps) form, instead just doing plain comments for those, but I realized after awhile that my notes often evolve into review comments over time, so I started marking them separately so they’d be easier to find. ↩︎
Strictly speaking, git reset --mixed origin/main works as long as you’ve done git fetch, which you have to do to grab the PR anyway, but getting that muscle memory right is tricky. ↩︎
This is a problem that prr also suffers—which is really unfortunate, since having a mailing-list-style review is perfect for this exact case. I ultimately concluded that prr’s approach just isn’t one that appeals to me, but if I’d decided to stick with it, trying to add that functionality would’ve been a fun contribution. ↩︎
It’s worth noting that Visual Studio Code can show GitHub review comments directly in the editor via the GitHub extension. If I used Visual Studio Code as my sole and primary editor, I’d be pretty tempted to look at that route more closely, but since my preferred editor is Helix, and I also spend a ton of time in PhpStorm, I prefer a workflow that works identically in all three places. ↩︎

Beating Spelling Bee with Factor

March 14, 2023 2:35 pm

While I unfortunately haven’t had a lot of time to contribute to Factor for a couple of years, I still love using it for the little random programming tasks I have to deal with day-to-day. Factor’s design makes it perfect for the kind of exploratory work that hits at the fringe of what it makes sense to automate. And on top of that, I still think Factor should be more widely used, so I like doing what little I can with what time I have to “make fetch happen”.

One of my current addictions is to the New York Times’ Crossword app. The app, in addition the actual crossword, has a pile of other word games that are perfect for burning a couple minutes. And one of them, Spelling Bee, drives me slightly crazy.

Spelling Bee is a simple game: you are presented with seven tiles, each with a unique letter. One of the tiles is gold. Your goal is to come up with as many words as possible that are at least four letters long, use only those seven letters, and use the gold letter at least once. You are allowed to use any given tile multiple times, so, given tiles that include A, B, and O, a word like baobab would be completely legal (provided at least one of those letters was gold). If you happen to get a word that involves all seven letters at least once, that’s called a Pantograph or something, and you get the most points.

The thing is, sometimes, I cannot begin to guess what on Earth the Pentateuch is for a given day’s Spelling Bee. At the time I’m writing this, the letter files are X, E, O, I, F, L, and N, with N being the gold letter. Short of British spellings like “inflexion” (which the game does not accept), I have no idea what word they were aiming for.

Wouldn’t it be great if I had a program that could tell me? Maybe written in a language of my choosing for once, rather than whatever’s got the library I need?

So anyway, the answer is yes, and we’ll be doing it in Factor.

This is a very short and simple program that’s not itself interesting. My intent, rather, is to give a quick idea of how I approach this problem in an environment like Factor, where the program is running the whole time I’m working on it, and how much the rich library makes very short work of this kind of issue. If you want to follow along, this entire example fits comfortably inside Factor’s graphical REPL, which it calls the Listener, so it’s quite easy to run the code yourself. Just grab a nightly (under the “Development release” section), fire it up, and follow along.

First up, I know I’m going to need a list of words. Some quick googling and a words.txt later, and I’m ready to do the first step of loading the lines:

USING: io.encodings.utf8 io.files ;
"/Users/benjamin/Downloads/words.txt" utf8 file-lines

That’ll put an array of the words from the file on the stack. The USING: lists the libraries (“vocabularies”) we want to have available; I just need the UTF8 and file-reading vocabularies for this, the first for the utf8 word (which just says to read the file in UTF-8), and the second for file-lines (which is a handy way to yoink an entire file into RAM, broken up by lines).

I can see on quick inspection it’s got a lot of garbage in it I don’t want: proper nouns, contractions, words with numbers in them (what?). Since Factor’s concatenative, I can just do some more operations on this unnamed value on the stack, kind of like if I were building up a pipeline in a shell.

First, let’s get rid of the short words:

USE: sequences
[ length 4 >= ] filter

filter preserves elements in a collection that obey a given predicate, which in this case is the anonymous function (“quotation”) [ length 4 >= ], which will simply check if a string is at least four characters long.

Next, let’s nuke anything that is a proper noun or has characters like ' in it that aren’t letters:

USE: unicode
[ [ letter? ] all? ] filter

Okay, much better. letter?, from the unicode vocabulary, ensures that a character is a lowercase unicode letter. (Letter? would allow uppercase, and LETTER?, the reverse of letter?, would require it.) The all? ensures that every character in the string obeys letter?. After we run this filter, we can easily see in the REPL that the list of candidates is now entirely reasonable.

Now, one of the reasons I love Factor for this kind of problem is that it has an amazingly rich collection vocabulary, including a deep pile of set operations. On top of that, any linear collection (“sequence”) can function as a set. And guess what strings are? They’re just a sequence of characters. Sure, it won’t be efficient, but for now, I’m just playing around. All I really need to do is find any words in that word list that are subsets of a string containing all the tiles.

Well, that’s almost too easy:

USE: sets
[ "xeoifln" subset? ] filter

And boom, there’s my list. Well, almost; since N is a gold tile, I need words that actually involve N.

[ CHAR: n swap in? ] filter

And I probably should sort these from longest to shortest, since long words have the most points:

[ length ] inv-sort-by

And bingo, we’ve got our answer: inflexion. Which unfortunately was a word I already tried that it rejected. But at least, next time, when it’s not looking for some weird and obscure word, I can find it.^[1]

But of course, if I want to reuse all of this, it’ll be annoying to enter these commands one-by-one on the REPL. Thankfully, since Factor’s concatenative, I can almost literally paste the entire thing into a source file and be done with it. Here’s what that might look like:

USING: kernel combinators.short-circuit io.encodings.utf8 io.files sequences sets unicode ;
IN: spelling-bee

! Words that help us filter the word list down
: word-length-okay? ( string -- f ) length 4 >= ;
: word-letters-okay? ( string -- f ) [ letter? ] all?
: word-okay? ( string -- f ) { [ word-length-okay? ] [ word-letters-okay? ] } 1&& ;

! A word to actually load and filter the word list
: word-list ( -- seq ) "/Users/benjamin/Downloads/words.txt" utf8 file-lines [ word-okay? ] filter ;

! And finally, two words to find candidate words
: candidate-words ( string -- seq ) '[ _ subset? ] word-list swap filter ;
: best-candidate-words ( string -- seq ) candidate-words [ length ] inv-sort-by ;

If you look back at what we did in the REPL, you can see that all I’m doing is naming the various steps we did, and then executing them in the exact same way.^[2] Even though I don’t have tests yet, I have extremely high confidence this works, because I just interactively built it up. And if I now want to optimize this (because its performance is awful, though “awful” here just means “it takes 250ms on my machine”, so not too bad), I can easily add some tests and begin refactoring in a similar manner. In fact, after I wrote this post, I used bit-set, which is a…well, a bit set, to improve the execution speed by a factor of about 20, such that it only takes about 20ms to give me the solution. (I also added a word that finds only Pentathalons or whatever they call them, which simply swaps out subset? for set= in candidate-words.) Not too bad for a couple minutes of work.

Hopefully, that gives some insight into what actually working in Factor feels like. It’s not unique in a strict sense, since Smalltalk and Common Lisp strongly encourage the same development style. But it’s definitely a very different approach from any mainstream language I’ve used. Even in environments that do have a rich REPL, such as JavaScript, it’s not nearly so easy to migrate your test examples out of the REPL and into a source code file for further refinement; you often end up doing a lot more massaging than we did here, where we basically just named the steps.

If you’re interested in poking more, you’ll find that Factor has a very rich set of documentation, and that our Discord community is very welcoming. Stop on by, and I’ll be happy to help you out.

It actually turned out that they didn’t accept inflexion, but did accept flexion. This is an outrage, and they will be hearing from my attorney. ↩︎
I confess I tossed a 1&& in. Without getting into how it works, it’s just joining the two conditions we did earlier so we don’t need to run filter twice, once for length and once for letters. ↩︎

Introducing Hayom

August 18, 2022 8:22 pm

For quite some time, I’ve had an appreciation for text-based tooling. Not (necessarily) for terminal-based tooling, mind—there are some meaningful benefits to using a GUI, after all—but for solutions that truly think of plaintext as their source of truth. To that end, I’ve been using a nice Python tool called jrnl for years, which makes maintaining a pure text journal really easy. All jrnl really does is to automate maintaining a simple text file in a straightforward way, and providing a few very simple ways to query its contents. And it’s completely happy to use whatever editor you want, so I can go ahead and write my posts in a nice graphical Windows- or macOS-native Markdown editor and not think twice about it.

The thing is, I’ve never been entirely happy with jrnl. For reasons that have nothing to do with jrnl proper, and a lot to do with the Python ecosystem, I routinely ran into issues where an upgrade to brew, or Python, or even just a weird collision of some pyenv with my system, would mean that jrnl quit working until I had time to figure out what had gone wrong. Tools like pipx made the pain minimal, but it still meant that I wasn’t always capturing journal articles.

Well, the nice thing about tools that use plain text as their file format is that they’re pretty easy to replace, so I’m happy to announce hayom, a pure-TypeScript, Deno-based replacement for jrnl. For now, it supports basically the same features (albeit with a more consistent, less surprising command-line argument format), and it’s entirely possible to point jrnl and hayom at the same journal file and get useful results.

If you’ve already got deno installed, you can run hayom immediately with a simple

deno run --allow-env --allow-read --allow-write --allow-run \
  https://git.sr.ht/~bmp/hayom/blob/main/cli.ts

or install it via

deno install -n hayom \
  --allow-env --allow-read --allow-write --allow-run \
  https://git.sr.ht/~bmp/hayom/blob/main/cli.ts

and start writing your journal.

In the future, there are several key improvements I want to make, including writing a lightweight web view with search, providing transparent Git and/or Mercurial syncing, and supporting images, but it’s already very useful to me as-is, and I figured it was time to share. So if you want a really easy-to-understand plaintext journaling solution, give hayom a try.

Learning Writing and Coding from a Con Artist

January 05, 2022 3:03 pm

The best teacher I ever had on how to write and how to code was a complete charlatan hack who conned his way into Duke’s English department.

No wait, hear me out: the prof (let’s call him Matt, because I’m not even entirely sure he gave us his real name) was an awful professor in most respects. He didn’t grade anything, I’m dubious he had any teaching credentials in the first place, he often didn’t even bother showing up to class at all, and, while I’m about 95% sure he had some college degree, I’m extremely skeptical it was in English, or that it came from Harvard. Due to his expertise in con artistry, my best guess would be General Studies accompanied by a law degree.

But! Matt could write. And more than that, he made us write. A lot. I wrote well north of 200 pages in that semester just for his class alone. And a funny thing happens when you write that much: you learn how to internalize “write drunk, edit sober” in a way that can be trivially practiced when all you’ve had is coffee and a bagel at 8am and the paper’s due at 12.

The basic process that Matt drilled into me, which I do still use: first, just get stuff down into your text editor of choice, not taking any time to edit whatsoever. It can be crap; that’s fine. If the whole thing is crap, throw the whole thing out and just start over, because you didn’t spend any real time on it anyway.

When it’s at least 50% crap or less, start your structural editing pass. To do that, you begin by writing out a sentence off to the side of what you want your paper to achieve—for example, “convince the team to use automated test environments.” (Or in Matt’s case, “convince the class that me not showing up for class half the time is due to being bedridden by illness, despite the fact that at least two of my students saw me not even fifteen minutes ago eating a bagel in the cafeteria.”) Then, go through paragraph by paragraph, and make sure that each paragraph relates back in some way to that topic. If it doesn’t, just kill the entire thing. If it does, then leave it.

Once you’ve done that per paragraph, go through each paragraph’s individual sentences. The first sentence (or occasionally, the second) should dictate everything in that paragraph. If you find sentences that don’t relate, either kill them, or consider putting them in their own paragraph—provided, of course, that the new paragraph would tie back to the overarching theme.

Next, read the entire paper out loud. “Out loud” is key: writing does have a different cadence than speech, but it’s still ultimately a language, and language is verbal. It can sound highfalutin’, if that’s appropriate for the audience, but it should still sound correct to your ear. It may be hard to resist doing at least some grammar and punctuation tweaks as you do this, but your main focus needs to be on the flow, and on whether you actually made your argument. Flow is easiest: you presumably nuked sentences and whole paragraphs; some of the paragraphs are probably no longer in the right place. You’re in a word processor. Move them. Likewise, you may now realize you need connecting paragraphs, or an extra persuasive point hits you during the read. Add those in.

Persuasiveness is trickier: try to approach what you wrote from the perspective of someone who knowledgeably disagrees passionately with your point. Can you drive a truck through your claims? Do any of your points have superficial counterpoints you didn’t address? There’s a trade-off between addressing every complaint, and refusing to even respond to trivial, obvious problems. Try to handle the big issues preemptively, but it’s okay to ignore obscure ones unless someone actually asks about them.

Finally—and last!—do a real grammar pass. Again, reading out loud is your best bet here: there are “rules” for punctuation, but, when you’re not doing the SAT or submitting something to a teacher who walks around with Strunk and White shoved into their back pocket, follow how you talk, not the rules. You may develop a love for the Oxford comma, or for semicolons, or for short paragraphs, or any of a dozen other things that are “incorrect”. Or, for that matter, you may develop a love for the comma splice, the run-on sentence, the gargantuan oil tanker of a paragraph that simply does not know when to shut up. But…it still needs to sound good, and it needs to be clear.

When you’re all done, you should have a tight document that makes its points clearly and tersely, but completely. It won’t overstay its welcome, but it won’t leave you grasping at straws, either.

One advantage of having a writing class from a bullshit artist is that Matt knew what he was doing. That process works. And it’s still my process.

The fun thing is, this process works for code, too. Too often, coders get stuck trying to think about the One True Way to Solve the Problem, and end up not writing any code until they have figured it out. Since it’s sometimes impossible to “figure it out” by just thinking and staring at a wall, they end up feeling burned out and unable to make any progress.

But you’re in a code editor, and likely one with powerful built-in refactoring tools that are far more capable and helpful than anything that exists for an aspiring writer. You likely can trivially extract functions and classes, rename variables, track callers and callees, and all the other things you’d need to do to refine your idea with just a few clicks and a minimal amount of typing. So, use it: take the exact same approach you’d take with English and just start writing.

Get something that works, no matter how silly and brittle. Don’t think too hard about it: don’t worry about it being DRY, or being as fast as possible, or properly encapsulated, or anything else. Just get something that works. If it’s garbage, throw it out and start over; you barely spent any time on it anyway. Once it’s mostly not garbage, begin a refactoring pass. Make sure that what the code is doing is clear, concise, efficient, and above all, makes sense to read—because you will forget where your head was at in six months when you inevitably have to do some maintenance on this code. That means not slavishly holding to things like DRY; instead, think carefully about whether a little bit of duplication actually helps with flexibility and readability. Sometimes, it really does. Sometimes, it’s very much time to bust out a new class or a higher-order function. Try to approach the decision from the perspective of someone reading the code who needs to fix prod urgently and has no idea how anything works: are they going to be able to get oriented quickly? Are there any pieces you’ve done where you’d be screaming “what the f—?” in that process? Make sure the answers are “yes” and “no,” respectively, to the best of your ability. When you’re done, you should have something that’s clean, efficient, and highly maintainable.

So there you have it: the best writing and coding instructor I had was a charlatan hack with dubious credentials. Sure, he should never have been allowed within ten feet of a classroom, but I’ll give him credit where it’s due: his lessons are some of the few that have actually stuck with me all these years, and I’m grateful I had them.

I See Deno in Your Future

November 23, 2021 1:26 am

Deno is a re-imagining of Node: still JavaScript for the server and command line, still based on V8, but with a drastically improved build story, simplified (hell, genuinely simple) dependencies, and a vastly improved standard library and web compatibility story. I’ve been using it on-and-off for hobby work for a couple of years now,^[1] and I’ve really enjoyed playing with it.

One especially unique feature of Deno is its security model. By default, Deno scripts aren’t allowed any dangerous access: not the file system, not the network, not environment variables, not even high-resolution timers. Basically, they need to be hermetically sealed scripts, or be explicitly granted permissions by the user to do anything. The upshot is that you can blindly run a script (e.g. the official welcome script, via deno run https://deno.land/std/examples/welcome.ts) safe in the knowledge you can’t hose your computer.

For awhile, I’d had an idea that I’d port some of my personal programs such that I could simply deno run them right off my GitHub account, rather than installing them. In practice, that proved a bit tricky: Deno’s APIs for reading local files (e.g. Deno.readFileSync) were different from reading remote files (via fetch), so handling a script running both locally and remotely, if it required external resources, ended up being a bit of a pain and require varying amounts of conditional branching. Not a deal-breaker in a strict sense, but it took enough fun away I didn’t bother.

But I was happy to discover that Deno 1.16 actually added file:// URL support to fetch. That means that fetch(new URL("./file.txt", import.meta.url)) will work both when run locally and when run remotely. I gave this a shot in the silliest way imaginable, and, well…feel free to enjoy my Deno port of fortune, Dortune. Sure, you can clone and run it locally, but you can also do deno run --allow-net https://git.sr.ht/~bmp/dortune/blob/main/dortune.ts and enjoy the exact same code working remotely without installing anything.

Granted, this particular example is fairly ridiculous, but I’m honestly quite excited about having a suite of personal utilities I can keep up-to-date transparently and that don’t care where they run.

At least, when I’m not playing with Factor. ↩︎

The Deprecated *nix API

May 20, 2020 7:31 pm

I realized the other day that, while I do almost all of my development “in *nix”, I don’t actually meaningfully program in what I traditionally have thought of as “*nix” anymore. And, if things like Hacker News, Lobsters, and random dotfiles I come across on GitHub are any indication, then there are many developers like me.

“I work on *nix” can mean a lot of very different things, depending on who you ask. To some, it honestly just means they’re on the command line: being in cmd.exe on Windows, despite utterly different (and not necessarily inferior!) semantics, might qualify. To others, it means a rigid adherence to POSIX, even if GNU’s incompatible variants might rule the day on the most common Linux distros. To others, it truly means working on an actual, honest-to-goodness Unix derivative, such as some of the BSDs—or perhaps a SunOS or Solaris derivative, like OpenIndiana.

To me, historically, it’s meant that I build on top of the tooling that Unix provides. Even if I’m on Windows, I might be developing “in *nix” as long as I’m using sed, awk, shell scripts, and so on, to get what I need to do done. The fact I’m on Windows doesn’t necessarily matter; what matters is the underlying tooling.

But the other day, I realized that I’ve replaced virtually all of the traditional tooling. I don’t use find; I use fd. I don’t use sed; I use sd. du is gone for dust, bash for fish, vim for kakoune, screen for tmux, and so on. Even the venerable grep and awk are replaced by not one, but two tools, and not in a one-for-one: depending on my ultimate goal, ripgrep and angle-grinder replace either or both tools, sometimes in concert, and sometimes alone.

I’m not particularly interested in a discussion on whether these tools are “better”; they work better for me, so I use them. Based on what I see on GitHub, enough other people feel similarly that all of these incompatible variations on a theme must be heavily used.

My concern is that, in that context, I think the meaning of “I write in *nix” is starting to blur a bit. The API for Windows is defined in terms of C (or perhaps C++, if you squint). For Linux, it’s syscalls. For macOS, some combo of C and Objective-C. But for “*nix”, without any clarifying context, I for one think in terms of shell scripts and their utilities. And the problem is that my own naïve scripts, despite being written on a legit *nix variant, simply will not run on a vanilla Linux, macOS, or *BSD installation. They certainly can—I can install fish, and sd, and ripgrep, and whatever else I’m using, very easily—but those tools aren’t available out-of-the-box, any more than, I dunno, the PowerShell 6 for Linux is. (Or MinGW is for Windows, to turn that around.) It amounts to a gradual ad-hoc breakage of the traditional ad-hoc “*nix” API, in favor of my own, custom, bespoke variant.

I think, in many ways, what we’re seeing is a good thing. sed, awk, and the other traditional tools all have (let’s be honest) major failings. There’s a reason that awk, despite recent protestations, was legitimately replaced by Perl. (At least, until people forgot why that happened in the first place.) But I do worry about the API split, and our poor ability to handle it. Microsoft, the paragon of backwards compatibility, has failed repeatedly to actually ensure that compatibility, even when armed with much richer metadata than vague, non-version-pinned plain-text shell-scripts calling ad-hoc, non-standard tooling. If we all go to our own variants of traditional Unix utilities, I worry that none of my scripts will meaningfully run in a decade.

Or maybe they will. Maybe my specific preferred forks of Unix utilities rule the day and all of my scripts will go through unscathed.

When class-based React beats Hooks

December 23, 2019 4:07 pm

As much as I love exploring and using weird tech for personal projects, I’m actually very conservative when it comes to using new tech in production. Yet I was an immediate, strong proponent of React Hooks the second they came out. Before Hooks, React really had two fundamentally different ways to write components: class-based, with arbitrary amounts of state; or pure components, done as simple functions, with zero state. That could be fine, but the absolutely rigid split between the two was a problem: even an almost entirely pure component that had merely one little tiny bit of persistent state—you know, rare stuff like a checkbox—meant you had to use the heavyweight class-based component paradigm. So in most projects, after awhile, pretty much everyone just defaulted to class-based components. Why go the lightweight route if you know you’ll have to rewrite it in the end, anyway?

Hooks promised a way out that was deeply enticing: functional components could now be the default, and state could be cleanly added to them as-needed, without rewriting them in a class-based style. From a purist perspective, this was awesome, because JavaScript profoundly does not really want to have classes; and form a maintenance perspective, this meant we could shift functional-components—which are much easier to test and debug than components with complex state, and honestly quite common—back to the forefront, without having the threat of a full rewrite dangling over our heads.

I was able to convince my coworkers at Bakpax to adopt Hooks very quickly, and we used them successfully in the new, much richer content model that we launched a month ago. But from the get-go, one hook made me nervous: useReducer. It somehow felt incredibly heavyweight, like Redux was trying to creep into the app. It seemed to me like a tacit admission that Hooks couldn’t handle everything.

The thing is, useReducer is actually awesome: the reducer can easily be stored outside the component and even dependency-injected, giving you a great way to centralize all state transforms in a testable way, while the component itself stays pure. Complex state for complex components became simple, and actually fit into Hooks just fine. After some experimentation, small state in display components could be a useState or two, while complex state in state-only components could be useReducer, and everyone went home happy. I’d been entirely wrong to be afraid of it.

No, it was useEffect that should’ve frightened me.

A `goto` for React

If you walk into React Hooks with the expectation that Hooks must fully replace all use cases of class-based components, then you hit a problem. React’s class-based components can respond to life-cycle events—such as being mounted, being unmounted, and getting new props—that are necessary to implement certain behaviors, such as altering global values (e.g., history.pushState, or window.scrollTo), in a reasonable way. React Hooks, out-of-the-box, would seem to forbid that, specifically because they try to get very close to making state-based components look like pure components, where any effects would be entirely local.

For that reason, Hooks also provides an odd-one-out hook, called useEffect. useEffect gets around Hooks limitations by basically giving you a way to execute arbitrary code in your functional component whenever you want: every render, every so many milliseconds, on mounts, on prop updates, whatever. Congratulations: you’re back to full class-based power.

The problem is that, just seeing that a component has a useEffect^[1] gives you no idea what it’s trying to do. Is the effect going to be local, or global? Is it responding to a life-cycle event, such as a component mount or unmount, or is it “merely” escaping Hooks for a brief second to run a network request or the like? This information was a lot easier to quickly reason about in class-based components, even if only by inference: seeing componentWillReceiveProps and componentWillMount get overrides, but componentWillUnmount left alone, gives me a really good idea that the component is just memoizing something, rather than mutating global state.

That’s a lot trickier to quickly infer with useEffect: you really need to check everything listed in its dependency list, see what those values are doing, and track it up recursively, to come up with your own answer of what life-cycle events useEffect is actually handling. And this can be error-prone not only on the read, but also on the write: since you, not React, supply the dependency chain, it’s extremely easy to omit a variable that you actually want to depend on, or to list one you don’t care about. As a result, you get a component that either doesn’t fire enough, or fires way too often. And figuring out why can sometimes be an exercise in frustration: sure, you can put in a breakpoint, but even then, just trying to grok which dependency has actually changed from React’s perspective can be enormously error-pone in a language where both value identity and pointer identity apply in different contexts.

I suspect that the React team intended useEffect to only serve as the foundation for higher-level Hooks, with things like useMemo or useCallback serving as examples of higher-level Hooks. And those higher-level Hooks will I think be fine, once there’s a standard collection of them, because I’ll know that I can just grep for, I dunno, useHistory to figure out why the pushState has gone wonky. But as things stand today, the anemic collection of useEffect-based hooks in React proper means that reaching for useEffect directly is all too common in real-world React projects I’ve seen—and when useEffect is used used in the raw, in a component, in place of explicit life-cycle events? At the end of the day, it just doesn’t feel worth it.

The compromise (for now)

What we’ve ended up doing at Bakpax is pretty straightforward: Hooks are great. Use them when it makes sense. Even complex state can stay in Hooks via useReducer. But the second we genuinely need to start dealing with life-cycle events, we go back to a class-based component. That means, in general, anything that talks to the network, has timers, plays with React Portals, or alters global variables ends up being class-based, but it can in certain places even bring certain animation effects or the like back to the class-based model. We do still have plenty of hooks in new code, but this compromise has resulted in quite a few components either staying class-based, or even migrating to a class-based design, and I feel as if it’s improved readability.

I’m a bit torn on what I really want to see going forward. In theory, simply shipping a lot more example hooks based on useEffect, whether as an official third-party library list or as an official package from the React team, would probably allow us to avoid more of our class-based components. But I also wonder if the problem is really that Hooks simply should not be the only abstraction in React for state. It’s entirely possible that class-based components, with their explicit life-cycle, simply work better than useEffect for certain classes of problems, and that Hooks trying to cover both cases is a misstep.

At any rate, for the moment, class-based components are going to continue to have a place when I write React, and Bakpax allowing both to live side-by-side in our codebase seems like the best path forward for now.

And its sibling, useLayoutEffect. ↩︎

Falsehoods Programmers Believe About Cats

July 24, 2019 12:43 noon

Personal

Inspired by Falsehoods Programmers Believe About Dogs, I thought it would be great to offer you falsehoods programmers believe about mankind’s other best friend. But since I don’t know what that is, here’s instead a version about cats.

Cats would never eat your face.
Cats would never eat your face while you were alive.^[1]
Okay, cats would sometimes eat your face while you’re alive, but my cat absolutely would not.
Okay, fine. At least I will never run out of cat food.
You’re kidding me.
There will be a time when your cat knows enough not to vomit on your computer.
There will be a time when your cat cares enough not to vomit on your computer.
At the very least, if your cat begins to vomit on your computer and you try to move it to another location, your cat will allow you to do so.
When your cat refuses to move, it will at least not manage to claw your arm surprisingly severely while actively vomiting.
Okay, but at least they won’t attempt to chew the power cord while vomiting and clawing your hand, resulting in both of you getting an electric shock.
…how the hell are you even alive?^[2]
Cats enjoy belly rubs.
Some cats enjoy belly rubs.
Cats reliably enjoy being petted.
Cats will reliably tell you when they no longer enjoying being petted.
Cats who trust their owners will leave suddenly when they’re done being petted, but at least never cause you massive blood loss.
Given all of the above, you should never adopt cats.
You are insane.

Happy ten years in your forever home, my two scruffy kitties. Here’s to ten more.

Here, ask Dewey, he knows more about it than I do. ↩︎
Because, while my cat has absolutely eaten through a power cord, this is an exaggeration. The getting scratched while trying to get my cat not to puke on a computer I was actively using happened at a different time from the power cord incident. Although this doesn’t answer the question how she is alive. ↩︎

The Death of Edge

December 08, 2018 1:16 am

Technology

Edge is dead. Yes, its shell will continue, but its rendering engine is dead, which throws Edge into the also-ran pile of WebKit/Blink wrappers. And no, I’m not thrilled. Ignoring anything else, I think EdgeHTML was a solid rendering engine, and I wish it had survived because I do believe diversity is good for the web. But I’m not nearly as upset as lots of other pundits I’m seeing, and I was trying to figure out why.

I think it’s because the other pundits are lamenting the death of some sort of utopia that never existed, whereas I’m looking at the diversity that actually exists in practice.

The people upset about Edge’s death, in general, are upset because they have this idea that the web is (at least in theory) a utopia, where anyone could write a web browser that conformed to the specs and (again, theoretically) dethrone the dominant engine. They know this hasn’t existed de facto for at least some time–the specs that now exist for the web are so complicated that only Mozilla, with literally hundreds of millions of dollars of donations, can meaningfully compete with Google–but it’s at least theoretically possible. The death of Edge means one less browser engine to push back against Chrome, and one more nail in the coffin of that not-ever-quite-here utopia.

Thing is, that’s the wrong dynamic.

The dynamic isn’t Gecko v. EdgeHTML v. Blink v. WebKit. It’s any engine v. native. That’s it. The rendering engine wars are largely over: while I hope that Gecko survives, and I do use Firefox as my daily driver, that’s largely irrelevant; Gecko has lost by at least as much as Mac OS Classic ever lost. What does matter is that most people access the web via mobile apps now. It’s not about whether you like that, or whether I like that, or whether it’s the ideal situation; that’s irrelevant. The simple fact is, most people use the web through apps, period. In that world, Gecko v. Blink v. WebKit is an implementation detail; what matters is the quality of mobile app you ship.

And in that world, the battle’s not over. Google agrees. You know how I know? Because they’re throwing a tremendous amount of effort at Flutter, which is basically a proprietary version of Electron that doesn’t even do desktop apps.^[1] That only makes sense if you’re looking past the rendering engine wars–and if already you control effectively all rendering engines, then that fight only matters if you think the rendering engine wars are already passé.

So EdgeHTML’s death is sad, but the counterbalance isn’t Gecko; it’s Cocoa Touch. And on that front, there’s still plenty of diversity. Here’s to the fight.

Yeah, I know there’s an effort to make Flutter work on desktops. I also know that effort isn’t driven by Google, though. ↩︎

Messages, Google Chat, and Signal

April 26, 2018 4:46 pm

Technology

Google is about to try, yet again, to compete with iMessages, this time by supporting RCS (the successor to SMS/MMS) in their native texting app. As in their previous attempts, their solution isn’t end-to-end encrypted—because honestly, with their business model, how could it be? And as with Google’s previous attempts to unseat a proprietary Apple technology, I’m sure they’ll tout openness: they’ll say that this is a carrier standard while iMessages isn’t, and attempt to use that to put pressure on Apple to support it—never mind the inferior security and privacy that make the open standard a woefully…erm, substandard choice.

So here’s my suggestion to Apple: you’ve got a good story going on right now that you have the more secure, more privacy-conscious platform. If you want to shut down Google’s iMessages competitors once and good, while simultaneously advancing your privacy story for your own customers, why not have iMessages use Signal when the recipient doesn’t have an iOS device? Existing Apple users would be unaffected, and could still leverage the full suite of iMessages features they’re used to. Meanwhile, Android customers on WhatsApp or Signal would suddenly have secure communication with their iOS brethren, not only helping protect Android users, but also helping protect your own iOS users. And you’d be doing all of this while simultaneously robbing Google of the kind of deep data harvesting that they find so valuable.

I doubt Apple will actually do this in iOS 12, but it’d be amazingly wonderful to see: a simultaneous business win for them, and a privacy win for both iOS and Android users. I’ll keep my fingers crossed.