For the past week, I have felt a wave of relief that we shipped Kiln Harmony, the first DVCS-agnostic source control system. Kiln Harmony’s translation engine ruled my life for the better part of a year, and, as the technical blog series is revealing, probably took some of my sanity with it. But we’ve received nearly universally positive feedback, and built a product that I myself love to use, so I can’t help but feel the project was an incredible success.
A success that started with me half-screaming “that’s impossible” across a table.
I like to think of myself as fairly open-minded, but I found those words flying out of my mouth almost before I’d had a chance to think about them.
Joel and I were sitting in his office in January 2012, discussing adding Git support to Kiln. As much as I loved Mercurial, and as happy as we both were that Kiln had been built on top of it, we both also agreed that the time had come to add Git support as well. The support I wanted to build was simple and straightforward: just like every other service that offers both Git and Mercurial, we’d make you pick one of the two at repository creation. To make it easy to change your mind later, I also proposed a one-click method to switch a repository to the alternative DVCS, but any given repository would ultimately be either Git or Mercurial—not both.
Joel’s idea was fundamentally different: he wanted anyone to be able to use either system on any repository at any time. And that’s why I found myself screaming “that’s impossible.”
Developers expect that pushing data to Kiln and cloning it back out again gives them the same repository. To do that, we’d have to round-trip a steady state between Mercurial and Git. No conversion tool at the time could do that, and for a very good reason: while Mercurial and Git are nearly isomorphic, they have some fundamental differences under the hood that were intractable. There are concepts that exist only in one of the two systems, and therefore cannot possibly be translated in a meaningful fashion into the other. Unless you’re the magic data fairy, you’re going to lose that information while round-tripping. In addition, since some of these data are tightly tied to their tool’s preferred workflow, you’re also going to severely hamper day-to-day DVCS operations in the process.
In other words, if we tried to build what Joel wanted, we’d end up with the world’s first lossy version control system, supporting only the narrow band of functionality common between Git and Mercurial. Talk about products that are dead on arrival. I wanted no part.
I ended up going back to my office after that meeting incredibly annoyed. Joel just didn’t understand the internals of the two tools well enough. If he did, he’d agree with me. I’d therefore draft up a paper explaining exactly why you could not possibly build a tool like what he was proposing.
Except I ended up doing quite the opposite.
Over and over, I’d pick a topic that I knew was fundamentally intractable—say, handling Git- and Mercurial-style tags at the same time—and bang out a few paragraphs going into the technical minutiae of why it was impossible. Then I’d read back over my paragraph, and realize there were gaps in my logic. So I’d fill gap after gap, until, finally, I’d written a detailed how-to manual for handling exactly the thing that I originally said you couldn’t.
The impossible things fell down one by one. I designed a scheme for Mercurial named branches that involved non-fast-forward merges on the Git side for lossless preservation.1 I proposed extending Mercurial’s pushkey system in a way to handle annotated tags. I developed a workflow that could handle the competing branching strategies with minimal user confusion. I ended up going home at nearly 8 PM, only to keep writing.
The next morning, I came into the office not with a document explaining why we couldn’t make Kiln repositories simultaneously Git and Mercurial, but rather, a document explaining exactly how to do so.
That’s very unlikely!
At the time, I was still serving as Kiln’s shit-umbrella more than a developer, so I asked Kevin, one of the Kiln team’s best developers, to write me a prototype based on the white paper, thus freeing me up for the vastly more sexy task of merging 15,000 SQL Server databases into one so our backup strategy would finally be viable. (Hey, I was a good shit umbrella.) He put together a surprisingly robust prototype using ideas from the white-paper, extending them nicely in the process, thus indicating pretty strongly that my ideas weren’t all that bad.
Based on Kevin’s prototype, I proposed a new strategy: we would, indeed, make a version of Kiln that had Git and Mercurial “either/or” support, and we’d still aim to have that project shippable by summer on roughly the original schedule. That would be our safety option. Meanwhile, a parallel effort, dubbed “Doublespeak,” would launch to make repositories DVCS-agnostic. If Doublespeak ended in failure, we’d still have a new Kiln version to show in summer with Git support. If, on the other hand, it were promising, we’d delay the launch long enough to ship the more ambitious vision of Kiln with Doublespeak.
By a stroke of luck, things worked out such that I could swap my team-lead duties for developer duties just as Doublespeak development kicked off in earnest, and that’s how I ended up as the lead engineer on the translation engine.
The first thing I did was throw out the prototype. Dogma or no, I realized in my first day of hacking that the original design, by definition, would not perform as well as we needed it to. Its architecture would also have required me to spend a considerable amount of time modifying Git’s internals, which was not appealing to me.2
I opted for a new design that would directly leverage Mercurial’s code base and a Python Git implementation called Dulwich. I reimplemented the prototype in a day or two. Then I began the long sledge through gnarly parts of the white-paper: managing ref/bookmark translation and movement. Handling tags. Handling octopus merges. Handling Mercurial’s concepts of filenode parents and linkrevs. As the Kiln “either/or” project wrapped up, more developers started joining me on the translation layer to fill in the increasingly esoteric gaps.
But it’ll still never actually fly!
It wasn’t long before we had a nearly complete solution that was ready for dogfooding. Unfortunately, almost as soon as we began using the new magic, we hit two major problems that threatened to scuttle the project.
The first was that the entire engine was just too slow, limited almost entirely by how much disk I/O Doublespeak had to do to get its job done. This was already brutal on Fog Creek’s private Kiln instance; our ops team was having nightmares about what the disk damage would look like on Kiln On Demand. Thus began months of work to try to get our disk access as minimal as possible. The general mantra was to read any given revision at most once when converting—and, whenever possible, not at all. We introduced caches, memoization, and more. At the beginning, I was landing order-of-magnitude performance improvements daily. By the end, we’d optimized the code base so much that dinky little 2% performance improvements were frequently the result of days of work. But we had the system performing at a level we could actually use.
The second problem we hit was that, while we had lots of experience with how Mercurial repositories were supposed to look, and how Git data was supposed to look, we grossly underestimated how much real-world variation there’d be in Mercurial and Git repositories. The Kiln team spent weeks adding more and more “corrupt” data preservation logic to Doublespeak before it could handle real-world repositories like Mercurial itself. But we ultimately got to a place where nearly every repository we threw at the thing losslessly round-tripped.3
But we tackled both of these challenges. And soon, dogfooding became alpha, alpha became beta, the Doublespeak code-name became the much less Orwellian “Kiln Harmony,” and, last Tuesday, Kiln Harmony shipped.
That was supposed to be impossible!
I don’t think Joel was prescient and knew Kiln Harmony was doable, and I certainly don’t think he knew how to do everything the white-paper explained before I wrote it. But I definitely believe he knew that pushing me as hard as he did would force me to find a way if there were one to be found.
In case it’s not clear, I’m very glad he did.
This strategy ended up being too clever by half, so we dropped it for a less-perfect but less-surprising solution, but I was very excited to realize it was possible at all. ↩
I honestly couldn’t care less whether you like Mercurial or Git at the end of the day, but I think it’s objectively obvious that Mercurial is far easier to hack and extend than Git. Its code base is DRY, orthogonal, written in Python, and (in the sense of executable count) monolithic. Therefore, solutions where the hacking could be more in Mercurial, and less in Git, were highly attractive. ↩
Annoyingly, we somehow spaced ever checking the Linux kernel, so of course that was one of the first repositories someone fed Kiln Harmony on launch day, and it crashed the translation engine. Thankfully, while there’s lots of user features headlining Kiln Harmony, one of the features the developers are most excited about is that we are finally in a place where Kiln can be continuously deployed. Open-source may make all bugs shallow, but continuous deployment makes all bugs short-lived. ↩
I always travel ready to get stuck and be forced to work remotely. My tool of choice for that varies, but has recently been a third-generation iPad armed with my Nokia 800’s old folding keyboard, PocketCloud, and Prompt. With these four simple tools, plus Azure and AWS in a pinch, I can pretty easily get a good day’s work done anywhere. So when I got stuck in Los Angeles this past Saturday, I wasn’t worried: I knew I’d still be able to help Fog Creek get stuff done.
You know what an iPad, PocketCloud, and Prompt do not help with?
I’ve been at Fog Creek seven years. I’ve worked through emergencies before: I’ve been there moving systems around, rebuilding databases, getting emergency code fixes out to work around infrastructure problems. I’ve even written code on an airplane to fix a bug in the account registration system on Thanksgiving, pushing it out right after we landed. When stuff breaks, I’m there.
But this time, there was nothing I could do to help. With the exception of providing some very minor assistance with the shutdown and power-up when we thought the datacenter death was immanent, all I could do was to sit and watch.
I tried to think of something to do, anything, to make up for my absence. I spooled up machines on AWS and Azure so I could…write code if I had to, I guess? And nabbed copies of the deployment system for…some reason. I wanted to do something to help out, and was, briefly, being a noisy jerk in the chat room trying desperately to find something to do.
One thing I’ve come to realize is that, sometimes, the best thing to do is to shut up and stay out.
I love that at Fog Creek, everyone I work with is bright, focused, and eager to build amazing stuff. But that means that my individual absence, while not ideal, is not going to make or break anything, and right now, the help Fog Creek needs is 100% people on the ground. The best way I can help from LA is to quietly monitor the chat room, pipe up if I know a specific answer that no one else currently available knows, and otherwise, keep to myself.
The annoying thing to me is that I’ve been in the reverse situation before, many times: trying to get a system back online and answering every five minutes if it’s back yet, or whether someone can help, is ludicrously frustrating. But it was incredibly hard to recognize that I was doing the same thing when my role was reversed. I was so used to being able to help that it took awhile for me to genuinely understand that I couldn’t.
The next time this happens, I’m going to follow a strict check-list before interjecting:
- Make sure I try to understand the full problem first. That includes not asking, “what’s the situation with ABC, and have you tried XYZ?” until I’ve read the full chat logs (if applicable) or talked to someone on a private channel or face-to-face who is clearly currently idle, and therefore interruptible.
- Evaluate whether I have any relevant skills or expertise to help. Is the team trying to figure out how to quickly ship 10 TB of data to S3? Unless I actually have real experience trying to accomplish that, I’ll be quiet. Anyone can google the random comment I saw on Serverfault if that actually becomes relevant.
- Even when I do have relevant experience, if someone is already providing accurate, relevant information, I should be quiet. The random guy on the other team describing the migration process for PostgreSQL may not have as much experience with it as I do, but if he’s right, no one needs to hear me validate it. If they’re actually unsure if the information they’re getting is accurate, they will ask for confirmation from someone else, and I’ll provide it.
- Evaluate whether what I’m about to say is actually productive. Even when I have experience at hand, if the comment I want to make isn’t going to move things forward, it’s useless. If something will take ten days to complete, I know a cool hack that’ll cut it to eight, and the actual problem is any solution has to be done in five hours, then I can keep it to myself. And finally,
- Do not 4chan the conversation. People stress out and need to blow off steam. When I need to do that, I’ll do it off-channel, to keep the main forum clean. When other people do it, I’ll ignore it, and not add fuel to the fire.
Instead of trying to help with the data center, I’m responding to public forums, working on marketing campaigns for a new product we’re releasing when this is all over, and doing some performance work on that same new product—basically, the stuff I’d be doing if there weren’t an emergency. And I’m also making sure I’m well-rested so that, whenever I actually manage to get back to Manhattan, I, too, can help out with the bucket brigade.
I’m annoyed it took a hurricane to teach me to stand back and not throw myself in an adrenaline-fueled craze of help when something goes wrong, but I’m happy it took.
To all my fellow Creekers, and the awesome folks from Stack Exchange and Squarespace who are helping out: you are all mad awesome people, and you deserve serious accolades. I’m sorry I can’t help, but I can’t think of any group of people I’d trust more than all of you to get this done. Good luck!
I remember when I was back in college looking for summer internships. It stank. No one gave me any meaningful guidance (very much including the college employment office). I had no idea how to organize the process. I had no central location to store contact information, or to easily make sure I’d done the next step. I just kind of had a stack of brochures and business cards on my desk that I tried to follow up on, and an Entourage calendar of any upcoming interviews I had. Basically, I was a mess.
Thankfully, Fog Creek’s resident organizational ninja and jack-of-all-trades, Liz, took time out of her day to help all of you college students manage your summer internship applications. So when you start applying to companies this fall, you should have a really easy time keeping everything organized in a single, easy-to-use spot.
Business of Software has long stood as a unique conference for me: while nearly every tech conference I attend focuses on the technological side of delivering a solution, Business of Software focuses on actually delivering the goods. How do you reach people? How do you know you’ve reached people? How, if you’ve reached people, do you turn that into profit so that you can keep making people’s lives better?
These are insanely important questions, and ones that are far too easily glossed over in the debate of what database software to use, or what language has the easiest hires, and that’s a huge part of why so few start-ups actually manage to get anywhere.
Business of Software, as a conference, is a world apart. I meet amazing developers at the conference, true. But the main people I meet are people who get stuff done: that forgotten part of the “Smart And Gets Things Done” duet, without which you end up with academic projects that, while amazing, and possibly theoretically world-changing, will never reach more than the three developers who programmed it.
As a rule, people who attend Business of Software both have brilliant ideas, and know how to package that idea into something people will actually pay for. Some idealistic part of me may wish that this last part weren’t necessary, but at the moment, it definitely is: you can invent a literal best-thing-in-the-world, but if you don’t know how to reach the people who want it, you will fade into obscurity.
Maybe it’s because of how different Business of Software feels from other conferences I’ve attended that I was caught so completely off-guard by the last speech of the day, delivered by Noah Kagan. But I’d like to believe that I’d be insanely offended hearing his speech at any conference.
Noah had a point. I think. I think it was that you need to do what you enjoy doing in order to succeed. The reason I’m unclear if that was his point is that I got so angry at a remark he made that I forgot most of the talk.
Noah’s speech was highly entertaining to the audience in large part due to being obscene. Partially in the classic sense—there was certainly no shortage of fucks, shits, and the like—but also in content. Calling out the thinnest person in the room? Uncomfortable. Calling out the heaviest person in the room? Inappropriate. Making the audience recite “Will you sleep with me” one word at a time? No swear words, but insanely bad taste.
But the coup, for me, was something he mentioned in passing, as an afterthought: he identified the Three “P”s of entrepreneurship:1
“Profits, people and…you can figure it out.”
[Muttering amongst the audience.]
“Women. People, profits, and women. Or men. Whatever. People, profits, and women.”
Are you saying women aren’t people? Or are you saying that the only reason you care about women is the sex? Because your phrasing has left you only those two options.
I’ve read about the sexism that’s happened at Ruby conferences, and thought, Yeah, but my communities would never do that. And I’ve read about the sexism that’s happened at overseas Microsoft venues, and thought, Yeah, but my communities would never do that.
But my community did just do that.
And no one seemed to care. At tonight’s mixer, I talked to at least a half dozen people about Noah’s talk, and the general sentiment was, “Eh, it was offensive, but it was funny. Who cares?”
I expect better than that from this conference. Partially because of the constituency, but also partly because, in the past, Fog Creek was directly associated with it. It isn’t now, and it hasn’t been for some time, and I want to make that very clear. But I still feel a professional attachment to Business of Software that was violated today.
It’s just not that hard, fellas. Obscene jokes are fine—in the right context. Anyone who knows me knows that I’ve got a strong Michael Scott joke pattern in me—in an appropriate context.
A professional conference is emphatically not an appropriate context.
I am so tired of hearing that there are no women in tech companies because there are no women in CS programs. You know why there are no women in CS programs? Because they see shit like this (yes, this conference is live-streamed) that contain men making an obscenely hostile environment for women. So you know what? Of course they opt to pursue some other discipline. I would, if I were them. Would you really want to be in a career where a keynote speaker at a major conference referred to your gender by its genitals, while other disciplines treated you like a person? Because I know I’ve got more respect for myself than that, as do 100% of my female friends.
If I were a woman, there might be a bitquabit, but I can promise you it wouldn’t be in tech if this were the norm for conference talks.
We need to fix this, immediately.
First, despite any association I might have with Business of Software, I’m going to treat them with the same attitude I’d treat any other conference that did this: admit you screwed up, apologize, and tell me how you’ll prevent this in the future. I think it’s extremely important that computing feel safe to everyone, regardless of their gender, and I think that Monday’s concluding talk completely failed to do so.
Second, I really want speakers, at all tech conferences, to start thinking of computing as a professional discipline, instead of a boy’s club. If you make comments like the ones I’m complaining about, you know what you’re doing? Well, remember when you were picked on in middle school and/or high school? Congratulations: you’re doing that now. To literally half the people on the planet. Ever felt like certain disciplines looked down on you because of your background? You’re doing the same. Ever felt you were pigeonholed because of how you look? Now it’s you who’s pigeonholing.
Third, I think all tech conferences should start behaving in a gender-friendly manner. The Geek Feminism Wiki has a great write-up of how to make conferences friendly to women that can be used as a ground level, but I won’t exactly complain if conferences the caliber of Business of Software manage to do even better.
We’re better than this. Act like it. Treat your fellow developers with the same respect you’d treat someone you met on the street: basic, common courtesy. It’s not exactly asking a lot, so quit making it seem so hard to attain.
I hate reading and writing posts like this. Please never give me, or anyone else, an excuse to do so again.
This is a very close paraphrase I verified with other attendees, but, because I don’t have a recording available, it is probably not a verbatim transcript. When the recording of this session goes live, I’ll update it with the exact transcript. ↩
I’ve been coding full-time for only a few weeks, and already I’m going somewhat insane by people engaging in what I’d call cargo-cult debugging. Cargo cults were religions that developed when primitive societies, who’d had little exposure to any technology, were suddenly confronted with top-of-the-line modernism in the form of World War II military machines. When the armies disappeared at the conclusion of festivities, they took all of their modern marvels with them. The locals, believing that they’d observed gods bringing them promised goods, attempted to make the gods provide more cargo by building crude imitations of what they had seen—bamboo airplanes, fake landing strips, wooden radar dishes, and the like—without really having any proper idea what any of the things they were building actually were.
Cargo-cult debugging is when, having seen effective debugging, you imitate the motions, but without having any actual clue what you’re doing or why.
Let’s take a look at some cargo-cult debugging in action.
Two problems, likely unrelated, but including both for completeness since they started at about the same time: car tilts to the left and slightly backwards, and there is a grinding noise coming from the rear-left wheel-well.
Engine belt is likely applying too much torque, resulting in car listing to one side.
- Tried turning on left blinker, since car tends to lean right during left-hand turns.
- Tried having extremely fat man sit in passenger’s seat and drive car via strings attached to steering wheel to balance the frame.
- Attempted to drive car backwards. (Note: visibility extremely hampered in this mode; file bug upstream with manufacturer.)
- Tried covering wheel-well with thick cloth to muffle sound, thereby evening out the frame.
- Tried driving with windows down to change airflow. (Note: test cut short due to sparks flying from rear-left wheel-well and a small resulting fire, which also prevented further testing.)
The structure of a good debugging session is there—the facts, the possible solutions, and a hypothesis—but even being very generous to the tester, their ideas and hypothesis make no sense. You’re going to solve the problem with exactly as much speed and precision as if you played Tetris for an hour and then changed five lines of code at random.
I’ve seen every developer engage in this behavior sometimes, and that includes me. Not only that; every developer I’ve seen who does this does it for one of two very closely related reasons:
- They do not understand the problem.
- They do understand the problem, but the most likely solution involves rewriting some very hard-to-understand code, or some code that they’re personally very attached to.
I’ve got a bug right now that’s right in between these two, and guess what? I’m guilty.
Our internal build of Kiln doesn’t run on 32-bit Windows 2003 systems, even though the 64-bit build works fine, and the 32-bit build works fine on all other Windows platforms we support; just not Windows 2003. The error message Windows generates is beyond unhelpful, and googling it indicates that the problem likely has to do with a bad manifest—something I barely know anything about. They go in resource forks, I modified one in Copilot once to get theming in Windows XP, and I seem to recall XML being involved. That’s it.
Here is the right way to address this bug:
- Learn more about manifests, so I know what a good one looks like.
- Take a look at the one we’re generating for Kiln; see if anything obvious screams out.
- If so, dive into the build system [blech] and have it fix up the manifest, or generate a better one, or whatever’s involved here. This part’s a second black box to me, since the Kiln Storage Service is just a py2exe executable, meaning that we might be hitting a bug in py2exe, not our build system.
- If not, burn a Microsoft support ticket so I can learn how to get some more debugging info out of the error message.
Here’s the first thing I actually did:
- Look at the executable using a dependency checker to see what DLLs it was using, then make sure they were present on Windows 2003.
This is not the behavior of a rational man. But it’s the behavior of a man who’s flirting with the edge of his knowledge and has no desire to screw with the build system: fixing DLL dependencies involves tweaking a line or two in
setup.py, which is way easier than learning a bunch of new stuff about manifests, diving into how py2exe makes its particular brand of sausage, and then patching it upstream or mucking about on the build server to add post- or pre-build steps as appropriate.
So here’s my call to action: do not engage in cargo-cult debugging. Whenever you’re about to try a debugging session, force yourself to answer the question, Why do I believe this is the most likely solution to the problem? If the answer is, “I have no idea,” or even worse, “It’s not, but it’s easier to check than this other more likely solution,” then keep looking. Is your app slow even though the CPU is basically idle? Increasing the size of the thread pool probably won’t help. Web service occasionally timing out? Your Redis server is unlikely to hold the answer. Occasionally getting your app’s network messages out-of-order? Attacking layer 4 of your networking stack with Wireshark is probably premature.
We developers pride ourselves on being “lazy” in the sense of “developing solutions that minimize the work we have to do.” If you engage in cargo-cult debugging, you’re being intellectually lazy, which is entirely different. Intellectual laziness turns you from a great developer into a mediocre one.
Be a great developer. Don’t engage in cargo-cult debugging.
Let’s set the scene. It’s the summer of 2010. Kiln had been launched into the wild for all of six months, after a grueling year-long, no-revenue sprint to turn my dinky prototype that ran only on my personal laptop into a shipping application that worked both in Fog Creek’s hosted environment and in a gazillion ever-so-slightly-different on-site installations. We’d had all of a few months actually charging people, and were only just barely making a month-to-month profit, let alone having a positive ROI. We were thinking about What Would Come Next, and how to deliver That, and What Would It Look Like, because everyone knows that standing still is death. And me? I was enjoying coding every day to turn the vision in my head into something that our customers could actually use.
But then it happened. I came into work one day, and Ben Kamens, the head of the Kiln and FogBugz teams, called me into his office to tell me that he’d decided to leave Fog Creek and join the Khan Academy.
I felt sick. Tyler, who’d help found Kiln, felt sick. We left the office in a daze and had a lovely lunch that consisted mostly of beer, then came back and did a half-assed job pretending we were doing work for the rest of the day before we went home.
Part of our despair was personal, but a lot of it was professional. Here’s the thing: you may not know what your team lead does, but if they’re awesome the way Ben was, a huge number of problems get solved without you ever hearing about them, which frees you to work on actually shipping amazing products. So his leaving meant that:
- No one would be shielding us from all of the problems peripheral to actually shipping new versions of Kiln; and
- We were going to be making fewer, crappier releases as a result.
Of course, it wouldn’t actually come to that; one of us was going to learn how to become a team lead so that (provided we were good at it) we’d be able to continue to shield the rest of the team from all that stuff. At the end of the day, I stepped up and took the job.
Six months later I’d be at the end of my rope. I hated my job. I felt like I wasn’t accomplishing anything, I felt like I wasn’t competent at what I was doing, and my attitude deteriorated until my fiancée came home one night to find me physically ill from stress.
The thing is that I didn’t stink at my job. In fact, both my team and the rest of the company have told me that I’ve been a great team lead. The problem I was suffering from is that the only job I’d known at Fog Creek up to that point had nothing to do with the job I suddenly found myself thrown into.
We have a general cultural issue when it comes to coders. It has to do with career paths. Here’s how your typical coder career path works:
- Hey, it looks like you can code! Why not try writing this feature over here? There’s the spec for it; just follow it and you’re golden.
- Whoa, neat! You do a great job writing exactly what I tell you! Here’s a problem customers want solved; can you do that?
- Awesome! That’s totally amazing. I heard you have an idea for something that’d be the bee’s knees, or at least one of its major joints; can you build it? Here’s a team who’ll work with you.
- Insane! Congratulations! You are an amazing coder. You have founded a product that shipped and is in the black! You know what we should totally do with you?
- Based on your long history of dealing with insanely complex technical issues, we’ve decided to make you a manager in charge of five to twenty people, because this is totally both the same skill-set and the same general area of interest as what you had before! Please as to enjoy with maximum intensity!
You spend years of your life focused on building things. Sometimes you have the idea, sometimes other people do. You like it better when you have the idea, because you “know” it’s the right thing, but the point at the end of the day is you build stuff. You build hard stuff sometimes that requires you to talk to and convince five, ten, fifteen, twenty developers that yours is the right way to go about things, which is definitely leadership and kind of looks managery, but you’re still down in the trenches writing code the whole time. Chances are pretty good
hg churn and
git log are gonna have your name there at least as much as anyone else’s.
Team leads are different. Your job, should you accept it, is to become what I’ve lovingly dubbed Shit Umbrella. Your goal is to find all of the peripheral stuff involved in getting the product out the door—important stuff, such as making sure the delivery schedule for the new servers makes sense for when you want to ship the product that needs them, or taking customer calls at 11 PM on a Sunday because their account quit working and they want to know why they should keep paying you, or figuring out when doing features the sales and support teams want makes financial sense—and then coming back and presenting a focused direction to all the developers so that they can get the features written without worrying about how they actually ship. You switch from doing the building yourself to enabling others to build stuff on your behalf.
I keenly remember sitting in our General Manager’s office, explaining how I just couldn’t do all of my job responsibilities, and him responding that I probably just wasn’t thinking about my job properly. We sat down and enumerated all of my responsibilities. We came up with five tasks, and then we ordered them by priority.
“Writing code” was priority number five of five. If I had anything I needed to do in one of the other four categories, that had to come first.
Once I accepted what I needed to do, I was both better at it and vastly more relaxed. As a developer, a good day is one where I land commits all day long, or hammer out a blog post that gets people talking, or solve a complex support case. As a team lead, a productive day might consist entirely of phone calls, one-on-ones, and emails that let the rest of your team get their stuff done. If you’re serving as a multiplier for your team’s productivity, then you’re being a great team lead. You need to redefine your success as helping others achieve success.
I learned a lot and became a pretty good Kiln Team Lead over the last couple of years, but several weeks ago, I came to the realization that it wasn’t what I wanted to be doing right now. I missed building things with my own hands, and I wasn’t really learning anything new about leadership, management, or the process of writing software.
At a lot companies, this is where you’d see me writing something like, “so I’ve decided to leave
$COMPANY, take a sabbatical, and then join
$OTHER_COMPANY to find new challenges.” At Fog Creek, that’s not how it works. Joel and Michael have a strong attitude that good developers should be rewarded as developers. When I went to them and told them that I wanted to get new experiences and get back into the writing part of writing software, they were really happy to make that happen.
So today? Coding is priority number one. Or occasionally two or three, but never five of five. I’m the senior-most developer on the Kiln team, and I’m back to writing code in that capacity. At the same time, to keep learning new things, I’m working both with a high school in Queens on their CS curriculum, and with an awesome TechStars company as a mentor providing leadership advice and technical guidance. I’ll hopefully have a few other announcements over the next several months as well.
If you like the sound of an environment that understands developers that way, you should come join me. Me? I’m right where I need to be right now.
I think the point of math class is probably to teach people math, but what many of the best developers I know actually learned in math class was how to program.
Nearly every high school math class I took was really, really boring. Not through the fault of the teachers; they were actually awesome. But I consistently knew just enough to be bored, yet not enough to actually skip the class. At first, I tried to act like I was paying attention, which meant that my face had to be vaguely directed at the teacher, even if I was actually studying the posters on the wall. It was in that mode of thought that I finished off Algebra II able to regurgitate π out to maybe 50 decimal places.1
But that only lasted so long before it occurred to me that, “hey, I know how to program!” combined with “hey, I have a programmable calculator!”2 meant the inevitable “hey, I can program this thing to do my math homework!” So I began writing programs to do factoring, to solve equations, to help rotate ellipses and parabolas by arbitrary angles, and pretty soon reached a point where doing a lot of homework mostly involved firing up the right program, plugging in some numbers, and writing down the result.3
While I’d done a fair amount of programming by this point, my calculator programs were the most exciting to me because I was really using them. Yeah, okay, the what-kind-of-rock-is-this database I made for science fair taught me more about data structures, but these were the first programs where I cared about the user interface, where I spent time refactoring so that extending the thing would be easier, or where I’d spend 20 minutes optimizing just to shave off a second or two.
That’s my story. But I know a lot of developers who have almost the exact same story. Some couldn’t use their programs on assignments; some wrote video games instead of math helpers. But in all of these cases, people started getting passionate about what they were writing. And when I’ve talked to them about why this was the first time they really cared, I always get back the same answer: because for the first time, they had end users they cared about. Themselves.
My high school also had a CS course. I didn’t take it. No one I knew who really liked programming took it, in fact. And those who did take the class universally went on to do something, anything else, when they got to college. What was painful for me was that even as a clueless high school student, I knew why: because the whole damn course was totally irrelevant to them! No bearing on what they were doing at all. Here the programmers-to-be were writing toy programs that they were using daily, even competing to see who could make the fastest, most useful ones. The AP class? Write another three implementations that sort random numbers. Yip-de-frickin’-do.
It’s not that you shouldn’t teach that—and, in fact, the number of applicants here at Fog Creek who don’t grok basic algorithms drives me crazy. But it’s completely the wrong way to introduce people to programming.
Programmers like to program because they can do cool things, or because they can solve problems, or both. It’s both creative and it’s practical. If the goal of a high school course is to get people interested in programming, then the course must build around these two pillars.
You wanna appeal to the first group? Show them Mindstorms so they can show their friends a cool robot. Show them Twilio so they can see how to make a little voice-controlled system. Show them LÖVE so they can put together a simple video game.
You wanna appeal to the second group? I think you could do worse than to introduce them to something like VBScript or AppleScript so that they get a flavor of manipulating the applications they use every day. If the total lack of ideological purity of those two bugs you, then introduce them to some Greasemonkey, or some light shell scripting, or maybe even (gasp!) TI-Basic.4
When you attack programming this way, students get hooked. They become their own critics and their own users, so they genuinely care about improving their work. They’ll naturally hit the limit of what they can do, and the good ones—the ones who are going to go on to become programmers—will naturally want to start learning more of the theory, the underlying technologies and techniques, so they can continue to improve their craft. When those students get to an AP prep class later in the curriculum, they’ll be ready for it and enjoy it, because they’ll understand why it matters. It will have become relevant to them.
I’m lucky right now to be working with Bayside High School in Queens, who’s developing a program for CS students that looks a lot like the above, with introductory classes focusing on tangible results students can play with immediately (web applications, little GUIs, and dumpster-diving through massive datasets) rather than wading knee-deep into theory right from the get-go. But there are many, many high schools out there who have nothing remotely like this, who teach programming the same way they teach math. If the high school in your area’s like that, volunteer to teach something more inspiring. The students will love it, and we’ll get more impassioned developers as a result.
I can still recite about 25. Why I’ve forgotten about half the digits of π, but can still happily rattle off the advertising jingle to Kanine Krunchies, which isn’t even a real product, is anyone’s guess. ↩
Anyone else think Texas Instruments has the best racket ever? ↩
My teachers took what in retrospect was an unbelievably forward-looking attitude to this: as long as I was writing all the programs myself, and could explain how they worked, it was fair game. I have no idea what would have happened otherwise. ↩
I know some of you want me to mention UserRPL here, but I think it’s time to admit that the HP-48 is the Amiga of graphing calculators. ↩
I should be in the middle of an interview right now. About fifteen minutes into it, in fact. About the part of my interview where we stop talking about awesome stuff the candidate has worked on in the past and start diving into writing some actual code. A stack with O(1) data access that also always knows its maximum, for example. Or perhaps a rudimentary mark-and-sweep garbage collector. It’s usually my favorite part of the interview: I get to see how the candidate thinks, how they process information, how they problem solve, and how they code. The best candidates even teach me something in the process.
But I’m not in the middle of an interview right now. I’m in Emacs. And instead of being excited about watching someone solve an interesting problem, I’m upset and full of righteous indignation on behalf of the now-former job candidate.
Non-competes are annoying, but not the end of the world. I can understand why, say, Apple, might be legitimately angry to have a senior iPhone manufacturing executive jump ship to HTC: a large part of the candidate’s value to HTC would be his knowledge of the internals of Apple’s manufacturing process. But very few jobs work like that. And even there, most companies don’t forbid you from working at all for a competitor; they forbid you from working in the same area of expertise. So, for example, maybe the iPhone executive couldn’t work on HTC’s manufacturing operations, but he could still head a software development team.
But Bob’s company decided that, nope, Bob couldn’t come work for us, because the existence of Joel on Software’s careers board made us a direct competitor with them.
Let me tell you something about the Joel on Software careers board: Fog Creek doesn’t even make it. We outsource the whole thing to a little-known company called StackOverflow.
It’s true. The lie exposed. If you’re in doubt, take a look and notice the admittedly subtle similarities:
Shocking, I’m sure.
Yet based on this, Bob’s company told him that Fog Creek and his firm were direct competitors, and therefore he couldn’t even come to work with us on, say, Kiln.
Here’s the thing: Bob’s a junior in college, and the company doing this to him is one he merely interned at. There is no way that Bob has inside knowledge of how a job board runs that could help us. And even if he somehow did have that knowledge, we don’t even run the job board! There’s no way he could actually give us anything that would help us! But Bob’s company made him turn us down, before we could even interview him, because they were absolutely, utterly terrified that their intern, who is still in college, was so amazing that his coming to work for us could crash their entire company.
I actually feel really bad for Bob’s company. They’re so unsure of their own ideas, so negative on their own potential, that they believe a former intern being physically near a company with an outsourced jobs board would be enough for that company to absolutely crush them. I don’t want to imagine what it feels like to get up every day and face that world. But that’s no reason to inflict such a morose world view on your interns.
So I’m upset on Bob’s behalf. Bob got shafted by a company he interned at. A company that has so little confidence that they’ve decided the best route to their success is to limit Bob’s choices. Limits that mean we miss out on an awesome candidate, and Bob misses out on an awesome job.
I’ve got two real points to make, at the end of the day:
If you’re an intern, don’t sign a non-compete contract. You have absolutely no idea where your life is going to take you, and you don’t want your direction being shaped by one crappy employer. And, trust me on this: no reputable software company I know of (Google, Microsoft, or Fog Creek) makes their interns sign non-compete contracts. You can find a job that won’t make you do that.
If you’re a company, don’t be a vampire. If you’re so scared of everyone else that you believe that you have to give your interns a non-compete contract in order to stay competitive, then guess what? You’re not competitive. Get a better idea.
And based on the quality of their homepage, probably with good reason, but that’s neither here nor there. ↩
I sincerely doubt that the statement “I like Mercurial” will catch anyone who reads this blog by surprise. I brought it to Fog Creek. I evangelized for it on the Fog Creek World Tour. I helped build a whole product around it. I’ve gone to a Mercurial coding sprint, I’ve sent a whizkid to a Mercurial coding sprint, and I’ve even written a few patches (mostly trivial) for Mercurial. So let’s agree that I like Mercurial an awful lot.
Reading crap like this pisses me off.
The question seems innocuous enough at first glance: Why is Mercurial considered easier-to-use than Git? A legitimate, simple question. And in a sane world, perhaps, a fair one.
But that’s not how things work amongst us developers, because we have these utterly inane religious flamewars. Emacs is for octopuses, Vim is for beep mode. Ruby is a language for potheads, Python is for BDSM fetishists. DOS is for PCP-using masochists…okay, that one may be legit, but it’s also kind of moot at this juncture. Point is, there are some topics where developers just cannot have a rational discussion anymore.
Mercurial versus Git is one of them. To save you time, here is every single Mercurial v. Git discussion I’ve read since 2005:
Harry: Mercurial is awesome, because it is easy.
Sally: It’s not easier than Git! You’re just too dumb to see the light!
Harry: Okay but like Git is written in Perl! And it doesn’t run on Windows! And I had my changesets fucking garbage collected once! And the man pages are like 900 pages!
Sally: It’s not, it’s written in C, and it’s ninety bajillion times faster than Mercurial! And I had to screw with its database once! Also Git does too run on Windows, and I lost nine weeks worth of stuff in Mercurial once due to enabling the
But don’t take my word for it; you can go watch the internecine fanboyism play out at Reddit, right now! In response to that crap question! Again! God! I live in Groundhog’s Day!
You know what?
Both sides are full of shit.
Yeah, I believe that Mercurial is easier to use than Git. But the best example anyone can provide on StackExchange is converting the author names in a pile of commit messages? Really? Because, okay, yes, I’ve done that. Maybe twice. And even if it completely stank, I’d just memorize how to do it and get on with my life.
What about the daily workflow? Is that easier? Why? Is it fewer commands? Is there more protection from you shooting yourself in the foot? Does the protection come with less power? Why do Git users swear by branching, if “it’s in Mercurial since <some low version>”? Why do Mercurial users swear by their branching system? Does Git have an emulation of it? Why or why not? What are the trade-offs here? Do they actually end up mattering?
These questions matter. And they’ve been answered, very eloquently, many times. But for whatever reason, that doesn’t resonate with people. They want to find the weak argument that they can refute, and then that gets traction specifically because it’s weak, and everyone can have the fight all over again.
This stops now.
I have a solution.
Instead of talking about why you are better than the other guy, let’s focus purely on why your system of choice rocks. That’s it. No
Here, I’ll get us started. I whipped up I Love Mercurial, a site where you can talk until you pass out about why Mercurial is awesome. Just tweet with the
#ilovemercurial hashtag, and we’ll pick it up and post it.
Note: why you love Mercurial. Not why Mercurial is better than X. Not why X causes brain tumors in lab rats and sterilizes your children. Just, things that Mercurial does that you love? Talk about ’em on Twitter with the
#ilovemercurial hashtag. We’ll post ’em. Then, the next time, someone asks you why you like Mercurial, just point them to that site. Or the hashtag. I honestly don’t care. Point is, show them why Mercurial rocks, and not why the other guy sucks.
And then maybe, just maybe, just maybe, at least on this one little topic, the flamewar can die in a pile of love on why my tool is awesome instead of the 92835 deficiencies your tool has that mine doesn’t, and I don’t have to wake up to my alarm clock playing I Got You Babe ever, ever again.
Please remember to update manners to
tip before tweeting.
Good news! That’s no longer the problem.
The problem now is that we’re too successful.
What?! I hear some of you ask. Don’t you want to be too successful?!
I think I speak for my entire team when I say: hell yes! We want to keep on being too successful. If anything, we want to be even more too successful than we currently are.
But success does present problems. It’s awesome to have thousands of customers and terabytes of data, but then you start dealing with the boring details of questions like where do you put those terabytes of data, and how do you get that data to the customers in a timely manner. And that can be a really tough cookie to crack.
I’m going to be talking about several different aspects of how we’re handling scaling Kiln over the next few weeks, but today, I want to focus on one single narrow thing: caching.
The WISC Stack
The main part of Kiln that you all know and love—the website—is a fairly typical WISC application. We have a couple of web servers running IIS and the Kiln website, which talk to several SQL Server boxes. The nice thing about well-engineered WISC stacks is that, like LAMP, you can scale the databases and the web servers independently. This is the bread-and-butter of designing a scalable application. So far, so good.
The thing is, just adding more SQL boxes isn’t always the answer. If you have complex queries that take a long time to run, then adding another box won’t help anything. It just gives you another box to run your complex query on at the same slow speed. Even if you’re only doing simple queries, adding more database boxes isn’t necessarily the answer. Good SQL boxes are expensive—doubly so if you’re using a big commercial DB package such as SQL Server or Oracle. While you might be able to afford buy more, you don’t want to if you can avoid it.
Instead, you should focus on just not hitting the database in the first place.
The S in WISC
It turns out that there are already some mechanisms we had in place to help with this. We prefetched certain data that we knew we needed nearly every request (like the full list of repositories), and then used that cache for any other lookups during the request. And LINQ to SQL does a bit of its own per-request object caching in certain situations (such as querying an entity by primary key), so we already had some actual data caching going on.
While that kind of stuff can help, what we really wanted to do was to try to avoid talking to SQL at all for common operations. Those complex queries that Kiln does—things like showing the amalgamated DAG for all related repositories—take a long time to run, but the resulting data doesn’t actually change that often. This is a clear and wonderful win, if we can pull it off.
Making it Happen
There were two problems we had to solve: where do you cache the data? and how do you get it there?
The second part was much more difficult. Kiln uses LINQ to SQL for its database access layer. That meant we had a problem: LINQ to SQL is a very complex beast, where objects have a database context that in turn is aware of all the objects that it’s managing. If you just grab a random LINQ object and throw it into Memcache, then it is not going to deserialize cleanly. Throw in that we have piles of custom logic in our LINQ-to-SQL-backed models, and you’ve got a recipe for pain.
We ended up solving this in two different ways:
- We modified our models to allow for detaching and reattaching to the database context. Before serialization, the object is detached, so it has no controlling database context. On deserialization, we attach it to the current context. This isn’t as fast as grabbing an attached object out of a cache (such as the old per-request prefetch cache mentioned earlier), but ends up incurring minimal overhead for the common case.
- We also had to modify our models to know that they might not have come from the DB. We rely heavily on signal handlers to make changes in a given model class propagate to all the parts of Kiln that need to be notified. These were firing erroneously as the deserialization code set piles of properties. The fix we came up with was to suppress signals for deserializing objects—which, since most of our model modifications are done by T4 templates anyway, was very easy to do in a DRY manner.
With these two changes, we were able to reliably store LINQ database entities in Memcache, get them back out, and work with them.
It was easy enough to verify that the number of queries was down, but would the caching code make a real difference?
I think this graph of what load looks like on one of our DB boxes, before and after the caching deployment, says more than I could in several paragraphs of text:
We’ve cut the amount of data we’re getting from SQL by 75%.
The benefits we’re seeing are already impressive. We have faster load times and less DB traffic. But we can still do a lot more: now that we have the outlines of a caching framework, we can continue to profile and to move more of our expensive queries into the cache. Based on what we’ve seen so far, this should yield immediate and real benefits to our Kiln On Demand customers.