Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A D&D.Sci Dodecalogue, published by abstractapplic on April 12, 2024 on LessWrong.
Below is some advice on making D&D.Sci scenarios. I'm mostly yelling it in my own ear, and you shouldn't take any of it as gospel; but if you want some guidance on how to run your first game, you may find it helpful.
1. The scoring function should be fair, transparent, and monotonic
D&D.Sci players should frequently be confused, but about how to best reach their goals, not the goals themselves. By the end of the challenge, it should be obvious who won[1].
2. The scoring function should be platform-agnostic, and futureproof
Where possible, someone looking through old D&D.Sci games should be able to play them, and easily confirm their performance after-the-fact. As far as I know, the best way to facilitate this for most challenges is with a HTML/JS web interactive, hosted on github.
3. The challenge should resist pure ML
It should not be possible to reach an optimal answer just training a predictive model and looking at the output: if players wanted a "who can apply XGBoost/Tensorflow/whatever the best?" competition, they would be on Kaggle. The counterspell for this is making sure there's a nontrivial amount of task left in the task after players have good guesses for all the relevant response variables, and/or creating datasets specifically intended to flummox conventional use of conventional ML[2].
4. The challenge should resist simple subsetting
It should not be possible to reach an optimal answer by filtering for rows exactly like the situation the protagonist is (or could be) in: this is just too easy. The counterspell for this is making sure at least a few of the columns are continuous, and take a wide enough variety of values that a player who attempts a like-for-like analysis has to - at the very least - think carefully about what to treat as "basically the same".
5. The challenge should resist good luck
It should not be plausible[3] to reach an optimal answer through sheer good luck: hours spent poring over spreadsheets should not give the same results as a good diceroll. The counterspell for this is giving players enough choices that the odds of them getting all of them right by chance approach zero. ("Pick the best option from this six-entry list" is a bad goal; "Pick the best three options from this twenty-entry list" is much better.)
6. Data should be abundant
It is very, very hard to make a good "work around the fact that you're short on data" challenge. Not having enough information to be sure whether your hypotheses are right is a situation which players are likely to find awkward, irritating, and uncomfortably familiar: if you're uncertain about whether you should give players more rows, you almost certainly should. A five- or six-digit number of rows is reasonable for a dataset with 5-20 columns.
(It is possible, but difficult, to be overly generous. A dataset with >1m rows cannot easily be fully loaded into current-gen Excel; a dataset too large to be hosted on github will be awkward to analyze with a home computer. But any dataset which doesn't approach either of those limitations will probably not be too big.)
7. Data should be preternaturally (but not perfectly) clean
Data in the real world is messy and unreliable. Most real-life data work is accounting for impurities, setting up pipelines, making judgement calls, refitting existing models on slightly new datasets, and noticing when your supplier decides to randomly redefine a column. D&D.Sci shouldn't be more of this: instead, it should focus on the inferential and strategic problems people can face even when datasets are uncannily well-behaved.
(It is good when players get a chance to practice splitting columns, joining dataframes, and handling unknowns: however, these subtasks should not make up the meat of a ch...
view more