Friday, March 25, 2016

Metabolism Probability Insomnia


One might think that I might try to get more sleep during Spring Break. I tried. It just didn’t work. (I have occasional insomnia.)

It was my own fault. I had been thinking about one of the rule modifications for the game I’m playtesting. It is called Bios Genesis and designed by Phil Eklund of Sierra Madre Games. (Planned release is Fall 2016.) For a very brief overview of the game, you can read this earlier post that describes how my research, a boardgame, watching TV, and thinking about time travel merged into a wild idea. But now to the matter at hand: I was thinking about the probability of rolling triples given N dice.

But first, a little context is needed. The object of the game is to create life, sustain it, and hopefully evolve into a thriving organism. The currency (how you pay for upgrades) in the game comes through catalysts. In the early stages of the game, catalysts are obtained via dice rolls that allow you to cycle nutrient molecules. Basically you are trying to set up a robust autocatalytic cycle, which fits well with some origin-of-life scenarios. At some point, the autocatalytic cycle can be evolved into simple life – microorganisms. Once they arrive on the scene, each microorganism has to perform a Darwin Roll. The number of dice rolled depends on how many “chromosomes” the organism contains. The aptly named chromosomes are colored cubes in this game. The four colors represent abilities in four areas: metabolism (red), specificity (yellow), entropy (green) and heredity (blue).

If your organism has red cubes signifying its metabolism, you gain catalysts when you roll ones. But you might not have red cubes, or you might be unlucky and not roll ones. (Such is life.) In some playtests, there was a significant shortage of catalysts in the early stages. Thus a rule was suggested that for every triple rolled, your organism also receives a catalyst. This led to a few games that were flush with catalysts, while others might still be relatively lean. It also depended on whether players were more cooperative or conversely more competitive. Clearly the more dice rolled, the higher your chance of getting triples. With this new rule, an organism has better metabolism if it has more chromosomes, not just the red ones. But how does it scale?

[Warning: Some Math Ahead]

This is an easy problem, I thought to myself. Note to self: Don’t work on such exciting things shortly before going to bed. (Hence, the title of my blog post.) Here’s my reasoning. If three dice are rolled, the number of possibilities is 63 or 216. There are just six ways of getting a triple (all ones, all twos, all threes, all fours, all fives, all sixes). So the probability is 6/216 or 2.8%. Another way of calculating this is 6 x (1/6)3 = 6/216. The probability of rolling three equal dice is (1/6) x (1/6) x (1/6) but there are 6 ways you could do this. Or you could say that it doesn’t matter what you roll for the first dice so this is 6/6, but after it is rolled the other two dice must match if this is to be a triple, i.e., (1/6)2 = 1/36 or 2.8%.

What if you roll four dice? Now the total is 64 or 1296. As long as you have three dice equal, the fourth one shouldn’t matter. So now you have 4 x 6 x 6 x (1/6)4 where the factor of four is because any one of the four dice could be the one that doesn’t matter, and one of the factors of 6 is because this dice could be any number from  one to six. This yields 144/1296 or 11.1%. (This turns out to be wrong, but I hadn’t realized it yet.) Note that you could also have written this as 4 x 62/64 or 4 x (1/6)2.

How about five dice? Well, now you have two dice that don’t matter so that gives you 63/65 but there should be 10 ways that you could do this, i.e., the dice that don’t matter are (1,2), (1,3), (1,4), (1,5), (2,3), (2,4), (2,5), (3,4), (3,5), (4,5) where the numbers refer to the 1st, 2nd, 3rd, 4th or 5th dice as the ones that don’t matter. So the probability would be 10 x 63/65 = 2160/7776, i.e., 27.8%. (This will also turn out to be wrong.) This could also have been written as 10 x (1/6)2 = 10/36. At this point, I think I see a pattern. For N dice, you can calculate the probability my multiplying two factors. The first is N!/(3!(N-3)!) and the second is 6N-2/6N.

I am led astray because back in week 3 of the semester, I was teaching students the Boltzmann distribution. A good way to count the number of ways a particular arrangement of particles can take, as they spread themselves in different quantum states, is to count marbles in boxes. If you are tossing N marbles into boxes and the relative box sizes are given by g1, g2, g3, … and the number of marbles in the respective boxes are N1, N2, N3, … then you can calculate the number of arrangements W using the formula below. (In a quantum model, g is the degeneracy of the particular state.)

Assuming fair dice, this sort of resembles a six-box problem with boxes of equal sizes. Hence, all the values of g are 1. The number of marbles is the number of dice rolled. That factor of 4 that I used for four dice and the factor of 10 that I used for 5 dice, well those are just 4!/3!1! and 5!/3!2! which fits with N!/3!(N-3)! and sort of resembles the ratio shown in the formula above. This ratio is part of Pascal’s Triangle. I have highlighted the relevant section in the figure below.

It hasn’t taken me much time to get to this point and I am feeling very pleased with myself. (This is what happens when you get sloppy.) I now merrily plug numbers with increasing N. When I get to N = 7, the probability is 0.972. This can’t be right. I go ahead and plug in N = 8 and sure enough, I get a number larger than 1 and I know my simple formula is dead wrong. Here’s the table below.

You would think I’d have figured this out earlier simply by seeing that Factor 2 is unchanged at (1/36) and since Factor 1 is simply an increasing multiplicative integer, once it passes 36 (and it does so very quickly), this becomes nonsense. It’s time for bed. I’m tired. And I’m somewhat dejected by my own idiocy. Why did I think it was going to be that simple?

I go to bed. Maybe 3 hours later I’m awake. But now my mind can’t stop working on the problem. I’m trying to get back to sleep but my mind has shifted gears to try a simpler problem – rolling doubles rather than triples. If you roll just two dice, then the probability is 61/62 = 1/6. That’s easy. What if you roll three dice? Is it 3 x 62/63 or 3 x 1/6 using similar reasoning as I did earlier? That would yield 108/216 or 50%. What if you roll four dice? That would be (4!/2!) x (1/6) which is exactly 1. That’s clearly wrong. You should only hit a probability of one when rolling seven dice because you could roll (1,2,3,4,5,6) with six dice. I’m still doing this in my head lying in the dark. Okay, I go back to the three dice problem. I imagine all the possibilities in a matrix. The 216 possibilities can be written as a 3 x 36 matrix that I can systematically populate. And if I think of the 3 as xyz coordinates in Cartesian space, I can imagine a cube of length six. When x and y are equal (a double is rolled), then z can take any value. This leads to a plane parallel to z and along the diagonal y=x. There should be two similar planes for x=z and y=z. These planes bisect each other and that’s why my earlier formula didn’t work. I must have double-counted, triple-counted, etc., the intersections! Unfortunately I’m stuck at this point in the dark since I’m not very good at trying to visualize these three intersecting planes. Worse, even if I do figure it out, I probably can’t do the four-dimensional problem representing rolling four dice. I try to think about something else and eventually get back to sleep.

In the morning before going into work I sketch out the three planes and I can guess the intersection. But this is going to be more problematic for higher dimensions. I start writing out sequences (over breakfast) to get a sense of where I might be double-counting and I’m able to quickly figure out that each intersecting plane has double-counted 6 possibilities so instead of 3 x 36 = 108 in the numerator, it should be 36 + 30 + 30 = 96. The probability is 96/216 or 44%. I make a quick stab at the four-dice problem and it’s clear that much larger chunks of the matrix are being double-counted, but I don’t have the time or patience to figure it out. I’ve clearly learned that it’s not going to be so easy.

I get busy and leave the problem for several days. I contemplate writing a simple script that generates N random integers from 1 to 6 (for N dice rolls) and then checks to see if a triple is rolled. I could then run maybe a million trials and get some statistics. The problem is that 6N grows very quickly. When N=10, 6N is about 60 million, so I’d have to sample much more. Not only that, I’d need to go look for a better random number generator than the simple rand( ) function or its equivalent.

Today I decide that I’m going to do this systematically. I’m a lazy and lousy coder. Hence I write a short script that generates the 3 x 6N matrix. I then write a second script that goes in and checks in each case how many triples are in each combination. This might be useful later because in the actual game, rolls of fives and sixes could cause an error catastrophe. If errors exceed the number of blue (heredity) chromosomes, your organism suffers atrophies. Yellow (specificity) chromosomes allow you to reroll some of the dice. And some mutations (when DNA has evolved) confer additional stability whereby only sixes cause errors. Given that 6N starts to blow up exponentially and my script is inefficient with lots of I/O writing out files with 6N lines, you can imagine that this bogs down after a bit on my laptop. I could submit the job to my computational cluster at work but this doesn’t seem right. Anyway in my playtests so far, it’s not often that you have to roll more than ten dice, so I let my laptop work while I do something else.

Here are the results for N=3 to 11 for triples.

Looks like 7 dice get you to at least half a chance of getting a triple. I haven’t done a further analysis taking into account rerolls. A fair assumption is that if able, the player will try to reroll fives and sixes to avoid errors. Assuming success this reduces the player’s final number of triples by a third. I’ll give feedback to the designer who can decide if the rule stays or goes.

And that’s how my episode of Metabolism Probability Insomnia transpired.

No comments:

Post a Comment