Wednesday, March 20, 2019

Simulations


TIME TO STRATEGY EXECUTION: 103 DAYS

“You’re just wrong!” data analyst Rico Sanguini said for the third time at the impromptu test team meeting called by team lead Caleb Tosner to discuss my progress in troubleshooting failures traced to Sanda the AI. He was referring in this case to my hypothesis that Sanda’s inconsistent reporting of a sports statistic was based on its interpretation of them as simulations. “First,” he continued, “sports statistics are among the most reliable ones we can get and are demonstrably based on direct observation. Second, distinguishing reality from simulation is the most basic test we give it, which it’s consistently passed hundreds of times going back to when it first went live. Third and most important, we can find no sources that give the second, inaccurate number for that stat.”

I had done a similar search for sources. “No one is reporting it,” I acknowledged, “but that doesn’t mean Sanda is wrong.” Sanguini glared at me like I was stupid. “One of Sanda’s first and most basic rules is: ‘Question everything and fully accept nothing, because reality is always subject to interpretation.’ That’s a variation of something Sanda shared with me outside if its job here, suggesting that the rule is also basic to its own operation. Following that rule, Sanda would have sought out raw observations behind the statistic after the second question cast doubt on it. I believe the second number was the result of analyzing those observations or a slightly different set, which to Sanda was functionally a simulation because the relevant variables had to be accounted for, and because the implied goal was to predict future behavior. Sanda would have reasonably judged the first report to be the result of another such simulation, done differently.”

Tosner asked if I had found supporting evidence, such as the raw data Sanda used, and whether I applied the rules to the related cases and found a similar explanation. “The first exercise would be pointless, don’t you think?” I replied to the first part. “Only Sanda could tell us what it used, and that’s impossible.” Again, I had that feeling of criticizing a dead person who couldn’t defend herself. “Each of the other cases could plausibly be explained by applying the same rule and possibly one or two others. There is one other thing I should note, though, which may or may not be relevant. I interviewed all of the originators, and every one of them was in or near the break room when they spoke with Sanda; and Maura was there with them, probably because her office is nearby.”

We had a broader conversation about the application of the rules to the entire strategy. I could see Tosner softening his opinion that they were too general to be useful, while growing frustrated with the added work required and the uncertainty it added to the results. At the end, he summarized the team’s consensus opinion by laying out the next steps. “Let’s all go back and study this, and meet tomorrow with ideas about how to game it out. Meet with your test environment contacts and I’ll meet with their leads to identify the range of behaviors and outcomes that we can directly test. We can’t count on Sanda being brought back online in time to help organize the whole test, so we’ll have to do it ourselves, and as soon as possible.”

Reality Check


As I understand it, testing a strategy typically involves simulation in the form of “gaming,” which is quite different from equipment and software which I have had direct experience with. 

Related to that experience, my favorite approach is to exercise whatever I’m testing in a process of documented discovery before developing a test plan so that I can understand the real variables that determine behavior and then compare that understanding to the intended design. Viewing pass/fail observations as metaphorical features on a map which I’m creating with experience helps to assess context for both success and failure, with the benefit of accelerating troubleshooting if it’s necessary, and dealing with anomalies (unexpected behaviors) that inevitably result after deployment.

In this imaginary situation, a simulation is being tested by a simulation, with a very limited set of direct observations as inputs (never mind the fact that it is entirely the product of two simulations: a numerical one, and “gaming” inside my skull). Awareness of this suggested that an artificial intelligence would be an invaluable tool in such a world, and my experience forced consideration of how to succeed if that tool was suddenly unavailable.

No comments:

Post a Comment