Thursday, March 14, 2019

Errors

TIME TO STRATEGY EXECUTION: 109 DAYS

The logical first step in my contribution to troubleshooting the artificial intelligence tool Sanda was to learn about the strategy inconsistencies found by the test team. I spent most of yesterday morning being briefed by the head of the team, Caleb Tosner, who is about my age and a lot smarter. 

I can’t go into detail for security reasons, but there were generally three types of what the team’s test plans classify as “critical errors.” 

The most consequential error had to do with changing the way people make economic decisions. Tosner explained, “A typical region is the size of a very large city, so we have to start with what's already in place. Governments and businesses have historically tended to promote growth in revenue, but they will have to substitute that with growth in nature, and without any way to pay for it. The AI was supposed to use the national strategy inputs and models of law, psychology, and behavior to develop a set of agreements people and organizations could make both within regions and between regions to enable that.” He displayed a short, bulleted list on his conference room screen. “This is what Sanda gave us. It is essentially gibberish, and inconsistent with both the inputs and the models. Our testing includes implementation of relevant strategy components in small cities populated with volunteers, and then observing the results. No one knew what to do with this, or these.” He replaced the list with ten more, in rapid succession, each appearing to be a set of generalities, rather than specifics, regarding a different policy requirement.

“That’s the first type, agreements,” he told me. “The second type is projected conformance with target measurables like population, ecological impact, and life expectancy.” He displayed two graphs side-by-side, each with a time series of the three variables he mentioned. The graph on the left was clearly the goal; while the one on the right had population that was too high and per-capita impact that was a little low toward the end. “This was the easiest failure I’ve ever seen in a test,” he said, disgusted. “The AI explained that the reduction in population could not be justified, and that the difference in ecological impact was too small to be worth changing. Can you believe that? The AI even signed off on the targets at the beginning after doing most of the work deciding what they should be. My teenager has better excuses!” He added that the test team was re-running the numbers to make sure that the projections reflected the design and agreed-to assumptions in the strategy; and was planning to re-evaluate the assumptions, as time permitted, using recent research results and observations in the test cities, some of which were in the process of major environmental remediation. 

Finally, there was variability. Sanda was giving different answers to the same questions during the week before the server crash. This didn’t technically indicate a problem since the questions had nothing to do with the work and might leave room for interpretation, but some members of the test team flagged it as an anomaly worth investigating (a “potential bug” in their jargon) and it was getting a lot more attention in light of the other errors. “This looks like a good one for you to start with,” Tosner told me toward the end of the briefing, and I had to agree.

Reality Check


In lieu of having a real artificial intelligence tool to mimic, I’m imagining Sanda as a person with reasonable limitations and strengths represented by placeholders for what I don’t know or have.

I continue refining the simulation to better model what the imaginary world (and ours) might encounter, and how it might decide to handle issues suggested by the output.

No comments:

Post a Comment