Using JavaScript generators in the REPL to create interactive auditing tools for analyzing production data and comparing different code paths
I’ve lately been doing a ton of auditing on production data (mainly through REPL), and when you have enough failing cases for an audit, it can be a real sore to go through all of the entries and examine them one by one.
My usual process for an audit is something like this:
- Come up with a thesis of what’s right and what’s wrong
- Write a function that verifies whether a particular piece of data is right or wrong
- Query for all of the data I need to audit (usually a DB query)
- Run the queried data through the function
- Report the results
Steps 4 and 5 are surprisingly annoying. The most rudimentary way I’ve pulled data without any special tools is to console.log
the results in a csv/tsv format, then copy paste the results into a Google Sheet or something.
Note: There are, of course, much better tools than this method out there! But if you had zero tools from the outset and you didn’t have the time/money to invest in better tooling, this is probably the de facto naive way of doing a decent data pull.
This method is totally fine if you have a relatively flat data structure you want to output, but when the data you’re dealing with is nested -or- the results you want are more complex, you can actually create some primitive tooling yourself without having to download any extra dependencies.
A complex output
Let’s say you wanted to do a scientist-esqe manual audit, where you compare the output of one function, to the output of another one (probably a modified version of the first function). How would you go about auditing the output from the two codepaths?
Again, there’s a million tools out there that can help you do this very easily (including scientist!), but let’s assume you have not set up any of these tools and you don’t have the time to make the upfront investment to add the specific tooling at the moment. You just have your REPL and your knowledge of javascript.
The ultra-naive way would be to just console.log
the two outputs, and diff them with your eyes.
// Source codefunction currentWay(a: string) { ... }
function newWay(a: string) { ... }
// REPLREPL > const input = 'some_testable_input'REPL > console.log(currentWay(input))
// ... output of current wayREPL > console.log(newWay(input))
// ... output of new way
It’s very tedious to do this for every input/output you want to test, so most people, facing this problem head on, would write a custom that does the diff and prints out the results all at once.
// Source codefunction currentWay(a: string) { ... }
function newWay(a: string) { ... }
const inputs: string[] = await query();
function auditScript() {
for(const input of inputs) {
console.log(currentWay(input))
console.log(newWay(input))
}
}
// REPL
REPL > auditScript()
// ... first output of current way
// ... first output of new way
// ... second output of current way
// ... second output of new way
// ...
But trying to compare that many outputs all at once can also be a little overwhelming.
Is there something in-between? It would be nice if you could “interactively” go through each example automatically, much like you would do with git add —patch
or Jest’s interactive snapshot feature.
Generators to the rescue
JS generators are a super under appreciated feature of js that allow you to iteratively yield a fixed or infinitely long set of results, one function call at a time. (Side note: if you’ve use async
/await
in modern js/ts, you’re already using generators underneath the hood).
The MDN tutorial on generators is really good already so I won’t dive into a tutorial here, but the basic idea is that with whatever dataset you’re working with (pulled from step 3 above), you can embed that dataset in a generator function, and console.log
the output in whatever text-based format you want. For the purposes of auditing, it’s not really important in my example to yield an actual return value from the generator, so you can just yield nothing.
function currentWay(a: string) { ... }
function newWay(a: string) { ... }
const inputs: string[] = await query();
// Vanilla generator function!function* compareCurrentAndNewInteractively() {
for(const input of inputs) {
console.log(currentWay(input))
console.log(newWay(input))
yield;
}
}
Now, when I create the generator and call for the next value, it’ll give me the next output to compare in the fixed but large set of data that I pulled from step 3.
REPL > const generator = compareCurrentAndNewInteractively();
REPL > generator.next();
//... first output of current way
//... first output of new way
REPL > generator.next();
//... second output of current way
//... second output of new way
Voila! You just basically just added a -i
option to your audit function.
Bells and whistles
The beauty of this is that because this is just vanilla code, it’s easy to extend your audit function to do whatever you want it to do.
Let’s say I want to keep track of which specific output I’m currently auditing. I can do that by adding a simple log function in my yield
loop.
function* compareCurrentAndNewInteractively() {
let counter = 1;
for(const input of inputs) {
console.log(`${counter} of ${inputs.length}\n`); // NEW
console.log(currentWay(input))
console.log(newWay(input))
yield;
}
}
Now let’s say I also want to show the current input I’m working on along side the output. That’s just another log function. Boom.
function* compareCurrentAndNewInteractively() {
let counter = 1;
for(const input of inputs) {
console.log(`${counter} of ${inputs.length}`);
console.log(`input: ${input}\n`); // NEWconsole.log(currentWay(input))
console.log(newWay(input))
yield; // This just yields `undefined`, which is fine for auditing.}
}
I could even transform the input into some kind of shim before passing them along to the functions I’m auditing.
import transformInput from '../../someFile';
function* compareCurrentAndNewInteractively() {
let counter = 1;
for(const input of inputs) {
console.log(`${counter} of ${inputs.length}`);
console.log(`input: ${input}\n`);
const shim = transformInput(input); // NEWconsole.log(currentWay(shim))
console.log(newWay(shim))
yield;
}
}
The world is your oyster! It all depends on how you want to go about auditing your data on REPL, and what will help you debug and audit whatever you’re working on.