I recently created a command line Reverse Polish notation calculator as a programming exercise. It's interactive, so a user runs the executable and then they are presented with a REPL that they can use to evaluate Reverse Polish notation expressions. When the user is done, they type q or CTRL-D to exit. It might look something like this:

$ rpn_party
> 3 2 +
5.0
> 6 -
-1.0
> q
$

Naturally, I wanted to write tests that verified that it behaved as expected. But I'm a big believer in integration tests, so I wanted my tests to actually interact with the calculator as if they were a user. That is, I wanted them to run the executable, then send commands to the REPL over stdin, and, finally, verify the results by reading them from stdout. Essentially, use the app exactly as a human would.

The Solution

The solution can be found in a class from Ruby's standard library called PTY. PTY allows you to spawn an external process and then interact with that process by using puts to write to it's stdin and gets to read from it's stdout. You can read the documentation for PTY, but it probably won't make much sense unless you have a pretty good understanding of how pseudoterminals work (PTY is an unixism for pseudoterminal). It's okay if you don't, though, because using PTY to interact with command line apps is pretty simple:

PTY.spawn('path/to/executable') do |stdout, stdin, pid|
  stdin.puts 'some command'
  stdout.gets

  response = stdout.gets
  assert_equal 'expected response', response
end

Here's what's going on in this chunk of code:

  1. We're calling the spawn class method on the PTY class.
  2. We're passing the path to our executable as the first argument
  3. Our second argument is a block which is where we specify how we want to interact with the process.
  4. The block takes three arguments: two IO objects representing the stdout and stdin of the spawned process and then the pid of the spawned process, which will be useful later on.
  5. In the first line of the block, we are interacting with the spawned process by writing text to it's stdin.
  6. Then we are consuming one line of text from the process's stdout. We have to do this because our input from the previous line is echoed to the process's stdout, so we need to consume that before we can get to the actual response. I'll go into more detail on this down below.
  7. Then, in the next two lines, we are getting the process's response and asserting that it is the value we are expecting.

That's basically it, other than a few tips and gotchas, which I will dive into below.

Echoed Input on stdout

The first gotcha to be aware of is that, for many command line apps, everything that is typed on stdin is echoed back to stdout. If you think about it, this actually makes a lot of sense. If it wasn't, the user wouldn't be able to see what they were typing. There are some cases where input doesn't get echoed to stdout, though, like when a user is typing a password.

This isn't a big deal, more just something to keep in mind. I dealt with this in my tests by creating a method that would send a command to my process and then immediately consume a line of output from stdout. This made my tests easier to comprehend. Here's what that looked like:

def send_command(pty, command)
  pty[1].puts command
  pty[0].gets
end

This is a little tricky because for this method to do what it needs to do, it needs access to both stdin (to send the command) and stdout (to consume the line containing the command). So at the beginning of each of my blocks, I put all three of the block arguments into an array which would be less cumbersome to pass around:

PTY.spawn('path/to/executable') do |stdout, stdin, pid|
  pty = stdout, stdin, pid
end

And that pty variable is what gets passed as the first argument to the send_command method.

Process Termination

If you've come this far in pursuit of integration testing your command line app, you want to go all the way. And that means you want to verify that your program exits correctly. You could have your app print an exit message and then verify in your test that this message gets printed to stdout when you input the command to exit. The problem is that you don't actually know that your CLI app exited. It could have printed the exit message and kept running.

Better to actually verify that your process is no longer running. This is where the pid argument to the block from up above comes into the picture:

PTY.spawn('bin/rpn_party') do |output, input, pid|
  stdin.puts 'exit'
  assert PTY.check(pid)
end

Here we're sending the exit command to our process, then using the check class method on PTY to assert that the process is no longer running. The semantics of check are the opposite of what I, personally, would expect, but I'm not a systems programmer, so I will assume that I'm wrong on this one. Anyway, check returns nil if the process is running and a truthy value if the process is not running, so you want to assert that PTY.check(pid) returns true to verify that your process has exited.

Results Race Condition

Another gotcha you need to be aware of is that, when you spawn another process with PTY like this, you now have two separate processes, which means that you can run into timing issues when making assertions about the output from your spawned process. So this assertion can fail sometimes:

PTY.spawn('path/to/executable') do |stdout, stdin, pid|
  # send a command and clear it from stdout
  response = stdout.gets
  assert_equal 'expected response', response
end

What's going on is that your test process has sent a command to your spawned process, then instantaneously tries to read the response from stdout. But if your spawned process has a small delay in writing to it's stdout (for whatever reason; the delay only needs to be miniscule), then the response your test gets will be blank and the assertion will fail. You can get around this by having your test process sleep:

PTY.spawn('path/to/executable') do |stdout, stdin, pid|
  # send a command and clear it from stdout
  sleep 0.1
  response = stdout.gets
  assert_equal 'expected response', response
end

Having your test process sleep before getting every response is not really optimal, though. With a large enough test suite, tenths of a second start to add up. What you really want is a method that tries getting the response for a certain amount of time before giving up. Something like this:

def get_response(stdout)
  start = Time.now
  try_for = 2
  response = nil

  loop do
    response = stdout.gets.chomp
    break if response || Time.now > start + try_for

    sleep 0.1
  end

  response
end

What's going on there is that, in each iteration of the loop, we try to get a response. If we get a response, or if we have exceeded the total amount of time we are going to try for, then we break out of the loop. Otherwise, we sleep for 0.1 seconds, after which point the loop runs again.

Prompt Race Condition

Another race condition issue can pop up if your CLI app shows the user a prompt, like this (the > is the prompt):

$ rpn_party
> commmand
result
>

What can happen is that, because the test is running at computer speed, rather than human speed, the test can send the next command in the sequence before the CLI app's stdout has printed the > after the last command. Say you have a test like this:

PTY.spawn('path/to/executable') do |stdout, stdin, pid|
  stdin.puts 'command1'
  response1 = stdout.gets

  stdin.puts 'command2'
  response2 = stdout.gets
  assert_equal 'response2', response     # fails sometimes
end

This assertion will sometimes fails in a not so obvious way (because reasoning about async is hard). When it does fail, you'll get a message like:

Expected "response2"
Actual   "> response2"

The reason it fails like this is because what you think is happening is this sequence of events:

  1. Command 1 is sent
  2. Response 1 is received
  3. Prompt 1 is printed to screen
  4. Command 2 is sent
  5. Response 2 is received
  6. Prompt 2 is printed to screen

But what actually happens sometimes is this:

  1. Command 1 is sent
  2. Response 1 is received
  3. Command 2 is sent before the prompt has been printed
  4. Prompt + Response 2 are then received together

That's how a response like > response happens instead of response.

To resolve this, we need to wait until the prompt is printed before we send a request. I think it's best to wrap this up in a method, like so:

def wait_for_prompt(stdout)
  start = Time.now
  try_for = 1

  loop do
    prompt = stdout.getc
    break if prompt == '>' || Time.now > start + try_for

    sleep 0.1
  end
end

And there you go. Now you can integration test your CLI apps.