Unit Testing Games: Tip #3

Don’t let full coverage fool you

Full cover, but still not bulletproof!
Full cover, but still not bulletproof!

When playing XCOM 2, the survival and effectiveness of your squad is heavily influenced by the amount of cover available. Half-cover items provide some protection from enemy fire, but full-cover offers the best protection. Though critical to survival, full-cover doesn’t guarantee survival.

When testing your code, it’s common practice to examine code coverage. This is a metric that identifies what percentage of your code is executed by your tests. Higher coverage equates to more lines of code being put through the paces.

But just as full-cover doesn’t prevent your soldier from being pummeled in XCOM 2, having 100% code coverage doesn’t guarantee your code is bulletproof. Tests can execute lines of code, but the Engineer has to make sure the tests verify behavior.

Take this code sample of a square grid-based map, stored in a linear array.

class GameMap(object):
 
    NUM_ROWS = 10
    NUM_COLS = 10
 
    def __init__(self):
        # Initialize map to all 'grass'
        self._map = ['grass'] * GameMap.NUM_ROWS * GameMap.NUM_COLS
 
    def get_terrain(self, x, y):
        # grass, road, mountain (impassable), brickwall (impassable)
        return self._map[y * GameMap.NUM_ROWS + x]
 
    def is_passable(self, x, y):
        grid = self.get_terrain(x, y)
        if 'mountain' not in grid:
            return True
        return False

The get_terrain() function returns the type of terrain at a given map point. To test this function, we only need x and y coordinates that are inside the map area.

import unittest

from main import GameMap


class TestGameMap(unittest.TestCase):

    def test_get_grid_returns_value(self):
        m = GameMap()
        val = m.get_terrain(1, 1)
        self.assertEqual(val, 'grass')

Since the get_terrain() function only has a single line, it is guaranteed to be executed by our test function (100% test coverage). Great! Except, this doesn’t test the full range of behavior. What happens if x or y are negative? What about x or y being outside of the map bounds? For both cases, the behavior is “raise an exception when this condition is true”. These are edge cases we should cover in our tests.

One of the benefits of testing your code is to provide real-usage documentation. A new user might assume the only behavior of get_terrain() is to return a terrain value. They might proceed to write their code, assuming invalid map coordinates are handled gracefully. With explicit tests that highlight these edge cases, the new engineer can write safer code.

Another example is when a function doesn’t properly handle return values. The get_terrain() function has 4 return values, 2 of which are ‘impassable’. The is_passable() function relies on these return values to do it’s job. Our tests below also offer 100% code coverage by touching every line of the function.

    def test_is_passable(self):
        m = GameMap()
        m._map[0] = 'road'
        val = m.is_passable(0, 0)
        self.assertTrue(val)

    def test_is_passable_fails_for_mountains(self):
        m = GameMap()
        m._map[0] = 'mountain'
        val = m.is_passable(0, 0)
        self.assertFalse(val)

Coverage is at 100% as well, but these tests don’t offer insight into a bug in the code. As it’s written, is_passable() will incorrectly identify ‘brickwall’ as passable. The missing test that checks for ‘brickwall’ will catch this error.

    def test_is_passable_fails_for_brickwalls(self):
        m = GameMap()
        m._map[0] = 'brickwall'
        val = m.is_passable(0, 0)
        self.assertFalse(val)

This test will fail since ‘brickwall’ results in a True (passable) return value. Only ‘mountain’ terrain returns False (impassable).

Test coverage is a fantastic tool in your software engineering arsenal. Increasing the coverage percentage means more of your lines of code are executed. However, this should not be taken as the sole measure of quality.

Starting TDD with ASP.Net

An article posted on Visual Studio Magazine discusses getting started test-driving ASP.Net projects. The author, Peter Vogel, mentions some common replies he gets when introducing TDD to uninitiated programmers. These questions are common across all software engineering domains. I’ve received similar questions, and asked these same questions when I started.

An important point Vogel makes is that TDD doesn’t replace your current (usually manual) methods of testing; TDD complements your current process, making it more reliable and faster.

I think that, for most developers, the biggest issue is that they have a working UI that they’ve been using for testing. That form of testing – as inefficient as it is compared to using the TDD environment – has become a habit and, as a result, easier to keep doing than to change.

TDD isn’t a substitute for human inspection. It is a tool that lightens the testing burden of the developer. Tests also improve product reliability when incorporated into Regression Testing. This also saves time for Quality Assurance Testers.

In fact, I’d suggest that whatever testing you’re doing now is always going to be easier with TDD than by navigating through your code using a browser: right click, select Run Tests, and you have your result.

You can read the full article here

Unit Testing Games: Tip #2

Keep tests fast

Tip #1 is about keeping unit tests small. This is greatly complemented by keeping them fast.

Unit tests are a powerful tool for your development team, and time is possibly the teams most valuable asset. Having tests that execute rapidly both promotes their use, and removes barriers to running them in the heat of Alpha. If an engineer can execute 300 tests in less than 2 seconds, they will likely get run. If 100 tests take 20 minutes, they will almost never be executed.

Remember not to confuse size with speed. Small tests will very likely be fast, but that’s not a guarantee. Let’s look at our “small” test from Tip #1.

def test_weapon_fire(self):
    # setup
    # =============================
 
    # This version will NOT implicitly load assets on construction
    weapon = Weapon("rifle")
    start_position = vector(2.0, 0.0, 2.0) 
 
    # execute
    # =============================
 
    # Reduce ammo
    # Note: we no longer require a target
    weapon.fire(start_position, None)
    actual_ammo_post_fire = weapon.get_current_ammo()
 
    # assert
    # =============================
 
    # Verify we lost ammo on fire
    self.assertEqual(weapon.get_max_ammo() - 1, actual_ammo_post_fire)

Depending on the details of the Weapon class constructor and the fire() method, this test could take a second or two to execute. It’s plausible the constructor loads art assets from disk. This could be for the weapon itself, as well as for it’s ammo. It’s also plausible the fire() method performs line-of-sight checks, which can be computationally expensive.

Having this test execute in 1-2 seconds may not be an issue in isolation. It becomes a problem when we add more tests. Adding tests for reloading (another 1-2 seconds) and switching (2-4 seconds to load 2 weapons) raises the tally to 4-8 seconds. When we have 100 tests (100-200 seconds) or, even better, 600 tests (600-1200 seconds), you see how this rapidly increases.

The Weapon class could be refactored to remove asset loading, maybe by moving this to a load() function. This could reduce construction to simple variable initialization, which takes milliseconds. If you can reduce the time to 100-200 milliseconds, you can now get through 100 tests in 1-2 seconds instead of 1. 600 tests now execute in 6-12 seconds instead of 20 minutes.

Something is better than nothing

It may not be reasonable to refactor like this for speed increases. That’s OK. The primary goal is to add reliability to your systems, which increases overall productivity. I will always prefer a single, slow, test over no tests at all!

Unit Testing Games: Tip #1

I previously discussed some benefits of Test-Driven Development for Games. Next, I’ll offer some tips for effectively adopting TDD and unit testing.

Keep tests small

Game teams strive to keep the size of their game assets to a minimum. This conserves disk space, and allows packing more content into a single release. Your unit tests should also be small, though for different reasons.

Small tests aid in comprehension and provide real-world documentation that is always up-to-date. Naming functions is hard, and sometimes the name doesn’t properly convey behavior. Unit tests identify exactly HOW the function should be called and if/what it should return.

A passive benefit of unit testing is forcing you to examine your interfaces. Spaghetti code usually inflates the size of your tests. Functions with multiple dependencies are hard to understand and refactor. If the dependencies require sizable setup, you lose even more flexibility.

Larger tests also translates to more written code, which can be a barrier to a team accepting the process.

The Weapon class in this code sample is very coupled to other game objects. The fire() function in particular has implicit dependencies on the Player class (someone must be holding the weapon), the World class for line-of-sight checks, and the Sound Manager class to play sounds.

def fire(self, target):
    # Must fire at a valid target player
    if target is None:
        raise MissingTargetException()

    # Weapon must be held by someone to be fired
    if self._player is None:
        raise NoWeaponOwnerException()

    # If no ammo, play "empty click" sound and return
    if self._current_ammo == 0:
        sound_manager.play(self._empty_sound)
        return

    # Query current map to verify line-of-sight to target
    # Player carring this weapon is cached on construction
    if world.get_current_map().check_los(self._player, target):
        # Decrease ammo
        self._current_ammo = self._current_ammo - 1

        # Apply damage to target
        target.take_damage(self._damage)

        # Play weapon "fire" sound
        sound_manager.play(self._fire_sound)

    else:
        # Play "error" sound indicating no line of sight
        sound_manager.play(self._not_visible_sound)

    return

Let’s add test coverage to the fire() method. We can start by adding a test which verifies the ammo count is correctly reduced by one.

The ‘setup’ portion of the test must instantiate the dependencies.

def test_weapon_fire(self):
    # setup
    # =============================

    # Instantiate player object because Weapon is required to belong to someone,
    # and uses Player for line-of-sight checks.
    player = Player("user_player")

    # fire() requires a valid target
    target_player = Player("some_enemy")

    # Weapon plays sounds, and requires a World instance for line-of-sight
    sound_mgr = sound_manager()
    world_mgr = world()

    sound_mgr.init()
    world_mgr.init("some_map")

    # Weapon will load required sounds and art assets on its own
    weapon = Weapon("rifle")

    # Weapon has circular reference to owning Player
    player.add_weapon(weapon)


    # execute
    # =============================

    # Reduce ammo
    weapon.fire(target)
    actual_ammo_post_fire = weapon.get_current_ammo()


    # assert
    # =============================
    # Verify expected results
    
    # Verify we lost ammo on fire
    self.assertEqual(weapon.get_max_ammo() - 1, actual_ammo_post_fire)

Comments aside, you can see how convoluted things get. The number of implicit dependencies add overhead to object creation and test comprehension. Remember, a benefit of unit testing is providing real-world usage and documentation. A new engineer will need a baseline understanding of multiple systems to work on Weapons.

If the Weapon class were decoupled from players, and world functionality, things get simpler. This alternate fire() function no longer requires the World class for line-of-sight checks. You could argue this should be handled by an A.I. system.

Attachment to a Player instance is also removed, and fire() can now operate without a valid target. There is still an implicit link to the Sound Manager, though it’s not required. The dependency overhead is reduced to zero so you can now test the function in isolation. The goal is to test that ammo is decreased, not that the sound system works or that the target can take damage. This simplifies our system and test.

def fire(self, start, target):
    # If no ammo, play "empty click" sound and return
    if self._current_ammo == 0:
        if sound_manager:
            sound_manager.play(self._empty_sound)
        return

    # Decrease ammo
    self._current_ammo = self._current_ammo - 1

    # Play weapon "fire" sound
    if sound_manager:
        sound_manager.play(self._fire_sound)

    # Apply damage to target
    if target:
        target.take_damage(self._damage)

    return

The test below operates on the second Weapon class. Having fewer responsibilities allows for more maintainable code and simpler testing. This is much easier for a new engineer to grok.

def test_weapon_fire(self):
    # setup
    # =============================

    # This version will NOT implicitly load assets on construction
    weapon = Weapon("rifle")
    start_position = vector(2.0, 0.0, 2.0) 

    # execute
    # =============================

    # Reduce ammo
    # Note: we no longer require a target
    weapon.fire(start_position, None)
    actual_ammo_post_fire = weapon.get_current_ammo()

    # assert
    # =============================

    # Verify we lost ammo on fire
    self.assertEqual(weapon.get_max_ammo() - 1, actual_ammo_post_fire)

Poor interfaces and tightly coupled code usually lead to long and convoluted tests. Writing tests first force you to consider how your interfaces will be used and help create more maintainable code.

Intro to Test-Driven Development for Games

Video game companies rely heavily on QA testers during development, especially during crunch time. This is generally a time consuming process as it requires human interaction with a running instance of the game. To help reduce the bug counts and lower manual testing costs, companies already utilize forms of automated testing to catch problems. These tests are typically reactive, providing feedback after things fail in the main code repository.

Developers can use proactive measures such as Test-Driven Development, or TDD, to supplement their testing. TDD centers around frequent iteration of the code base, and supporting these changes with automated unit tests (tests that verify a small piece of functionality). The tests run in isolation (no game instance required), and without human interaction. Software Engineers write unit tests that verify functionality before writing code for a feature, adding confidence that features performs as expected. This greatly complements the QA team by reducing their workload.

Additionally, TDD improves code health by encouraging programming to interfaces instead of implementations, which helps make systems more modular. Evidence also suggests it can actually increase development speed.

The TDD process consists of 5 steps, with the last step repeating as needed:

  1. Write a test that verifies some functionality
  2. Run the test and watch it fail (since the code isn’t there yet)
  3. Write the least amount of code to make the test pass
  4. Watch test pass
  5. (repeat) Refactor code as needed, making sure the test continues to pass

For example, let’s say you need a function that adds 2 numbers. Your first step is to create a valid, but failing test for the function. It is valid because it will pass if the function behaves properly. It is failing because the assert at the end will not execute/compile due to the missing function. This first test should highlight the “success path” for the function and not be concerned about failure cases. The success test may be the most important for your function, as it demonstrates intended operation. Prove your function works as expected before trying to examine when it doesn’t. You will add expected failure cases later.

One pattern you can follow with your tests is setup->execute->assert. You setup any state or dependencies you need for the code you want tested, execute the code in question, then assert your expectations were met.

def test_add(self):
  # setup
  value1 = 2
  value2 = 3
  expected = 5

  # execute
  actual = add(value1, value2)

  # assert expected value is correct
  self.assertEqual(expected, actual)

Next, let’s write the “add” function

def add(a, b):
  return 5

As step 3 states, write the least amount of code necessary to pass the test. The test expects a return value of “5”, so the least amount of code to satisfy the test is to return “5”.

Step 4, the test will now pass.

At step 5, you refactor the function now, making sure the test continues to pass

def add(a, b):
  return a + b

After this change, you can now update the unit test to add another test. This also highlights inline testing instead of the setup-execute-assert pattern.

def test_add(self):
  # no setup

  # execute/assert
  self.assertEqual(5, add(2, 3))
  self.assertEqual(2, add(1, 1))

This is a trivial example, but it showcases standard TDD and how both the code and tests will evolve. I usually write tests first, but I rarely take the minimalist approach on the first pass. When starting out, it is good practice to start with the “proper” TDD to allow developers to get acquainted with the process.

Later, we will dive deeper into the characteristics of testing, examine how to apply them to game code, and discuss ways to implement testing into your current workflow.