Skip to content

Latest commit

 

History

History
383 lines (276 loc) · 18.7 KB

README.md

File metadata and controls

383 lines (276 loc) · 18.7 KB

Testing the library

Since there are many people relying on this library working properly, and we don't want to accidentally introduce some changes which cause it to break, we're using unit-tests ensuring that the individual functions in our code-base work properly. This guide will help you get started with writing new unit-tests, or editing existing ones, which is often needed when changing things around.

NOTE: This is a practical guide to quickly get you started with writing unit-tests, not a full introduction to what unit-tests are, and it will only cover some very basics which can help you understand our unit-tests. If you're looking for a full introduction, you can take a look at the Additional Resources section at the bottom.

Tools

We are using the following modules and packages for our unit tests:

We decided on using pytest instead of the unittest module from standard library since it's much more beginner friendly and it's generally easier to use.

Running tests

When running the tests, you should always be in an activated virtual environment (or use poetry run to run commands for the tests from within the environment).

To make things simpler, we made a few shortcuts/aliases using taskipy:

  • poetry run task test-nocov will run all unit-tests using pytest.
  • poetry run task test will run pytest with pytest-cov, collecting code coverage information
  • poetry run task test /path/to/test.py will run specific test
  • poetry run task retest will rerun only previously failed tests

When actively developing, you'll most likely only be working on some portion of the code-base, and as the result, you won't need to run the entire test suite, instead you can only run tests for a specific file with

poetry run task test-nocov /path/to/test.py

When you are done and are preparing to commit and push your code, it's a good idea to run the entire test suite as a sanity check that you haven't accidentally introduced some unexpected bugs:

poetry run task test

Writing tests

Since consistency is an important consideration for collaborative projects, we have written some guidelines on writing tests for the project. In addition to these guidelines, it's a good idea to look at the existing code base for examples (e.g., test_connection.py).

File and directory structure

To organize our test suite, we have chosen to mirror the directory structure of mcproto in the tests subdirectory. This makes it easy to find the relevant tests by providing a natural grouping of files. More general testing files, such as helpers.py are located directly in the tests subdirectory.

All files containing tests should have a filename starting with test_ to make sure pytest will discover them. This prefix is typically followed by the name of the file the tests are written for. If needed, a test file can contain multiple test classes, both to provide structure and to be able to provide different fixtures/set-up methods for different groups of tests.

Writing independent tests

When writing unit tests, it's really important to make sure that each test that you write runs independently from all of the other tests. This both means that the code you write for one test shouldn't influence the result of another test and that if one tests fails, the other tests should still run.

The basis for this is that when you write a test method, it should really only test a single aspect of the thing you're testing. This often means that you do not write one large test that tests "everything" that can be tested for a function, but rather that you write multiple smaller tests that each test a specific branch/path/condition of the function under scrutiny.

To make sure you're not repeating the same set-up steps in all these smaller tests, pytest provides fixtures that can be executed before and after each test is run. In addition to test fixtures, it also provides support for parametrization, which is a way of re-running the same tests with different values. If there's a failure, pytest will then show us the values that were being used when this failure occurred, making it a much better solution than just manually using them in the test function.

Mocking

As we are trying to test our "units" of code independently, we want to make sure that we don't rely on objects and data generated by "external" code. If we did, the we might end up observing a failure that something external, and not a failure in the code we're actually testing.

However, the objects that we're trying to test often depend on these external pieces of code. Fortunately, there is a solution to for that: we use fake objects, that act like the true objects. We call these fake objects "mocks".

To create these mock objects, we use the unittest.mock module (part of python's standard library). In addition, we have also defined some helper mixin classes, to make our mocks behave how we want (see examples below).

As an example of mocking, let's create a fake socket, which the connection class can use to make the send calls, when sending over some data, That way, we don't have to actually establish a connection to some external server, and can instead test out that the connection class works properly and calls our mocked methods with correct data.

import socket
from unittest.mock import Mock

from mcproto.connection import TCPSyncConnection


def test_connection_sends_correct_data():
    mock_socket = Mock(spec_set=socket.socket)
    conn = TCPSyncConnection(mock_socket)

    data = bytearray("hello", "utf-8")
    conn.write(data)
    mock_socket.send.assert_called_once_with(data)

In the example above, we've just made sure that when we try to write some data into a connection class, it properly call the send method of socket, with our data, sending them out.

The spec_set attribute limits what attributes will be accessible through our mock socket, for example mock_socket.close will work, because the socket.socket class has it defined. However mock_socket.abc will not be accessible an will produce an error, because the socket class doesn't define it.

By default, a mock will allow access to any attribute, which is however not what we usually want, as a test should fail if an attribute that shouldn't exist is accessed. That's why we often end up setting spec_set with our mocks.

Alright, now let's consider a bit more interesting example. What if we wanted to ensure that our connection can properly read data that were sent to us through a socket?

def test_connection_reads_correct_data():
    mock_socket = Mock(spec_set=socket.socket)
    mock_socket.recv.return_value = bytearray("data", "utf-8")
    conn = TCPSyncConnection(mock_socket)

    received = conn.read(4)  # 4 bytes, for 4 characters in the word "data"
    assert received == bytearray("data", "utf-8")
    mock_socket.recv.assert_called_once_with(4)

Cool! But in real tests, we'll need something a bit more complicated, as right now, our recv method will just naively return 4 byte long data, no matter what the passed length attribute was. We can afford to do this here, as we know we'll be reading 4 bytes and we'll only make one recv call to do so. But what if our connection actually read the data procedurally, only reading a few bytes at a time, and then just joining them together?

Well, this is a bit more complex, but it's still doable, let's see it:

from unittest.mock import Mock

from mcproto.connection import TCPSyncConnection
from tests.helpers import CustomMockMixin  # Explained later, in it's own section


class ReadFunctionMock(Mock):
    def __init__(self, *a, combined_data: bytearray, **kw):
        super().__init__(*a, **kw)
        self.combined_data = combined_data

    def __call__(self, length: int) -> bytearray:
        """Override mock's __call__ to make it return part of our combined_data bytearray.

        This allows us to define the combined data we want the mocked read function to be
        returning, and have each call only take requested part (length) of that data.
        """
        self.return_value = self.combined_data[:length]
        del self.combined_data[:length]
        return super().__call__(length)

class MockSocket(CustomMockMixin, Mock):
    spec_set = socket.socket

    def __init__(self, *args, read_data: bytearray, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        self._recv = ReadFunctionMock(combined_data=read_data)

    def recv(self, length: int) -> bytearray:
        return self._recv(length)


def test_connection_partial_read():
    mock_socket = MockSocket(read_data=bytearray("data", "utf-8"))
    conn = TCPSyncConnection(mock_socket)

    data1 = conn.read(2)
    assert data1 == bytearray("da", "utf-8")
    data2 = conn.read(2)
    assert data2 == bytearray("ta", "utf-8")

def test_connection_empty_read_fails():
    mock_socket = MockSocket(read_data=bytearray())
    conn = TCPSyncConnection(mock_socket)

    with pytest.raises(IOError, match="Server did not respond with any information."):
        conn.read(1)

Well, that was a lot! But it finally gave us an idea of how mocks can look like in tests, and how they help us represent the objects that they're "acting" to be.

Mocking coroutines

By default, unittest.mock.Mock and unittest.mock.MagicMock classes cannot mock coroutines, since __call__ method they provide is synchronous. The AsyncMock that has been introduced in python 3.8 is an asynchronous version of MagicMock, that can be used anywhere a coroutine is expected.

CustomMockMixin class

While Mock classes are pretty well written, there are some features which we often want to change. For this reason, we have a special mixin class: tests.helpers.CustomMockMixin, which performs these custom overrides for us.

Namely, we stop the propagation of spec_set restricted mocks in child mocks. Let's see an example to better understand what this means:

from tests.helpers import CustomMockMixin

class CustomDictMock(CustomMockMixin, Mock):
    spec_set = dict

normal_mock = Mock(spec_set=dict)
custom_mock = CustomDictMock()

# Let's run the `pop` method, which is accessible from both mocks, as it's a
# part of the `dict`'s specification.
x = normal_mock.pop("abc")
y = custom_mock.pop("abc")

x.foobar()  # TypeError: No such attribute!
y.foobar()  # Works!

x.pop("x")  # Works
y.pop("x")  # Works

As you can see from the example above, by default, mocks return new child mocks whenever any attribute is accessed. However with mocks limited to some spec_set, these child mocks will also be limited to the same spec_set. However in most cases, attributes/functions of the mocked classes wouldn't actually hold/return instances of that same class. They can really hold anything, and so this kind of limitation doesn't really make sense, and so we instead return regular unrestricted mock classes as the child mocks.

Additionally, the CustomMockMixin also provides support for using spec_set as a class attribute, which regular mocks don't have. This has proven to be quite useful when making custom mock classes, as the alternative would be to override __init__ and pass the spec_set attribute manually, each time.

Over time, more helpful features might be added to this class, and so it's advised to always inherit from it whenever making a mock object, unless you have a good reason not to.

Patching

Even though mocking is a great way to let us use fake objects acting as real ones, without patching, we can only use mocks as arguments. However that greatly limits us in what we can test, as some functions may be calling/referencing the external resources that we'd like to mock directly inside of them, without being overridable through arguments.

Cases like these are when patching comes into the picture. Basically, patching is just about (usually temporarily) replacing some built-in / from external code object, by a mock, or some other object that we can control from the tests.

A good example would be for example the open function for reading/writing files. We likely don't want any actual files to be written in the tests, however we might need to test a function that writes these files, and perhaps check that the content written matches some pattern, ensuring that it works properly.

While there is some built-in support for caching in the unittest.mock library, we generally use pytest's monkeypatching as it can act as a fixture and integrates well with the rest of our test codebase, which is written with pytest in mind.

Some considerations

Finally, there are some considerations to make when writing tests, both for writing tests in general and for writing tests for our project in particular.

Test coverage is a starting point

Having test coverage is a good starting point for unit testing: If a part of your code was not covered by a test, we know that we have not tested it properly. The reverse is unfortunately not true: Even if the code we are testing has 100% branch coverage, it does not mean it's fully tested or guaranteed to work.

One problem is that 100% branch coverage may be misleading if we haven't tested our code against all the realistic input it may get in production. For instance, take a look at the following format_join_time function and the test we've written for it:

# Source file:
from typing import Optional
from datetime import datetime

def format_join_time(time: Optional[datetime] = None, name: str) -> str:
    str_time = time.strfptime("%d-%m-%Y") if time else "unknown"
    return f"User {name!r} has joined at: {str_time}"

# Test file:
from source_file import format_join_time

def test_format_join_time():
    res = format_join_time("ItsDrike", None)
    assert res == "User 'ItsDrike' has joined, time: unknown"

If you were to run this test, the function pass the test, and the branch coverage would show 100% coverage for this function. Can you spot the bug the test suite did not catch?

The problem here is that we have only tested our function with a time that was None. That means that time.strptime("%d-%m-%Y) was never executed during our test, leading to us missing the spelling mistake in strfptime (it should be strftime).

Adding another test would not increase the test coverage we have, but it does ensure that we'll notice that this function can fail with realistic data

def test_format_join_time_with_non_none_time():
    res = format_join_time("ItsDrike", datetime(2022, 12, 31)
    assert res == "User 'ItsDrike' has joined, time: 2022-12-31"

Leading to the test catching our bug:

collected 2 items
run-last-failure: rerun previous 1 failure first

tests/test_foo.py::test_format_join_time_with_non_none_time FAILED                                [ 50%]
tests/test_foo.py::test_format_join_time PASSED                                                   [100%]

=============================================== FAILURES ===============================================
_______________________________ test_format_join_time_with_non_none_time _______________________________

    def test_format_join_time_with_non_none_time():
>       res = format_join_time("ItsDrike", datetime(2022, 12, 31))

tests/test_foo.py:11:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

name = 'ItsDrike', time = datetime.datetime(2022, 12, 31, 0, 0)

    def format_join_time(name: str, time: Optional[datetime] = None) -> str:
>       str_time = time.strfptime("%d-%m-%Y") if time else "unknown"
E       AttributeError: 'datetime.datetime' object has no attribute 'strfptime'. Did you mean: 'strftime'?

mcproto/foo.py:5: AttributeError
======================================= short test summary info ========================================
FAILED tests/test_foo.py::test_format_join_time_with_non_none_time - AttributeError: 'datetime.datetime'
 object has no attribute 'strfptime'. Did you mean: 'strftime'?
=================-============= 1 failed, 1 passed, 2 warnings in 0.02s ================================

What's more, even if the spelling mistake would not have been there, the first test did not test if the format_join_time function formatted the join time according to the output we actually want to see.

All in all, it's not only important to consider if all statements or branches were touched at least once with a test, but also if they are extensively tested in all situations that may happen in production.

Unit Testing vs Integration Testing

Another restriction of unit testing is that it tests, well, in units. Even if we can guarantee that the units work as they should independently, we have no guarantee that they will actually work well together. Even more, while the mocking described above gives us a lot of flexibility in factoring out external code, we are work under the implicit assumption that we fully understand those external parts and utilize it correctly. What if our mocked socket object works with a send method, but it got changed to a send_message method in a recent update? It could mean our tests are passing, but the code it's testing still doesn't work in production.

The answer to this is that we also need to make sure that the individual parts come together into a working application. Since we currently have no automated integration tests or functional tests, that means it's still very important to test out the code you've written manually in addition to the unit tests you've written.

Additional resources

Footnotes

This document was heavily inspired by python-discord's tests README