| 1 | * Functional programming simplifies test-driven development |
|---|
| 2 | |
|---|
| 3 | Extreme Programming includes a fundamental programming technique |
|---|
| 4 | called Test Driven Development. The idea is to let the program and the |
|---|
| 5 | design emerge from a series of cycles: |
|---|
| 6 | |
|---|
| 7 | - write an assertion about what the code should do; |
|---|
| 8 | - make the code pass the assertion; |
|---|
| 9 | - refactor the code (simplify it, make it expressive and remove |
|---|
| 10 | - duplication, all without breaking the tests). |
|---|
| 11 | |
|---|
| 12 | The first step, writing an assertion about the code, comes in various |
|---|
| 13 | levels of complexity. |
|---|
| 14 | |
|---|
| 15 | ** Canonical programmer tests |
|---|
| 16 | |
|---|
| 17 | The most straightforward assertions look like: |
|---|
| 18 | |
|---|
| 19 | assertEquals ("hello", substring (5, "hello world")); |
|---|
| 20 | |
|---|
| 21 | The simplicity comes from the fact that the function under test is |
|---|
| 22 | free-standing (it does not require an object to be instanciated), its |
|---|
| 23 | arguments are of primitive types (which may be provided as literals), |
|---|
| 24 | and it has no side-effects (it returns a value which depends only on |
|---|
| 25 | its arguments, and does nothing else). |
|---|
| 26 | |
|---|
| 27 | When assertions can be that straightforward, it only takes a few more |
|---|
| 28 | lines of test code to completely define the behaviour of the function: |
|---|
| 29 | |
|---|
| 30 | assertEquals ("orld", substring (-4, "hello world")); |
|---|
| 31 | assertEquals ("", substring (0, "hello world")); |
|---|
| 32 | assertEquals ("", substring (5, "")); |
|---|
| 33 | assertEquals ("", substring (5, NULL)); |
|---|
| 34 | |
|---|
| 35 | The conciseness and the ability to see side by side all the different |
|---|
| 36 | cases contribute to turning these tests into documentation of the |
|---|
| 37 | tested function. Such a test can also easily be re-expressed in a |
|---|
| 38 | data-driven fashion. |
|---|
| 39 | |
|---|
| 40 | ** The problem with objects |
|---|
| 41 | |
|---|
| 42 | Several characteristics of object-oriented languages contribute to |
|---|
| 43 | making most test assertions more complex than this. |
|---|
| 44 | |
|---|
| 45 | *** Object initialisation |
|---|
| 46 | |
|---|
| 47 | If the function being tested is not free-standing, its target object |
|---|
| 48 | needs to be instanciated and initialised. The same is true of any |
|---|
| 49 | arguments that are not primitive types. |
|---|
| 50 | |
|---|
| 51 | In the best of cases, a default initialisation is appropriate, or the |
|---|
| 52 | object provides convenient initialisation functions from one or more |
|---|
| 53 | appropriate primitive types. Often, however, putting the object under |
|---|
| 54 | test or an argument into the desired state may require initialising a |
|---|
| 55 | separate object, or calling several methods on the object. |
|---|
| 56 | |
|---|
| 57 | *** Complex or multiple return types |
|---|
| 58 | |
|---|
| 59 | When a function produces several results, procedural |
|---|
| 60 | programming requires the use of "out" arguments |
|---|
| 61 | (arguments that will be modified by the |
|---|
| 62 | function). These results cannot be tested in a single |
|---|
| 63 | expression: the assertions must be made after the call. |
|---|
| 64 | |
|---|
| 65 | Object-oriented programming offers the additional possibility of |
|---|
| 66 | returning an object, i.e. a complex type which may hold more than one |
|---|
| 67 | piece of information. In some cases, when the returned object offers a |
|---|
| 68 | method which tells us what we want to know, testing it is |
|---|
| 69 | straightforward. Often, however, it is necessary to make several |
|---|
| 70 | assertions on it to be certain of its state. |
|---|
| 71 | |
|---|
| 72 | *** State must be queried |
|---|
| 73 | |
|---|
| 74 | A fundamental principle of object-oriented design is |
|---|
| 75 | encapsulation and data hiding. In the context of |
|---|
| 76 | testing, the consequence of this is that the effects of |
|---|
| 77 | a given function call are rarely all visible in the |
|---|
| 78 | return value. Separate methods (queries, or "getters" as |
|---|
| 79 | they are called in Java) must be called to inspect the |
|---|
| 80 | state of an object after the function under test has |
|---|
| 81 | been called. Thus test expressions can no longer be |
|---|
| 82 | purely declarative (in/out). |
|---|
| 83 | |
|---|
| 84 | *** Effects propagate |
|---|
| 85 | |
|---|
| 86 | Object-oriented concepts are frequently used to model |
|---|
| 87 | something, typically the application domain. This |
|---|
| 88 | implies creating relationships (owns, uses...) between |
|---|
| 89 | objects. As a result, the effects of calling a given |
|---|
| 90 | function, may not even be localised in the target |
|---|
| 91 | object! In that case, testing a single function will |
|---|
| 92 | require more work setting up the secondary objects, and |
|---|
| 93 | more work retrieving and querying them. |
|---|
| 94 | |
|---|
| 95 | A popular alternative involves providing mock objects |
|---|
| 96 | to the object under test, and testing that it interacts |
|---|
| 97 | as expected with these. This approach doesn't however |
|---|
| 98 | reduce the total amount of work involved in setting |
|---|
| 99 | things up before calling the function and/or |
|---|
| 100 | investigating things after the call. |
|---|
| 101 | |
|---|
| 102 | *** Inheritance |
|---|
| 103 | |
|---|
| 104 | Finally, inheritance is another cornerstone of |
|---|
| 105 | object-oriented design which adds complexity to |
|---|
| 106 | tests. Inheritance defers all or part of the work done |
|---|
| 107 | by a function into subclasses. |
|---|
| 108 | |
|---|
| 109 | When all the work is deferred, there is nothing to test |
|---|
| 110 | in the base class. This is simple, but not particularly |
|---|
| 111 | useful (i.e. there is a lack of tests). |
|---|
| 112 | |
|---|
| 113 | When part of the work is deferred, it is possible to |
|---|
| 114 | test those parts that have not been deferred, but this |
|---|
| 115 | requires creating a subclass just for the purposes of |
|---|
| 116 | the test, which doesn't contribute to simple tests. |
|---|
| 117 | |
|---|
| 118 | ** Introducing functional programming |
|---|
| 119 | |
|---|
| 120 | Functional languages offer a different programming paradigm from |
|---|
| 121 | object orientation, which happens to be ideally suited to the writing |
|---|
| 122 | of canonically simple programmer tests. |
|---|
| 123 | |
|---|
| 124 | Indeed, the "raison d'être" of functional programming is known as |
|---|
| 125 | "referential transparency", i.e. that functions have no side effects: |
|---|
| 126 | they are given immutable arguments and return immutable results. The |
|---|
| 127 | motivation for this was to make it simpler to /reason/ about programs, |
|---|
| 128 | and to make it possible to prove their correctness. It turns out that |
|---|
| 129 | this property, along with a few other nice features that come with |
|---|
| 130 | functional languages, makes it very simple to write simple, expressive |
|---|
| 131 | programmer tests. |
|---|
| 132 | |
|---|
| 133 | *** Dynamic data structures |
|---|
| 134 | |
|---|
| 135 | Functional programming promotes the exact opposite of object oriented |
|---|
| 136 | encapsulation: data are kept separate from the functions that handle |
|---|
| 137 | them. Data structures are visible, get passed into functions, and new |
|---|
| 138 | ones come out. |
|---|
| 139 | |
|---|
| 140 | In order to promote program correctness, data is also immutable: a |
|---|
| 141 | function cannot modify the data it receives, it can only produce new |
|---|
| 142 | data. Many functional languages go even further by making it |
|---|
| 143 | impossible to modify the value assigned to variables within a given |
|---|
| 144 | function (this is often referred to as single-assignment). |
|---|
| 145 | |
|---|
| 146 | To make all this practicable, it needs to be easy to create complex |
|---|
| 147 | data structures on the fly as easily as it is to write a litteral |
|---|
| 148 | number or other primitive type. Functional languages usually have |
|---|
| 149 | built-in lists and tuples, from which arbitrarily complex data |
|---|
| 150 | structures can be built. |
|---|
| 151 | |
|---|
| 152 | This happens to be exactly what is needed, when expressing unit tests, |
|---|
| 153 | to solve two of the problems identified with objects: initialization |
|---|
| 154 | and complex/multiple return types. |
|---|
| 155 | |
|---|
| 156 | In the case of initialization, arbitrarily complex data structures |
|---|
| 157 | can be expressed as literals: |
|---|
| 158 | |
|---|
| 159 | ... |
|---|
| 160 | |
|---|
| 161 | As for complex or multiple return types, in functional languages |
|---|
| 162 | these are always handled by returning a tuple or list of |
|---|
| 163 | results, because arguments are immutable. And the return value(s) can |
|---|
| 164 | simply be compared to a literal with the expeced tuple of list: |
|---|
| 165 | |
|---|
| 166 | ... |
|---|
| 167 | |
|---|
| 168 | Of course this would become unwieldy with very complex data |
|---|
| 169 | structures if it were not for pattern matching. |
|---|
| 170 | |
|---|
| 171 | *** Pattern matching |
|---|
| 172 | |
|---|
| 173 | Pattern matching in single-assignment functional languages (WHICH???) is a |
|---|
| 174 | powerful mecanism that combines assignment, assertion, packing and |
|---|
| 175 | unpacking of data structures. Examples will provide the clearest |
|---|
| 176 | explanation. |
|---|
| 177 | |
|---|
| 178 | First, an ordinary-looking expression: |
|---|
| 179 | |
|---|
| 180 | Author = "Knuth" |
|---|
| 181 | |
|---|
| 182 | An attempt is made to match the right-hand side with the left-hand |
|---|
| 183 | side. Assuming the variable Author has not been bound before, the match |
|---|
| 184 | succeeds, which means that the variable Author is bound to |
|---|
| 185 | "Knuth". This is like an assignment. |
|---|
| 186 | |
|---|
| 187 | Now, written the other way round: |
|---|
| 188 | |
|---|
| 189 | "Knuth" = Author |
|---|
| 190 | |
|---|
| 191 | This time, an attempt is made to match the value of Author to a |
|---|
| 192 | constant expression: the match succeeds (because Author happens to be |
|---|
| 193 | bound to the same value). This is like an assertion. |
|---|
| 194 | |
|---|
| 195 | The following assertion-like match fails (throws a runtime exception): |
|---|
| 196 | |
|---|
| 197 | "Dijkstra" = Author |
|---|
| 198 | |
|---|
| 199 | Indeed, Author is bound to the value "Knuth", which doesn't match the |
|---|
| 200 | constant expression on the left-hand side. |
|---|
| 201 | |
|---|
| 202 | Now if we write: |
|---|
| 203 | |
|---|
| 204 | Author = "Dijkstra" |
|---|
| 205 | |
|---|
| 206 | This match also fails, because Author has already been bound so its |
|---|
| 207 | value can no longer be changed (single assignment). |
|---|
| 208 | |
|---|
| 209 | Now let's introduce data structures: |
|---|
| 210 | |
|---|
| 211 | Book = {123, Author, "The Art of Programming"} |
|---|
| 212 | |
|---|
| 213 | The variable Book is unbound, so the match succeeds. Here, in |
|---|
| 214 | addition to an assignment, we have packed a tuple into a single |
|---|
| 215 | variable. |
|---|
| 216 | |
|---|
| 217 | Now, we can do a simple assertion: |
|---|
| 218 | |
|---|
| 219 | {123, "Knuth", "The Art of Programming"} = Book |
|---|
| 220 | |
|---|
| 221 | Or, we can do some unpacking: |
|---|
| 222 | |
|---|
| 223 | {Id, Author2, Title} = Book. |
|---|
| 224 | |
|---|
| 225 | This expression is actually doing three things: |
|---|
| 226 | |
|---|
| 227 | 1) Asserting that Book is bound to a 3-tuple |
|---|
| 228 | 2) Unpacking the 3-tuple |
|---|
| 229 | 3) Binding its elements to new variables. |
|---|
| 230 | |
|---|
| 231 | We could have done more asserting and less binding: |
|---|
| 232 | |
|---|
| 233 | {123, "Knuth", Title2} = Book |
|---|
| 234 | |
|---|
| 235 | This only binds the title, but asserts not only that book holds a |
|---|
| 236 | 3-tuple, but also the values of its first two elements. |
|---|
| 237 | |
|---|
| 238 | Finally, we could have done less binding /and/ less asserting, using |
|---|
| 239 | the special "ignore" variable: |
|---|
| 240 | |
|---|
| 241 | {_, _, Title3} = Book. |
|---|
| 242 | |
|---|
| 243 | "_" is a special variable that matches anything and never gets bound. |
|---|
| 244 | |
|---|
| 245 | With pattern matching, the components of complex values returned by |
|---|
| 246 | functions can be tested or bound for further use, all in a single |
|---|
| 247 | expression. |
|---|
| 248 | |
|---|
| 249 | Here is a complete example of a unit test: |
|---|
| 250 | |
|---|
| 251 | unique_test() -> |
|---|
| 252 | [] = unique([]), |
|---|
| 253 | [1, 3, 8] = unique([1, 3, 8]), |
|---|
| 254 | [1, 3, 6, 12] = unique([1, 1, 3, 1, 6, 12, 6, 1]). |
|---|
| 255 | |
|---|
| 256 | *** Declarative programming |
|---|
| 257 | |
|---|
| 258 | Pattern matching has another use in functional languages, which is to define functions declaratively. A function can have several clauses, according to the form (pattern) of its arguments. Typically, there will be a clause for the general case and specific clauses for borderline cases. There is a profound ressemblance, and synergy, between this and TDD, especially in its purest form when a function is designed gradually by making it work for increasingly complex examples, one at a time. |
|---|
| 259 | |
|---|
| 260 | For example, let's revisit the classic Fibonacci example, which Kent Beck uses in "Test-Driven Development". |
|---|
| 261 | |
|---|
| 262 | We'll start by creating a unit test and making an assertion about a single data point: |
|---|
| 263 | |
|---|
| 264 | >>>> Cf. adlib:contains/2 (DW au clavier). |
|---|
| 265 | |
|---|
| 266 | And that (aside from tail-call optimization) is the final answer in functional, declarative style. Note how naturally it came out of the tests. The two special cases never even get modified! |
|---|
| 267 | |
|---|
| 268 | *** Lambda, the ultimate mock |
|---|
| 269 | |
|---|
| 270 | Referential transparency, dynamic data structures and pattern matching all contribute to functions being easier to test, but what about the last step in the TDD cycle: simplifying the code through expressiveness and abstraction? |
|---|
| 271 | |
|---|
| 272 | Object-oriented languages use inheritance, interfaces and polymorphism for abstraction. I have often argued that TDD is good for object-oriented design because it pushes for the creation of interfaces or base classes, in order to attain testability. However, testing complex object designs is itself not easy, and as pointed out above, requires special techniques such as mocks. One of the problems with mocking is that it can't be done if the code under test has not been designed with this in mind (i.e. has been programmed to an interface rather than to a concrete class). |
|---|
| 273 | |
|---|
| 274 | In functional programming, the abstraction mecanism is the higher-order function: a function that only performs a restricted, abstract task, and delegates the rest (the specifics) to another function. The canonical example is a function that maps each element in a list to another. The abstract, higher-order function only deals with the traversal and the creation of the resulting list. The specific transformation is performed by another function. |
|---|
| 275 | |
|---|
| 276 | >>>> Convert previous unit test into data-driven (NC au clavier). |
|---|
| 277 | |
|---|
| 278 | This form of abstraction offers several advantages, in terms of testability, over object-oriented design: |
|---|
| 279 | |
|---|
| 280 | - the abstraction mecanism of functional programming, higher-order functions, intrinsically and necessarily offers an 'interface' or 'protocol' (the function signature); |
|---|
| 281 | |
|---|
| 282 | - the specific function must be passed in to the generic function, making it easy to pass in a dummy for the purposes of testing; |
|---|
| 283 | |
|---|
| 284 | - lamda expressions (closures) make it possible to create such dummy functions on the fly, within the test itself. |
|---|
| 285 | |
|---|
| 286 | - pattern matching (and dynamic typing) can drastically simplify the code of the lambda to avoid distracting from the generic function under test. |
|---|
| 287 | |
|---|
| 288 | >>>> TDD de adlib:first (NC au clavier). |
|---|