ds_share_handbook/testing.qmd at master · jpcompartir/ds_share_handbook · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
---
execute:
  eval: false
---

::: {.callout-warning}
This document is not rendered by the handbook so some code samples may be out of date/not working. (sorry!)
:::

# Testing in R

When developing packages in R, we usually lean on {testthat}. Creating unit tests with {testthat} (leaving aside integration tests for now) is pretty simple, first we write a function which performs some actions, then we write some tests which check that our function still performs those actions - or if following Test Driven Development (TDD) practices, we write the tests first and then write the functionality, in either case, {testthat} makes it pretty seamless.

As a quick refresher, let's look at how to write some basic unit tests in R; for brevity we won't follow a strict TDD workflow - we'll test a couple of functions that we've inherited - `is_even()` and `is_odd()`.

::: {.callout-tip}
The `is_odd()` function calls the `is_even()` function, so it makes sense to test the `is_even()` function first. Once we've tested the implementation of `is_even()`, we only need to test the additional logic introduced by `is_odd()`.
:::

## Testable Functions

```{r}
is_even <- function(number) {

  stopifnot(is.numeric(number), number != 0)

  return(number %% 2 == 0)
}


is_odd <- function(number) {
  stopifnot(is.numeric(number), number != 0)

  return(!is_even(number))
}
```

Our two functions should take a number as an input, and check that the number isn't zero. `is_even` should return TRUE if the number is divisible by 2 and FALSE otherwise. `is_odd` calls `is_even` and then flips the truth value, so that if a number isn't even, and it's not 0, it's odd.

It clearly makes sense to test `is_even` first, because `is_odd` depends on it. So what should we test? Echoing Einstein's famous words on simplicity, tests should test everything the function does, nothing more and nothing less. Obviously we should check that our input validation is working, if we input 0 do we get an error, the same if we input a non-numeric. Then we should check a few return values, some that the function should return FALSE to and some that the function should return TRUE to. And then there's a slightly less obvious test - our function has one argument and that argument is mandatory, i.e. it has no default value; if we've made this decision we should have made it for a reason, so we should test the function does error if no value is set.

Reminder that unit tests should:

-   **be simple**:  *tests are not the place to show off what you can do, you should be able to understand at a glance what's being tested and how the test works i.e. favour writing each test out rather than wrapping a bunch in a `vapply()` or a `map()`.*
-   **be lightweight**: *you're usually going to want to write hundreds of them for a package and check them often. Each test should run in fractions of a second, a heavy test suite won't be used and so becomes self-defeating*
-   **be self-contained**: *don't pass data around between tests, each test_that block should be able to start and terminate in isolation*
-   **be informative**: *they should provide helpful error messages when they fail, so that you know precisely which part of your code's logic is broken and where*
-   **be comprehensive**: *if your code should do something, write a test to show it does*
-   **look both ways**: *if your code shouldn't do something, write a test to show it doesn't*

Writing tests may at times feel cumbersome, but only at the beginning. Once you've got a good test suite up development becomes more enjoyable - less anxiety associated with each change you make or feature you implement - and faster (trust!). You should often feel like you're insulting your own intelligence and that of your colleagues' by writing such a simple test "Well duh, of course it does that..."

## Our First Tests

There are a number of other, more specific tests for more advanced users, but let's stick to expect_error, expect_true, and expect_false for now.

```{r}
library(testthat)
test_that("is_even has an argument called number and it requires an input", {

  expect_true(names(formals(is_even)) == "number")
  expect_error(is_even(),
               regexp = 'argument "number" is missing')
})
```

We're going to make a change to is_even, to show that these tests can fail if the underlying logic of is_even changes resulting in changes in the function's behaviour (this isn't necessary except for explanatory purposes).

```{r}
is_even_inputs <- function() {
  test_that("is_even has an argument called number and it requires an input", {

    expect_true(
      names(formals(is_even)) == "number")

    expect_error(
      is_even(),
      regexp = 'argument "number" is missing'
    )
  })
}


is_even_inputs()
```

Ok, so the test passes. But what if I want to change the input is_even takes to 'x' which is a more common input?

```{r, error = TRUE}
is_even <- is_even <- function(x) {

  stopifnot(is.numeric(x), x != 0)

  return(x %% 2 == 0)
}

is_even_inputs()
```

We see that we get a test failure: -- Failure: is_even has an argument called number and it requires an input ----- names(formals(is_even)) == "number" is not TRUE

This is exactly what we wanted. We wrote a function, wrote some tests, changed the function's behaviour and then running our tests told us that we'd altered the function's behaviour. At this point we should either fix our function - if indeed we broke it - or update our tests. We'll fix the function as the tests are still doing what we want them to. Then we'll check our old tests still pass.

```{r}
is_even <- function(number) {

  stopifnot(is.numeric(number), number != 0)

  return(number %% 2 == 0)
}

is_even_inputs()
```

Ok, so let's carry on with testing the function. We'll establish that our function doesn't take 0 as an input, and that if we feed it a string, or a string that could be coerced into a numeric that the function errors. This last one might seem like a funny test, but we haven't explicitly asked our function to coerce its inputs, so we should check that it does not.

```{r}
test_that("is_even errors if given a non-numeric input, or 0 as an input", {
  expect_error(is_even(0),
               regexp = stringr::fixed('number != 0'))

  expect_error(
    is_even("string"),
    regexp = "is\\.numeric"
  )

  expect_error(
    is_even("10"),
    regexp = "is\\.numeric"
  )
})
```

And then finally we can test that the return values are what we expect:

```{r}
test_that("is_even returns a logical, and that logical is TRUE if given an even input, and FALSE if given an odd.", {
  expect_true(
    inherits(is_even(10), "logical")
  )

  expect_true(
    inherits(is_even(9), "logical")
  )

  expect_true(
    is_even(10) == TRUE
  )
  expect_false(
    is_even(9) == TRUE
    )

  #and another value, just to be sure...
  expect_true(
    is_even(10002) == TRUE
  )
})
```

## Refactoring is_odd

We've now tested that our is_even function does what it should, and doesn't do what it shouldn't. We could add more tests, like what happens if we input a data frame as number, or a factor? Or if 8938957 and 23665 are odds, but we feel quite confident that our current cases take care of those.

We haven't tested is_odd yet, but let's take another look at our function definitions and see if we can't simplify the logic somehwat.

```{r}
is_even <- function(number) {

  stopifnot(is.numeric(number), number != 0)

  return(number %% 2 == 0)
}

is_odd <- function(number) {
  stopifnot(is.numeric(number), number != 0)

  return(!is_even(number))
}
```

We've written a pretty lightweight and comprehensive test suite for is_even, so do we just go ahead and write the same tests for is_odd? We don't really need to, because is_odd calls is_even anyway. So let's simplify is_odd:

```{r}
is_odd <- function(number) {
  return(!is_even(number))
}
```

Informally test a few values:

```{r, error= TRUE}
is_odd("string")
is_odd(0)
```

So we can see that is_odd is producing the errors we would expect it to, because the logic is cemented in is_even. Our tests for is_odd don't *really* need to duplicate this logic, so we could test one each odd-signalling end digit, and each even-signalling end digit.

```{r}
test_that("is_odd returns TRUE for odd numbers and FALSE for even numbers", {

  expect_true(is_odd(11))
  expect_true(is_odd(333))
  expect_true(is_odd(555))
  expect_true(is_odd(37))
  expect_true(is_odd(49))


  expect_false(is_odd(10))
  expect_false(is_odd(4))
  expect_false(is_odd(638))
  expect_false(is_odd(132))
  expect_false(is_odd(666))
})
```

It's overkill to do this, but there's an important point to be made. You might look at this and think 'shouldn't I just apply a list of numbers, rather than write each test out, to avoid duplication?'

```{r}
test_that("is_odd returns TRUE for odd numbers and FALSE for even numbers", {
  odds <- list(11, 333, 555, 37, 49)
  lapply(odds, function(odd) {
    expect_true(is_odd(odd))
  })

  evens <- list(10, 4, 638, 132, 666)
  lapply(evens, function(even){
    expect_false(is_odd(even))
  })
})
```

## Keep it Simple, Stupid

Whilst this is generally good practice, it's not ideal in the case of testing because when a test fails, our error messages are less informative. For brevity we'll add an odd value to our evens list, and apply that list over our tests:

```{r, error = TRUE}

test_that("is_odd returns FALSE when given even inputs",{
  evens <- list(10, 4, 638, 132, 666, 17)
  lapply(evens, function(even){
    expect_false(is_odd(even))
  })
})

```

We see that the error message we get back doesn't tell us which of our inputs failed, just that we expected a FALSE and we got a TRUE, somewhere. In this case it's pretty obvious, but there are times when testing things like shiny UI components where it's tempting to put all the UI tags into a list of tags and l/vapply them into an expect function to keep the testing code concise and avoid duplication. However, we want our tests to be informative more than we want them to adhere to Do Not Repeat Yourself principles.

# Testing Shiny apps

Testing feels pretty straightforward for R packages with {testthat} but it was not built with Shiny in mind. Shiny introduces reactive programming to R users, and it's not self-evident how to test reactive components and applications via {testthat}'s traditional testing approach. In fact, when I sat down to start testing Shiny apps, I realised that not only could I not see how to do it, I didn't know how to articulate why I couldn't `just do it`. I stared at the screen for a while with that unpleasant sense of 'I don't know what I'm doing', looked at a few help pages, and eventually went back to building out more features (don't do this!).

Let's steal a basic shiny app from the `sidebarLayout` documentation. From the code it's pretty clear that we'll have a one page app, with a sidebar layout. In the sidebar we'll have a slider input which allows us to select a number of observations and then in the main panel we'll output a histogram. The server then reacts to changes in the slider's input, and generates a new histogram each time.

```{r, eval = FALSE}
library(shiny)

# Define UI
ui <- fluidPage(

  # Application title
  titlePanel("Hello Shiny!"),
  sidebarLayout(
    # Sidebar with a slider input
    sidebarPanel(
      sliderInput("obs",
                  "Number of observations:",
                  min = 0,
                  max = 1000,
                  value = 500)
    ),
    # Show a plot of the generated distribution
    mainPanel(
      plotOutput("distPlot")
    )
  )
)

# Server logic
server <- function(input, output, session) {
  output$distPlot <- renderPlot({
    hist(rnorm(input$obs))
  })
}
if (interactive()) {
# Complete app with UI and server components
shinyApp(ui, server)
}
```

## What's the problem?

Ideally we want to test the three main components - the UI, the server, and the call to combine the two. The first obstacle is the UI, it's not a function like we're used to testing. And we don't need to write a test to check that it's still not a function going forwards, we let the Shiny developers write their own tests.

```{r, eval=FALSE}
inherits(ui, "function")
```

I think it's quite common to build Shiny apps without really knowing what a shiny.tag.list is, and that's what a UI is.

```{r,eval=FALSE}
S3Class(ui)
```

And then there's the slightly puzzling unnamed list which has 4 elements

```{r, eval=FALSE}
length(ui); names(ui)
```

What are the elements, and what kinds of test can we run on them?

```{r, eval=FALSE}
ui[[1]]
ui[[2]]
ui[2]
ui[3]
class(ui[4][[1]][[1]])
```

We'll come back to this shortly.

The next problem is the server object - which is a function - but a slightly esoteric one.

```{r, eval=FALSE}
S3Class(server)
```

It takes three mandatory arguments - input, output, and session. The input argument is quite transparent - we use it to access inputs all the time when building Shiny apps, and similar for outputs with the output\$ object. As we can index them with \$ we think they're probably named lists of some description. But session is a little more opaque.

```{r}
formals(server)
```

So we know that if we want to test the server function we'll need to add input, output, and session but we don't really know what we should add there. Like the UI, we'll come back to this shortly.

So the shinyApp function, which has a more familiar look about it. It takes our ui and server as inputs, and builds the Shiny app for us. Source code:

```{r, eval = FALSE}
function (ui, server, onStart = NULL, options = list(), uiPattern = "/",
    enableBookmarking = NULL)
{
    if (!is.function(server)) {
        stop("`server` must be a function", call. = FALSE)
    }
    uiPattern <- sprintf("^%s$", uiPattern)
    httpHandler <- uiHttpHandler(ui, uiPattern)
    serverFuncSource <- function() {
        server
    }
    if (!is.null(enableBookmarking)) {
        bookmarkStore <- match.arg(enableBookmarking, c("url",
            "server", "disable"))
        enableBookmarking(bookmarkStore)
    }
    appOptions <- captureAppOptions()
    structure(list(httpHandler = httpHandler, serverFuncSource = serverFuncSource,
        onStart = onStart, options = options, appOptions = appOptions),
        class = "shiny.appobj")
}
```

There's some input validation,

inputs is a list of reactiveValues, output is a list of some values too. Session is... a bit different. And how do we access it programmatically?

Later - refactor to have min \> 0, as why would you want 0 breaks and to allow the erro?

# Testing a Golem Module

There is another layer of complexity if we build our apps with frameworks like {golem}. For the rest of this post, we'll assume some familiarity with {golem} and its modules.

In my case, the modules that I want to test take reactiveValues from other modules, or reactive objects, such as reactive data frames from other modules. This presents a barrier to testing, as in a general R or testthat session, we're not in a reactive context.

Now - I'm pretty sure I first read this in `Mastering Shiny by Hadley Wickham` - but it's important to remember that in R, virtually everything is a function, and reactives are no different. This means that we can mimic the behaviour of a reactive, by passing in a function to a module.

To make it more concrete, I have a module's server function which takes an id and a data frame. The function then calls the moduleServer function, which takes the id from my_module_server, and a server function as an input.

```{r, eval = FALSE}
my_module_server <- function(id, highlighted_dataframe) {
  moduleServer(id,
               function(input, output, session) # This is what we'd usually have as our server, e.g. server <- function(input, output, session)
  ) {

  }
}
```

At the moment the module doesn't actually do anything, but we have a skeleton in place and we can see that when we call `my_module_server` we have to provide an input for id and highlighted_dataframe.

Let's add some real logic, so that our module creates a reactive object, which updates whenever there's a change in our highlighted_dataframe input, or the updatePlotButton is pressed, and needs to have an x column in the highlighted_dataframe, plus an input set for topN and width + height. This module is a bit more complex, but still less complex than many modules will be.

```{r, eval = FALSE}
my_module_server <- function(id, highlighted_dataframe) {
  moduleServer(id, function(input, output, session)) {

    reactive_plot <- shiny::eventReactive( c(highlighted_dataframe(), input$updatePlotButton), {
      module_plot <- highlighted_dataframe() %>%
        make_module_plot(
          x_var = x,
          top_n = input$topN
        )
      return(module_plot)
    }

    output$modulePlot <- shiny::renderPlot({
      reactive_plot()
    }, res = 100, width = function() input$width, height = function() input$height
    )
    }
  }
}
```

In testing this module we want to know that given the right inputs, a plot is rendered. So how do we go about testing it?

My first pass with testServer was to do something like:

```{r}
#| eval: false
testServer(
  app = my_module_server,
  args = list(),
  expr = {
    ns <- session$ns

    #Check input isn't set
    expect_true(is.null(session$topN))

    #Set input
    session$setInputs(topN = 5)
    expect_true(input$topN == 5)

    #... some other code
  }
)
```

This test passed, as did the other tests that I wrote for the inputs, but then I realised that I could set anything as an input here, and the test would pass

```{r}
#| eval: false
testServer(
  app = my_module_server,
  args = list(),
  expr = {
    ns <- session$ns

    #Check input isn't set
    expect_true(is.null(session$topN))

    #Set input
    session$setInputs(topN = 5)
    expect_true(input$topN == 5)

    session$setInputs(shalabadaba = "shalabadooo")
    expect_equal(input$shalabadaba, "shalabadooo")
  }
)
```

So I realised that I wasn't testing what I thought I was, or what I needed to test. So I wanted to get a bit more information about what's actually happening in the testServer, like is there actually a reactive_plot being generated?

```{r, eval = FALSE}
testServer(
  app = my_module_server,
  args = list(),
  expr = {
    ns <- session$ns

    print(reactive_plot())
  }
)
```

So now I get the error that 'highlighted_dataframe was missing', which is a mandatory argument for the module server, now we're getting somewhere. Whereas before the tests were passing because they weren't really testing anything, the test is now failing in meaningful ways.

In more familiar R terms, the server was waiting until it had to do anything with reactive_plot before raising an error. So how do we solve it and check that a plot really is being generated?

```{r, eval = FALSE}

```

# Notes from other resources

## Mastering Shiny - Testing (Chapter 21 presently)

You can use browser() inside testServer to see what's going on with specific values/what your changes do and what will / won't work...

stopifnot(is.reactive(var)) - nice little trick for input validation in modules, e.g. for highlighted_dataframe()

testServer - Unlike the real world, time does not advance automatically. So if you want to test code that relies on reactiveTimer() or invalidateLater(), you'll need to manually advance time by calling session\$elapse(millis = 300).

testServer() ignores UI. That means inputs don't get default values, and no JavaScript works. Most importantly this means that you can't test the update\* functions, because they work by sending JavaScript to the browser to simulates user interactions. You'll require the next technique to test such code.

Wrap testServer in test_that

## Shiny App Packages - Testing (Section 3)

Testing the UI

```{r}
#| eval: false
mod_bigram_network_ui(id = "test")
```

## Gotchas + Reminders

Browse\[1\]\> class(bigram_reactive) \[1\] "reactive.event" "reactiveExpr" "reactive" "function"\
Browse\[1\]\> x \<- bigram_reactive() Browse\[1\]\> class(x) \[1\] "ggraph" "gg" "ggplot"

bigram_reactive is the reactive expression, untriggered. bigram_reactive() is the actual ggraph/gg/ggplot object now triggered. Always good to remind onself of this and what that means when interacting with the objects at various points of the SWD process.

Can use `ui <- mod_bigram_network_ui(id = "test")` and type ui to see all of the shiny tags, and then type ui\[\[1\]\] to render the UI in a viewer object, maybe easier than end app, run_app() -\> click to app.

## Testing interaction with nested modules

the mod_group_vol_time_server gets its filtered data out of the mod_daterange_input_server. This presents a challenge because each module is namespaced, so we can't just `setInputs(dateRange = list(as.Date("2023-01-03"), as.Date("2023-01-09"))` because we'd be setting the value of dateRange inside the wrong namespace - the namespace of mod_group_vol_time_server.

This means when we try to access the `group_date_range_vot$over_time_data()` to generate our groupued volume over time char, we get an error that the dateRange isn't set. So with the help of `ui <- mod_group_vol_time_ui(id = "test")` we can look for the correct input to set `dateRange` to, which is `dateRangeGroupVol-dateRange`. We've unearthed a general truth about nested modules. Our parent module is dateRangeGroupVol, our child is dateRange, so we join the two with `dateRangeGroupVol-dateRange`, if dateRange had a child module called dateRangeChild, we'd join the three with `dateRangeGroupVol-dateRange-dateRangeChild`!

```{r, eval = FALSE}

session$setInputs(
  # ns("dateRange") = list(as.Date("2023-01-03"), as.Date("2023-01-09")),
  dateBreak = "day",
  height = 600,
  width = 400,
  nrow = 2,
  #So to pass stuff into the modules that need them we can pre-prepend the namespace with syms
  `dateRangeGroupVol-dateRange` = list(as.Date("2023-01-03"), as.Date("2023-01-09"))
)
```

## Emulating a reactive with the return values specified:

because in our module we try to: label_ids \<- as.numeric(selected_range()\$key)

We can't just send in selected_range as c(1, 2, 3). We need to send it in as an object which imitates a reactive, with a return value of list(key = c(...)) so that when we call it in the module later on, we get the current value, and we can access the key. Tricky.

selected_range = function(){ return(list(key = c(1, 2, 3))) }

If wanting to change the length of the generate_dummy_data for whatever reason: function(){return(generate_dummy_data(length = 20))}