Conditional Control Flow in R

What is Control Flow?

One of the fundamental techniques in imperative-style programming is the use of control flow constructs to dynamically control the execution of a program. These tools are critical to the creation of anything but the most basic functions and allow programmers to specify abstract procedures with code.

Currently, R supports the following control flow constructs:

Name Syntax
if if(cond) cons.expr else if(alt.cond) alt.expr ... else final.expr
for for(var in seq) expr
while while(cond) expr
repeat repeat expr

While, one might already be familiar with these constructs due to experience with another programming language, their incarnations in R posses a few peculiar properties. Foremost, in R, these constructs are actually expressions (as opposed to statements in other languages). Consequently, they have a return value which can be bound to a name (variable), or quoted and evaluated later. Such behavior can be surprising at first, but is consistent with R’s principle of treating almost every syntactic element as an expression.

Conditional Control Constructs

In this article, we’ll examine what I’ll define as the conditional control constructs – those control constructs that alter program execution based on the truth value of some conditional expression. In R, there are two conditional control constructs: if and while.

If

The if control construct allows for the dynamic execution of one of a finite number of predetermined execution paths. Exactly which execution path is followed by the program depends on the value of an expression(s) called the condition(s) (or test condition(s)).

One of the most common usages of the if construct is the binary form, where the test condition decides between two possible execution paths.

For example:

# %% is the modulo operator in R -- a %% b returns the remainder when a is divided by b
x <- 4
if(x %% 2 == 0){  # Here x %% 2 == 0 is the test condition
    "even"
} else {
    "odd"
}
## [1] "even"

For simple tests like the above example, the entire expression can be written on one line.

if(x %% 2 == 0) "even" else "odd"
## [1] "even"

Notice that in both of these examples, output is being printed to the console, even though no explicit call to print() is made. This because, as mentioned, the if construct is actually an expression in R, and the default behavior of the interpreter is to print the value of any expression that is not assigned to a variable. So let’s try that next.

result <- if(x %% 2 == 0) "even" else "odd"

Note, no output is printed.

typeof(result)
## [1] "character"
nchar(result)
## [1] 4
print(result)
## [1] "even"

Express Yourself

Now, it may be tempting to take this knowledge of the if construct as an expression and start using this behavior in your own code. However, I’d except for simple expressions, and advise from utilizing the assignment behavior of if expressions. For one, the behavior is rather uncommon among programming languages, so using this technique in your code may make it difficult for others to read and understand your code. Also, as if expressions grow more complicated, using this assignment behavior, increases the distance between the binding variable name, and the evaluation of the value that it’s being assigned to. This can also make it harder to reason about and understand code.

While

The while control construct (or “while loop”) takes a test condition and an expression, and repeated evaluates the expression for as long as the test condition remains true. Take the following example:

x <- 3
while (x > 0) {
    print(x)
    x <- x - 1
}
## [1] 3
## [1] 2
## [1] 1

As previously mentioned,a while loop is actually an expression in R, so it evaluates to a value and can be assigned to a variable. As like the other looping constructs, while returns NULL invisibly. Take the following example.

x <- 3
y <- while (x > 0) {
    print(x)
    x <- x - 1
}
## [1] 3
## [1] 2
## [1] 1
print(y)
## NULL

Like the other looping constructs, while can also use the break and continue statements. I’ll cover these features in a later article, when I cover the other looping constructs.

Test Conditions

Recall that if and while expressions each contain a cond parameter that is used to determine program execution. From the R:Control documentation, cond should be “A length-one logical vector that is not NA. Conditions of length greater than one are currently accepted with a warning, but only the first element is used…” This can lead to unexpected behavior, particularly in functions, where it’s not known in advance what the value of the cond expression will be.

To illustrate, let’s look at our previous example, now wrapped in a function.

determine_parity <- function(x){
    if(x %% 2 == 0){  # Here x %% 2 == 0 is the test condition
        "even"
    } else {
        "odd"
    }
}

If the test condition is NA or has zero length, then R will raise an error.

determine_parity(NA)
## Error in if (x%%2 == 0) {: missing value where TRUE/FALSE needed
determine_parity(logical(0))
## Error in if (x%%2 == 0) {: argument is of length zero



Now these calls work as expected.

determine_parity(2)
## [1] "even"
determine_parity(5)
## [1] "odd"



But what about this one?

determine_parity(c(2,5))
## Warning in if (x%%2 == 0) {: the condition has length > 1 and only the first
## element will be used
## [1] "even"

The reason for the warning is that the %% operator is vectorized. When it’s passed a vector with multiple elements, it computes the remainder for each element in the vector. So, when we call determine_parity(c(2,5)), the test condition actually evaluates to c(TRUE, FALSE).

However, whenever the test condition of if or while evaluates to a multiple element vector, only the value of the test condition is determined by taking the first element of the vector, which in this case is TRUE. We then follow the first execution path, which returns "even".

This can be a subtle source of bugs. You may find yourself in a situation where your test condition evaluates to a vector with multiple elements.

There are three main ways of addressing this problem:

  • Use the && and || operators
  • Use identical()
  • Use isTRUE() and isFalse()

Option 1: The && and || operators

In contrast to the shorter & and | operators – which perform element-wise logical comparisons – && and || “examine only the first element of each vector”. While these operators ensure single-element logical vectors as output, there are still two issues. One, is that this approach effectively suppresses the warning. If none of the arguments to operator are supposed to be multi-element vectors, receiving a warning about the test condition length could alert us to a bug in the code. The second reason is that these operations can lead your code to fail mysteriously when given NA as input.

Option 2: identical()

At this point, you might be thinking that it would be best to compare to the single-element truth values directly i.e. identical(cond, TRUE) or identical(cond, FALSE). But this idiom has it own shortcomings as we’ll soon see.

foo <- c(TRUE)
identical(foo, TRUE)
## [1] TRUE
identical(foo, FALSE)
## [1] FALSE

Above we’ve assigned a scalar vector containing the value TRUE to foo. As expected, the call identical correctly equates foo and TRUE.

bar <- c(TRUE, FALSE)
identical(bar, TRUE)
## [1] FALSE
identical(bar, FALSE)
## [1] FALSE

Now we’ve assigned a logical vector with multiple elements to bar. Since identical compares the elements of each of its arguments, both calls to identical return FALSE.

baz <- c(a = TRUE)
identical(baz, TRUE)
## [1] FALSE
identical(baz, FALSE)
## [1] FALSE

The reason for this surprising result above is that identical() not only compares the elements of each argument, it also compares the attributes of each argument. In the example above, the value of baz has a “name” attribute that the constant TRUE does not, so the two arguments are not considered to be identical.

Option 3: isTRUE() and isFALSE()

The workaround to all these issues is to wrap your test conditions in calls to isTRUE() or isFALSE(). From the R Documentation, “isTRUE(x) is the same as { is.logical(x) && length(x) == 1 && !is.na(x) && x }; isFALSE() is defined analogously.”

isTRUE(baz) # TRUE
isFALSE(baz) # FALSE

isTRUE(NA) # FALSE
isFALSE(NA) # FALSE
Avatar
Akindele Davies
Associate Data Scientist

My research interests include statistics and solving complex problems.

Related