Conditional Control Flow in R
What is Control Flow?
One of the fundamental techniques in imperative-style programming is the use of control flow constructs to dynamically control the execution of a program. These tools are critical to the creation of anything but the most basic functions and allow programmers to specify abstract procedures with code.
Currently, R supports the following control flow constructs:
Name | Syntax |
---|---|
if |
if(cond) cons.expr else if(alt.cond) alt.expr ... else final.expr |
for |
for(var in seq) expr |
while |
while(cond) expr |
repeat |
repeat expr |
While, one might already be familiar with these constructs due to experience with another programming language, their incarnations in R posses a few peculiar properties. Foremost, in R, these constructs are actually expressions (as opposed to statements in other languages). Consequently, they have a return value which can be bound to a name (variable), or quoted and evaluated later. Such behavior can be surprising at first, but is consistent with R’s principle of treating almost every syntactic element as an expression.
Conditional Control Constructs
In this article, we’ll examine what I’ll define as the conditional control constructs – those control constructs
that alter program execution based on the truth value of some conditional expression.
In R, there are two conditional control constructs: if
and while
.
If
The if
control construct allows for the dynamic execution of one of a finite number of predetermined execution
paths. Exactly which execution path is followed by the program depends on the value of an expression(s) called the condition(s) (or test condition(s)).
One of the most common usages of the if
construct is the binary form, where the test condition
decides between two possible execution paths.
For example:
# %% is the modulo operator in R -- a %% b returns the remainder when a is divided by b
x <- 4
if(x %% 2 == 0){ # Here x %% 2 == 0 is the test condition
"even"
} else {
"odd"
}
## [1] "even"
For simple tests like the above example, the entire expression can be written on one line.
if(x %% 2 == 0) "even" else "odd"
## [1] "even"
Notice that in both of these examples, output is being printed to the console, even though no explicit call to print()
is made.
This because, as mentioned, the if
construct is actually an expression in R, and the default behavior of the interpreter is to print the value of any expression that is not assigned to a variable. So let’s try that next.
result <- if(x %% 2 == 0) "even" else "odd"
Note, no output is printed.
typeof(result)
## [1] "character"
nchar(result)
## [1] 4
print(result)
## [1] "even"
Express Yourself
Now, it may be tempting to take this knowledge of the if
construct as an expression
and start using this behavior in your own code. However, I’d except for simple expressions, and
advise from utilizing the assignment behavior of if
expressions. For one, the behavior is
rather uncommon among programming languages, so using this technique in your code
may make it difficult for others to read and understand your code. Also, as
if
expressions grow more complicated, using this assignment behavior, increases the distance between the
binding variable name, and the evaluation of the value that it’s being assigned to. This
can also make it harder to reason about and understand code.
While
The while
control construct (or “while loop”) takes a test condition and an expression, and repeated evaluates the expression for as long as the test condition remains true. Take the following example:
x <- 3
while (x > 0) {
print(x)
x <- x - 1
}
## [1] 3
## [1] 2
## [1] 1
As previously mentioned,a while loop is actually an expression in R, so it evaluates to a value and can be assigned to a variable. As like the other looping constructs, while
returns NULL
invisibly. Take the following example.
x <- 3
y <- while (x > 0) {
print(x)
x <- x - 1
}
## [1] 3
## [1] 2
## [1] 1
print(y)
## NULL
Like the other looping constructs, while
can also use the break
and continue
statements. I’ll cover these features in a later article, when I cover the other looping constructs.
Test Conditions
Recall that if
and while
expressions each contain a cond
parameter that
is used to determine program execution. From the R:Control documentation, cond
should be “A length-one logical vector that is not NA. Conditions of length greater than one are currently accepted with a warning, but only the first element is used…” This can lead to unexpected behavior, particularly in functions,
where it’s not known in advance what the value of the cond
expression will be.
To illustrate, let’s look at our previous example, now wrapped in a function.
determine_parity <- function(x){
if(x %% 2 == 0){ # Here x %% 2 == 0 is the test condition
"even"
} else {
"odd"
}
}
If the test condition is NA
or has zero length, then R will raise an error.
determine_parity(NA)
## Error in if (x%%2 == 0) {: missing value where TRUE/FALSE needed
determine_parity(logical(0))
## Error in if (x%%2 == 0) {: argument is of length zero
Now these calls work as expected.
determine_parity(2)
## [1] "even"
determine_parity(5)
## [1] "odd"
But what about this one?
determine_parity(c(2,5))
## Warning in if (x%%2 == 0) {: the condition has length > 1 and only the first
## element will be used
## [1] "even"
The reason for the warning is that the %%
operator is vectorized. When it’s passed a
vector with multiple elements, it computes the remainder for each element in the vector.
So, when we call determine_parity(c(2,5))
, the test condition actually evaluates to c(TRUE, FALSE)
.
However, whenever the test condition of if
or while
evaluates to a multiple element vector, only the value of the test condition is determined by taking the first element of the vector, which in this case is TRUE
.
We then follow the first execution path, which returns "even"
.
This can be a subtle source of bugs. You may find yourself in a situation where your test condition evaluates to a vector with multiple elements.
There are three main ways of addressing this problem:
- Use the
&&
and||
operators - Use
identical()
- Use
isTRUE()
andisFalse()
Option 1: The &&
and ||
operators
In contrast to the shorter &
and |
operators – which perform element-wise logical comparisons –
&&
and ||
“examine only the first element of each vector”. While these operators
ensure single-element logical vectors as output, there are still two issues. One,
is that this approach effectively suppresses the warning. If none of the arguments to operator
are supposed to be multi-element vectors, receiving a warning about the test condition length
could alert us to a bug in the code. The second reason is that these operations
can lead your code to fail mysteriously when given NA
as input.
Option 2: identical()
At this point, you might be thinking that it would be best to compare to the single-element
truth values directly i.e. identical(cond, TRUE)
or identical(cond, FALSE)
.
But this idiom has it own shortcomings as we’ll soon see.
foo <- c(TRUE)
identical(foo, TRUE)
## [1] TRUE
identical(foo, FALSE)
## [1] FALSE
Above we’ve assigned a scalar vector containing the value TRUE
to foo
. As expected, the call identical
correctly equates foo
and TRUE
.
bar <- c(TRUE, FALSE)
identical(bar, TRUE)
## [1] FALSE
identical(bar, FALSE)
## [1] FALSE
Now we’ve assigned a logical vector with multiple elements to bar
. Since identical compares the elements of each
of its arguments, both calls to identical
return FALSE
.
baz <- c(a = TRUE)
identical(baz, TRUE)
## [1] FALSE
identical(baz, FALSE)
## [1] FALSE
The reason for this surprising result above is that identical()
not only compares the elements of each argument, it also compares the attributes of each argument. In the example above, the value of baz
has a “name” attribute that the constant TRUE does not, so the two arguments are
not considered to be identical.
Option 3: isTRUE()
and isFALSE()
The workaround to all these issues is to wrap your test conditions in calls to
isTRUE()
or isFALSE()
. From the R Documentation, “isTRUE(x)
is the same as { is.logical(x) && length(x) == 1 && !is.na(x) && x }
; isFALSE()
is defined analogously.”
isTRUE(baz) # TRUE
isFALSE(baz) # FALSE
isTRUE(NA) # FALSE
isFALSE(NA) # FALSE