library(lobstr)
library(tidyverse)
Can you diagnose what is going on below?
x <- 1:10
y <- x
tracemem(x)
#> [1] "<0x7fc64d46b368>"
c(obj_addr(x), obj_addr(y))
#> [1] "0x7fc64d46b368" "0x7fc64d46b368"
y[1] <- 3
#> tracemem[0x7fc64d46b368 -> 0x7fc64d4cb8e8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous>
#> tracemem[0x7fc64d4cb8e8 -> 0x7fc648a83458]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous>
The question is, why are two copies being made? The vector x
is of type integer. However, when we do subassignment and change the first component of y
to be 3 (of type double) two copies are made. One for the modification of the component, the other for the atomic vector type change.
x <- 1:10
y <- x
tracemem(x)
#> [1] "<0x7fc64d88e0a8>"
c(obj_addr(x), obj_addr(y))
#> [1] "0x7fc64d88e0a8" "0x7fc64d88e0a8"
y[1] <- 3L # type integer
#> tracemem[0x7fc64d88e0a8 -> 0x7fc64d931078]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous>
Starting from 0 we can see that
lobstr::obj_size(integer(0))
#> 48 B
lobstr::obj_size(numeric(0))
#> 48 B
are both 48 bytes. Run the code below and see if you can deduce how R handles these numeric data in memory?
diff(sapply(0:100, function(x) obj_size(integer(x))))
c(obj_size(integer(20)), obj_size(integer(22)))
diff(sapply(0:100, function(x) obj_size(numeric(x))))
c(obj_size(numeric(10)), obj_size(numeric(14)))
R allocates memory to vectors in chunks. An integer vector of length one is allocated 56 bytes, 8 more than a null integer vector. Since an integer component only requires 4 bytes of memory, an integer vector of length two is also only 56 bytes. R does not need any more memory. Hence, we see that obj_size(integer(1))
and obj_size(integer(2))
are the same. The diff()
function calls give you an idea as to how memory is allocated in chunks.