This tutorial explains the difference between tibble() and data.frame(), along with several examples.
1. When you print a tibble, it doesn't show all the data. Whereas, data frame prints the complete data.
Note - tibble is a part of the tibble package. When you load dplyr package, it is also loaded.
# Load library library(dplyr) df <- data.frame(x = 1:50, y = seq(1,100,2)) tb <- tibble(x = 1:50, y = seq(1,100,2)) # Print print(df) print(tb)
As shown in the output below, tibble displayed only top 10 rows.
tibble
# A tibble: 50 × 2 x y 1 1 1 2 2 3 3 3 5 4 4 7 5 5 9 6 6 11 7 7 13 8 8 15 9 9 17 10 10 19 # ℹ 40 more rows # ℹ Use `print(n = ...)` to see more rows
data.frame
x y 1 1 1 2 2 3 3 3 5 4 4 7 5 5 9 6 6 11 7 7 13 8 8 15 9 9 17 10 10 19 11 11 21 12 12 23 13 13 25 14 14 27 15 15 29 16 16 31 17 17 33 18 18 35 19 19 37 20 20 39 21 21 41 22 22 43 23 23 45 24 24 47 25 25 49 26 26 51 27 27 53 28 28 55 29 29 57 30 30 59 31 31 61 32 32 63 33 33 65 34 34 67 35 35 69 36 36 71 37 37 73 38 38 75 39 39 77 40 40 79 41 41 81 42 42 83 43 43 85 44 44 87 45 45 89 46 46 91 47 47 93 48 48 95 49 49 97 50 50 99
2. data.frame() returns values of a column when you use a partial column name to access it. Whereas, tibble() returns error - Unknown or uninitialised column
.
df <- data.frame(ids = 1:50, score = seq(1,100,2)) tb <- tibble(ids = 1:50, score = seq(1,100,2))
In the example below, we are using "id" instead of "ids" to access the column.
tibble
tb$id # NULL # Warning message: # Unknown or uninitialised column: `id`.
data.frame
df$id # [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # [25] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 # [49] 49 50
3. A tibble remains a tibble when you extract a single column from it, whereas a data frame becomes a vector when you select a single column from it.
df <- data.frame(ids = 1:50, score = seq(1,100,2)) tb <- tibble(ids = 1:50, score = seq(1,100,2)) tb2 <- tb[,"score"] df2 <- df[,"score"]
In the code below, we are checking if the new dataset is still a tibble or data.frame using is_tibble()
and is.data.frame()
functions.
is_tibble(tb2) # TRUE is.data.frame(df2) # FALSE
4. When adding a new column to a tibble, the number of rows must match the number of rows in the other columns. Whereas, data frame adds a column even when the length of the new column is different than the others.
tb$newcol <- c(5,6)
As shown in the code below, values in data.frame() gets repeated when the length of the new column does not match.
df$newcol <- c(5,6) # [1] 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 # [38] 6 5 6 5 6 5 6 5 6 5 6 5 6
Share Share Tweet