This tutorial explains the difference between tibble() and data.frame(), along with several examples.
1. When you print a tibble, it doesn't show all the data. Whereas, data frame prints the complete data.
Note - tibble is a part of the tibble package. When you load dplyr package, it is also loaded.
# Load library library(dplyr) df <- data.frame(x = 1:50, y = seq(1,100,2)) tb <- tibble(x = 1:50, y = seq(1,100,2)) # Print print(df) print(tb)
As shown in the output below, tibble displayed only top 10 rows.
tibble
# A tibble: 50 × 2
x y
1 1 1
2 2 3
3 3 5
4 4 7
5 5 9
6 6 11
7 7 13
8 8 15
9 9 17
10 10 19
# ℹ 40 more rows
# ℹ Use `print(n = ...)` to see more rows
data.frame
x y
1 1 1
2 2 3
3 3 5
4 4 7
5 5 9
6 6 11
7 7 13
8 8 15
9 9 17
10 10 19
11 11 21
12 12 23
13 13 25
14 14 27
15 15 29
16 16 31
17 17 33
18 18 35
19 19 37
20 20 39
21 21 41
22 22 43
23 23 45
24 24 47
25 25 49
26 26 51
27 27 53
28 28 55
29 29 57
30 30 59
31 31 61
32 32 63
33 33 65
34 34 67
35 35 69
36 36 71
37 37 73
38 38 75
39 39 77
40 40 79
41 41 81
42 42 83
43 43 85
44 44 87
45 45 89
46 46 91
47 47 93
48 48 95
49 49 97
50 50 99
2. data.frame() returns values of a column when you use a partial column name to access it. Whereas, tibble() returns error - Unknown or uninitialised column.
df <- data.frame(ids = 1:50, score = seq(1,100,2)) tb <- tibble(ids = 1:50, score = seq(1,100,2))
In the example below, we are using "id" instead of "ids" to access the column.
tibble
tb$id # NULL # Warning message: # Unknown or uninitialised column: `id`.
data.frame
df$id # [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # [25] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 # [49] 49 50
3. A tibble remains a tibble when you extract a single column from it, whereas a data frame becomes a vector when you select a single column from it.
df <- data.frame(ids = 1:50, score = seq(1,100,2)) tb <- tibble(ids = 1:50, score = seq(1,100,2)) tb2 <- tb[,"score"] df2 <- df[,"score"]
In the code below, we are checking if the new dataset is still a tibble or data.frame using is_tibble() and is.data.frame() functions.
is_tibble(tb2) # TRUE is.data.frame(df2) # FALSE
4. When adding a new column to a tibble, the number of rows must match the number of rows in the other columns. Whereas, data frame adds a column even when the length of the new column is different than the others.
tb$newcol <- c(5,6)
As shown in the code below, values in data.frame() gets repeated when the length of the new column does not match.
df$newcol <- c(5,6) # [1] 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 # [38] 6 5 6 5 6 5 6 5 6 5 6 5 6


Share Share Tweet