tibble vs. data.frame (with Examples)

Deepanshu Bhalla Add Comment ,

This tutorial explains the difference between tibble() and data.frame(), along with several examples.

1. When you print a tibble, it doesn't show all the data. Whereas, data frame prints the complete data.

Note - tibble is a part of the tibble package. When you load dplyr package, it is also loaded.

# Load library
library(dplyr)

df <- data.frame(x = 1:50, y = seq(1,100,2))
tb <- tibble(x = 1:50, y = seq(1,100,2))

# Print
print(df)
print(tb)
Output

As shown in the output below, tibble displayed only top 10 rows.

tibble

# A tibble: 50 × 2
       x     y
 1     1     1
 2     2     3
 3     3     5
 4     4     7
 5     5     9
 6     6    11
 7     7    13
 8     8    15
 9     9    17
10    10    19
# ℹ 40 more rows
# ℹ Use `print(n = ...)` to see more rows

data.frame

    x  y
1   1  1
2   2  3
3   3  5
4   4  7
5   5  9
6   6 11
7   7 13
8   8 15
9   9 17
10 10 19
11 11 21
12 12 23
13 13 25
14 14 27
15 15 29
16 16 31
17 17 33
18 18 35
19 19 37
20 20 39
21 21 41
22 22 43
23 23 45
24 24 47
25 25 49
26 26 51
27 27 53
28 28 55
29 29 57
30 30 59
31 31 61
32 32 63
33 33 65
34 34 67
35 35 69
36 36 71
37 37 73
38 38 75
39 39 77
40 40 79
41 41 81
42 42 83
43 43 85
44 44 87
45 45 89
46 46 91
47 47 93
48 48 95
49 49 97
50 50 99

2. data.frame() returns values of a column when you use a partial column name to access it. Whereas, tibble() returns error - Unknown or uninitialised column.

df <- data.frame(ids = 1:50, score = seq(1,100,2))
tb <- tibble(ids = 1:50, score = seq(1,100,2))

In the example below, we are using "id" instead of "ids" to access the column.

tibble

tb$id
# NULL
# Warning message:
# Unknown or uninitialised column: `id`.

data.frame

df$id
# [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
# [25] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
# [49] 49 50

3. A tibble remains a tibble when you extract a single column from it, whereas a data frame becomes a vector when you select a single column from it.

df <- data.frame(ids = 1:50, score = seq(1,100,2))
tb <- tibble(ids = 1:50, score = seq(1,100,2))

tb2 <- tb[,"score"]
df2 <- df[,"score"]

In the code below, we are checking if the new dataset is still a tibble or data.frame using is_tibble() and is.data.frame() functions.

Output
is_tibble(tb2)
# TRUE
is.data.frame(df2)
# FALSE

4. When adding a new column to a tibble, the number of rows must match the number of rows in the other columns. Whereas, data frame adds a column even when the length of the new column is different than the others.

tb$newcol <- c(5,6)
tibble vs. data.frame

As shown in the code below, values in data.frame() gets repeated when the length of the new column does not match.

df$newcol <- c(5,6)
# [1]  5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5
# [38] 6 5 6 5 6 5 6 5 6 5 6 5 6
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 0 Response to "tibble vs. data.frame (with Examples)"
Next → ← Prev