R tidyverse: tibble.
Wrangle
library(tidyverse)
Tibbles with tibble
Tibbles are just a dataframes but improved to make more easier wrangling data, you can even use numbers or especial characters as names, you can coerce a dataframe as tibble with as_tibble().
head(as_tibble(iris))
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 3.6 | 1.4 | 0.2 | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa |
Or make tibbles with vectors.
x <- tibble(x = 1:5,
y = 1,
z = c("luz","Al-bin","Fernando","Santiago","Katie"))
x
x | y | z |
---|---|---|
1 | 1 | luz |
2 | 1 | Al-bin |
3 | 1 | Fernando |
4 | 1 | Santiago |
5 | 1 | Katie |
subsetting tibbles
You can subset tibles in yhe same way you can do it with a classic dataframe, the firts way is extracting by name using the¨ $ symbol
my_subset <- my_tibble$z
my_subset
V1 |
---|
luz |
Al-bin |
Fernando |
Santiago |
Katie |
Or by position with []
my_subset <- my_tibble[[3]]
my_subset
V1 |
---|
luz |
Al-bin |
Fernando |
Santiago |
Katie |
Adding column names
One drawback with classic dataframes is they can not take numbers at the begining nor symbols as names for columns, but on the other hand tibles can do it, in the next cole line we change the names in my_tibble varibles with rename() function from dplyr
package.
rename(my_tibble, "1" = x, "2" = y, "(name)" = z)
## # A tibble: 5 × 3
## `1` `2` `(name)`
## <int> <dbl> <chr>
## 1 1 1 luz
## 2 2 1 Al-bin
## 3 3 1 Fernando
## 4 4 1 Santiago
## 5 5 1 Katie
Tibble drawbacks
Some functions does not work with tibble, becouse a tibble always returns other tibble as output and if what you want is a vector for example subsetting a tibble with [], in these cases coerse to data.frame with as.data.frame() before subsetting or coherse the output as vector with as.vector()
library(nycflights13)
library(lubridate)
flights %>% select(year, month, day, hour, minute) %>% mutate(departure = make_datetime(year, month, day))
## # A tibble: 336,776 × 6
## year month day hour minute departure
## <int> <int> <int> <dbl> <dbl> <dttm>
## 1 2013 1 1 5 15 2013-01-01 00:00:00
## 2 2013 1 1 5 29 2013-01-01 00:00:00
## 3 2013 1 1 5 40 2013-01-01 00:00:00
## 4 2013 1 1 5 45 2013-01-01 00:00:00
## 5 2013 1 1 6 0 2013-01-01 00:00:00
## 6 2013 1 1 5 58 2013-01-01 00:00:00
## 7 2013 1 1 6 0 2013-01-01 00:00:00
## 8 2013 1 1 6 0 2013-01-01 00:00:00
## 9 2013 1 1 6 0 2013-01-01 00:00:00
## 10 2013 1 1 6 0 2013-01-01 00:00:00
## # … with 336,766 more rows