R tidyverse: stringr Part one (Basics)
No matter what are you doing, soon or later you will need to format or change string values, and that is when stringr
packages can help you because it is strong and consistent and also it can deal with NAs and zero length vector and of course it is perfectly integrated with tidyverse
package.
library("stringr") #load package required
Strings
When you use character vectors, the best way to create them is with c() (this is a base function).
x <- c("one", "two", "three")
x
## [1] "one" "two" "three"
Basic operations
You can know the length of each element with str_length() function
str_length(x)
## [1] 3 3 5
Also you saw in the last example, the “x” vector has three element with different length, an if you want you could concatenate them in a whole and unique string with the argument str_c().
str_c(x, collapse = "-")
## [1] "one-two-three"
On the other hand, str_length() and str_count() without arguments will give you the same output, but the second one has the capability of count the number of matches in a string.
str_count(x, pattern = "ee")
## [1] 0 0 1
If what you want is a Boolean output to eventually filter o make other operations with your strings you could use str_detect() function.
str_detect(x, pattern = "w")
## [1] FALSE TRUE FALSE
In the same way, you can evaluate if the element in your strings starts or finish by some particular character with str_starts() and str_ends() functions.
str_ends(x, pattern = "e")
## [1] TRUE FALSE TRUE
str_starts(x, pattern = "t")
## [1] FALSE TRUE TRUE
Or you can extract an exact values from the element in your string with str_extract(), this function will return NAs if it does not find a match.
str_extract(x, pattern = "e")
## [1] "e" NA "e"
Subsetting strings
When you want to subset multiple elements inside a character vector you can use str_sub() function, take on mind that with the indexing starts with 1 instead of 0 like in other languages like , on the other hand you can also use negative values to subset backwards.
x <- c("cat", "scorpion", "goose")
str_sub(x,1,4)
## [1] "cat" "scor" "goos"
str_sub(x,-3 ,-1)
## [1] "cat" "ion" "ose"
Set lowercase - uppercase
As the title say you can change a character vector from uppercase to lowercase and vice verse using str_to_lower(), str_to_upper() and str_to_title() functions.
x <- "agtcccgcTGTcT"
str_to_lower(x)
## [1] "agtcccgctgtct"
str_to_upper(x)
## [1] "AGTCCCGCTGTCT"
str_to_title(x)
## [1] "Agtcccgctgtct"
str_to_lower(x,8:10) # Convert case of a precise position.
## [1] "agtcccgctgtct"
Sorting a character vector
If you want to sort a string probably str_sort() it’s you best option, you can sorting alphabetically and backwards with decreasing
argument and even you can sort a string using different alphabets (default is English).
x <- c("cat", "toddler", "goose")
str_sort(x)
## [1] "cat" "goose" "toddler"
str_sort(x, locale = "haw") #sort by other alphabets with the same result
## [1] "cat" "goose" "toddler"
Concatenate different vectors
There are some very useful in-built functions like paste() and paste0()
x <- "light"
y <- "darkness"
z <- 585
paste(x,y)
## [1] "light darkness"
paste(x,y,z)
## [1] "light darkness 585"
paste0(x,y)
## [1] "lightdarkness"
But if you want to
str_c(x,y,z, sep = " ")
## [1] "light darkness 585"
str_c(x,y,z,NA, sep = " ")
## [1] NA
#if there is a NA this fuction will collapse everything to NA, paste() do not do it.
Length of a string
In the next example you can know the length of a string and of course use this information to your own purposes
str_length(y)
## [1] 8
str_sub(y, 1 , str_length(y)/2)
## [1] "dark"
The last exersices were pretty useful and basic, now you should go to R tidyverse: stringr Part two and and learn how to use regular expressions to improve your skills working with strings.