R tidyverse: stringr Part one (Basics)

No matter what are you doing, soon or later you will need to format or change string values, and that is when stringr packages can help you because it is strong and consistent and also it can deal with NAs and zero length vector and of course it is perfectly integrated with tidyverse package.

library("stringr") #load package required

Strings

When you use character vectors, the best way to create them is with c() (this is a base function).

x <- c("one", "two", "three")
x
## [1] "one"   "two"   "three"

Basic operations

You can know the length of each element with str_length() function

str_length(x)
## [1] 3 3 5

Also you saw in the last example, the “x” vector has three element with different length, an if you want you could concatenate them in a whole and unique string with the argument str_c().

str_c(x, collapse = "-")
## [1] "one-two-three"

On the other hand, str_length() and str_count() without arguments will give you the same output, but the second one has the capability of count the number of matches in a string.

str_count(x, pattern = "ee")
## [1] 0 0 1

If what you want is a Boolean output to eventually filter o make other operations with your strings you could use str_detect() function.

str_detect(x, pattern = "w") 
## [1] FALSE  TRUE FALSE

In the same way, you can evaluate if the element in your strings starts or finish by some particular character with str_starts() and str_ends() functions.

str_ends(x, pattern = "e") 
## [1]  TRUE FALSE  TRUE
str_starts(x, pattern = "t")
## [1] FALSE  TRUE  TRUE

Or you can extract an exact values from the element in your string with str_extract(), this function will return NAs if it does not find a match.

str_extract(x, pattern = "e")
## [1] "e" NA  "e"

Subsetting strings

When you want to subset multiple elements inside a character vector you can use str_sub() function, take on mind that with the indexing starts with 1 instead of 0 like in other languages like , on the other hand you can also use negative values to subset backwards.

x <- c("cat", "scorpion", "goose")
str_sub(x,1,4)
## [1] "cat"  "scor" "goos"
str_sub(x,-3 ,-1)
## [1] "cat" "ion" "ose"

Set lowercase - uppercase

As the title say you can change a character vector from uppercase to lowercase and vice verse using str_to_lower(), str_to_upper() and str_to_title() functions.

x <- "agtcccgcTGTcT"

str_to_lower(x)
## [1] "agtcccgctgtct"
str_to_upper(x) 
## [1] "AGTCCCGCTGTCT"
str_to_title(x)
## [1] "Agtcccgctgtct"
str_to_lower(x,8:10) # Convert case of a precise position.
## [1] "agtcccgctgtct"

Sorting a character vector

If you want to sort a string probably str_sort() it’s you best option, you can sorting alphabetically and backwards with decreasing argument and even you can sort a string using different alphabets (default is English).

x <- c("cat", "toddler", "goose")
str_sort(x)
## [1] "cat"     "goose"   "toddler"
str_sort(x, locale = "haw") #sort by other alphabets with the same result
## [1] "cat"     "goose"   "toddler"

Concatenate different vectors

There are some very useful in-built functions like paste() and paste0()

x <- "light"
y <- "darkness"
z <- 585
paste(x,y)
## [1] "light darkness"
paste(x,y,z)
## [1] "light darkness 585"
paste0(x,y)
## [1] "lightdarkness"

But if you want to

str_c(x,y,z, sep = " ") 
## [1] "light darkness 585"
str_c(x,y,z,NA, sep = " ") 
## [1] NA
#if there is a NA this fuction will collapse everything to NA, paste() do not do it.

Length of a string

In the next example you can know the length of a string and of course use this information to your own purposes

str_length(y)
## [1] 8
str_sub(y, 1 , str_length(y)/2)
## [1] "dark"

The last exersices were pretty useful and basic, now you should go to R tidyverse: stringr Part two and and learn how to use regular expressions to improve your skills working with strings.

Diego Sierra Ramírez
Diego Sierra Ramírez
Msc. in Biological Science / Data analyst

Related