Around 85% of the world data is unstructured, and a good part of it is text data. As a data analyst or data scientist, it is a common job to handle text data, and therfore strings.

# Character Strings

Character strings are sequences of individual letters which all together form a string.

a = "My cat is 8"


Now, we’ll dive into some key functions that you can use on strings.

You can retrieve the length of a string by simply using the keyword len :

len(a)

11


You can count the number of occurences of a specific letter by using .count:

a.count("M")

1


You can apply a lower, upper or title transformation to a string (upper, lower, title).

a.upper()

MY CAT IS 8


You can replace a letter or several letters by another letter or group of letters.

a.replace("My", "my")

my cat is 8


What if I now want to split my string into a list of words, that is to obtain a list of words? You can use the split keyword:

a.split()

["My", "cat", "is", "8"]


You can specify in the parenthesis the character on which it should split, by default it is the space.

You can also index a string to extract a sub-string:

a[1:6]

y cat
`

We start the indexing in 0, and exclude the 6th element. Notice how the space counts as a character too.