Around 85% of the world data is unstructured, and a good part of it is text data. As a data analyst or data scientist, it is a common job to handle text data, and therfore strings.
Character Strings
Character strings are sequences of individual letters which all together form a string.
a = "My cat is 8"
Now, we’ll dive into some key functions that you can use on strings.
You can retrieve the length of a string by simply using the keyword len
:
len(a)
11
``
You can count the number of occurences of a specific letter by using `.count`:
```python
a.count("M")
1
You can apply a lower, upper or title transformation to a string (upper
, lower
, title
).
a.upper()
MY CAT IS 8
You can replace a letter or several letters by another letter or group of letters.
a.replace("My", "my")
my cat is 8
What if I now want to split my string into a list of words, that is to obtain a list of words? You can use the split
keyword:
a.split()
["My", "cat", "is", "8"]
You can specify in the parenthesis the character on which it should split, by default it is the space.
You can also index a string to extract a sub-string:
a[1:6]
y cat
We start the indexing in 0, and exclude the 6th element. Notice how the space counts as a character too.
If you found the article useful or see ways in which it could be improved, please leave a comment :)