To convert bytes into a string in Python, use the bytes.decode() method.
For instance:
name_byte = b'Alice' name_str = name_byte.decode() print(name_str)
Output:
Alice
This is the quick answer.
However, depending on the context and your needs, there are other ways to convert bytes to strings.
In this guide, you learn how to convert bytes to string in 5 different ways in different situations.
Here’s a short review of the byte-to-string converting methods:
Method | Example |
---|---|
1. The decode() method of a byte string | byte_string.decode(‘UTF-8’) |
2. The built-in str() method | str(byte_string, ‘UTF-8’) |
3. Codecs decode() function | codecs.decode(byte_string) |
4. Pandas dataframe decode() method | df[‘column’].str.decode(“utf-8”) |
5. The join() method with map() function | “”.join(map(chr, byte_str)) |
Let’s jump to it!
Bytes vs Strings in Python
There is a chance you are looking to convert bytes to strings because you do not know what they are. Before jumping into the conversions, let’s take a quick look at what are bytes in the first place.
Why Bytes?
A computer doesn’t understand the notion of “text” or “number” as is. This is because computers operate on bits, that is, 0s and 1s.
Storing data to a computer happens by using groups of bits, also known as bytes. Usually, there are 8 bits in a byte. But this might vary depending on what system you’re using.
Byte Strings in Python
In Python, a byte string is a sequence of bytes that the computer understands but humans can’t.
A string is a sequence of characters and is something we humans can understand but cannot directly store in a computer.
This is why any string needs to be converted to a byte string before the computer can use it.
In Python, a bytes object is a byte representation of a string. A bytes object is prefixed with the letter ‘b‘.
For example, take a look at these two variables:
name1 = 'Alice' name2 = b'Alice'
Here:
- name1 is a str object.
- name2 is a bytes object.
You can verify this by printing out the data types of these variables:
name1 = 'Alice' name2 = b'Alice' print(type(name1)) print(type(name2))
Output:
<class 'str'> <class 'bytes'>
As I mentioned earlier, the byte string is something that is hard to understand. In the above code, this isn’t clear as you can just read the b’Alice’ very clearly.
Byte String vs String in Python
To see the main difference between the byte string and a string, let’s print the words character by character.
First, let’s do the name1 variable:
name1 = 'Alice' name2 = b'Alice' for c in name1: print(c)
Output:
A l i c e
Now, let’s print each byte in the name2 bytes object:
name1 = 'Alice' name2 = b'Alice' for c in name2: print(c)
Output:
65 108 105 99 101
Here you can see there is no way for you to tell what those numbers mean. Those numbers are the byte values of the characters in a string. Something that a computer can understand.
To make one more thing clear, let’s see what happens if we print the bytes object name2 as-is:
name1 = 'Alice' name2 = b'Alice' print(name2)
Output:
b'Alice'
As your surprize, it clearly says “Alice”. This isn’t too hard to read, is it?
The reason why the byte string prints out as a readable string is because what you see is actually a string representation of the bytes object.
Python does this for the developer’s convenience.
If there was no special string representation for a bytes object, printing bytes would be nonsense.
Anyway, now you understand what is a bytes object in Python, and how it differs from the str object.
Now, let’s see how to convert between bytes and string.
1. The decode() Function
Given a bytes object, you can use the built-in decode() method to convert the byte to a string.
You can also pass the encoding type to this function as an argument.
For example, let’s use the UTF-8 encoding for converting bytes to a string:
byte_string = b"Do you want a slice of \xf0\x9f\x8d\x95?" string = byte_string.decode('UTF-8') print(string)
Output:
Do you want a slice of ๐?
2. The str() Function
Another approach to convert bytes to string is by using the built-in str() function.
This method does the exact same thing as the decode() method in the previous example.
For instance:
byte_string = b"Do you want a slice of \xf0\x9f\x8d\x95?" string = str(byte_string, 'UTF-8') print(string)
Output:
Do you want a slice of ๐?
Perhaps the only downside to this approach is in the code readability.
If you compare these two lines:
name_str = str(byte_string, 'UTF-8') name_str = byte_string.decode('UTF-8')
You can see the latter is more explicit about decoding the bytes to a string.
3. Codecs decode() Function
Python also has a built-in codecs module for text decoding and encoding.
This module also has its own decode() function. You can use this function to convert bytes to strings (and vice versa).
For instance:
import codecs byte_string = b"Do you want a slice of \xf0\x9f\x8d\x95?" name_byte = codecs.decode(byte_string) print(name_byte)
Output:
Do you want a slice of ๐?
4. Pandas decode() Function
If you are working with pandas and you have a data frame that consists of bytes, you can easily convert them to strings by calling the str.decode() function on a column.
For instance:
import pandas as pd data_bytes = {'column' : [b'Alice', b'Bob', b'Charlie']} df = pd.DataFrame(data=data_bytes) data_strings = df['column'].str.decode("utf-8") print(data_strings)
Output:
0 Alice 1 Bob 2 Charlie Name: column, dtype: object
5. map() Function: Convert a Byte List to String
In Python, a string is a group of characters.
Each Python character is associated with a Unicode value, which is an integer.
Thus, you can convert an integer to a character in Python.
To do this, you can call the built-in chr() function on an integer.
Given a list of integers, you can use the map() function to map each integer to a character.
Here is how it looks in code:
byte_data = [65, 108, 105, 99, 101] strings = "".join(map(chr, byte_data)) print(strings)
Output:
Alice
This piece of code:
- Converts the integers to corresponding characters.
- Returns a list of characters.
- Merges the list of characters to a single string.
To learn more about the map() function in Python, feel free to read this article.
Be Careful with the Encoding
There are dozens of byte-to-string encodings out there.
In this guide, we only used the UTF-8 encoding, which is the most popular encoding type.
The UTF-8 is also the default encoding type in Python. However, UTF-8 encoding is not always the correct one.
For instance:
s = b"test \xe7\xf8\xe9" s.decode('UTF-8')
Output:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 5: invalid continuation byte
This error means there is no character in the UTF-8 encoding that corresponds to the bytes in the string.
In other words, you should be using a different encoding.
You can use a module like chardet to detect the character encodings. (Notice that this module is not maintained, but most of the info you learn about it is still applicable.)
However, no approach is 100% foolproof. This module gives you its best guess about the encoding and the probability associated with it.
Anyway, let’s say the above byte string can be decoded using the latin1 encoding as well as the iso_8559_5 encoding.
Now let’s make the conversion:
s = b"test \xe7\xf8\xe9" print(s.decode('latin1')) print(s.decode('iso8859_5'))
Output:
test รงรธรฉ test ััั
This time there is no error. Instead, it works with both encodings and produces a different result.
So be careful with the encodings!
If you see an error when doing a conversion, the first thing you need to do is to figure out the encoding used. Then you should use that particular encoding to encode/decode your values to get it right.
Conclusion
Today you learned how to convert bytes to strings in Python.
To recap, there is a bunch of ways to convert bytes to strings in Python.
- To convert a byte string to a string, use the bytes.decode() method.
- If you have a list of bytes, call chr() function on each byte using the map() function (or a for loop)
- If you have a pandas dataframe with bytes, call the .str.decode() method on the column with bytes.
By default, the Python character encoding is usually UTF-8.
However, this is not always applicable. Trying to encode a non-UTF-8 byte with UTF-8 produces an error. In this situation, you should determine the right character encoding before encoding/decoding. You can use a module like chardet to do this.