How to convert Bytes to string in Python
What Are Bytes and Strings in Python?
Bytes:
- A
bytesobject is a sequence of integers (0–255) representing raw binary data. - Example:
byte_data = b'Hello'
print(byte_data) # Output: b'Hello'
print(type(byte_data)) # Output: <class 'bytes'>
String:
- A
strobject is a sequence of Unicode characters, used for representing textual data. - Example:
string_data = 'Hello'
print(string_data) # Output: Hello
print(type(string_data)) # Output: <class 'str'>
Why Convert Bytes to Strings?
You may come across bytes when:
- Reading files in binary mode.
- Receiving data from a network (e.g., HTTP, sockets).
- Interacting with APIs that return byte data.
To process or display such data, converting bytes into strings is necessary.
How to Convert Bytes to String?
The primary method is the decode() function, which interprets bytes using a character encoding.
Example 1: Basic Conversion
byte_data = b'Hello, world!'
string_data = byte_data.decode('utf-8') # Decode using UTF-8 encoding
print(string_data) # Output: Hello, world!
Output:
Hello, world!
Common Encodings
- UTF-8 (default): Support for all Unicode characters.
- ASCII: Only the English characters with values 0–127 are supported.
- Latin-1: Extended ASCII of Western European languages.
Example 2: Using Different Encodings
byte_data = b'Hello, world!'
# Decode with UTF-8
utf8_string = byte_data.decode('utf-8')
print(utf8_string) # Output: Hello, world!
# Decode with ASCII
ascii_string = byte_data.decode('ascii')
print(ascii_string) # Output: Hello, world!
Output:
Hello, world!
Hello, world!
Handling Errors During Conversion
If the byte data contains invalid sequences for the specified encoding, a UnicodeDecodeError is raised unless handled explicitly.
Example 3: Handling Errors
invalid_byte_data = b'\x80\x81\x82' # Invalid UTF-8 bytes
# 1. Strict mode (default)
try:
strict_string = invalid_byte_data.decode('utf-8')
except UnicodeDecodeError as e:
print(f"Error: {e}") # Output: Error message
# 2. Ignore errors
ignored_string = invalid_byte_data.decode('utf-8', errors='ignore')
print(ignored_string) # Output: (empty string)
# 3. Replace invalid bytes
replaced_string = invalid_byte_data.decode('utf-8', errors='replace')
print(replaced_string) # Output: ���
Output:
Error: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
(empty string)
���
Decoding a Subset of Bytes
You can decode only a portion of the byte data by slicing it before decoding.
Example 4: Decoding Part of Bytes
byte_data = b'Hello, \xf0\x9f\x8c\x8d' # Contains 'Hello, 🌍'
# Decode only the first 7 bytes
partial_string = byte_data[:7].decode('utf-8')
print(partial_string) # Output: Hello,
Output:
Hello,
Real-Life Applications
1. File Handling
When reading binary files, bytes must be converted to strings for textual processing.
# Create a text file with 'Hello, world!'
with open('example.txt', 'wb') as file:
file.write(b'Hello, world!')
# Read and decode
with open('example.txt', 'rb') as file:
byte_content = file.read()
text_content = byte_content.decode('utf-8')
print(text_content) # Output: Hello, world!
Output:
Hello, world!
2. Network Data
Bytes are often received from a server or API and must be decoded.
# Simulate a response from a server
server_response = b'HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\n\r\nHello'
decoded_response = server_response.decode('utf-8')
print(decoded_response)
Output:
HTTP/1.1 200 OK
Content-Type: text/plain
Hello
Key Takeaways
1. Always Assume Encoding:
- Use utf-8 unless otherwise specified by the source.
- byte_data.decode(‘utf-8’)
2. Handle Errors Robustly:
- Use errors=’ignore’ or errors=’replace’ to avoid crashing on garbage data.
3. Work With Subsets:
- If you suspect that only parts of the data are valid, slice before you decode.
Summary
- Bytes are raw binary data; strings are textual data.
- Use the
decode()method to convert bytes into strings. - Specify the correct encoding (
utf-8,ascii, etc.). - Handle decoding errors using
errors='ignore'orerrors='replace'.