How to convert Bytes to string in Python

What Are Bytes and Strings in Python?

Bytes:

  • A bytes object is a sequence of integers (0–255) representing raw binary data.
  • Example:
byte_data = b'Hello'
print(byte_data)  # Output: b'Hello'
print(type(byte_data))  # Output: <class 'bytes'>

String:

  • A str object is a sequence of Unicode characters, used for representing textual data.
  • Example:
string_data = 'Hello'
print(string_data)  # Output: Hello
print(type(string_data))  # Output: <class 'str'>

Why Convert Bytes to Strings?

You may come across bytes when:

  1. Reading files in binary mode.
  2. Receiving data from a network (e.g., HTTP, sockets).
  3. Interacting with APIs that return byte data.

To process or display such data, converting bytes into strings is necessary.

How to Convert Bytes to String?

The primary method is the decode() function, which interprets bytes using a character encoding.

Example 1: Basic Conversion

byte_data = b'Hello, world!'
string_data = byte_data.decode('utf-8')  # Decode using UTF-8 encoding
print(string_data)  # Output: Hello, world!

Output:

Hello, world!

Common Encodings

  1. UTF-8 (default): Support for all Unicode characters.
  2. ASCII: Only the English characters with values 0–127 are supported.
  3. Latin-1: Extended ASCII of Western European languages.

Example 2: Using Different Encodings

byte_data = b'Hello, world!'

# Decode with UTF-8
utf8_string = byte_data.decode('utf-8')
print(utf8_string)  # Output: Hello, world!

# Decode with ASCII
ascii_string = byte_data.decode('ascii')
print(ascii_string)  # Output: Hello, world!

Output:

Hello, world!
Hello, world!

Handling Errors During Conversion

If the byte data contains invalid sequences for the specified encoding, a UnicodeDecodeError is raised unless handled explicitly.

Example 3: Handling Errors

invalid_byte_data = b'\x80\x81\x82'  # Invalid UTF-8 bytes

# 1. Strict mode (default)
try:
    strict_string = invalid_byte_data.decode('utf-8')
except UnicodeDecodeError as e:
    print(f"Error: {e}")  # Output: Error message

# 2. Ignore errors
ignored_string = invalid_byte_data.decode('utf-8', errors='ignore')
print(ignored_string)  # Output: (empty string)

# 3. Replace invalid bytes
replaced_string = invalid_byte_data.decode('utf-8', errors='replace')
print(replaced_string)  # Output: ���

Output:

Error: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
(empty string)
���

Decoding a Subset of Bytes

You can decode only a portion of the byte data by slicing it before decoding.

Example 4: Decoding Part of Bytes

byte_data = b'Hello, \xf0\x9f\x8c\x8d'  # Contains 'Hello, 🌍'

# Decode only the first 7 bytes
partial_string = byte_data[:7].decode('utf-8')
print(partial_string)  # Output: Hello,

Output:

Hello,

Real-Life Applications

1. File Handling

When reading binary files, bytes must be converted to strings for textual processing.

# Create a text file with 'Hello, world!'
with open('example.txt', 'wb') as file:
    file.write(b'Hello, world!')

# Read and decode
with open('example.txt', 'rb') as file:
    byte_content = file.read()
    text_content = byte_content.decode('utf-8')
    print(text_content)  # Output: Hello, world!

Output:

Hello, world!

2. Network Data

Bytes are often received from a server or API and must be decoded.

# Simulate a response from a server
server_response = b'HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\n\r\nHello'
decoded_response = server_response.decode('utf-8')
print(decoded_response)

Output:

HTTP/1.1 200 OK
Content-Type: text/plain

Hello

Key Takeaways

1. Always Assume Encoding:

  • Use utf-8 unless otherwise specified by the source.
  • byte_data.decode(‘utf-8’)

2. Handle Errors Robustly:

  • Use errors=’ignore’ or errors=’replace’ to avoid crashing on garbage data.

3. Work With Subsets:

  • If you suspect that only parts of the data are valid, slice before you decode.

Summary

  • Bytes are raw binary data; strings are textual data.
  • Use the decode() method to convert bytes into strings.
  • Specify the correct encoding (utf-8, ascii, etc.).
  • Handle decoding errors using errors='ignore' or errors='replace'.