Converting CSV to JSON in Python

Converting CSV (Comma-Separated Values) to JSON (JavaScript Object Notation) is a common task when working with data.

Step 1: Understanding CSV and JSON

Before starting the conversion, it’s essential to understand the two formats:

CSV (Comma-Separated Values)

  • What is CSV?
  • CSV is a plain-text file format in which data is arranged in rows
  • Each row is defined as a record
  • Columns in a row are separated by a delimiter, which can be any of the following: a comma(,), a semicolon (;), or a tab (\t).
  • Example CSV File:
name,age,city
Alice,30,New York
Bob,25,Los Angeles
Charlie,35,Chicago
  • The first row is the header row, which contains the column names (name, age, city).
  • Subsequent rows contain the actual data.

JSON (JavaScript Object Notation)

  • What is JSON?
  • JSON is a light-weight and flexible notation of representation for data in structured form.
  • It stores data in key-value pairs, making it human-readable and easy to work with in programming languages.
  • Example JSON Data:
[
  {"name": "Alice", "age": 30, "city": "New York"},
  {"name": "Bob", "age": 25, "city": "Los Angeles"},
  {"name": "Charlie", "age": 35, "city": "Chicago"}
]
  • Here, the data is represented as an array of objects, where each object corresponds to a row in the CSV.

Step 2: Libraries Used

Python has built-in libraries for these formats:

1. csv:

  • Reads and writes files in CSV format.
  • Takes care of delimiters, headers, and quoting automatically.

2. json:

  • Supports work with JSON data by converting Python objects, such as lists and dictionaries, to JSON strings and vice versa.

These libraries need not be coded; rather they simplify the process.

Step 3: Process Overview

The process for converting CSV to JSON is:

1. Reading the CSV file:

  • Read data row by row.
  • Convert each row to a Python dictionary, where the keys are the column names, and the values are the cell data.

2. Storing Data:

  • Collect all rows (now dictionaries) into a Python list.

3. Converting to JSON:

  • Using the json module to convert the list of dictionaries into a JSON string.

4. Saving JSON to a File:

  • Rewrite the JSON string to a .json file so it may be reused.

Step 4: Step-by-Step Code Walkthrough

Let’s convert a sample CSV file (data.csv) to JSON (data.json).

1. Set up the CSV File

Create a CSV file named data.csv with the following content:

name,age,city
Alice,30,New York
Bob,25,Los Angeles
Charlie,35,Chicago
  • Header Row: Defines the column names: name, age, and city.
  • Data Rows: Contain the actual data for each record.

2. Python Script

Below is a detailed Python script to convert the CSV to JSON.

# Import necessary libraries
import csv
import json

# File paths
csv_file_path = "data.csv"  # Path to the CSV file
json_file_path = "data.json"  # Path to save the JSON file

# Initialize an empty list to store the data
data_list = []

# Step 1: Open and Read the CSV File
try:
    with open(csv_file_path, mode="r", encoding="utf-8") as csv_file:
        csv_reader = csv.DictReader(csv_file)  # Automatically maps rows to dictionaries
        
        # Step 2: Process each row
        for row in csv_reader:
            # Optional: Convert data types (e.g., age from string to integer)
            row["age"] = int(row["age"])
            
            # Append the dictionary to the list
            data_list.append(row)
except FileNotFoundError:
    print(f"Error: The file {csv_file_path} was not found.")
    exit(1)

# Step 3: Convert the List of Dictionaries to JSON
try:
    json_data = json.dumps(data_list, indent=4)  # Format JSON with indentation for readability
except TypeError as e:
    print(f"Error converting to JSON: {e}")
    exit(1)

# Step 4: Write the JSON Data to a File
try:
    with open(json_file_path, mode="w", encoding="utf-8") as json_file:
        json_file.write(json_data)
    print(f"CSV data has been successfully converted to JSON and saved to {json_file_path}.")
except Exception as e:
    print(f"Error writing JSON to file: {e}")

Step 5: Code Interpretation

1. File Paths

  • csv_file_path: CSV source file pointer
  • json_file_path: saves the JSON file to the specified path.

2. Reading the CSV

  • csv.DictReader(csv_file):
    • Opens the CSV and reads line by line while interpreting each row into a dictionary
    • Header row column names will serve as keys for each dictionary.
  • Example of a dictionary for the first row:
{"name": "Alice", "age": "30", "city": "New York"}

3. Type Conversion

  • By default, all values are strings. If needed, convert specific columns to their appropriate types.
  • Example:
row["age"] = int(row["age"])

4. Storing Data

  • Append each dictionary (representing a row) to the data_list.
  • Example data_list after reading all rows:
[
    {"name": "Alice", "age": 30, "city": "New York"},
    {"name": "Bob", "age": 25, "city": "Los Angeles"},
    {"name": "Charlie", "age": 35, "city": "Chicago"}
]

5. Convert to JSON

  • json.dumps(data_list, indent=4):
    • Converts the Python list of dictionaries to a JSON-formatted string.
    • indent=4 adds indentation for better readability.

6. Write to a File

  • Open the JSON file in write mode and save the JSON string.

Step 6: Output

The final JSON file (data.json) will look like this:

[
    {
        "name": "Alice",
        "age": 30,
        "city": "New York"
    },
    {
        "name": "Bob",
        "age": 25,
        "city": "Los Angeles"
    },
    {
        "name": "Charlie",
        "age": 35,
        "city": "Chicago"
    }
]

Step 7: Additional Tips

1. Custom Delimiters:

    • If your CSV uses a delimiter other than a comma (e.g., ;), specify it:
    csv_reader = csv.DictReader(csv_file, delimiter=";")

    2. Error Handling:

    • Include try-except blocks to handle errors such as:
      • Missing files.
      • Invalid data formats.

    3. Handling Large Files:

    • Process large CSV files row by row instead of loading everything into memory:
    with open(csv_file_path, mode="r", encoding="utf-8") as csv_file, open(json_file_path, mode="w", encoding="utf-8") as json_file:
        csv_reader = csv.DictReader(csv_file)
        for row in csv_reader:
            row["age"] = int(row["age"])  # Example type conversion
            json.dump(row, json_file)
            json_file.write("\n")