Converting CSV to JSON in Python
Converting CSV (Comma-Separated Values) to JSON (JavaScript Object Notation) is a common task when working with data.
Step 1: Understanding CSV and JSON
Before starting the conversion, it’s essential to understand the two formats:
CSV (Comma-Separated Values)
- What is CSV?
- CSV is a plain-text file format in which data is arranged in rows
- Each row is defined as a record
- Columns in a row are separated by a delimiter, which can be any of the following: a comma(
,), a semicolon (;), or a tab (\t).
- Example CSV File:
name,age,city
Alice,30,New York
Bob,25,Los Angeles
Charlie,35,Chicago
- The first row is the header row, which contains the column names (
name,age,city). - Subsequent rows contain the actual data.
JSON (JavaScript Object Notation)
- What is JSON?
- JSON is a light-weight and flexible notation of representation for data in structured form.
- It stores data in key-value pairs, making it human-readable and easy to work with in programming languages.
- Example JSON Data:
[
{"name": "Alice", "age": 30, "city": "New York"},
{"name": "Bob", "age": 25, "city": "Los Angeles"},
{"name": "Charlie", "age": 35, "city": "Chicago"}
]
- Here, the data is represented as an array of objects, where each object corresponds to a row in the CSV.
Step 2: Libraries Used
Python has built-in libraries for these formats:
1. csv:
- Reads and writes files in CSV format.
- Takes care of delimiters, headers, and quoting automatically.
2. json:
- Supports work with JSON data by converting Python objects, such as lists and dictionaries, to JSON strings and vice versa.
These libraries need not be coded; rather they simplify the process.
Step 3: Process Overview
The process for converting CSV to JSON is:
1. Reading the CSV file:
- Read data row by row.
- Convert each row to a Python dictionary, where the keys are the column names, and the values are the cell data.
2. Storing Data:
- Collect all rows (now dictionaries) into a Python list.
3. Converting to JSON:
- Using the
jsonmodule to convert the list of dictionaries into a JSON string.
4. Saving JSON to a File:
- Rewrite the JSON string to a
.jsonfile so it may be reused.
Step 4: Step-by-Step Code Walkthrough
Let’s convert a sample CSV file (data.csv) to JSON (data.json).
1. Set up the CSV File
Create a CSV file named data.csv with the following content:
name,age,city
Alice,30,New York
Bob,25,Los Angeles
Charlie,35,Chicago
- Header Row: Defines the column names:
name,age, andcity. - Data Rows: Contain the actual data for each record.
2. Python Script
Below is a detailed Python script to convert the CSV to JSON.
# Import necessary libraries
import csv
import json
# File paths
csv_file_path = "data.csv" # Path to the CSV file
json_file_path = "data.json" # Path to save the JSON file
# Initialize an empty list to store the data
data_list = []
# Step 1: Open and Read the CSV File
try:
with open(csv_file_path, mode="r", encoding="utf-8") as csv_file:
csv_reader = csv.DictReader(csv_file) # Automatically maps rows to dictionaries
# Step 2: Process each row
for row in csv_reader:
# Optional: Convert data types (e.g., age from string to integer)
row["age"] = int(row["age"])
# Append the dictionary to the list
data_list.append(row)
except FileNotFoundError:
print(f"Error: The file {csv_file_path} was not found.")
exit(1)
# Step 3: Convert the List of Dictionaries to JSON
try:
json_data = json.dumps(data_list, indent=4) # Format JSON with indentation for readability
except TypeError as e:
print(f"Error converting to JSON: {e}")
exit(1)
# Step 4: Write the JSON Data to a File
try:
with open(json_file_path, mode="w", encoding="utf-8") as json_file:
json_file.write(json_data)
print(f"CSV data has been successfully converted to JSON and saved to {json_file_path}.")
except Exception as e:
print(f"Error writing JSON to file: {e}")
Step 5: Code Interpretation
1. File Paths
csv_file_path: CSV source file pointerjson_file_path: saves the JSON file to the specified path.
2. Reading the CSV
csv.DictReader(csv_file):- Opens the CSV and reads line by line while interpreting each row into a dictionary
- Header row column names will serve as keys for each dictionary.
- Example of a dictionary for the first row:
{"name": "Alice", "age": "30", "city": "New York"}
3. Type Conversion
- By default, all values are strings. If needed, convert specific columns to their appropriate types.
- Example:
row["age"] = int(row["age"])
4. Storing Data
- Append each dictionary (representing a row) to the
data_list. - Example
data_listafter reading all rows:
[
{"name": "Alice", "age": 30, "city": "New York"},
{"name": "Bob", "age": 25, "city": "Los Angeles"},
{"name": "Charlie", "age": 35, "city": "Chicago"}
]
5. Convert to JSON
json.dumps(data_list, indent=4):- Converts the Python list of dictionaries to a JSON-formatted string.
indent=4adds indentation for better readability.
6. Write to a File
- Open the JSON file in write mode and save the JSON string.
Step 6: Output
The final JSON file (data.json) will look like this:
[
{
"name": "Alice",
"age": 30,
"city": "New York"
},
{
"name": "Bob",
"age": 25,
"city": "Los Angeles"
},
{
"name": "Charlie",
"age": 35,
"city": "Chicago"
}
]
Step 7: Additional Tips
1. Custom Delimiters:
- If your CSV uses a delimiter other than a comma (e.g.,
;), specify it:
csv_reader = csv.DictReader(csv_file, delimiter=";")
2. Error Handling:
- Include
try-exceptblocks to handle errors such as:- Missing files.
- Invalid data formats.
3. Handling Large Files:
- Process large CSV files row by row instead of loading everything into memory:
with open(csv_file_path, mode="r", encoding="utf-8") as csv_file, open(json_file_path, mode="w", encoding="utf-8") as json_file:
csv_reader = csv.DictReader(csv_file)
for row in csv_reader:
row["age"] = int(row["age"]) # Example type conversion
json.dump(row, json_file)
json_file.write("\n")