Working with configuration files and data exports? You’ll often need to convert YAML files into CSV format for spreadsheet analysis, data processing, or database imports.
In this article, I’ll teach you how to convert YAML to CSV using Python.
The complete source code of the YAML to CSV Python script is given below. It uses Python’s pyyaml module to convert YAML to CSV file.
You can use this YAML to CSV converter on Windows, Mac, and Linux. But, if you want to use it online then you have to host the source code on a server.
Why Convert YAML to CSV?
YAML files store hierarchical data in a human-readable format, but they’re not ideal for data analysis or sharing with non-technical team members. CSV files open in Excel, Google Sheets, and most database tools. You might need this conversion when:
- Exporting configuration data for business analysis
- Migrating data between systems that only accept CSV
- Creating reports from YAML-based application configs
- Batch processing multiple YAML files into a unified dataset
Prerequisites and Setup
Before starting, you need Python 3.6 or higher installed on your system. Check your version:
python --version
Install the pyyaml library using pip:
pip install pyyaml
That’s it. Python’s built-in csv module handles the CSV operations, so no additional dependencies are required.
Understanding Your YAML Structure
YAML files can have different structures. The conversion approach depends on whether your YAML contains a list of records or nested objects. Here’s a simple YAML file with employee data:
employees:
- name: Sarah Johnson
age: 28
department: Engineering
salary: 85000
- name: Mike Chen
age: 34
department: Marketing
salary: 72000
- name: Emily Rodriguez
age: 31
department: Sales
salary: 68000This structure works perfectly for CSV conversion because each employee record has consistent fields.
Basic YAML to CSV Conversion
Let’s start with a straightforward conversion script that handles the most common use case: a YAML file containing a list of dictionaries.
import yaml
import csv
def yaml_to_csv(yaml_file, csv_file):
# Read YAML file
with open(yaml_file, 'r') as file:
data = yaml.safe_load(file)
# Extract the list of records
# Adjust 'employees' to match your YAML key
records = data['employees']
# Get column headers from the first record
headers = records[0].keys()
# Write to CSV
with open(csv_file, 'w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=headers)
writer.writeheader()
writer.writerows(records)
print(f"Successfully converted {yaml_file} to {csv_file}")
# Usage
yaml_to_csv('employees.yaml', 'employees.csv')This code reads the YAML file, extracts the list of employee records, and writes them to a CSV file with proper headers. The DictWriter class handles the conversion automatically.
Handling Nested YAML Structures
Real-world YAML files often contain nested data. Here’s how to flatten nested structures for CSV export:
import yaml
import csv
def flatten_dict(nested_dict, parent_key='', sep='_'):
"""Flatten a nested dictionary"""
items = []
for key, value in nested_dict.items():
new_key = f"{parent_key}{sep}{key}" if parent_key else key
if isinstance(value, dict):
items.extend(flatten_dict(value, new_key, sep=sep).items())
elif isinstance(value, list):
# Convert lists to comma-separated strings
items.append((new_key, ', '.join(map(str, value))))
else:
items.append((new_key, value))
return dict(items)
def yaml_to_csv_nested(yaml_file, csv_file):
with open(yaml_file, 'r') as file:
data = yaml.safe_load(file)
# Handle different YAML structures
if isinstance(data, list):
records = data
elif isinstance(data, dict):
# Extract the list from the dictionary
records = next(iter(data.values()))
else:
raise ValueError("Unsupported YAML structure")
# Flatten each record
flattened_records = [flatten_dict(record) for record in records]
# Collect all possible headers
all_headers = set()
for record in flattened_records:
all_headers.update(record.keys())
headers = sorted(all_headers)
# Write to CSV
with open(csv_file, 'w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=headers)
writer.writeheader()
writer.writerows(flattened_records)
print(f"Conversion complete: {csv_file}")
# Usage
yaml_to_csv_nested('nested_data.yaml', 'flattened_output.csv')This script flattens nested dictionaries by combining parent and child keys with underscores. For example, address.city becomes address_city in the CSV.
Converting Multiple YAML Files
When you have multiple YAML files in a directory, batch processing saves time:
import yaml
import csv
import os
from pathlib import Path
def batch_yaml_to_csv(input_folder, output_folder):
# Create output folder if it doesn't exist
Path(output_folder).mkdir(parents=True, exist_ok=True)
# Process all YAML files
yaml_files = Path(input_folder).glob('*.yaml')
for yaml_file in yaml_files:
try:
with open(yaml_file, 'r') as file:
data = yaml.safe_load(file)
# Determine data structure
if isinstance(data, list):
records = data
elif isinstance(data, dict):
records = next(iter(data.values()))
else:
print(f"Skipping {yaml_file.name}: unsupported structure")
continue
# Generate CSV filename
csv_filename = yaml_file.stem + '.csv'
csv_path = Path(output_folder) / csv_filename
# Get headers
if records:
headers = records[0].keys()
# Write CSV
with open(csv_path, 'w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=headers)
writer.writeheader()
writer.writerows(records)
print(f"Converted: {yaml_file.name} → {csv_filename}")
else:
print(f"Skipping {yaml_file.name}: no records found")
except Exception as e:
print(f"Error processing {yaml_file.name}: {str(e)}")
# Usage
batch_yaml_to_csv('yaml_files/', 'csv_output/')This script processes every YAML file in a directory and creates corresponding CSV files in the output folder.
Handling Missing Fields
YAML records don’t always have consistent fields. Some entries might be missing certain keys. Here’s a robust approach:
import yaml
import csv
from collections import OrderedDict
def yaml_to_csv_with_missing_fields(yaml_file, csv_file):
with open(yaml_file, 'r') as file:
data = yaml.safe_load(file)
# Extract records
records = data if isinstance(data, list) else next(iter(data.values()))
# Collect all unique fields across all records
all_fields = set()
for record in records:
all_fields.update(record.keys())
# Sort fields for consistent column order
headers = sorted(all_fields)
# Fill missing fields with empty strings
normalized_records = []
for record in records:
normalized_record = OrderedDict()
for header in headers:
normalized_record[header] = record.get(header, '')
normalized_records.append(normalized_record)
# Write to CSV
with open(csv_file, 'w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=headers)
writer.writeheader()
writer.writerows(normalized_records)
print(f"Converted {len(normalized_records)} records to {csv_file}")
# Usage
yaml_to_csv_with_missing_fields('incomplete_data.yaml', 'complete_output.csv')This code scans all records first to identify every possible field, then fills missing values with empty strings to maintain CSV structure integrity.
Adding Custom Formatting and Validation
Sometimes you need to transform data during conversion. Here’s how to add custom formatting:
import yaml
import csv
from datetime import datetime
def yaml_to_csv_with_formatting(yaml_file, csv_file, formatters=None):
with open(yaml_file, 'r') as file:
data = yaml.safe_load(file)
records = data if isinstance(data, list) else next(iter(data.values()))
if not records:
print("No records found")
return
headers = records[0].keys()
# Apply custom formatters
if formatters:
formatted_records = []
for record in records:
formatted_record = {}
for key, value in record.items():
if key in formatters:
formatted_record[key] = formatters[key](value)
else:
formatted_record[key] = value
formatted_records.append(formatted_record)
records = formatted_records
# Write to CSV
with open(csv_file, 'w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=headers)
writer.writeheader()
writer.writerows(records)
print(f"Formatted and converted to {csv_file}")
# Custom formatters
formatters = {
'salary': lambda x: f"${x:,.2f}",
'hire_date': lambda x: datetime.strptime(x, '%Y-%m-%d').strftime('%m/%d/%Y'),
'active': lambda x: 'Yes' if x else 'No'
}
# Usage
yaml_to_csv_with_formatting('employees.yaml', 'formatted_employees.csv', formatters)This approach lets you apply custom transformations to specific fields, such as currency formatting or date conversions.
Error Handling and Logging
Production scripts need proper error handling. Here’s a complete implementation:
import yaml
import csv
import logging
from pathlib import Path
# Setup logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
def safe_yaml_to_csv(yaml_file, csv_file):
try:
# Validate input file exists
yaml_path = Path(yaml_file)
if not yaml_path.exists():
logger.error(f"YAML file not found: {yaml_file}")
return False
# Read YAML
with open(yaml_file, 'r', encoding='utf-8') as file:
data = yaml.safe_load(file)
if data is None:
logger.error(f"Empty YAML file: {yaml_file}")
return False
# Extract records
if isinstance(data, list):
records = data
elif isinstance(data, dict):
if not data:
logger.error(f"Empty dictionary in YAML: {yaml_file}")
return False
records = next(iter(data.values()))
else:
logger.error(f"Unsupported YAML structure: {yaml_file}")
return False
if not records:
logger.warning(f"No records found in: {yaml_file}")
return False
# Validate records are dictionaries
if not all(isinstance(record, dict) for record in records):
logger.error(f"Invalid record format in: {yaml_file}")
return False
# Get all headers
all_headers = set()
for record in records:
all_headers.update(record.keys())
headers = sorted(all_headers)
# Write CSV
with open(csv_file, 'w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=headers, extrasaction='ignore')
writer.writeheader()
for idx, record in enumerate(records, 1):
try:
writer.writerow(record)
except Exception as e:
logger.error(f"Error writing record {idx}: {str(e)}")
logger.info(f"Successfully converted {yaml_file} to {csv_file}")
logger.info(f"Total records: {len(records)}, Columns: {len(headers)}")
return True
except yaml.YAMLError as e:
logger.error(f"YAML parsing error in {yaml_file}: {str(e)}")
return False
except IOError as e:
logger.error(f"File I/O error: {str(e)}")
return False
except Exception as e:
logger.error(f"Unexpected error: {str(e)}")
return False
# Usage
if __name__ == "__main__":
success = safe_yaml_to_csv('data.yaml', 'output.csv')
if success:
print("Conversion completed successfully")
else:
print("Conversion failed - check logs for details")This version includes comprehensive error handling, logging, and validation to handle real-world scenarios gracefully.
Performance Optimization for Large Files
When dealing with large YAML files, memory usage becomes a concern. Here’s an optimized approach:
import yaml
import csv
def yaml_to_csv_optimized(yaml_file, csv_file, chunk_size=1000):
"""
Process large YAML files efficiently by writing in chunks
"""
with open(yaml_file, 'r') as file:
data = yaml.safe_load(file)
# Extract records
records = data if isinstance(data, list) else next(iter(data.values()))
if not records:
print("No records to process")
return
# Get headers from first record
headers = list(records[0].keys())
# Write CSV in chunks
with open(csv_file, 'w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=headers)
writer.writeheader()
# Process records in chunks
for i in range(0, len(records), chunk_size):
chunk = records[i:i + chunk_size]
writer.writerows(chunk)
if (i + chunk_size) % 10000 == 0:
print(f"Processed {i + chunk_size} records...")
print(f"Completed: {len(records)} records written to {csv_file}")
# Usage for large files
yaml_to_csv_optimized('large_dataset.yaml', 'large_output.csv')This implementation processes data in chunks to reduce memory footprint when handling files with thousands of records.
Command-Line Tool
Turn your converter into a command-line utility for easy reuse:
import yaml
import csv
import argparse
import sys
from pathlib import Path
def convert_yaml_to_csv(yaml_file, csv_file, flatten=False):
try:
with open(yaml_file, 'r', encoding='utf-8') as file:
data = yaml.safe_load(file)
# Handle different structures
if isinstance(data, list):
records = data
elif isinstance(data, dict):
records = next(iter(data.values()))
else:
print("Error: Unsupported YAML structure", file=sys.stderr)
return False
if not records:
print("Error: No records found", file=sys.stderr)
return False
# Flatten if requested
if flatten:
records = [flatten_dict(record) for record in records]
# Get headers
all_headers = set()
for record in records:
all_headers.update(record.keys())
headers = sorted(all_headers)
# Write CSV
with open(csv_file, 'w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=headers)
writer.writeheader()
writer.writerows(records)
print(f"Success: Converted {len(records)} records")
return True
except Exception as e:
print(f"Error: {str(e)}", file=sys.stderr)
return False
def flatten_dict(d, parent_key='', sep='_'):
items = []
for k, v in d.items():
new_key = f"{parent_key}{sep}{k}" if parent_key else k
if isinstance(v, dict):
items.extend(flatten_dict(v, new_key, sep=sep).items())
elif isinstance(v, list):
items.append((new_key, ', '.join(map(str, v))))
else:
items.append((new_key, v))
return dict(items)
def main():
parser = argparse.ArgumentParser(
description='Convert YAML files to CSV format'
)
parser.add_argument('yaml_file', help='Input YAML file path')
parser.add_argument('csv_file', help='Output CSV file path')
parser.add_argument(
'-f', '--flatten',
action='store_true',
help='Flatten nested structures'
)
args = parser.parse_args()
# Validate input file
if not Path(args.yaml_file).exists():
print(f"Error: File not found: {args.yaml_file}", file=sys.stderr)
sys.exit(1)
# Convert
success = convert_yaml_to_csv(args.yaml_file, args.csv_file, args.flatten)
sys.exit(0 if success else 1)
if __name__ == '__main__':
main()Save this as yaml2csv.py and use it from the terminal:
python yaml2csv.py input.yaml output.csv
python yaml2csv.py input.yaml output.csv --flatten
Common Issues and Solutions
Issue: Unicode Characters Not Displaying Correctly
Always specify UTF-8 encoding when opening files:
with open(yaml_file, 'r', encoding='utf-8') as file:
data = yaml.safe_load(file)Issue: YAML Contains Null Values
Handle null values explicitly:
for record in records:
for key, value in record.items():
if value is None:
record[key] = '' # or 'N/A' or any default valueIssue: Column Order Changes Between Runs
Use sorted() or maintain a specific order:
# For consistent ordering headers = sorted(records[0].keys()) # Or define explicit order preferred_order = ['id', 'name', 'email', 'department'] headers = [h for h in preferred_order if h in records[0].keys()]
Issue: Special Characters in CSV
The csv module handles most special characters automatically, but for complete safety:
writer = csv.DictWriter(
file,
fieldnames=headers,
quoting=csv.QUOTE_NONNUMERIC # Quotes all non-numeric fields
)Testing Your Conversion
Always test your conversion with sample data first:
import yaml
import csv
import tempfile
import os
def test_conversion():
# Create test YAML data
test_data = {
'users': [
{'id': 1, 'name': 'Alice', 'email': '[email protected]'},
{'id': 2, 'name': 'Bob', 'email': '[email protected]'}
]
}
# Create temporary files
with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as yaml_file:
yaml.dump(test_data, yaml_file)
yaml_path = yaml_file.name
csv_path = yaml_path.replace('.yaml', '.csv')
try:
# Run conversion
yaml_to_csv(yaml_path, csv_path)
# Verify CSV content
with open(csv_path, 'r') as file:
reader = csv.DictReader(file)
rows = list(reader)
assert len(rows) == 2, "Expected 2 rows"
assert rows[0]['name'] == 'Alice', "First row name mismatch"
assert rows[1]['id'] == '2', "Second row ID mismatch"
print("Test passed!")
finally:
# Clean up
os.unlink(yaml_path)
if os.path.exists(csv_path):
os.unlink(csv_path)
# Run test
test_conversion()Frequently Asked Questions
How do I handle YAML files with multiple documents?
YAML files can contain multiple documents separated by ---. Use yaml.safe_load_all() instead:
with open('multi_doc.yaml', 'r') as file:
documents = yaml.safe_load_all(file)
all_records = []
for doc in documents:
if isinstance(doc, list):
all_records.extend(doc)
elif isinstance(doc, dict):
all_records.extend(next(iter(doc.values())))Can I preserve data types in the CSV output?
CSV files store everything as text. If you need to preserve types, consider using pandas instead:
import pandas as pd
import yaml
with open('data.yaml', 'r') as file:
data = yaml.safe_load(file)
df = pd.DataFrame(data['records'])
df.to_csv('output.csv', index=False)What if my YAML has arrays within records?
Convert arrays to comma-separated strings:
def prepare_record(record):
prepared = {}
for key, value in record.items():
if isinstance(value, list):
prepared[key] = ', '.join(str(v) for v in value)
else:
prepared[key] = value
return preparedHow do I handle very large YAML files that don’t fit in memory?
For files larger than available RAM, you’ll need streaming YAML parsers. However, PyYAML loads the entire file into memory. Consider processing the file in parts or using alternative tools like yq for extremely large files.
Is PyYAML safe for untrusted input?
Always use yaml.safe_load() instead of yaml.load(). The safe_load function only constructs simple Python objects and is secure against arbitrary code execution.
Can I convert CSV back to YAML?
Yes, the reverse conversion is straightforward:
import csv
import yaml
def csv_to_yaml(csv_file, yaml_file):
with open(csv_file, 'r') as file:
reader = csv.DictReader(file)
records = list(reader)
with open(yaml_file, 'w') as file:
yaml.dump({'records': records}, file, default_flow_style=False)How do I handle datetime objects in YAML?
PyYAML automatically parses ISO 8601 datetime strings. Convert them to strings for CSV:
from datetime import datetime
def serialize_datetime(value):
if isinstance(value, datetime):
return value.strftime('%Y-%m-%d %H:%M:%S')
return valueWhat’s the difference between yaml.safe_load and yaml.load?
yaml.safe_load() only constructs basic Python objects (strings, lists, dicts, numbers). It’s secure for untrusted input. yaml.load() can execute arbitrary Python code and should never be used with untrusted data.
Conclusion
Converting YAML to CSV with Python and pyyaml is straightforward once you understand the basic patterns. Start with the simple conversion script for flat data structures, then adapt it for nested data, missing fields, or batch processing as needed.
The key points to remember: always use yaml.safe_load() for security, handle missing fields gracefully, specify UTF-8 encoding, and add proper error handling for production use. Test your conversion with sample data before processing critical files.
Whether you’re building a one-time migration script or a production data pipeline, these code examples give you a solid foundation. Save the command-line tool for quick conversions, and customize the formatting functions when you need specific data transformations.
The techniques in this guide work with Python 3.6 and above, and they scale from small configuration files to datasets with thousands of records. You now have the tools to handle any YAML-to-CSV conversion task that comes your way.