`read_nsrdb_psm4`: parse header with the `csv` module to keep quoted commas by gaoflow · Pull Request #2771 · pvlib/pvlib-python

gaoflow · 2026-06-02T03:04:08Z

Closes Issue with importing NSRDB spectral-on-demand files with pvlib.iotools.read_nsrdb_psm4 #2736
I am familiar with the contributing guidelines
Tests added
Updates entries in docs/sphinx/source/reference for API changes. (n/a — no API change)
Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes.
New code is fully documented. (behavior unchanged for existing files; no public-API docstring change)
Pull request is nearly complete and ready for detailed review.

What this fixes

read_nsrdb_psm4 parsed its three header lines with a naive str.split(','):

metadata_fields = fbuf.readline().split(',')
metadata_values = fbuf.readline().split(',')
columns        = fbuf.readline().split(',')

The NSRDB spectral-on-demand CSVs reported in #2736 have quoted column
names that contain commas, e.g.

..., "GaAs (Bauhuis et al., 2009)","InGaP (Gray, 2008)", ...

These are valid CSV (the commas are inside quotes), and pandas.read_csv
parses the data rows correctly — but str.split(',') splits each quoted name
into multiple fragments, inflating the column count. The mismatch between the
mis-split names/usecols and the correctly-parsed data then raises on read.

The change

Parse the three header lines with the csv module (which honors quoting)
instead of str.split(','). For ordinary (unquoted) files this is identical
to the previous behavior, so the existing readers are unaffected.

This addresses the parsing crash that @kandersolar confirmed should be
supported. The further map_variables=True unit handling for spectral files
(W/m²/µm → W/m²/nm) mentioned in the issue is a separate enhancement and is
left out of scope here.

Reproduction (before this PR)

from io import StringIO
from pvlib.iotools import psm4

content = (
    "Source,Location ID,City,State,Country,Latitude,Longitude,Time Zone,"
    "Elevation,Local Time Zone,Version\n"
    "NSRDB,1,-,-,-,40.0,-105.0,-7,1600,-7,4.0.1\n"
    'Year,Month,Day,Hour,Minute,GHI,"GaAs (Bauhuis et al., 2009)",'
    '"InGaP (Gray, 2008)"\n'
    "2023,1,1,0,0,0,0.1,0.2\n"
    "2023,1,1,1,0,5,0.3,0.4\n"
)
psm4.read_nsrdb_psm4(StringIO(content), map_variables=False)
# ParserError: Too many columns specified: expected 10 and found 8

After the fix the quoted columns survive intact
('GaAs (Bauhuis et al., 2009)', 'InGaP (Gray, 2008)').

A regression test (test_read_nsrdb_psm4_quoted_columns_with_commas) is added
that fails on main and passes with this change; the existing
read_nsrdb_psm4 tests continue to pass.

read_nsrdb_psm4 split the three header lines with a naive str.split(','), which broke spectral-on-demand files whose column names are quoted fields containing commas (e.g. '"GaAs (Bauhuis et al., 2009)"'). Such names were split into spurious columns, raising on read. Parse the header lines with the csv module so quoted fields are kept intact. Fixes pvlib#2736

kandersolar

Nice PR, thanks @gaoflow

gaoflow force-pushed the fix-2736-nsrdb-psm4-quoted-columns branch from e79f87e to 88f18aa Compare June 8, 2026 13:43

kandersolar added bug io labels Jun 9, 2026

kandersolar added this to the v0.15.2 milestone Jun 9, 2026

kandersolar added the remote-data triggers --remote-data pytests label Jun 9, 2026

kandersolar approved these changes Jun 9, 2026

View reviewed changes

kandersolar changed the title ~~Parse read_nsrdb_psm4 header with the csv module to keep quoted commas (fixes #2736)~~ read_nsrdb_psm4: parse header with the csv module to keep quoted commas Jun 9, 2026

kandersolar merged commit 044ad59 into pvlib:main Jun 9, 2026
37 of 39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`read_nsrdb_psm4`: parse header with the `csv` module to keep quoted commas#2771

`read_nsrdb_psm4`: parse header with the `csv` module to keep quoted commas#2771
kandersolar merged 1 commit into
pvlib:mainfrom
gaoflow:fix-2736-nsrdb-psm4-quoted-columns

gaoflow commented Jun 2, 2026

Uh oh!

kandersolar left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gaoflow commented Jun 2, 2026

Uh oh!

kandersolar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants