Skip to content

Adding support for .jpk-qi-data and .bin files and moving to dataclasses for the returned loaded data#190

Open
ahobbs7 wants to merge 58 commits into
mainfrom
ahobbs7/jpk-qi-data
Open

Adding support for .jpk-qi-data and .bin files and moving to dataclasses for the returned loaded data#190
ahobbs7 wants to merge 58 commits into
mainfrom
ahobbs7/jpk-qi-data

Conversation

@ahobbs7

@ahobbs7 ahobbs7 commented May 3, 2026

Copy link
Copy Markdown
Collaborator

Adds support for .jpk-qi-data files, allowing the loading of the image for the selected channel. It also allows for the raw force-distance curve data as well as metadata for each curve to be loaded efficiently, using a lazy data structure approach that loads each section of the dataset into memory only when it is required.
The updates also allow for the conversion of .jpk-qi-data files into a HDF5 file format (.h5-jpk) for much faster mass analysis. Hence, the h5-jpk loading code has also been updated to support the loading of these converted files.

The update also allows .bin with different binary formats to be loaded.

closes #174 closes #173

…files and removing the ability to run curve analysis directly in the reader
…g to memory then saving in one go when duplicating data to h5
…sampled curves then are streamed directly into h5
@codecov-commenter

codecov-commenter commented May 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 27.75281% with 643 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.12%. Comparing base (d65d297) to head (7b2623d).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
AFMReader/jpk_qi.py 10.07% 473 Missing ⚠️
AFMReader/h5_jpk.py 22.88% 91 Missing ⚠️
AFMReader/data_classes.py 44.06% 33 Missing ⚠️
AFMReader/raw_bin.py 23.52% 26 Missing ⚠️
AFMReader/general_loader.py 78.43% 11 Missing ⚠️
AFMReader/asd.py 78.57% 3 Missing ⚠️
AFMReader/jpk.py 89.65% 3 Missing ⚠️
AFMReader/spm.py 81.25% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #190       +/-   ##
===========================================
- Coverage   79.84%   53.12%   -26.73%     
===========================================
  Files          12       15        +3     
  Lines         928     1775      +847     
===========================================
+ Hits          741      943      +202     
- Misses        187      832      +645     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ahobbs7 ahobbs7 marked this pull request as ready for review May 5, 2026 07:54
@ahobbs7 ahobbs7 requested a review from SylviaWhittle May 5, 2026 07:54

@SylviaWhittle SylviaWhittle left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small partial review.

I'd like to call with you to talk about the interesting lazy loading of data if you have time?

Comment thread AFMReader/asd.py
return frames, pixel_to_nanometre_scaling_factor, header_dict


def get_asd_channels(file_path: Path):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to check this manually - does it work? There isn't a test but frankly we don't have the dev time to move slowly. If you say it works, this is fine with me.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel fetching seems to work though I'm happy to quickly make some tests for them as that should be pretty quick.

Comment thread AFMReader/general_loader.py Outdated
elif len(h5_returned) == 4:
image, pixel_to_nanometre_scaling_factor, _, curve_data = h5_returned # type: ignore[misc]
self.loaded_curves = True
print(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(
logger.info(

or just remove it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll just remove them, they are some accidentally left in prints for debugging.

Comment thread AFMReader/general_loader.py Outdated
f"Loaded image with shape {image.shape} and pixel to nanometre "
f"scaling factor {pixel_to_nanometre_scaling_factor}"
)
print(f"Image has max value {image.max()} and min value {image.min()}")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto :)

Comment thread AFMReader/general_loader.py Outdated
image, pixel_to_nanometre_scaling_factor = spm.load_spm(self.filepath, self.channel)
elif self.suffix == ".h5-jpk":
image, pixel_to_nanometre_scaling_factor, _ = h5_jpk.load_h5jpk(self.filepath, self.channel)
h5_returned = h5_jpk.load_h5jpk(self.filepath, self.channel, load_curves=not self.loaded_curves)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this entire block be absorbed into the h5_jpk.load_h5jpk function call? It's a little messy here.

I'll have a go too.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, is definitely a bit messy. My thinking was preventing the need to make changes to topostats by not changing what each load function returns if it isn't reading the curve data I'm adding support for. Though I think h5jpk is not used in TopoStats anyway? And it might be necessary to make changes to topostats reading anyway due to the changes I have been working on for returning the z unit for the read files?

raise e

# scope for a "check what channels are available" function similar to above.
def get_available_channels(self): # noqa: C901

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guessing this is for the napari feature to list the channels?

Well done if this is all working, that's a lot of work.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Comment thread AFMReader/lazy_data_classes.py Outdated
A proxy object for the specified row.
"""

class RowProxy:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reason RowProxy is defined within the scope of LazyQiData encapsulation? It'll get redefined each time __getitem__ runs, though this is likely a negligible and unimportant cost.

Comment thread AFMReader/lazy_data_classes.py Outdated
self.parent = parent
self.y = y

def __getitem__(self, x: int):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to need a bit of an explanation on how this is intended to work - could we set up a call to talk about it? There's a lot going on here. Plus, frankly I'm just curious.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely!

Comment thread AFMReader/logging.py
# Set the format to have blue time, green file, module, function and line, and white message
logger.add(
sys.stderr,
lambda msg: sys.stderr.write(msg), # pylint: disable=unnecessary-lambda

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What necessitated this change?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the general_loader and some of the spm tests which required were failing due to not being able to correctly reading the logged output (instead always reading an empty string), they seem to be failing when I run even the main branch locally for that reason. The lamda appears to be necessary due to a quirk of loguru where lamda makes sys.stderr be dynamically retrieved at the time of logging rather than once at import time.

Comment thread AFMReader/jpk_qi.py Outdated
return final_multiplier, final_offset, unit


class jpk_qi_loader:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalise this please :)

Comment thread AFMReader/jpk_qi.py Outdated
# Open the ZIP archive once and keep it open for the duration of the loading process
self.qi_archive = zipfile.ZipFile(self.filepath, "r") # pylint: disable=consider-using-with
logger.info(f"Opened JPK QI archive at {self.filepath}")
self.namelist = self.qi_archive.namelist()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this list_of_all_paths or something - since namelist could be anything? Just so future people (including ourselves) can tell at a glance what this is.

Maybe also add a comment above as to what it's used for? :)

Comment thread AFMReader/jpk_qi.py Outdated
self.qi_archive = zipfile.ZipFile(self.filepath, "r") # pylint: disable=consider-using-with
logger.info(f"Opened JPK QI archive at {self.filepath}")
self.namelist = self.qi_archive.namelist()
# Set path to the .jpk-qi-image file within the archive for later use

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Set path to the .jpk-qi-image file within the archive for later use
# For holding the reference to where the actual .jqk-qi image is (not the metadata).

Comment thread AFMReader/jpk_qi.py Outdated
if self.save_as_h5:
self.save_lite_data()

return (self.image, self.px2nm, (self.curve_data, self.channels_units, self.full_metadata))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly bundle these objects into one metadata dataclass, JPKQiCurveData? Gentle suggestion.

@ahobbs7 ahobbs7 requested a review from SylviaWhittle June 1, 2026 14:55
…adata to align with potential differences in channels between volumes
@ahobbs7 ahobbs7 changed the title Adding support for .jpk-qi-data and .bin files Adding support for .jpk-qi-data and .bin files and moving to dataclasses for the returned loaded data Jun 5, 2026
Comment thread AFMReader/data_classes.py
"""
self.shape_x = shape_x
self.shape_y = shape_y
self.dims = (shape_y, shape_x)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be shape instead of dims?

Comment thread AFMReader/data_classes.py
int
The total number of pixels in the image.
"""
return self.shape_x * self.shape_y

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the shape returned should be (x, y) rather than the absolute number of entries, since they are indexed as (x, y) rather than just the "flat" index?

Comment thread AFMReader/jpk_qi.py Outdated
curve_num = y * self.shape_x + x
curve_data: dict[str, Any] = {}

for chan_name, scale in self.channel_scaling.items():

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for these short names, use the full name, ie "channel_name"? Just in case it could be misinterpreted.

Comment thread AFMReader/jpk_qi.py
cumulative_multiplier = 1.0
cumulative_offset = 0.0
unit = props.get(f"{prefix}conversion-set.conversion.{current_slot}.scaling.unit.unit")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment about what slot is would be good here. :)

Comment thread AFMReader/jpk_qi.py Outdated
# For holding the reference to where the actual .jqk-qi image is (not the metadata).
self.path_to_image = None

# Chunk size for H5 datasets

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A brief explanation on why chunks matter, for future devs here?

Comment thread AFMReader/jpk_qi.py
i += 1

logger.info(f"Loading JPK QI data from {self.filepath} with channel {self.channel}")
self.extract_global_metadata()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this return the value and set it here instead? Just a little clearer - feel free to override this suggestion.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that the extract global metadata also defines some other instance attributes as well as the main global metadata attribute so I think going to leave as is to keep clean

Comment thread AFMReader/jpk_qi.py Outdated
# If there are no failed loads, log that all data was loaded successfully
logger.info("Successfully loaded all curve data without any missing files.")

def extract_data_to_h5(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename perhaps?

Comment thread AFMReader/jpk_qi.py
"""
Predict the total number of points for each channel and segment.

This is done by sampling a subset of curves and extrapolating based on the maximum number

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add that if this prediction is wrong, how the curve loading for the excess curve data will be slow due to resizing?

Comment thread AFMReader/jpk_qi.py Outdated
flip_image=bool(flip_image),
)

def save_lite_data(self):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention without curves?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to extract_and_save_per_curve_data

Comment thread AFMReader/jpk_qi.py Outdated
h5_datasets_buffer[seg_name][chan["name"]] = {"Data": [], "Indices": []}
return global_meta_group, h5_datasets, h5_meta_datasets, h5_datasets_buffer, h5_meta_datasets_buffer

def get_saving_context(self):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale - been changed on another branch - update?

Comment thread AFMReader/jpk_qi.py
# Lookup map for binary scaling
self.channel_scaling = {chan["name"]: chan for chan in self.segment_channels}

def close(self):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature] : Read .jpk-qi-data files [Bug]: .jpk-qi-data files not loading

3 participants