Adding support for .jpk-qi-data and .bin files and moving to dataclasses for the returned loaded data by ahobbs7 · Pull Request #190 · AFM-SPM/AFMReader

ahobbs7 · 2026-05-03T16:58:46Z

Adds support for .jpk-qi-data files, allowing the loading of the image for the selected channel. It also allows for the raw force-distance curve data as well as metadata for each curve to be loaded efficiently, using a lazy data structure approach that loads each section of the dataset into memory only when it is required.
The updates also allow for the conversion of .jpk-qi-data files into a HDF5 file format (.h5-jpk) for much faster mass analysis. Hence, the h5-jpk loading code has also been updated to support the loading of these converted files.

The update also allows .bin with different binary formats to be loaded.

closes #174 closes #173

…igger point or by contact point

… a function to general loader

… format

…make the save an adjustable option

… once, then extracting separately

…larity

…files and removing the ability to run curve analysis directly in the reader

…ore robust coordinate access

…o eagerly load lazy loaded data

…me efficient method

…g to memory then saving in one go when duplicating data to h5

…sampled curves then are streamed directly into h5

codecov-commenter · 2026-05-04T12:55:24Z

Codecov Report

❌ Patch coverage is 27.75281% with 643 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.12%. Comparing base (d65d297) to head (7b2623d).
⚠️ Report is 11 commits behind head on main.

Files with missing lines	Patch %	Lines
AFMReader/jpk_qi.py	10.07%	473 Missing ⚠️
AFMReader/h5_jpk.py	22.88%	91 Missing ⚠️
AFMReader/data_classes.py	44.06%	33 Missing ⚠️
AFMReader/raw_bin.py	23.52%	26 Missing ⚠️
AFMReader/general_loader.py	78.43%	11 Missing ⚠️
AFMReader/asd.py	78.57%	3 Missing ⚠️
AFMReader/jpk.py	89.65%	3 Missing ⚠️
AFMReader/spm.py	81.25%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #190       +/-   ##
===========================================
- Coverage   79.84%   53.12%   -26.73%     
===========================================
  Files          12       15        +3     
  Lines         928     1775      +847     
===========================================
+ Hits          741      943      +202     
- Misses        187      832      +645

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

SylviaWhittle

Small partial review.

I'd like to call with you to talk about the interesting lazy loading of data if you have time?

SylviaWhittle · 2026-05-22T15:02:36Z

        return frames, pixel_to_nanometre_scaling_factor, header_dict


+def get_asd_channels(file_path: Path):


I don't have time to check this manually - does it work? There isn't a test but frankly we don't have the dev time to move slowly. If you say it works, this is fine with me.

The channel fetching seems to work though I'm happy to quickly make some tests for them as that should be pretty quick.

SylviaWhittle · 2026-05-22T15:13:33Z

+                elif len(h5_returned) == 4:
+                    image, pixel_to_nanometre_scaling_factor, _, curve_data = h5_returned  # type: ignore[misc]
+                    self.loaded_curves = True
+                    print(


Suggested change

print(

logger.info(

or just remove it.

Yes, I'll just remove them, they are some accidentally left in prints for debugging.

SylviaWhittle · 2026-05-22T15:13:43Z

+                        f"Loaded image with shape {image.shape} and pixel to nanometre "
+                        f"scaling factor {pixel_to_nanometre_scaling_factor}"
+                    )
+                    print(f"Image has max value {image.max()} and min value {image.min()}")


SylviaWhittle · 2026-05-22T15:14:32Z

                image, pixel_to_nanometre_scaling_factor = spm.load_spm(self.filepath, self.channel)
            elif self.suffix == ".h5-jpk":
-                image, pixel_to_nanometre_scaling_factor, _ = h5_jpk.load_h5jpk(self.filepath, self.channel)
+                h5_returned = h5_jpk.load_h5jpk(self.filepath, self.channel, load_curves=not self.loaded_curves)


Could this entire block be absorbed into the h5_jpk.load_h5jpk function call? It's a little messy here.

I'll have a go too.

Yes, is definitely a bit messy. My thinking was preventing the need to make changes to topostats by not changing what each load function returns if it isn't reading the curve data I'm adding support for. Though I think h5jpk is not used in TopoStats anyway? And it might be necessary to make changes to topostats reading anyway due to the changes I have been working on for returning the z unit for the read files?

SylviaWhittle · 2026-05-22T15:15:20Z

+            raise e

-    # scope for a "check what channels are available" function similar to above.
+    def get_available_channels(self):  # noqa: C901


Guessing this is for the napari feature to list the channels?

Well done if this is all working, that's a lot of work.

SylviaWhittle · 2026-05-22T15:25:09Z

+            A proxy object for the specified row.
+        """
+
+        class RowProxy:


Is the reason RowProxy is defined within the scope of LazyQiData encapsulation? It'll get redefined each time __getitem__ runs, though this is likely a negligible and unimportant cost.

SylviaWhittle · 2026-05-22T15:41:59Z

+                self.parent = parent
+                self.y = y
+
+            def __getitem__(self, x: int):


I'm going to need a bit of an explanation on how this is intended to work - could we set up a call to talk about it? There's a lot going on here. Plus, frankly I'm just curious.

Yes, definitely!

SylviaWhittle · 2026-05-22T15:43:05Z

 # Set the format to have blue time, green file, module, function and line, and white message
 logger.add(
-    sys.stderr,
+    lambda msg: sys.stderr.write(msg),  # pylint: disable=unnecessary-lambda


What necessitated this change?

It seems like the general_loader and some of the spm tests which required were failing due to not being able to correctly reading the logged output (instead always reading an empty string), they seem to be failing when I run even the main branch locally for that reason. The lamda appears to be necessary due to a quirk of loguru where lamda makes sys.stderr be dynamically retrieved at the time of logging rather than once at import time.

SylviaWhittle · 2026-05-27T12:02:59Z

+    return final_multiplier, final_offset, unit
+
+
+class jpk_qi_loader:


Capitalise this please :)

SylviaWhittle · 2026-05-27T12:18:54Z

+        # Open the ZIP archive once and keep it open for the duration of the loading process
+        self.qi_archive = zipfile.ZipFile(self.filepath, "r")  # pylint: disable=consider-using-with
+        logger.info(f"Opened JPK QI archive at {self.filepath}")
+        self.namelist = self.qi_archive.namelist()


Maybe call this list_of_all_paths or something - since namelist could be anything? Just so future people (including ourselves) can tell at a glance what this is.

Maybe also add a comment above as to what it's used for? :)

SylviaWhittle · 2026-05-27T12:20:20Z

+        self.qi_archive = zipfile.ZipFile(self.filepath, "r")  # pylint: disable=consider-using-with
+        logger.info(f"Opened JPK QI archive at {self.filepath}")
+        self.namelist = self.qi_archive.namelist()
+        # Set path to the .jpk-qi-image file within the archive for later use


Suggested change

# Set path to the .jpk-qi-image file within the archive for later use

# For holding the reference to where the actual .jqk-qi image is (not the metadata).

SylviaWhittle · 2026-05-27T12:39:31Z

+        if self.save_as_h5:
+            self.save_lite_data()
+
+        return (self.image, self.px2nm, (self.curve_data, self.channels_units, self.full_metadata))


Possibly bundle these objects into one metadata dataclass, JPKQiCurveData? Gentle suggestion.

…to a numpy ndarray and grouping into unified CurveDataset structure

…actored dataclasses

…adata to align with potential differences in channels between volumes

… data

ahobbs7 · 2026-06-23T15:30:07Z

+        """
+        self.shape_x = shape_x
+        self.shape_y = shape_y
+        self.dims = (shape_y, shape_x)


Should this be shape instead of dims?

SylviaWhittle · 2026-06-23T13:41:20Z

+        int
+            The total number of pixels in the image.
+        """
+        return self.shape_x * self.shape_y


Perhaps the shape returned should be (x, y) rather than the absolute number of entries, since they are indexed as (x, y) rather than just the "flat" index?

SylviaWhittle · 2026-06-23T14:00:44Z

+        curve_num = y * self.shape_x + x
+        curve_data: dict[str, Any] = {}
+
+        for chan_name, scale in self.channel_scaling.items():


Maybe for these short names, use the full name, ie "channel_name"? Just in case it could be misinterpreted.

SylviaWhittle · 2026-06-23T14:02:30Z

+    cumulative_multiplier = 1.0
+    cumulative_offset = 0.0
+    unit = props.get(f"{prefix}conversion-set.conversion.{current_slot}.scaling.unit.unit")
+


A comment about what slot is would be good here. :)

SylviaWhittle · 2026-06-23T14:05:09Z

+        # For holding the reference to where the actual .jqk-qi image is (not the metadata).
+        self.path_to_image = None
+
+        # Chunk size for H5 datasets


A brief explanation on why chunks matter, for future devs here?

SylviaWhittle · 2026-06-23T14:08:17Z

+                i += 1
+
+        logger.info(f"Loading JPK QI data from {self.filepath} with channel {self.channel}")
+        self.extract_global_metadata()


Make this return the value and set it here instead? Just a little clearer - feel free to override this suggestion.

Just noticed that the extract global metadata also defines some other instance attributes as well as the main global metadata attribute so I think going to leave as is to keep clean

SylviaWhittle · 2026-06-23T14:11:44Z

+            # If there are no failed loads, log that all data was loaded successfully
+            logger.info("Successfully loaded all curve data without any missing files.")
+
+    def extract_data_to_h5(


Rename perhaps?

SylviaWhittle · 2026-06-23T14:13:53Z

+        """
+        Predict the total number of points for each channel and segment.
+
+        This is done by sampling a subset of curves and extrapolating based on the maximum number


Perhaps add that if this prediction is wrong, how the curve loading for the excess curve data will be slow due to resizing?

SylviaWhittle · 2026-06-23T14:14:25Z

+            flip_image=bool(flip_image),
+        )
+
+    def save_lite_data(self):


Mention without curves?

Renamed to extract_and_save_per_curve_data

SylviaWhittle · 2026-06-23T14:15:26Z

+                h5_datasets_buffer[seg_name][chan["name"]] = {"Data": [], "Indices": []}
+        return global_meta_group, h5_datasets, h5_meta_datasets, h5_datasets_buffer, h5_meta_datasets_buffer
+
+    def get_saving_context(self):


Stale - been changed on another branch - update?

SylviaWhittle · 2026-06-23T14:15:51Z

+        # Lookup map for binary scaling
+        self.channel_scaling = {chan["name"]: chan for chan in self.segment_channels}
+
+    def close(self):


Check this?

…riable names and better comments

ahobbs7 added 30 commits February 12, 2026 17:57

Added jpk-qi-data loading functionality. Two possible channels: by tr…

d25abaf

…igger point or by contact point

Adding get available channels function to each file format as well as…

9d28dec

… a function to general loader

Making load jpk qi data function use zipfile instead of afmformats

4bd6bb3

Fixing scaling

28f7633

Updating general_loader and dependencies

a67b11f

Adjusting h5_jpk so it works for different shaped images

37e1ae2

Making the jpk-qi-data processing save the h5 jpk file in the correct…

16a827c

… format

Add the ability to save the metadata to h5 from jpk qi. Additionally …

c780de5

…make the save an adjustable option

Adding ability to load force curves from h5 file

a764ca6

Improving speed of loading qi curve data by loading all the curves at…

405a95d

… once, then extracting separately

Making loading jpk-qi-data return all the curves

3198567

Adding .bin files support

c25fb75

Adding returning of metadata

085b252

Refactoring jpk-qi-data reader to use a loader class for greater modu…

06ec0de

…larity

Starting to implement lazy loading

78c54a7

Made force curves lazy loaded for h5-jpk and jpk-qi-data

3c327e9

Implementing caching of heavy data objects/ references to large open …

e776c66

…files and removing the ability to run curve analysis directly in the reader

Adjusting curve data access method to work more like a 2D array for m…

3e0b8b1

…ore robust coordinate access

Separating saving functionality for jpk-qi-data and adding function t…

b8ec43a

…o eagerly load lazy loaded data

Fixing duplicated converting to nm bug

10c1050

Minor changes to fix double scaling on current channel as well

b510141

Fixing minor error

afc65cc

Added timing for testing and started converting to more memory and ti…

b1ec527

…me efficient method

Making jpk-qi-data loading stream data into h5 file rather than savin…

1fd1eed

…g to memory then saving in one go when duplicating data to h5

Changing metdata data saving so 'changing keys' are assumed based on …

4e90188

…sampled curves then are streamed directly into h5

Pre-sizing the curve data to make loading faster (using a best guess)

66f388a

Improving performance by removing javaproperties reliance in loop

f286e57

Making the saving to h5 save in sections using a buffer

5391179

Removing redundant functions and fixing minor index bugs

f0235ad

removing possibility of size 1 image stack

62481da

ahobbs7 and others added 3 commits May 3, 2026 17:13

Skipping tests requiring large test files which cannot be added to repo

b88ee26

[pre-commit.ci] Fixing issues with pre-commit

be2840e

Fixing pre-commit problems

a1764fe

ahobbs7 marked this pull request as ready for review May 5, 2026 07:54

ahobbs7 requested a review from SylviaWhittle May 5, 2026 07:54

SylviaWhittle requested changes May 22, 2026

View reviewed changes

ahobbs7 added 3 commits May 22, 2026 18:12

chore: removing print statements

603d7bf

tests: adding tests for get channel functions

43640ac

fix: stopped unnecessary redefining of nested row proxy classes

0146542

SylviaWhittle requested changes May 27, 2026

View reviewed changes

ahobbs7 added 4 commits May 28, 2026 10:01

docs: comments and improving variable naming

0f6dabf

refactor: simplifying dataclass structure by making them work closer …

612d4e9

…to a numpy ndarray and grouping into unified CurveDataset structure

tests: updating curve loading tests (currently being skipped) for ref…

b104b3b

…actored dataclasses

chore: renaming get_pixel_metadata to get_point_metadata

2394556

ahobbs7 requested a review from SylviaWhittle June 1, 2026 14:55

feat: moving channel_units to be stored per volume rather than in met…

699c55b

…adata to align with potential differences in channels between volumes

SylviaWhittle mentioned this pull request Jun 3, 2026

[feature] : Return an object instead of an image and pixel to nm scaling? (consistency) #193

Open

refactor: using a dataclass to represent all return data consistently

773daca

ahobbs7 changed the title ~~Adding support for .jpk-qi-data and .bin files~~ Adding support for .jpk-qi-data and .bin files and moving to dataclasses for the returned loaded data Jun 5, 2026

ahobbs7 added 3 commits June 5, 2026 16:42

refactor: moving jpk-qi-data logic in general_loader into jpk_qi

0b29bbb

chore: moving channels logic out of general_loader and into topostats

23523fb

fix: resizing of datasets while saving not including current buffered…

724460f

… data

ahobbs7 commented Jun 23, 2026

View reviewed changes

SylviaWhittle requested changes Jul 1, 2026

View reviewed changes

ahobbs7 added 4 commits July 1, 2026 13:21

docs: renaming for clarity and adding comments

0a45730

fix: removing unnecessary emptying of variables in JPKQILoader

817a999

docs: improving readability of get_channel_scaling through clearer va…

fcce8d9

…riable names and better comments

docs: renaming px2nm to pixel_to_nanometre_scaling

7b2623d

		return frames, pixel_to_nanometre_scaling_factor, header_dict


		def get_asd_channels(file_path: Path):

		return final_multiplier, final_offset, unit


		class jpk_qi_loader:

	# Set path to the .jpk-qi-image file within the archive for later use
	# For holding the reference to where the actual .jqk-qi image is (not the metadata).

Uh oh!

Conversation

ahobbs7 commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

SylviaWhittle left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

ahobbs7 commented May 3, 2026 •

edited

Loading

codecov-commenter commented May 4, 2026 •

edited

Loading