Skip to content

feat(deepmd): add hdf5 mixed format#981

Open
njzjz wants to merge 1 commit into
deepmodeling:masterfrom
njzjz:feat/deepmd-hdf5-mixed
Open

feat(deepmd): add hdf5 mixed format#981
njzjz wants to merge 1 commit into
deepmodeling:masterfrom
njzjz:feat/deepmd-hdf5-mixed

Conversation

@njzjz

@njzjz njzjz commented Jun 20, 2026

Copy link
Copy Markdown
Member

Summary

  • add deepmd/hdf5/mixed format registration and HDF5 group handling
  • reuse the existing mixed-type split/dump logic through backend loader/dumper hooks
  • preserve full HDF5 type_map.raw and persist real_atom_types for mixed reconstruction
  • add HDF5 mixed round-trip, #group, HDF5 object, unlabeled, type-map, padding, regular-HDF5 rejection, and error-path tests

Tests

  • ruff format dpdata/ tests/test_deepmd_hdf5.py
  • ruff check dpdata/ tests/test_deepmd_hdf5.py
  • cd tests && python -m unittest test_deepmd_hdf5.py test_deepmd_mixed.py test_deepmd_comp.py (325 tests)
  • cd tests && coverage run --source=../dpdata -m unittest test_deepmd_hdf5.py test_deepmd_mixed.py test_deepmd_comp.py && coverage report -m ../dpdata/plugins/deepmd.py ../dpdata/formats/deepmd/mixed.py ../dpdata/formats/deepmd/hdf5.py

Coverage note: the new DeePMDHDF5MixedFormat class has no missing executable lines in the targeted coverage run. Remaining misses in plugins/deepmd.py are pre-existing sections outside this PR, such as raw/npy branches and DPDriver.

@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. deepmd DeePMD-kit format dpdata enhancement New feature or request labels Jun 20, 2026
@codspeed-hq

codspeed-hq Bot commented Jun 20, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 2 untouched benchmarks


Comparing njzjz:feat/deepmd-hdf5-mixed (94837fe) with master (7d75096)

Open in CodSpeed

@codecov

codecov Bot commented Jun 20, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.77419% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.77%. Comparing base (7d75096) to head (94837fe).

Files with missing lines Patch % Lines
dpdata/plugins/deepmd.py 96.35% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #981      +/-   ##
==========================================
+ Coverage   86.75%   86.77%   +0.01%     
==========================================
  Files          89       89              
  Lines        8093     8315     +222     
==========================================
+ Hits         7021     7215     +194     
- Misses       1072     1100      +28     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a new deepmd/hdf5/mixed format for reading and writing DeePMD mixed-type systems to HDF5 files. It fixes atom-name/numbs reconstruction in the HDF5 backend, introduces a pluggable load_func/dump_func API in the mixed layer, registers a new DeePMDHDF5MixedFormat plugin class with full single-system and MultiSystems support, refactors the mixed-format export loop in MultiSystems.to_fmt_obj, and adds comprehensive tests covering group layout, real_atom_types, and padding.

Changes

DeePMD mixed-type HDF5 format

Layer / File(s) Summary
HDF5 atom-name/numbs reconstruction and dtype skip-list fixes
dpdata/formats/deepmd/hdf5.py
Restructures atom_names/atom_numbs initialization to derive from atom_types after asserting my_type_map coverage; removes real_atom_types from both to_system_data and dump dtype skip-lists so it is processed as a regular frame property.
Pluggable load/dump API and _to_system_data helper
dpdata/formats/deepmd/mixed.py
Adds _to_system_data helper that remaps atom names, groups frames by real_atom_types layout, strips virtual atoms, and builds per-layout system dicts. Updates to_system_data and dump signatures with optional load_func/dump_func parameters, both defaulting to the existing comp_* implementations.
DeePMDHDF5MixedFormat class registration and backend helpers
dpdata/plugins/deepmd.py
Registers DeePMDHDF5MixedFormat under deepmd/hdf5/mixed. Adds _load_hdf5_mixed_data and _dump_hdf5_mixed_data static delegates, _iter_mixed_groups for recursive HDF5 traversal, and _get_group/_create_group helpers for path-based navigation.
Public read/write and MultiSystems methods
dpdata/plugins/deepmd.py
Implements from_system_mix, from_labeled_system_mix, and _from_system_mix for reading; to_system for writing with comp_prec support. Implements from_multi_systems and to_multi_systems generators for MultiSystems support with file#group string and HDF5 object handling.
MultiSystems export refactor, docstrings, and tests
dpdata/system.py, tests/test_deepmd_hdf5.py
Refactors the DeepMD mixed export loop in MultiSystems.to_fmt_obj to zip filenames from fmtobj.to_multi_systems with mixed_systems.values(). Updates System and LabeledSystem docstrings to list deepmd/npy/mixed and deepmd/hdf5/mixed formats. Adds TestHDF5MixedMulti, TestHDF5MixedPadding, and TestHDF5MixedIOVariants covering group layout, padding, and round-trip variants.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • deepmodeling/dpdata#970: Both PRs modify the DeepMD plugin wiring in dpdata/plugins/deepmd.py to use the dpdata.formats.deepmd.* backends; this PR builds the new mixed HDF5 format on top of that refactored dispatch.

Suggested labels

enhancement, deepmd, lgtm

Suggested reviewers

  • iProzd
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(deepmd): add hdf5 mixed format' directly and concisely describes the main change: adding HDF5 mixed format support to the DeepMD implementation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
dpdata/system.py (1)

1425-1430: Add explicit zip strictness to avoid silent truncation in mixed export loop.

The zip loop at lines 1425-1430 can silently drop trailing items if the iterables have different lengths. With Python 3.10+ as the minimum version, add strict=True to make the pairing explicit:

Suggested change
             for fn, ss in zip(
                 fmtobj.to_multi_systems(
                     list(mixed_systems.keys()), directory, **kwargs
                 ),
                 mixed_systems.values(),
+                strict=True,
             ):
                 ss.to_fmt_obj(fmtobj, fn, *args, **kwargs)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dpdata/system.py` around lines 1425 - 1430, In the zip loop at lines
1425-1430 that combines fmtobj.to_multi_systems() with mixed_systems.values(),
add the strict=True parameter to the zip function call to prevent silent
truncation when the two iterables have different lengths, ensuring both the list
of system keys and the mixed_systems values are properly paired without loss of
data.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dpdata/plugins/deepmd.py`:
- Around line 525-530: The condition in the `_iter_mixed_groups` method that
yields a group based solely on the presence of `type_map.raw` is too permissive
and incorrectly identifies regular deepmd/hdf5 groups as mixed groups. To fix
this, add an additional check alongside the `type_map.raw` existence check to
properly distinguish mixed groups from regular groups. The additional check
should verify the presence of attributes or datasets that are specific to mixed
format groups (such as `real_atom_types` which is expected by
`mixed.to_system_data`) to ensure only actual mixed groups are yielded and
routed to the mixed loader.

---

Nitpick comments:
In `@dpdata/system.py`:
- Around line 1425-1430: In the zip loop at lines 1425-1430 that combines
fmtobj.to_multi_systems() with mixed_systems.values(), add the strict=True
parameter to the zip function call to prevent silent truncation when the two
iterables have different lengths, ensuring both the list of system keys and the
mixed_systems values are properly paired without loss of data.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 421fcd2f-211c-40f2-be26-491b20803e3a

📥 Commits

Reviewing files that changed from the base of the PR and between 7d75096 and 9ad348f.

📒 Files selected for processing (5)
  • dpdata/formats/deepmd/hdf5.py
  • dpdata/formats/deepmd/mixed.py
  • dpdata/plugins/deepmd.py
  • dpdata/system.py
  • tests/test_deepmd_hdf5.py

Comment thread dpdata/plugins/deepmd.py Outdated
@njzjz njzjz force-pushed the feat/deepmd-hdf5-mixed branch from 9ad348f to ad097b0 Compare June 20, 2026 14:59

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dpdata/system.py`:
- Around line 1425-1431: The zip() call that iterates over the results of
fmtobj.to_multi_systems() and mixed_systems.values() lacks strict parameter
checking, which means if these two iterables have different lengths, the shorter
one will be silently truncated without raising an error. This could cause some
systems in mixed_systems to not be written to disk. Add strict=True as a
parameter to the zip() function call to ensure both iterables have exactly the
same length and raise a ValueError if they don't match, preventing silent data
loss.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d0060ce5-418b-4324-a0a4-39d8ea14fe81

📥 Commits

Reviewing files that changed from the base of the PR and between 9ad348f and ad097b0.

📒 Files selected for processing (5)
  • dpdata/formats/deepmd/hdf5.py
  • dpdata/formats/deepmd/mixed.py
  • dpdata/plugins/deepmd.py
  • dpdata/system.py
  • tests/test_deepmd_hdf5.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • dpdata/formats/deepmd/hdf5.py
  • dpdata/formats/deepmd/mixed.py
  • dpdata/plugins/deepmd.py

Comment thread dpdata/system.py
@njzjz njzjz force-pushed the feat/deepmd-hdf5-mixed branch from ad097b0 to 94837fe Compare June 20, 2026 15:07
@njzjz njzjz closed this Jun 20, 2026
@njzjz njzjz reopened this Jun 20, 2026
@njzjz njzjz requested review from iProzd and wanghan-iapcm June 20, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepmd DeePMD-kit format dpdata enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant