Skip to content

Refactor DepMapData.DataNamespace into reusable CancerDataNamespace, add custom dataset injection, and exclude non-core datasets from built-in defaults#4

Merged
abearab merged 9 commits into
masterfrom
copilot/update-datanamespace-structure
May 25, 2026
Merged

Refactor DepMapData.DataNamespace into reusable CancerDataNamespace, add custom dataset injection, and exclude non-core datasets from built-in defaults#4
abearab merged 9 commits into
masterfrom
copilot/update-datanamespace-structure

Conversation

Copilot AI commented May 24, 2026

Copy link
Copy Markdown

DepMapData.DataNamespace was structurally minimal and only supported read access to pre-defined datasets. This update modernizes namespace handling, introduces a reusable parent namespace class for future database integrations, adds explicit APIs for users to attach new in-memory datasets directly to a DepMapData instance, and narrows built-in DepMap defaults to core datasets only.

  • Reusable namespace base for broader data sources

    • Extracted shared namespace behavior into candi/data/cancer_database.py as CancerDataNamespace.
    • Updated DepMapData.DataNamespace to inherit from CancerDataNamespace, preserving DepMap-specific typed dataset annotations while making the core behavior reusable for other database handlers.
  • Namespace structure and ergonomics

    • Refactored namespace implementation with stronger class shape (__slots__, typed access behavior, clearer docstring).
    • Added __dir__ support so interactive discovery includes both file-backed and runtime-added datasets.
  • Custom dataset injection

    • Added DataNamespace.add(name, dataset, overwrite=False) with input validation:
      • non-empty string names only
      • valid Python identifier requirement (for attribute-style access)
      • overwrite protection unless explicitly enabled
    • Added attribute-assignment path for custom datasets via __setattr__ (except internal _parent).
    • Added DepMapData.add_dataset(name, dataset, overwrite=False) as a top-level convenience API.
  • Availability semantics

    • Updated list_available() to return both predefined dataset names and user-added runtime datasets.
  • Built-in dataset scope correction

    • Removed OmicsProteinAbundance and CRISPRScreenMap from built-in/default DepMap dataset definitions.
    • Removed CRISPRScreenMap.csv from DepMapAPI.download_essential().
    • These datasets are now expected to be added manually via the new custom dataset APIs when needed.
depmap = DepMapData(data_dir="...")

# explicit API
depmap.data.add("MyScores", my_df)

# convenience API on parent
depmap.add_dataset("MyAnnData", my_anndata)

# attribute-style injection
depmap.data.CustomPanel = panel_df

# attribute-style retrieval
depmap.data.MyScores

Copilot AI and others added 3 commits May 24, 2026 19:47
Agent-Logs-Url: https://github.com/FunGI-cap/CanDI/sessions/1468b644-509d-41a1-8355-ca1f3b3216e2

Co-authored-by: abearab <53412130+abearab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/FunGI-cap/CanDI/sessions/1468b644-509d-41a1-8355-ca1f3b3216e2

Co-authored-by: abearab <53412130+abearab@users.noreply.github.com>
@abearab

abearab commented May 24, 2026

Copy link
Copy Markdown
Member

@copilot There are two datasets I need you to exclude from the main code. Since these are not part of main DepMap downloader API so I think it's best if users can add them manually using new functionalities introduced in this PR

  1. DepMap proteomics data OmicsProteinAbundance: pd.DataFrame
  2. PRISM drug sensitivity data CRISPRScreenMap: ad.AnnData

Copilot AI commented May 24, 2026

Copy link
Copy Markdown
Author

@copilot There are two datasets I need you to exclude from the main code. Since these are not part of main DepMap downloader API so I think it's best if users can add them manually using new functiona...

Implemented in commit 3edc028. I removed OmicsProteinAbundance and CRISPRScreenMap from the built-in defaults (namespace hints, default dataset paths, and essential downloader list), so they can now be supplied manually through the new add-dataset functionality.

Copilot AI changed the title Refactor DepMapData.DataNamespace and add first-class custom dataset injection Refactor DepMapData.DataNamespace, add custom dataset injection, and exclude non-core datasets from built-in defaults May 24, 2026
Copilot AI requested a review from abearab May 24, 2026 19:59
@abearab

abearab commented May 25, 2026

Copy link
Copy Markdown
Member

@copilot make the DataNamespace a parental class for depmap class so it can be used for other type of databases. Maybe name it cancer database or something general so I can expand to other data sources. It's better if it's located in a different python file in data submodule in candi

Copilot AI commented May 25, 2026

Copy link
Copy Markdown
Author

@copilot make the DataNamespace a parental class for depmap class so it can be used for other type of databases. Maybe name it cancer database or something general so I can expand to other data sour...

Done in commit 118157b. I extracted a reusable parent namespace into candi/data/cancer_database.py as CancerDataNamespace and updated DepMapData.DataNamespace to inherit from it, so the same namespace base can be reused for other database handlers.

Copilot AI changed the title Refactor DepMapData.DataNamespace, add custom dataset injection, and exclude non-core datasets from built-in defaults Refactor DepMapData.DataNamespace into reusable CancerDataNamespace, add custom dataset injection, and exclude non-core datasets from built-in defaults May 25, 2026

@abearab abearab left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abearab abearab marked this pull request as ready for review May 25, 2026 02:22
@abearab abearab requested a review from Copilot May 25, 2026 02:23
Copilot AI requested a review from abearab May 25, 2026 02:23

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors DepMap dataset namespace handling to make .data attribute access reusable across cancer data backends, adds APIs to inject in-memory datasets at runtime, and narrows the built-in DepMap default dataset set to core datasets only.

Changes:

  • Introduces a reusable CancerDataNamespace base class (attribute access, add(), __dir__) and makes DepMapData.DataNamespace inherit from it.
  • Adds DepMapData.add_dataset() and expands list_available() to include runtime-injected datasets.
  • Removes CRISPRScreenMap/OmicsProteinAbundance from built-in DepMap defaults and from “essential” downloads/paths/loading.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.

File Description
candi/data/depmap.py Refactors DataNamespace to inherit from shared base, adds dataset injection/listing, and removes non-core built-in datasets from defaults/downloads.
candi/data/_database.py Adds CancerDataNamespace base implementation for attribute-style dataset access + runtime injection.
candi/data/__init__.py Minor formatting/cleanup of exports line.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread candi/data/_database.py
Comment thread candi/data/_database.py Outdated
Comment thread candi/data/_database.py
Comment thread candi/data/depmap.py
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

@abearab abearab left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abearab abearab merged commit 62ceb9c into master May 25, 2026
1 check passed
@abearab abearab deleted the copilot/update-datanamespace-structure branch May 25, 2026 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants