Refactor DepMapData.DataNamespace into reusable CancerDataNamespace, add custom dataset injection, and exclude non-core datasets from built-in defaults#4
Conversation
Agent-Logs-Url: https://github.com/FunGI-cap/CanDI/sessions/1468b644-509d-41a1-8355-ca1f3b3216e2 Co-authored-by: abearab <53412130+abearab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/FunGI-cap/CanDI/sessions/1468b644-509d-41a1-8355-ca1f3b3216e2 Co-authored-by: abearab <53412130+abearab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/FunGI-cap/CanDI/sessions/1468b644-509d-41a1-8355-ca1f3b3216e2 Co-authored-by: abearab <53412130+abearab@users.noreply.github.com>
|
@copilot There are two datasets I need you to exclude from the main code. Since these are not part of main DepMap downloader API so I think it's best if users can add them manually using new functionalities introduced in this PR
|
Agent-Logs-Url: https://github.com/FunGI-cap/CanDI/sessions/f4ed5b47-7f64-4db2-8600-a06b1359e493 Co-authored-by: abearab <53412130+abearab@users.noreply.github.com>
Implemented in commit |
DepMapData.DataNamespace and add first-class custom dataset injectionDepMapData.DataNamespace, add custom dataset injection, and exclude non-core datasets from built-in defaults
|
@copilot make the |
Agent-Logs-Url: https://github.com/FunGI-cap/CanDI/sessions/efaf926c-0dc8-46b3-9b53-743dfd556ecd Co-authored-by: abearab <53412130+abearab@users.noreply.github.com>
Done in commit |
DepMapData.DataNamespace, add custom dataset injection, and exclude non-core datasets from built-in defaultsDepMapData.DataNamespace into reusable CancerDataNamespace, add custom dataset injection, and exclude non-core datasets from built-in defaults
There was a problem hiding this comment.
Pull request overview
This PR refactors DepMap dataset namespace handling to make .data attribute access reusable across cancer data backends, adds APIs to inject in-memory datasets at runtime, and narrows the built-in DepMap default dataset set to core datasets only.
Changes:
- Introduces a reusable
CancerDataNamespacebase class (attribute access,add(),__dir__) and makesDepMapData.DataNamespaceinherit from it. - Adds
DepMapData.add_dataset()and expandslist_available()to include runtime-injected datasets. - Removes
CRISPRScreenMap/OmicsProteinAbundancefrom built-in DepMap defaults and from “essential” downloads/paths/loading.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
candi/data/depmap.py |
Refactors DataNamespace to inherit from shared base, adds dataset injection/listing, and removes non-core built-in datasets from defaults/downloads. |
candi/data/_database.py |
Adds CancerDataNamespace base implementation for attribute-style dataset access + runtime injection. |
candi/data/__init__.py |
Minor formatting/cleanup of exports line. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
DepMapData.DataNamespacewas structurally minimal and only supported read access to pre-defined datasets. This update modernizes namespace handling, introduces a reusable parent namespace class for future database integrations, adds explicit APIs for users to attach new in-memory datasets directly to aDepMapDatainstance, and narrows built-in DepMap defaults to core datasets only.Reusable namespace base for broader data sources
candi/data/cancer_database.pyasCancerDataNamespace.DepMapData.DataNamespaceto inherit fromCancerDataNamespace, preserving DepMap-specific typed dataset annotations while making the core behavior reusable for other database handlers.Namespace structure and ergonomics
__slots__, typed access behavior, clearer docstring).__dir__support so interactive discovery includes both file-backed and runtime-added datasets.Custom dataset injection
DataNamespace.add(name, dataset, overwrite=False)with input validation:__setattr__(except internal_parent).DepMapData.add_dataset(name, dataset, overwrite=False)as a top-level convenience API.Availability semantics
list_available()to return both predefined dataset names and user-added runtime datasets.Built-in dataset scope correction
OmicsProteinAbundanceandCRISPRScreenMapfrom built-in/default DepMap dataset definitions.CRISPRScreenMap.csvfromDepMapAPI.download_essential().