Skip to content

introduce unified ETL pipeline and API based data retrieval#18

Open
ShrooqAyman wants to merge 1 commit into
PRAISELab-PicusLab:mainfrom
ShrooqAyman:main
Open

introduce unified ETL pipeline and API based data retrieval#18
ShrooqAyman wants to merge 1 commit into
PRAISELab-PicusLab:mainfrom
ShrooqAyman:main

Conversation

@ShrooqAyman

Copy link
Copy Markdown

ETL Pipeline Integration and API-Based Data Retrieval

Introduces a unified ETL pipeline that standardizes bibliographic data from multiple sources into a single validated schema, decoupling data ingestion from source-specific formats and providing a consistent DataFrame structure regardless of the original database.


Key Changes

  • Added convert2df() as the single entry point for all data ingestion — handles extraction, mapping, normalization, and validation
  • Implemented source standardization for Scopus (CSV) and PubMed (TXT/MEDLINE); partial support for Dimensions
  • Enforced strict type contract on output DataFramePY as numeric, TC as integer, multi-value fields as list[str], no None or NaN in scalar fields
  • Added automatic SR field generation (First Author + Publication Year + Source)
  • Integrated OpenAlex and PubMed API retrievers with cursor-based pagination and retry/backoff logic
  • Added dedicated API search UI — users can query OpenAlex or PubMed directly without manual file exports
  • All ETL and API functionality is fully integrated and rendered in the UI; API results and file uploads share the same dashboard view and analysis sidebar
  • Fixed DataFrame compatibility issues in functions that expected Shiny reactive wrappers
  • Fixed TypeError in year arithmetic caused by string-typed PY values
  • Validated ETL output against: Annual Production, Average Citations, Main Information, Relevant Authors, Relevant Sources
WhatsApp Image 2026-06-15 at 16 25 56 image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant