Skip to content

Add bibliographic ETL pipeline for WoS-style standardization#19

Open
saniatulnaimahuq wants to merge 1 commit into
PRAISELab-PicusLab:mainfrom
saniatulnaimahuq:bibliometrix-etl-pipeline
Open

Add bibliographic ETL pipeline for WoS-style standardization#19
saniatulnaimahuq wants to merge 1 commit into
PRAISELab-PicusLab:mainfrom
saniatulnaimahuq:bibliometrix-etl-pipeline

Conversation

@saniatulnaimahuq

Copy link
Copy Markdown

This pull request adds a bibliographic ETL pipeline for converting heterogeneous bibliographic records into a Web of Science-like schema for Bibliometrix-Python.

Main additions:

  • Added a source-agnostic ETL pipeline under www/services/bibliometrix_etl
  • Added mapping dictionaries for local bibliographic files and OpenAlex records
  • Added extraction support for local files and OpenAlex API records
  • Added transformation logic for WoS-like field tags such as TI, AU, PY, SO, DI, AB, TC, and SR
  • Added cleaning support for DOI values, missing values, publication years, citation counts, and multi-value fields
  • Added validation module to check required columns, null values, and multi-value field handling
  • Added analysis validation script to test standardized outputs with bibliometric-style analyses
  • Added generated CSV outputs and validation reports as execution evidence

Validation completed successfully for:

  • Base-level local Scopus-like CSV pipeline
  • Advanced-level OpenAlex API pipeline

Terminal output confirmed:

PASSED: Standardized data is valid.
ANALYSIS VALIDATION COMPLETED
PROJECT EXECUTION COMPLETED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant