Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 31 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,18 @@ Scalable Ancestry Predictions from Genomic Data
3. [Installation](#install)
4. [Dependencies](#dependencies)
5. [Usage](#usage)
6. [Human ancestry predictions](#data)
7. [Demo](#demo)
8. [Documentation](#docs)
9. [Citing](#citing)
10. [License](#license)
6. [Interactive LAI HTML visualization](#lai)
7. [Human ancestry predictions](#data)
8. [Demo](#demo)
9. [Documentation](#docs)
10. [Citing](#citing)
11. [License](#license)

## Credit <a name=credit></a>
Written by René L Warren and Lauren Coombe

## Description <a name=description></a>
ntRoot is a framework for ancestry inference from genomic data, offering both Local Ancestry Inference (LAI) and Global Ancestry Inference (GAI). Leveraging integrated variant call sets from the 1000 Genomes Project (1kGP), ntRoot provides accurate predictions(1) of human super-population ancestry with speed and efficiency from Whole Genome Sequencing (WGS) datasets and complete or draft-stage Whole Genome Assemblies (WGA). Through streamlined processing and flexible genomic input, ntRoot holds promises for human ancestry inference of small-to-large patient/individual cohorts, enabling association studies with demographics and facilitating deeper insights into population genetics and disease risk factors.
ntRoot is a framework for ancestry inference from genomic data, offering both Local Ancestry Inference (LAI) and Global Ancestry Inference (GAI). Leveraging integrated variant call sets from the 1000 Genomes Project (1kGP), ntRoot provides accurate predictions(1) of human super-population ancestry with speed and efficiency from Whole Genome Sequencing (WGS) datasets and complete or draft-stage Whole Genome Assemblies (WGA). Through streamlined processing, flexible genomic input, and integrated local ancestry visualization, ntRoot holds promises for human ancestry inference of small-to-large patient/individual cohorts, enabling association studies with demographics and facilitating deeper insights into population genetics and disease risk factors.

(1) Tested on base-accurate quality data, including Illumina short read and newer nanopore (ONT, KitV14) & PacBio CCS HiFi long read datasets, complete reference genomes and polished, Oxford Nanopore Technology long read GoldRush, Flye and Shasta draft genome assemblies

Expand Down Expand Up @@ -93,6 +94,27 @@ Note: please specify --reads OR --genome (not both)
If you have any questions about ntRoot, please open an issue at https://github.com/birollab/ntRoot
```

## Interactive Local Ancestry (LAI) visualization <a name=lai></a>

When `--lai` is specified, ntRoot automatically generates an interactive HTML visualization of chromosomal local ancestry assignments as part of the Snakemake workflow.

The visualization enables interactive exploration of chromosome-specific and genome-wide ancestry patterns through linked chromosome, ancestry, and summary views.

The resulting visualization provides:

- Chromosome-length-scaled local ancestry tracks
- Global ancestry composition summaries
- Interactive exploration of chromosome-specific ancestry patterns
- Interactive exploration of genome-wide ancestry patterns
- Hover and click highlighting across linked views
- Standalone HTML output viewable in any modern web browser

Users can interactively explore ancestry patterns within individual chromosomes or across the entire genome using the generated standalone HTML report.

An example visualization is available here:
[Interactive LAI demo (HTML)](ntroot-lai-interactive_tile5000000.html)


## Human ancestry predictions <a name=data></a>

Using the 1kGP integrated variant call set.
Expand Down Expand Up @@ -163,6 +185,9 @@ cd demo
```
Ensure that the ntRoot installation is available on your PATH.

For an example interactive Local Ancestry Inference (LAI) report generated by ntRoot, see:
[Interactive LAI demo (HTML)](ntroot-lai-interactive_tile5000000.html)


## Documentation <a name=docs></a>

Expand Down
4,014 changes: 4,014 additions & 0 deletions ntroot-lai-interactive_tile5000000.html

Large diffs are not rendered by default.

18 changes: 15 additions & 3 deletions ntroot_run_pipeline.smk
Original file line number Diff line number Diff line change
Expand Up @@ -202,13 +202,25 @@ rule ancestry_prediction_lai:
vcf = "{vcf}",
ref_fai = f"{draft_base}.fai"
output:
lai_output = "{vcf}_ancestry-predictions-tile-resolution_tile{tile_size}.tsv"
lai_output = "{vcf}_ancestry-predictions-tile-resolution_tile{tile_size}.tsv",
html_output = "{vcf}_ntroot-lai-interactive_tile{tile_size}.html"
params:
benchmark = f"{time_command} ancestry_prediction_tile{tile_size}.time" if input_vcf_basename else f"{time_command} ancestry_prediction_k{k}_tile{tile_size}.time",
tile_size = tile_size,
verbosity = v
shell:
"{params.benchmark} ntRootAncestryPredictor.pl -f {input.vcf} -t {params.tile_size} -v {params.verbosity} -r 1 -i {input.ref_fai}"
"""
{params.benchmark} ntRootAncestryPredictor.pl \
-f {input.vcf} \
-t {params.tile_size} \
-v {params.verbosity} \
-r 1 \
-i {input.ref_fai}

plot_ntroot_lai.py \
{output.lai_output} \
{output.html_output}
"""

rule sort_vcf_input:
input: vcf = f"{input_vcf}"
Expand Down Expand Up @@ -250,4 +262,4 @@ rule cross_reference_vcf:
prefix=f"{input_vcf_basename}.cross-ref",
strip = "--strip" if strip_info else ""
shell:
"{params.benchmark} ntroot_cross_reference_vcf.py -b {input.bedtools} --vcf {input.vcf} --vcf_l {input.ref_vars} -p {params.prefix} {params.strip}"
"{params.benchmark} ntroot_cross_reference_vcf.py -b {input.bedtools} --vcf {input.vcf} --vcf_l {input.ref_vars} -p {params.prefix} {params.strip}"
Loading