Data Requirements
SeqUIaSCOPE accepts four types of genomic data: somatic variant calling results, germline variant calling results, fusion gene detection results, and gene expression profiles. Not all four are required — you can upload any combination depending on your analysis needs.
Before uploading, make sure your files follow the naming and format rules described on this page. The application automatically scans your directory for files and uses these rules to identify which file belongs to which patient and dataset type. Errors at upload time are almost always caused by naming issues.
Input Files Overview
.tsv — annotated somatic variants
.bam + .bai — for IGV
.bam + .bai — for IGV
mutation_loads.tsv — tumour mutational burden
.tsv — annotated germline variants
.bam + .bai — for IGV
.tsv or .xlsx — combined Arriba + STARFusion results
.bam + .bai — for IGV snapshots
.bam + .bai — for IGV snapshots
.pdf + .tsv — for expanded row preview
.tsv or .xlsx — gene expression data
Tip: When you have a choice, always prefer .tsv over .xlsx. TSV files load faster and are less likely to cause formatting issues.
Fusion gene detection requires results from both Arriba and STARFusion as upstream tools.
The required input file is a pre-processed table combining the output of both callers.
The Arriba .pdf and .tsv output files are separate and optional — they unlock the expanded row preview in the Fusion Genes module.
File Naming Rules
SeqUIaSCOPE identifies files by scanning the full file path (folder names + file name together). It looks for specific keywords anywhere in the path to decide what type each file is. This means a keyword can appear in the filename itself or in a parent folder name — both work.
Keyword Reference
| File type | Path must contain | Path must NOT contain |
|---|---|---|
| Somatic variant file | somatic |
— |
| Germline variant file | germline |
— |
| Fusion gene file |
fusion or fuze
|
arriba, STAR
|
| Tumor RNA BAM | user-defined pattern (set during upload) |
Chimeric, transcriptome
|
| Chimeric BAM | user-defined pattern (set during upload) | transcriptome |
| Arriba output files | arriba |
discarded, STAR
|
| Expression file |
expression or RNAseq
|
report, genes_of_interest
|
| TMB file | filename must be exactly mutation_loads
|
— |
The keyword can live anywhere in the path — these two examples both work:
project_root/patient_001/somatic_variants.tsv # keyword in filename
project_root/somatic_data/patient_001/variants.tsv # keyword in folder name
Patient ID in path: Every per-patient file must have the patient ID somewhere in its full path
(either in the filename or a parent folder). Files without the patient ID will not be matched.
The TMB file (mutation_loads) and the Genes of Interest file are the only exceptions —
they are shared across all patients and do not need a patient ID in the path.
BAM Files
BAM files don’t use the keyword system. Instead, you define patterns during upload
(e.g. tumor, FFPE, chimeric) and the application finds BAM files whose names match those patterns.
See Upload Data — Step 1 for details.
For every .bam file, a corresponding index file must exist in the same directory,
named either file.bam.bai or file.bai.
Arriba Output Files
If you provide Arriba output files, both the .pdf and .tsv must be present as a matched pair
(same base name, different extension). A .pdf without a matching .tsv will be ignored, and vice versa.
Expression Files with Multiple Tissues
When comparing against multiple reference tissues, provide one file per tissue.
The tissue name must appear in the filename (e.g. blood_expression.tsv, liver.tsv).
Use underscores instead of spaces — blood_vessel.tsv, not blood vessel.tsv.
Tissue names you enter in the upload form must match the names in the filenames.
If no tissue names are provided, the application looks for a single expression file per patient
containing expression or RNAseq in its path.
Directory Layouts
SeqUIaSCOPE is flexible — your data can be organised in many different ways as long as the naming rules above are respected. Below are three common layouts that all work correctly. The key principle is that the patient ID and dataset keyword must both appear somewhere in each file’s path.
Choose your root directory carefully. In the upload form you select a single root directory that contains all your patient data. Pick the most specific folder that still contains everything — selecting a very broad directory (e.g. your home folder) may cause the scanner to pick up unrelated files and confuse file matching.
Option A — one folder per data type
Best when your pipeline already separates outputs by analysis type.
project_root/ ← select this as root
├── somatic_data/
│ ├── mutation_loads.tsv
│ ├── patient_001/
│ │ ├── somatic_variants.tsv
│ │ ├── tumor.bam + tumor.bam.bai
│ │ └── normal.bam + normal.bam.bai
│ └── patient_002/ ...
├── germline_data/
│ ├── patient_001/
│ │ ├── germline_variants.tsv
│ │ └── normal.bam + normal.bam.bai
│ └── patient_002/ ...
├── fusion_data/
│ ├── patient_001/
│ │ ├── fusions.tsv
│ │ ├── fusion.bam + fusion.bam.bai
│ │ ├── chimeric.bam + chimeric.bam.bai
│ │ ├── arriba_report.pdf
│ │ └── arriba_results.tsv
│ └── patient_002/ ...
└── expression_data/
├── patient_001/
│ └── expression.tsv
└── patient_002/ ...
Option B — BAM files stored separately from analysis results
Common when primary (alignment) and secondary (variant/fusion calling) outputs live in separate trees.
project_root/ ← select this as root
├── alignments/
│ ├── DNA/
│ │ ├── patient_001/
│ │ │ ├── tumor.bam + tumor.bam.bai
│ │ │ └── normal.bam + normal.bam.bai
│ │ └── patient_002/ ...
│ └── RNA/
│ ├── patient_001/
│ │ ├── fusion.bam + fusion.bam.bai
│ │ └── chimeric.bam + chimeric.bam.bai
│ └── patient_002/ ...
└── results/
├── somatic_data/
│ ├── mutation_loads.tsv
│ ├── patient_001/ somatic_variants.tsv
│ └── patient_002/ ...
├── germline_data/
│ ├── patient_001/ germline_variants.tsv
│ └── patient_002/ ...
├── fusion_data/
│ ├── patient_001/
│ │ ├── fusions.tsv
│ │ ├── arriba_report.pdf
│ │ └── arriba_results.tsv
│ └── patient_002/ ...
└── expression_data/
├── patient_001/ expression.tsv
└── patient_002/ ...
Option C — all files flat per patient
Works for smaller projects or when a single pipeline writes everything into one folder per patient.
project_root/ ← select this as root
├── mutation_loads.tsv ← in root (shared across patients)
├── patient_001/
│ ├── somatic_variants.tsv
│ ├── germline_variants.tsv
│ ├── fusions.tsv
│ ├── tumor.bam + tumor.bam.bai
│ ├── normal.bam + normal.bam.bai
│ ├── fusion.bam + fusion.bam.bai
│ ├── chimeric.bam + chimeric.bam.bai
│ ├── arriba_report.pdf
│ ├── arriba_results.tsv
│ └── expression.tsv
└── patient_002/ ...
Required Columns
Each data file must contain a set of required columns. Column names are case-sensitive. Any additional columns in your file are allowed — they will appear in the table but without custom formatting or labels.
Somatic Variant File
| Column | Type | Description |
|---|---|---|
var_name |
string | Variant identifier |
gene_symbol |
string | Gene symbol |
tumor_variant_freq |
numeric | Variant allele frequency in tumour |
tumor_depth |
integer | Sequencing depth at variant position in tumour |
gene_region |
string | Genomic region (e.g. exon, intron, splice) |
gnomAD_NFE |
numeric | gnomAD Non-Finnish European allele frequency |
consequence |
string | Variant consequence (e.g. missense_variant, stop_gained) |
HGVSc |
string | HGVS coding sequence notation |
HGVSp |
string | HGVS protein sequence notation |
variant_type |
string | Variant type (e.g. SNV, insertion, deletion) |
all_full_annot_name |
string | Full annotation name |
Optional columns recognised with custom labels:
| Column | Description |
|---|---|
in_library |
Number of times var_name was observed in the project cohort (added automatically if absent) |
clinvar_sig |
ClinVar clinical significance |
clinvar_DBN |
ClinVar disease name |
CGC_Somatic |
Cancer Gene Census somatic annotation |
fOne |
fOne database annotation |
COSMIC |
COSMIC database annotation |
HGMD |
HGMD database annotation |
snpDB |
dbSNP annotation |
TMB File (mutation_loads)
| Column | Type | Description |
|---|---|---|
patient |
string | Patient ID — must match the IDs entered in the upload form |
TMB |
numeric | Tumour mutational burden value |
| patient | TMB |
|---|---|
| P001 | 0.17 |
| P002 | 0.74 |
Germline Variant File
| Column | Type | Description |
|---|---|---|
var_name |
string | Variant identifier |
gene_symbol |
string | Gene symbol |
variant_freq |
numeric | Variant allele frequency |
coverage_depth |
integer | Sequencing depth at variant position |
gene_region |
string | Genomic region (e.g. exon, intron, splice) |
gnomAD_NFE |
numeric | gnomAD Non-Finnish European allele frequency |
clinvar_sig |
string | ClinVar clinical significance |
consequence |
string | Variant consequence |
HGVSc |
string | HGVS coding sequence notation |
HGVSp |
string | HGVS protein sequence notation |
variant_type |
string | Variant type (e.g. SNV, insertion, deletion) |
all_full_annot_name |
string | Full annotation name |
Optional columns:
| Column | Description |
|---|---|
in_library |
Number of times var_name was observed in the project cohort (added automatically if absent) |
clinvar_DBN |
ClinVar disease name |
CGC_Germline |
Cancer Gene Census germline annotation |
trusight_genes |
TruSight gene panel annotation |
fOne |
fOne database annotation |
snpDB |
dbSNP annotation |
Fusion Gene File
This is a pre-processed file combining results from both Arriba and STARFusion.
Column names chrom1/chrom2 are accepted as alternatives to chr1/chr2 and will be renamed automatically.
| Column | Type | Description |
|---|---|---|
gene1 |
string | First fusion partner gene name |
gene2 |
string | Second fusion partner gene name |
chr1 (or chrom1) |
string | Chromosome of first partner (with chr prefix, e.g. chr2) |
chr2 (or chrom2) |
string | Chromosome of second partner |
pos1 |
numeric | Genomic position of first partner breakpoint |
pos2 |
numeric | Genomic position of second partner breakpoint |
strand1 |
string | Strand orientation of first partner (+ or -) |
strand2 |
string | Strand orientation of second partner (+ or -) |
arriba.called |
boolean | Whether Arriba called this fusion |
starfus.called |
boolean | Whether STARFusion called this fusion |
overall_support |
numeric | Overall read support for the fusion |
arriba.confidence |
string | Arriba confidence level (high, medium, or low) |
arriba.site1 |
string | Breakpoint site annotation for first partner |
arriba.site2 |
string | Breakpoint site annotation for second partner |
Optional columns:
| Column | Description |
|---|---|
DB_count |
Number of databases listing this fusion |
DB_list |
Database names |
arriba.split_reads |
Split reads supporting the fusion (Arriba) |
arriba.discordant_mates |
Discordant mate pairs (Arriba) |
arriba.break_coverage |
Coverage at first breakpoint (Arriba) |
arriba.break2_coverage |
Coverage at second breakpoint (Arriba) |
arriba.break_seq |
Sequence at breakpoint (Arriba) |
starfus.split_reads |
Split reads supporting the fusion (STARFusion) |
starfus.discordant_mates |
Discordant mate pairs (STARFusion) |
starfus.counter_fusion1 |
Counter fusion reads for first gene (STARFusion) |
starfus.counter_fusion2 |
Counter fusion reads for second gene (STARFusion) |
starfus.splice_type |
Splice junction type (STARFusion) |
starfus.break_seq |
Breakpoint sequence (STARFusion) |
Expression Profile File
| Column | Type | Description |
|---|---|---|
sample |
string | Sample ID |
feature_name |
string | Gene name |
geneid |
string | Gene ID (Ensembl, RefSeq, or other) |
all_kegg_gene_names |
string | KEGG gene names |
log2FC |
numeric | Log2 fold change (tumour vs. reference) |
p_value |
numeric | P-value |
p_adj |
numeric | Adjusted p-value |
Optional columns:
| Column | Description |
|---|---|
pathway |
Pathway name (retrieved automatically from KEGG if absent) |
refseq_id |
RefSeq gene ID |
type |
Gene type |
gene_definition |
Gene description |
num_of_paths |
Number of pathways the gene participates in |
Genes of Interest (GOI)
SeqUIaSCOPE ships with a built-in genes of interest list used for expression highlighting and
network visualisation. To use a custom list instead, provide a .tsv or .xlsx file and
update the path in reference_paths.json (see Configuration).
| Column | Required | Description |
|---|---|---|
gene |
✅ | Gene name |
pathway |
optional | Pathway name — retrieved from KEGG automatically if absent |
If the pathway column is provided, your own pathway classifications are used instead of KEGG.
| gene | pathway |
|---|---|
| BRCA1 | DNA damage/repair |
| TP53 | RTK Signalling |