Genomic resources and resource repositories

The GPF system uses genomic resources such as reference genomes, gene models, genomic scores, etc. These resources are provided by resource repositories which can be accessed remotely or locally. The system can use multiple repositories at a time.

Genomic resources and resource repositories are fundamentally a collection of directories and files with special YAML configurations.

The following documentation will explain what genomic resources are available and how they can be configured, how resource repositories are configured and discovered by the system, and a short tutorial on creating a local repository with a custom resource.

We also prepared an extensive demo to help individuals get started with their own GRR (https://github.com/iossifovlab/mini_grr).

Genomic resources

A genomic resource is a directory containing a special genomic_resource.yaml configuration and an arbitrary number of files. Additionally, GPF will create additional files (.MANIFEST, the .grr subdirectory) which are used internally to track changes to the resource.

genomic_resource.yaml

type: <genomic resource type>
# ...
meta:
    description: <resource description>
    summary: <resource summary>
    labels:
        <custom label>: <custom label value>
        # ...

This is the configuration file for a genomic resource. Directories containing this file will be treated as genomic resources by the system. It must be named genomic_resource.yaml, as this is how the system will search for it.

Below are some the common fields that can be found in every config. Depending on the resource type, other fields may be present.

Field	Description
type	String. Sets the type of the resource.
meta	Subsection. Contains fields with information about the resource.
labels	Dictionary. Can contain arbitrary key/values.

Below are the fields in the meta section:

Field	Description
description	String. Description of the resource.
summary	String. Short summary of the resource.

Types of genomic resources and their configurations

Genomic scores

Field	Description
type	One of `position_score`, `np_score`, `allele_score`.
table	Subsection. Describes the file containing the scores, what columns/fields are present in it, etc.
scores	List of dictionaries that describes each score column available in the resource.
default_annotation	Subsection. The default annotation configuration to use with this resource.

Gene models

Field	Description
type	`gene_models`
filename	String. Path to the models file. Relative to the resource directory.
format	String. Sets the expected format of the gene models. One of `default`, `refflat`, `refseq`, `ccds`, `knowngene`, `gtf`, `ucscgenepred`.

Reference genome

Field	Description
type	`genome`
filename	String. Path to the genome file. Relative to the resource directory.
PARS	Subsection. Configures the pseudoautosomal regions of the genome.
chrom_prefix	String. Configures the prefix contig names are expected to have in the genome.

The format for the PARS subsection is as follows:

PARS:
  "X":
      - "chrX:10000-2781479"
      - "chrX:155701382-156030895"
  "Y":
      - "chrY:10000-2781479"
      - "chrY:56887902-57217415"

Liftover chain

Field	Description
type	`liftover_chain`
filename	String. Path to the chain file. Relative to the resource directory.

Annotation pipeline

Field	Description
type	`annotation_pipeline`
filename	String. Path to the annotation configuration file. Relative to the resource directory.

Histograms and statistics

Each resource type defines a set of statistics that can be calculated for the resource. These statistics are calculated by the grr_manage command line tools and stored in the resource directory under statistics subdirectory.

For genomic and gene score resources the grr_manage command line tool calculates and draws histograms for each of the scrores defined in the resource.

Here were are going to describe the common behavior for calculation and drawing of histograms for genomic and gene score resources. Other statistics are specific for the resource type and should be described in the resource type documentation.

Histograms

Histograms are calculated for each of the scores defined in a gene score or genomic score resource. The GPF supports three types of histograms:

NumberHistogram - supported for scores of type int and float. By default the histogram is calculated with 100 bins and is linear on both axes.
CategoricalHistogram - supported for scores of type str and int. This is a histogram that shows the distribution of the unique values in the score. It is supported only for scores with less than 100 unique values.
NullHistogram - this histogram type defines a missing histogram. It is used when calculating a histogram is not possible or does not make sense.

Number Histograms Configuration

For each score defined in a genomic or gene score resource genomic_resource.yaml file a histogram configuration can be defined. The number histogram configuration supports the following fields:

type - the type of the histogram. This should be set to number.
number_of_bins - the number of bins in the histogram. By default this is set to 100.
view_range - the range of values that areshown in the histogram. This range could differ from the actual range of the score values. This is useful for adjustements of the histogram view.
y_log_scale - if set to True the y axis of the histogram will be logarithmic.
x_log_scale - if set to True the x axis of the histogram will be logarithmic.
x_min_log - when x_log_scale is set to True this value defines the minimum value of the x axis.
plot_function - user defined plot function. When the default plot function is not suitable for the score, a user defined function can be used.

Example 1: Number histogram configuration

Here is a full example of a number histogram configuration comming from the hg38/score/phyloP100way genomic score resource:

type: position_score

table:
filename: hg38.phyloP100way.bw
header_mode: none   # this makes no sense and should be removed

# score values
scores:
- id: phyloP100way
  type: float
  desc: "The score is a number that reflects the conservation at a position."
  large_values_desc: "more conserved"
  small_values_desc: "less conserved"
  index: 3    # this makes no sense and should be removed
  histogram:
    type: number
    number_of_bins: 100
    view_range:
        min: -20.0
        max: 10.0
    y_log_scale: True

Example 2: Number histogram configuration

Here is a full example of a number histogram configuration comming from the hg38/variant_frequencies/gnomAD_v3 genomic score resource:

type: allele_score

table:
  filename: gnomad.genomes.r3.0.extract.tsv.gz
  format: tabix

  chrom:
    name: CHROM
  pos_begin:
    name: POS
  pos_end:
    name: POS
  reference:
    name: REF
  alternative:
    name: ALT

scores:
  ...

  - id: AF
    name: AF
    type: float
    desc: "Alternative allele frequency in the all gnomAD v3.0 genome samples."
    histogram:
      type: number
      number_of_bins: 126
      view_range:
        min: 0.0
        max: 1.0
      y_log_scale: True
      x_log_scale: True
      x_min_log: 0.00001

  ...

Categorical Histograms Configuration

Categorical histograms are suitable for scores that have limited (less than 100) number of unique values. By default the values are displayed in the order of their frequency. By default the top 20 values are displayed in the histogram. Other values are grouped into the Other category.

The categorical histogram configuration supports the following fields:

type - the type of the histogram. This should be set to categorical.
y_log_scale - if set to True the y axis of the histogram will be logarithmic.
displayed_values_count - the number of unique values that will be displayed in the histogram. Default value for this field is 20. The rest of the values are grouped into the Other category.
displayed_values_percent - the percentage of total mass of unique values that will be displayed. Other values are grouped into the Other category. Only one of displayed_values_count and displayed_values_percent can be set.
value_order - the order in which the unique values are displayed in the histogram.
plot_function - user defined plot function. When the default plot function is not suitable for the score, a user defined function can be used.

Example 1: Categorical histogram configuration

Here is a full example of a number and categorical histogram configuration comming from the hg38/scores/AlphaMissense genomic score resource:

type: np_score

table:
  filename: AlphaMissense_hg38_modified.tsv.gz
  format: tabix

  chrom:
    name: chrom
  pos_begin:
    name: pos
  pos_end:
    name: pos
  reference:
    name: ref
  alternative:
    name: alt

scores:
  - id: am_pathogenicity
    name: am_pathogenicity
    type: float
    desc: |
      AlphaMissense Pathogenicity score is a deleteriousness score for missense variants
    large_values_desc: "more pathogenic"
    small_values_desc: "less pathogenic"
    histogram:
      type: number
      number_of_bins: 100
      view_range:
        min: 0.0
        max: 1.0
      y_log_scale: True

  - id: am_class
    name: am_class
    type: str
    desc: |
      AlphaMissense Class is a deleteriousness category for missense variants
    histogram:
      type: categorical
      y_log_scale: True

Example 2: Categorical histogram configuration

Here is an example of a categorical histogram configuration displaying usage of plot_function, displayed_values_count, and displayed_values_percent fields. Note that plot_function uses the following format: <python module>:<python function>. The path to the python module should be relative to the resource directory.

type: allele_score
table:
  filename: clinvar_20221105_chr.vcf.gz
  index_filename: clinvar_20221105_chr.vcf.gz.tbi
scores:
  - id: CLNSIG
    name: CLNSIG
    type: str
    desc: |
      Clinical significance for this single variant; multiple values
      are separated by a vertical bar
    histogram:
      type: categorical
      y_log_scale: True
      plot_function: "clinvar_plots.py:plot_clnsig"
  - id: CLNREVSTAT
    name: CLNREVSTAT
    type: str
    desc: |
      ClinVar review status for the Variation ID
    histogram:
      type: categorical
      y_log_scale: True
      displayed_values_count: 35
  - id: CLNVC
    name: CLNVC
    type: str
    desc: |
      Variant type
    histogram:
      type: categorical
      y_log_scale: True
      displayed_values_percent: 85.0

Here is the content of the clinvar_plots.py file:

from typing import IO
from dae.genomic_resources.histogram import CategoricalHistogram
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use("agg")


def plot_clnsig(
    outfile: IO,
    histogram: CategoricalHistogram,
    xlabel: str,
    _small_values_description: str | None = None,
    _large_values_description: str | None = None,
) -> None:
    """Plot histogram and save it into outfile."""
    # pylint: disable=import-outside-toplevel
    values = list(sorted(histogram.raw_values.items(), key=lambda x: -x[1]))
    values = [v for v in values if "|" not in v[0]]
    labels = [v[0] for v in values]
    counts = [v[1] for v in values]

    plt.figure(figsize=(40, 80), tight_layout=True)
    _, ax = plt.subplots()
    ax.bar(
        x=labels,
        height=counts,
        tick_label=[str(v) for v in labels],
        log=histogram.config.y_log_scale,
        align="center",
    )
    plt.xlabel(f"\n{xlabel}")
    plt.ylabel("count")
    plt.tick_params(axis="x", labelrotation=90, direction="out")
    plt.tight_layout()
    plt.savefig(outfile)
    plt.clf()

Null Histograms Configuration

Null histograms are used when calculating a histogram is not possible or does not make sense. The null histogram configuration supports the following fields:

type - the type of the histogram. This should be set to null.
reason - the reason why the histogram is disabled. This field is required.

Example: Null histogram configuration

type: allele_score

table:
  filename: clinvar_20221105_chr.vcf.gz
  index_filename: clinvar_20221105_chr.vcf.gz.tbi

scores:
- id: RS
  name: RS
  type: str
  desc: dbSNP ID (i.e. rs number)
  histogram:
    type: "null"
    reason: "Histogram is not available for this score."

Resource repositories

Resource repositories are collections of genomic resources hosted either locally or remotely.

Repository discovery

The GPF system will by default look for a .grr_definition.yaml file in the home directory of your user.

Alternatively, the system will use a repository configuration file pointed to by the GRR_DEFINITION_FILE environment variable if it has been set.

Finally, most CLI tools that use GRRs have a --grr <filename> argument that overrides the defaults.

To configure the GRRs to be used by default for your user, you can create the file ~/.grr_definition.yaml. An example of what the contents of this file can be is:

id: "development"
type: group
children:
- id: "grr_local"
  type: "directory"
  directory: "~/my_grr"

- id: "default"
  type: "url"
  url: "https://grr.iossifovlab.com"
  cache_dir: "~/default_grr_cache"

Repository configuration

Field	Description
id	String. The id of the repository.
type	String. One of `directory`, `http`, `url`, `embedded` or `group`. These values are explained below.
children	List of repository configurations for `group` type repositories’ children.
url	String. URL of the remote repository for `http` and `url` type repositories.
directory	String. Path to the directory of resources for `directory` type repositories.
content	Dictionary describing files and directories for `embedded` type repositories. Directories’ values are further nested dictionaries, while files’ values are the file contents.
cache_dir	String. Path to a directory in which the resources from this repository will be cached.

directory: A local filesystem repository.
http: A remote HTTP repository.
url: A remote S3 repository.
embedded: An in-memory repository.
group: A group of a number of repositories.

Caching of repositories

When a repository is configured with a cache_dir option, it will cache resources locally before using them. It is significantly faster to use cached resources, but it takes some time to cache them the first time they are used and they occupy substantial disk space.

Management of resources and repositories with CLI tools

The GPF system provides two CLI tools for management of genomic resources and repositories. Their usage is outlined below:

grr_manage

$ grr_manage --help
usage: grr_manage [-h] [--version] [--verbose]
                  {list,repo-init,repo-manifest,resource-manifest,repo-stats,resource-stats,repo-info,resource-info,repo-repair,resource-repair}
                  ...

Genomic Resource Repository Management Tool

positional arguments:
  {list,repo-init,repo-manifest,resource-manifest,repo-stats,resource-stats,repo-info,resource-info,repo-repair,resource-repair}
                        Command to execute
    list                List a GR Repo
    repo-init           Initialize a directory to turn it into a GRR
    repo-manifest       Create/update manifests for whole GRR
    resource-manifest   Create/update manifests for a resource
    repo-stats          Build the statistics for a resource
    resource-stats      Build the statistics for a resource
    repo-info           Build the index.html for the whole GRR
    resource-info       Build the index.html for the specific resource
    repo-repair         Update/rebuild manifest and histograms whole GRR
    resource-repair     Update/rebuild manifest and histograms for a resource

options:
  -h, --help            show this help message and exit
  --version             Prints GPF version and exists.
  --verbose, -v, -V

grr_browse

$ grr_browse --help
usage: grr_browse [-h] [--version] [--verbose] [-g GRR] [--bytes]

Genomic Resource Repository Browse Tool

options:
  -h, --help         show this help message and exit
  --version          Prints GPF version and exists.
  --verbose, -v, -V
  --bytes            Print the resource size in bytes

Repository/Resource:
  -g GRR, --grr GRR  path to GRR definition file.

Tutorial: Create a local repository with a custom resource

The genomic resource is a set of files stored in a directory. To make given directory a genomic resource, it should contain genomic_resource.yaml file.

A genomic resources repository is a directory that contains genomic resources. To make a given directory into a repository, it should have a .CONTENTS file.

Create an empty GRR

To create and empty GRR first create an empty directory. For example let us create an empty directory named grr_test, enter inside that directory and run grr_manage repo-init command:

mkdir grr_test
cd grr_test
grr_manage repo-init

After that the directory should contain an empty .CONTENTS file:

ls -a

.  ..  .CONTENTS

If we try to list all resources in this repository we should get an empty list:

grr_manage list

Create an empty genomic resource

Let us create our first genomic resource. Create a directory hg38/scores/score9 inside grr_test repository and create an empty genomic_resource.yaml file inside that directory:

mkdir -p hg38/scores/score9
cd hg38/scores/score9
touch genomic_resource.yaml

This will create an empty genomic resource in our repository with ID hg38/scores/score9.

If we list the resources in our repository we would get:

grr_manage list

working with repository: .../grr_test
Basic                0        1            0 hg38/scores/score9

When we create or change a resource we need to repair the repository:

grr_manage repo-repair

This command will create a .MANIFEST file for our new resource hg38/scores/score9 and will update the repository .CONTENTS to include the resource.

Add genomic score resources

Add all score resource files (score file and Tabix index) inside the created directory hg38/scores/score9. Let’s say these files are:

score9.tsv.gz
score9.tsv.gz.tbi

Configure the resource hg38/scores/score9. To this end create a genomic_resource.yaml file, that contains the position score configuration:

type: position_score
table:
  filename: score9.tsv.gz
  format: tabix

  # defined by score_type
  chrom:
    name: chrom
  pos_begin:
    name: start
  pos_end:
    name: end

# score values
scores:
- id: score9
    type: float
    desc: "score9"
    index: 3
histograms:
- score: score9
  bins: 100
  y_scale: "log"
  x_scale: "linear"
default_annotation:
  attributes:
  - source: score9
    destination: score9
meta: |
## score9
  TODO

When ready you should run grr_manage resource-repair from inside resource directory:

cd hg38/scores/score9
grr_manage resource-repair

This command is going to calculate histograms for the score (if they are configured) and create or update the resource manifest.

Once the resource is ready we need to regenerate the repository contents:

grr_manage repo-repair

Genomic position table configuration

Table configuration fields

filename

Path to the file containing the data, relative to the genomic resource’s directory.

format

Format of the file configured in filename. Currently supported formats are tabix, vcf_info, tsv, csv and bw. Auto-detection of the format works for the following filename extensions:

Extension	Format
.bgz	tabix
.vcf.gz	vcf_info
.txt, .txt.gz, .tsv, .tsv.gz	tsv
.csv, .csv.gz	csv
.bw	bw

header_mode

The default value is file.

Value	Effect
file	Will attempt to extract a header from the provided file.
list	Will take the list of strings provided with the configuration field `header` as header.
none	No header. Columns will only be able to be configured via index.

header

Used for providing a header when header_mode is set to list. Example:

header_mode: list
header: ["chrom", "start", "end", "score_value"]

chrom_mapping

Allows transformation of the values in the chromosome column. Three options are available:

add_prefix

Takes a string value and adds it as a prefix.

del_prefix

Takes a string value to remove from the start of each chromosome.

filename

Takes a filepath, relative to the genomic resource’s directory. The file’s contents must contain two columns delimited by whitespace. The first line must be the header, containing chrom and file_chrom as values. The file_chrom column contains values that will be found in the file, while the chrom column contains what they will be mapped to. An example is given below:

chrom           file_chrom
Chromosome_1    1
Chromosome_22   22

{column}

Generic configuration for a column in the genomic position table.

column_name: Takes a string value. The name of the column as it appears in the file’s header. Cannot be used if no header has been provided for the table.
column_index: Takes an integer value. The index of the column in the file.
name: Deprecated version of column_name.
index: Deprecated version of column_index.

chrom

Column configuration for the chromosome column. See explanation for {column} above.

pos_begin

Column configuration for the start position column. See explanation for {column} above.

pos_end

Column configuration for the end position column. See explanation for {column} above.

reference

Column configuration for the reference column. See explanation for {column} above.

alternative

Column configuration for the alternative column. See explanation for {column} above.

Score configuration fields

id: Takes a string value. The identifier the system will use to refer to this score column in annotation configurations.
type: Type of the column’s values. Takes one of the following values - str, float, int.
column_name: Takes a string value. The name of the column as it appears in the file’s header. Cannot be used if no header has been provided for the table.
column_index: Takes an integer value. The index of the column in the file.
name: Deprecated version of column_name.
index: Deprecated version of column_index.
desc: A string describing the score column.
na_values: Takes a string or list of strings value. Which score values to consider as na.
histogram: Histogram configuration. See Histograms and statistics for more info.

Auto generated score definition

VCF files provide enough information to allow automatic generation of score definitions. These definitions can be overriden manually if necessary, either partially or fully.

Example VCF file:

##fileformat=VCFv4.1
##INFO=<ID=A,Number=1,Type=Integer,Description="Score A">
#CHROM POS ID REF ALT QUAL FILTER  INFO
chr1   5   .  A   T   .    .       A=1

Score A will get auto generated score definition as if created by configuration like this:

scores:
- id: A
  type: int
  column_name: A
  desc: Score A

Some fields cannot be automatically generated. Use overriding to add more fields or change existing auto generated fields. Define manually which score definitions should be overriden by first specifying the score id, then add new fields (like histogram) or override existing auto generated (like type):

scores:
- id: A
  type: float
  histogram:
    type: categorical
    value_order: ["alpha", "beta"]

The resulting score definition with updated type and added histogram will be equivalent to the following configuration:

scores:
- id: A
  type: float
  column_name: A
  desc: Score A
  histogram:
    type: categorical
    value_order: ["alpha", "beta"]

How VCF types correspond to our types

VCF

GPF

Integer

int

Float

float

String

str

Flag

bool

Zero-based / BED format scores

table:
  filename: data.txt.gz
  format: tabix
  zero_based: True
scores:
- id: score_1
  name: score 1
  type: float

The zero_based argument controls how the score file will be read.

Setting it to true will read the score as a BED-style format - with 0-based, half-open intervals.

By default it is set to false, which will read the score in GPF’s internal format - with 1-based, closed intervals.

Example configurations

Example table configuration for a genomic score resource. This configuration is embedded in the score’s genomic_resource.yaml config.

# Example genomic_resource.yaml for an NP score resource.

table:
  filename: whole_genome_SNVs.tsv.gz
  format: tabix

  # how to modify the values found when reading the chromosome column
  chrom_mapping:
    add_prefix: chr

  # configuration for essential columns
  chrom:
    name: Chrom
  pos_begin:
    name: Pos
  reference:
    name: Ref
  alternative:
    name: Alt

# score values
scores:
  - id: cadd_raw
    type: float
    name: RawScore
    desc: |
      CADD raw score for functional prediction of a SNP. The larger the score
      the more likely the SNP has damaging effect
    large_values_desc: "more damaging"
    small_values_desc: "less damaging"
    histogram:
      type: number
      number_of_bins: 100
      view_range:
        min: -8.0
        max: 36.0
      y_log_scale: True

# Example genomic_resource.yaml for a position score resource with multiple scores
# with different histogram configurations.

table:
  filename: scorefile.tsv.gz
  format: tabix

  # configuration for essential columns
  chrom:
    name: chromosome
  pos_begin:
    name: start
  pos_end:
    name: stop

# score values
scores:
  # float score
  - id: score_A
    type: float
    name: NumericScore
    number_hist:
      number_of_bins: 120
      view_range:
        min: -10.0
        max: 225.0
      x_log_scale: True
      x_min_log: 0.05
  # integer score
  - id: score_B
    type: int
    name: IntegerScore
    number_hist:
      number_of_bins: 10
  # string score with categorical histogram
  - id: score_C
    type: str
    name: CategoricalScore
    histogram:
      type: categorical
      value_order: ["alpha", "beta", "gamma", "delta"]
  # string score with no histogram
  - id: score_D
    type: str
    name: WeirdScore
    histogram:
      type: null
      reason: "Don't care about this score"

# Example bigWig score configuration.

type: position_score

table:
  filename: hg38.phyloP7way.bw
  # header mode must be set to none for bigWig scores
  header_mode: none

# currently, it's necessary to explicitly configure the score with its index set to 3
scores:
  - id: phyloP7way
    type: float
    column_index: 3

default_annotation:
  - source: phyloP7way
    name: phylop7way

How to generate tabix files

Note - in order to use tabix, the score file must already be compressed using bgzip.

$ tabix --help

Version: 1.22.1
Usage:   tabix [OPTIONS] [FILE] [REGION [...]]

Indexing Options:
   -0, --zero-based           coordinates are zero-based
   -b, --begin INT            column number for region start [4]
   -c, --comment CHAR         skip comment lines starting with CHAR [null]
   -C, --csi                  generate CSI index for VCF (default is TBI)
   -e, --end INT              column number for region end (if no end, set INT to -b) [5]
   -f, --force                overwrite existing index without asking
   -m, --min-shift INT        set minimal interval size for CSI indices to 2^INT [14]
   -p, --preset STR           gff, bed, sam, vcf, gaf
   -s, --sequence INT         column number for sequence names (suppressed by -p) [1]
   -S, --skip-lines INT       skip first INT lines [0]

Querying and other options:
   -h, --print-header         print also the header lines
   -H, --only-header          print only the header lines
   -l, --list-chroms          list chromosome names
   -r, --reheader FILE        replace the header with the content of FILE
   -R, --regions FILE         restrict to regions listed in the file
   -T, --targets FILE         similar to -R but streams rather than index-jumps
   -D                         do not download the index file
       --cache INT            set cache size to INT megabytes (0 disables) [10]
       --separate-regions     separate the output by corresponding regions
       --verbosity INT        set verbosity [3]
   -@, --threads INT          number of additional threads to use [0]

$ bgzip --help

Version: 1.22.1
Usage:   bgzip [OPTIONS] [FILE] ...
Options:
   -b, --offset INT           decompress at virtual file pointer (0-based uncompressed offset)
   -c, --stdout               write on standard output, keep original files unchanged
   -d, --decompress           decompress
   -f, --force                overwrite files without asking
   -g, --rebgzip              use an index file to bgzip a file
   -h, --help                 give this help
   -i, --index                compress and create BGZF index
   -I, --index-name FILE      name of BGZF index file [file.gz.gzi]
   -k, --keep                 don't delete input files during operation
   -l, --compress-level INT   Compression level to use when compressing; 0 to 9, or -1 for default [-1]
   -o, --output FILE          write to file, keep original files unchanged
   -r, --reindex              (re)index compressed file
   -s, --size INT             decompress INT bytes (uncompressed size)
   -t, --test                 test integrity of compressed file
       --binary               Don't align blocks with text lines
   -@, --threads INT          number of compression threads to use [1]

Example usage of `tabix`

For a VCF-format score:

$ tabix -p vcf score.vcf.gz

For a 1-based TSV score with a single position column:

$ tabix -s 1 -b 2 score.tsv.gz

For a 1-based TSV score with start and stop position columns:

$ tabix -s 1 -b 2 -e 3 score.tsv.gz

For a 0-based TSV score with start and stop position columns:

$ tabix -0 -s 1 -b 2 -e 3 score.tsv.gz