Release Notes

  • 2026.6.0
    • grr_cache_repo now shows live progress while caching instead of a wall of per-file log lines. On a terminal it draws a byte-based tqdm bar (human-readable units, throughput and ETA) with a files=done/total and, when applicable, failed=N tally; off a terminal (e.g. a captured CI log) it instead emits throttled milestone INFO lines at a 0% baseline, every 10% crossing, and 100%. Per-file request/finished chatter is demoted to DEBUG. Caching first classifies the full work-list — logging a caching N file(s), B bytes to download; M already cached header — and then downloads only the files that are missing or stale, so a fully-cached re-run prints no bar. A failed download still advances the bar to 100% (credited with the file’s size and counted in failed=N) and is reported in the end-of-run summary. The --no-progress flag turns the indicator off entirely. tqdm is now a core runtime dependency.

    • Raised the GRR caching protocol’s application-level read-buffer (CHUNK_SIZE) from 32 KiB to 1 MiB. This cuts the per-chunk Python read/write/md5 and progress-callback overhead roughly 32× on multi-gigabyte resources, with negligible extra memory and no change to network behavior (fsspec still owns block-level fetching); small files are unaffected.

    • Revised the getting-started CLI and web tutorials, refreshed their figures, and updated the example variant files.

    • grr_cache_repo now takes the annotation pipeline as a positional argument (grr_cache_repo <pipeline>) instead of the --pipeline/-p flag, adopting the same CLIAnnotationContextProvider mechanism as annotate_columns and annotate_vcf. Backward-incompatible: scripts passing --pipeline/-p must drop the flag. When a GPF instance is also supplied (-i), an explicit positional pipeline still wins and the instance pipeline is used only as a fallback; with no pipeline from any source the command logs no pipeline supplied; nothing to cache and exits cleanly.

    • annotate_tabular and annotate_vcf now treat .bgz as a first-class compression extension on par with .gz, for both input and output. A .bgz input is read — and, when tabix-indexed, split by genomic region — exactly like a .gz one, and an explicit .bgz output suffix is now preserved (previously the output was always rewritten to .gz). When the output name omits a compression suffix, a compressed input’s suffix is mirrored onto it. This also fixes build_output_path mangling output names whose stem ended in g/z characters (e.g. log.gz), a side effect of the previous rstrip-based suffix handling. As a consequence, the default --work-dir name derived from a .bgz output changed (e.g. out.vcf.bgz now yields out_work rather than out.vcf_work). A run that was interrupted on an older version with a .bgz output and is resumed after upgrading will not find its old .task-status directory and will restart from scratch — point --work-dir at the previous <output-stem>.vcf_work path to continue from the checkpoint. (.gz outputs are unaffected.)

    • Fixed silent duplication of records when an annotate_tabular input was a compressed file carrying a .csi index rather than a .tbi one. The splittability check accepted either index, but the reader looked only for .tbi and otherwise opened the file whole, so every genomic-region part re-read and re-emitted the entire file.

  • 2026.5.10
    • grr_cache_repo no longer aborts a long HTTP download at htslib’s 300 s read cap (which killed caching of large resources such as the genome-wide gnomAD file): the HTTP filesystem now applies no overall timeout, only per-read and per-connect limits. A single failed file is retried with exponential backoff and, if it still fails, reported in a summary rather than discarding every other download’s progress. Caching is resumable, so a re-run only refetches what failed.

    • The anonymous-quota refresh commands (refreshdaily and refreshmonthly) now reset each quota and write their refresh-log entry inside a single transaction, so an interruption mid-run rolls back cleanly instead of leaving quotas half-refilled. They also now reset SessionQuota, which was previously left untouched and so became a permanent floor on the effective anonymous quota.

    • Added a version API endpoint.

  • 2026.5.9
    • annotate_vcf and annotate_tabular now run a pre-flight locality check: when the pipeline uses non-local genomic resources (http/https/s3, queried over the network per variant) and the input is large, they warn (1001–5000 rows) or abort before doing any work (more than 5000 rows). Local resources — file/memory schemes or anything behind a caching protocol — never trip the guard, and --allow-remote-resources disables it entirely.

    • The GRR index table now supports cascading column resize.

    • Fixed a GRR summary tooltip being truncated when a biosample description contained a double quote.

  • 2026.5.8
    • Added a prepare_tabular CLI that sorts a (optionally gzip-compressed) tabular file by genomic coordinates and writes a bgzip-compressed, tabix-indexed output, so that annotate_tabular can parallelize annotation across genomic regions. It reuses the same --col-* options to derive the sort and tabix keys, and orders chromosomes by a reference genome when one is supplied (lexicographically otherwise).

    • annotate_tabular (and the deprecated annotate_columns alias) now defaults --input-separator to a comma when the input filename has a .csv extension (optionally .gz or .bgz compressed); all other inputs still default to a tab. An explicit --input-separator/--in-sep always takes precedence.

    • annotate_tabular and annotate_vcf now run sequentially (forcing -j 1) when the input cannot be split into genomic regions — when it has no tabix index, or --region-size is zero or negative — avoiding needless parallel-executor startup overhead for what is a single-task run.

    • annotate_tabular and annotate_vcf now run inside their work_dir, so the .tbi index files htslib downloads for tabix/VCF score resources served over an http(s) GRR land in work_dir instead of littering the directory the tool was launched from. Path-bearing CLI arguments are absolutized first, so the change of working directory is transparent.

    • Fixed flicker and column-resize lag in the GRR browser index table.

    • Expanded the getting-started CLI tutorial with parallelization and resource-caching sections.

  • 2026.5.7
    • Moved the grr_cache_repo CLI from gpf into gain.

    • Fixed the --version label on the grr_manage, grr_browse, annotate_columns, and annotate_vcf CLIs.

    • The GRR index table now supports sorting by the ID column and resizing columns by dragging.

    • Silenced the spurious htslib [W::hts_idx_load3] The index file is older than the data file warning emitted when reading parallel-downloaded GRR resources (caching protocol or DVC). htslib verbosity is now level 1 (errors only) for any process that imports gain.genomic_resources.fsspec_protocol.

    • Revised the getting-started CLI tutorial and refreshed the overview diagram.

  • 2026.5.6
    • Renamed the annotate_columns CLI to annotate_tabular. The old name is kept as a deprecated alias (stderr banner on the CLI, DeprecationWarning on import gain.annotation.annotate_columns) and will be removed in a future release.

    • The web UI now runtime-injects the Google Analytics snippet from the GA_MEASUREMENT_ID container environment variable, so the same image runs with or without GA depending on the host’s deploy-time config.

    • Improved the getting-started CLI documentation with installation prerequisites.

    • The notifications WebSocket now retries on transport-level errors (e.g. a 502 during handshake) after a 2 s delay, preventing the subscription from dying permanently.

    • Fixed empty-array table header rendering and scrollable grid alignment in the single-annotation report.

  • 2026.5.5
    • Moved the to_gpf_gene_models_format CLI from gpf into gain.

  • 2026.5.4
    • Made .CONTENTS.json.gz and .CONTENTS.sqlite3.gz byte-reproducible across platforms.

  • 2026.5.3
    • Refactored the allele score annotator: its default mode now operates only on VCF alleles, and the legacy allele_aggregator attribute was deprecated in favor of aggregator.

    • Fixed VCF processing where incorrect end positions caused spanning records to be skipped, and corrected how allele scores access positions.

    • Standardized canonical annotator names throughout the documentation and fixed attribute-selection bugs in the new-annotator UI.

    • Fixed a race condition when filtering annotators.

    • URL-encode lists, tuples, and dicts when stringifying annotation attributes.

    • Improved GRR browser page styles and table layout, and refactored the templates for visual cohesion.

    • Fixed broken annotation infrastructure links.

    • Updated the FTS search database when creating the contents file and fixed a statistics-manifest bug.

    • The single-annotation report now handles array result values.

  • 2026.5.2
    • Added admin panel views for managing anonymous users and their quotas; monthly quotas are now always displayed.

    • Anonymous-user quotas are now tracked by session ID.

    • Restyled the GRR repository about and index pages.

    • Introduced a new template infrastructure for resource implementations.

    • Fixed BigWig score-definition validation.

  • 2026.5.1
    • Imported the GAIn user documentation into the repository and added Build/Deploy docs CI stages.

    • Updated the quotas page UI and removed quotas from the about page.