Release Notes
- 2026.6.0
grr_cache_reponow shows live progress while caching instead of a wall of per-file log lines. On a terminal it draws a byte-basedtqdmbar (human-readable units, throughput and ETA) with afiles=done/totaland, when applicable,failed=Ntally; off a terminal (e.g. a captured CI log) it instead emits throttled milestoneINFOlines at a 0% baseline, every 10% crossing, and 100%. Per-file request/finished chatter is demoted toDEBUG. Caching first classifies the full work-list — logging acaching N file(s), B bytes to download; M already cachedheader — and then downloads only the files that are missing or stale, so a fully-cached re-run prints no bar. A failed download still advances the bar to 100% (credited with the file’s size and counted infailed=N) and is reported in the end-of-run summary. The--no-progressflag turns the indicator off entirely.tqdmis now a core runtime dependency.Raised the GRR caching protocol’s application-level read-buffer (
CHUNK_SIZE) from 32 KiB to 1 MiB. This cuts the per-chunk Python read/write/md5 and progress-callback overhead roughly 32× on multi-gigabyte resources, with negligible extra memory and no change to network behavior (fsspec still owns block-level fetching); small files are unaffected.Revised the getting-started CLI and web tutorials, refreshed their figures, and updated the example variant files.
grr_cache_reponow takes the annotation pipeline as a positional argument (grr_cache_repo <pipeline>) instead of the--pipeline/-pflag, adopting the sameCLIAnnotationContextProvidermechanism asannotate_columnsandannotate_vcf. Backward-incompatible: scripts passing--pipeline/-pmust drop the flag. When a GPF instance is also supplied (-i), an explicit positional pipeline still wins and the instance pipeline is used only as a fallback; with no pipeline from any source the command logsno pipeline supplied; nothing to cacheand exits cleanly.annotate_tabularandannotate_vcfnow treat.bgzas a first-class compression extension on par with.gz, for both input and output. A.bgzinput is read — and, when tabix-indexed, split by genomic region — exactly like a.gzone, and an explicit.bgzoutput suffix is now preserved (previously the output was always rewritten to.gz). When the output name omits a compression suffix, a compressed input’s suffix is mirrored onto it. This also fixesbuild_output_pathmangling output names whose stem ended ing/zcharacters (e.g.log.gz), a side effect of the previousrstrip-based suffix handling. As a consequence, the default--work-dirname derived from a.bgzoutput changed (e.g.out.vcf.bgznow yieldsout_workrather thanout.vcf_work). A run that was interrupted on an older version with a.bgzoutput and is resumed after upgrading will not find its old.task-statusdirectory and will restart from scratch — point--work-dirat the previous<output-stem>.vcf_workpath to continue from the checkpoint. (.gzoutputs are unaffected.)Fixed silent duplication of records when an
annotate_tabularinput was a compressed file carrying a.csiindex rather than a.tbione. The splittability check accepted either index, but the reader looked only for.tbiand otherwise opened the file whole, so every genomic-region part re-read and re-emitted the entire file.
- 2026.5.10
grr_cache_repono longer aborts a long HTTP download at htslib’s 300 s read cap (which killed caching of large resources such as the genome-wide gnomAD file): the HTTP filesystem now applies no overall timeout, only per-read and per-connect limits. A single failed file is retried with exponential backoff and, if it still fails, reported in a summary rather than discarding every other download’s progress. Caching is resumable, so a re-run only refetches what failed.The anonymous-quota refresh commands (
refreshdailyandrefreshmonthly) now reset each quota and write their refresh-log entry inside a single transaction, so an interruption mid-run rolls back cleanly instead of leaving quotas half-refilled. They also now resetSessionQuota, which was previously left untouched and so became a permanent floor on the effective anonymous quota.Added a version API endpoint.
- 2026.5.9
annotate_vcfandannotate_tabularnow run a pre-flight locality check: when the pipeline uses non-local genomic resources (http/https/s3, queried over the network per variant) and the input is large, they warn (1001–5000 rows) or abort before doing any work (more than 5000 rows). Local resources — file/memory schemes or anything behind a caching protocol — never trip the guard, and--allow-remote-resourcesdisables it entirely.The GRR index table now supports cascading column resize.
Fixed a GRR summary tooltip being truncated when a biosample description contained a double quote.
- 2026.5.8
Added a
prepare_tabularCLI that sorts a (optionally gzip-compressed) tabular file by genomic coordinates and writes a bgzip-compressed, tabix-indexed output, so thatannotate_tabularcan parallelize annotation across genomic regions. It reuses the same--col-*options to derive the sort and tabix keys, and orders chromosomes by a reference genome when one is supplied (lexicographically otherwise).annotate_tabular(and the deprecatedannotate_columnsalias) now defaults--input-separatorto a comma when the input filename has a.csvextension (optionally.gzor.bgzcompressed); all other inputs still default to a tab. An explicit--input-separator/--in-sepalways takes precedence.annotate_tabularandannotate_vcfnow run sequentially (forcing-j 1) when the input cannot be split into genomic regions — when it has no tabix index, or--region-sizeis zero or negative — avoiding needless parallel-executor startup overhead for what is a single-task run.annotate_tabularandannotate_vcfnow run inside theirwork_dir, so the.tbiindex files htslib downloads for tabix/VCF score resources served over an http(s) GRR land inwork_dirinstead of littering the directory the tool was launched from. Path-bearing CLI arguments are absolutized first, so the change of working directory is transparent.Fixed flicker and column-resize lag in the GRR browser index table.
Expanded the getting-started CLI tutorial with parallelization and resource-caching sections.
- 2026.5.7
Moved the
grr_cache_repoCLI fromgpfintogain.Fixed the
--versionlabel on thegrr_manage,grr_browse,annotate_columns, andannotate_vcfCLIs.The GRR index table now supports sorting by the ID column and resizing columns by dragging.
Silenced the spurious htslib
[W::hts_idx_load3] The index file is older than the data filewarning emitted when reading parallel-downloaded GRR resources (caching protocol or DVC). htslib verbosity is now level 1 (errors only) for any process that importsgain.genomic_resources.fsspec_protocol.Revised the getting-started CLI tutorial and refreshed the overview diagram.
- 2026.5.6
Renamed the
annotate_columnsCLI toannotate_tabular. The old name is kept as a deprecated alias (stderr banner on the CLI,DeprecationWarningonimport gain.annotation.annotate_columns) and will be removed in a future release.The web UI now runtime-injects the Google Analytics snippet from the
GA_MEASUREMENT_IDcontainer environment variable, so the same image runs with or without GA depending on the host’s deploy-time config.Improved the getting-started CLI documentation with installation prerequisites.
The notifications WebSocket now retries on transport-level errors (e.g. a 502 during handshake) after a 2 s delay, preventing the subscription from dying permanently.
Fixed empty-array table header rendering and scrollable grid alignment in the single-annotation report.
- 2026.5.5
Moved the
to_gpf_gene_models_formatCLI fromgpfintogain.
- 2026.5.4
Made
.CONTENTS.json.gzand.CONTENTS.sqlite3.gzbyte-reproducible across platforms.
- 2026.5.3
Refactored the allele score annotator: its default mode now operates only on VCF alleles, and the legacy
allele_aggregatorattribute was deprecated in favor ofaggregator.Fixed VCF processing where incorrect end positions caused spanning records to be skipped, and corrected how allele scores access positions.
Standardized canonical annotator names throughout the documentation and fixed attribute-selection bugs in the new-annotator UI.
Fixed a race condition when filtering annotators.
URL-encode lists, tuples, and dicts when stringifying annotation attributes.
Improved GRR browser page styles and table layout, and refactored the templates for visual cohesion.
Fixed broken annotation infrastructure links.
Updated the FTS search database when creating the contents file and fixed a statistics-manifest bug.
The single-annotation report now handles array result values.
- 2026.5.2
Added admin panel views for managing anonymous users and their quotas; monthly quotas are now always displayed.
Anonymous-user quotas are now tracked by session ID.
Restyled the GRR repository about and index pages.
Introduced a new template infrastructure for resource implementations.
Fixed BigWig score-definition validation.
- 2026.5.1
Imported the GAIn user documentation into the repository and added Build/Deploy docs CI stages.
Updated the quotas page UI and removed quotas from the about page.