Release Notes ============= * 2026.6.0 * ``grr_cache_repo`` now shows live progress while caching instead of a wall of per-file log lines. On a terminal it draws a byte-based ``tqdm`` bar (human-readable units, throughput and ETA) with a ``files=done/total`` and, when applicable, ``failed=N`` tally; off a terminal (e.g. a captured CI log) it instead emits throttled milestone ``INFO`` lines at a 0% baseline, every 10% crossing, and 100%. Per-file request/finished chatter is demoted to ``DEBUG``. Caching first classifies the full work-list — logging a ``caching N file(s), B bytes to download; M already cached`` header — and then downloads only the files that are missing or stale, so a fully-cached re-run prints no bar. A failed download still advances the bar to 100% (credited with the file's size and counted in ``failed=N``) and is reported in the end-of-run summary. The ``--no-progress`` flag turns the indicator off entirely. ``tqdm`` is now a core runtime dependency. * Raised the GRR caching protocol's application-level read-buffer (``CHUNK_SIZE``) from 32 KiB to 1 MiB. This cuts the per-chunk Python read/write/md5 and progress-callback overhead roughly 32× on multi-gigabyte resources, with negligible extra memory and no change to network behavior (fsspec still owns block-level fetching); small files are unaffected. * Revised the getting-started CLI and web tutorials, refreshed their figures, and updated the example variant files. * ``grr_cache_repo`` now takes the annotation pipeline as a **positional argument** (``grr_cache_repo ``) instead of the ``--pipeline``/``-p`` flag, adopting the same ``CLIAnnotationContextProvider`` mechanism as ``annotate_columns`` and ``annotate_vcf``. **Backward-incompatible:** scripts passing ``--pipeline``/``-p`` must drop the flag. When a GPF instance is also supplied (``-i``), an explicit positional pipeline still wins and the instance pipeline is used only as a fallback; with no pipeline from any source the command logs ``no pipeline supplied; nothing to cache`` and exits cleanly. * ``annotate_tabular`` and ``annotate_vcf`` now treat ``.bgz`` as a first-class compression extension on par with ``.gz``, for both input and output. A ``.bgz`` input is read — and, when tabix-indexed, split by genomic region — exactly like a ``.gz`` one, and an explicit ``.bgz`` output suffix is now preserved (previously the output was always rewritten to ``.gz``). When the output name omits a compression suffix, a compressed input's suffix is mirrored onto it. This also fixes ``build_output_path`` mangling output names whose stem ended in ``g``/``z`` characters (e.g. ``log.gz``), a side effect of the previous ``rstrip``-based suffix handling. As a consequence, the default ``--work-dir`` name derived from a ``.bgz`` output changed (e.g. ``out.vcf.bgz`` now yields ``out_work`` rather than ``out.vcf_work``). A run that was interrupted on an older version with a ``.bgz`` output and is resumed after upgrading will not find its old ``.task-status`` directory and will restart from scratch — point ``--work-dir`` at the previous ``.vcf_work`` path to continue from the checkpoint. (``.gz`` outputs are unaffected.) * Fixed silent duplication of records when an ``annotate_tabular`` input was a compressed file carrying a ``.csi`` index rather than a ``.tbi`` one. The splittability check accepted either index, but the reader looked only for ``.tbi`` and otherwise opened the file whole, so every genomic-region part re-read and re-emitted the entire file. * 2026.5.10 * ``grr_cache_repo`` no longer aborts a long HTTP download at htslib's 300 s read cap (which killed caching of large resources such as the genome-wide gnomAD file): the HTTP filesystem now applies no overall timeout, only per-read and per-connect limits. A single failed file is retried with exponential backoff and, if it still fails, reported in a summary rather than discarding every other download's progress. Caching is resumable, so a re-run only refetches what failed. * The anonymous-quota refresh commands (``refreshdaily`` and ``refreshmonthly``) now reset each quota and write their refresh-log entry inside a single transaction, so an interruption mid-run rolls back cleanly instead of leaving quotas half-refilled. They also now reset ``SessionQuota``, which was previously left untouched and so became a permanent floor on the effective anonymous quota. * Added a version API endpoint. * 2026.5.9 * ``annotate_vcf`` and ``annotate_tabular`` now run a pre-flight locality check: when the pipeline uses non-local genomic resources (http/https/s3, queried over the network per variant) and the input is large, they warn (1001–5000 rows) or abort before doing any work (more than 5000 rows). Local resources — file/memory schemes or anything behind a caching protocol — never trip the guard, and ``--allow-remote-resources`` disables it entirely. * The GRR index table now supports cascading column resize. * Fixed a GRR summary tooltip being truncated when a biosample description contained a double quote. * 2026.5.8 * Added a ``prepare_tabular`` CLI that sorts a (optionally gzip-compressed) tabular file by genomic coordinates and writes a bgzip-compressed, tabix-indexed output, so that ``annotate_tabular`` can parallelize annotation across genomic regions. It reuses the same ``--col-*`` options to derive the sort and tabix keys, and orders chromosomes by a reference genome when one is supplied (lexicographically otherwise). * ``annotate_tabular`` (and the deprecated ``annotate_columns`` alias) now defaults ``--input-separator`` to a comma when the input filename has a ``.csv`` extension (optionally ``.gz`` or ``.bgz`` compressed); all other inputs still default to a tab. An explicit ``--input-separator``/``--in-sep`` always takes precedence. * ``annotate_tabular`` and ``annotate_vcf`` now run sequentially (forcing ``-j 1``) when the input cannot be split into genomic regions — when it has no tabix index, or ``--region-size`` is zero or negative — avoiding needless parallel-executor startup overhead for what is a single-task run. * ``annotate_tabular`` and ``annotate_vcf`` now run inside their ``work_dir``, so the ``.tbi`` index files htslib downloads for tabix/VCF score resources served over an http(s) GRR land in ``work_dir`` instead of littering the directory the tool was launched from. Path-bearing CLI arguments are absolutized first, so the change of working directory is transparent. * Fixed flicker and column-resize lag in the GRR browser index table. * Expanded the getting-started CLI tutorial with parallelization and resource-caching sections. * 2026.5.7 * Moved the ``grr_cache_repo`` CLI from ``gpf`` into ``gain``. * Fixed the ``--version`` label on the ``grr_manage``, ``grr_browse``, ``annotate_columns``, and ``annotate_vcf`` CLIs. * The GRR index table now supports sorting by the ID column and resizing columns by dragging. * Silenced the spurious htslib ``[W::hts_idx_load3] The index file is older than the data file`` warning emitted when reading parallel-downloaded GRR resources (caching protocol or DVC). htslib verbosity is now level 1 (errors only) for any process that imports ``gain.genomic_resources.fsspec_protocol``. * Revised the getting-started CLI tutorial and refreshed the overview diagram. * 2026.5.6 * Renamed the ``annotate_columns`` CLI to ``annotate_tabular``. The old name is kept as a deprecated alias (stderr banner on the CLI, ``DeprecationWarning`` on ``import gain.annotation.annotate_columns``) and will be removed in a future release. * The web UI now runtime-injects the Google Analytics snippet from the ``GA_MEASUREMENT_ID`` container environment variable, so the same image runs with or without GA depending on the host's deploy-time config. * Improved the getting-started CLI documentation with installation prerequisites. * The notifications WebSocket now retries on transport-level errors (e.g. a 502 during handshake) after a 2 s delay, preventing the subscription from dying permanently. * Fixed empty-array table header rendering and scrollable grid alignment in the single-annotation report. * 2026.5.5 * Moved the ``to_gpf_gene_models_format`` CLI from ``gpf`` into ``gain``. * 2026.5.4 * Made ``.CONTENTS.json.gz`` and ``.CONTENTS.sqlite3.gz`` byte-reproducible across platforms. * 2026.5.3 * Refactored the allele score annotator: its default mode now operates only on VCF alleles, and the legacy ``allele_aggregator`` attribute was deprecated in favor of ``aggregator``. * Fixed VCF processing where incorrect end positions caused spanning records to be skipped, and corrected how allele scores access positions. * Standardized canonical annotator names throughout the documentation and fixed attribute-selection bugs in the new-annotator UI. * Fixed a race condition when filtering annotators. * URL-encode lists, tuples, and dicts when stringifying annotation attributes. * Improved GRR browser page styles and table layout, and refactored the templates for visual cohesion. * Fixed broken annotation infrastructure links. * Updated the FTS search database when creating the contents file and fixed a statistics-manifest bug. * The single-annotation report now handles array result values. * 2026.5.2 * Added admin panel views for managing anonymous users and their quotas; monthly quotas are now always displayed. * Anonymous-user quotas are now tracked by session ID. * Restyled the GRR repository about and index pages. * Introduced a new template infrastructure for resource implementations. * Fixed BigWig score-definition validation. * 2026.5.1 * Imported the GAIn user documentation into the repository and added Build/Deploy docs CI stages. * Updated the quotas page UI and removed quotas from the about page.