Most retry libraries wrap one call. Fine for a single flaky operation — but when you run a pool of tasks, retry should be the pool’s job, not yours.
retry-thread-pool puts retries at the
thread-pool level: wrap any ExecutorService, submit a named task, get a CompletableFuture —
retries happen on their own. Java 17+, on Maven Central, zero runtime dependencies.
Quickstart
RetryPolicy policy = RetryPolicy.builder()
.maxRetries(3)
.backoff(Backoff.exponentialWithJitter(Duration.ofMillis(100), Duration.ofSeconds(5)))
.retryOn(IOException.class)
.build();
try (RetryExecutor executor = RetryExecutor.builder().retryPolicy(policy).build()) {
CompletableFuture<User> user = executor.submit("fetch-user", () -> client.fetchUser(id));
// compose it, join it, or collect a whole batch — it's a normal CompletableFuture
}
What you get
- Backoff —
none,fixed,exponential,exponentialWithJitter. Jitter kills synchronized retry storms. - Predicates —
retryOn(...)/abortOn(...);abortOnwins.ErrorandInterruptedExceptionnever retry. - Per-attempt timeout — a hung attempt is interrupted and retried, not left to wedge a worker.
- Listeners —
onRetry/onSuccess/onExhausted/onAbort, for metrics/logs without touching task code. - Stats — immutable snapshot: submitted / succeeded / exhausted / retried / timed-out counts.
- Bring your own pool — any
ExecutorService, including virtual threads on 21+. - Loud exhaustion — out of retries →
RetryExhaustedException(cause = last failure); a non-retryable error surfaces as itself.
Why it matters
- Fire and forget — submit → future. No catch, no
sleep, no attempt counters, no rescheduling in your code. - Async stays async — backoff is a scheduler timer, not a
Thread.sleep. Workers keep working; throughput holds when a dependency flaps. - Independent healing — each task has its own budget; one flaky task doesn’t stall the ninety-nine beside it.
- Resilience is a pool property — not retry logic threaded through every call site.
Observability
See what the pool is doing — without instrumenting your task code:
- Listeners —
onRetry/onSuccess/onExhausted/onAbortfire on every transition; bridge them to Micrometer, StatsD, or logs. stats()— an immutable snapshot: submitted / succeeded / exhausted / aborted / retried / timed-out / rejected, plus active + queued counts. Scrape it for a dashboard or a health check.- Logs — via
System.Logger, routed to your existing backend. Nothing to wire. - Latency —
TaskEvent.attemptDuration(per attempt) andstats().totalExecutionMillis(aggregate) give you timing, not just counts.
RetryExecutor executor = RetryExecutor.builder()
.retryPolicy(policy)
.listener(new RetryListener() {
@Override public void onRetry(TaskEvent e) { meter.counter("pool.retry", "task", e.taskName()).increment(); }
@Override public void onExhausted(TaskEvent e) { meter.counter("pool.exhausted", "task", e.taskName()).increment(); }
})
.build();
RetryExecutorStats s = executor.stats(); // point-in-time snapshot
log.info("succeeded={} exhausted={} retries={} timedOut={}",
s.succeeded(), s.exhausted(), s.retriesScheduled(), s.timedOut());
Lifecycle & control
AutoCloseable— use try-with-resources;close()stops new submits and drains in-flight plus already-scheduled retries before returning.- Owns only what it makes — it shuts down its internal pool; a pool you pass in stays yours to close.
- Cancellation —
future.cancel(true)interrupts the running attempt and cancels the pending retry. Cancelled ≠ exhausted, so no spuriousonExhausted.
Robustness
- Fail-fast config — the builder validates at
build():maxRetries >= 0, positive durations, and a class listed in bothretryOnandabortOnis rejected. - Overflow-safe backoff — exponential delays cap cleanly instead of overflowing; jitter is full
jitter over
[0, delay]. - Correct under load — the scheduler thread never runs your code (attempts and listeners run on the work pool), and stats are lock-free.
Zero dependencies
Logging goes through the JDK’s System.Logger facade (Java 9+) — routes to your SLF4J/Log4j if
present, silent otherwise. You add one artifact and nothing else comes with it.
Agent-first
Built so an AI agent can use it from the examples alone — the same way my profile is built for humans and agents at once:
llms.txt— discovery index pointing agents at the docs.docs/AI_USAGE.md— full public surface + a recipe per feature.AGENTS.md— build/test/conventions for agents editing the library.- Docs = compiling tests — every recipe is a real test in
ExamplesTest. Change the API and the examples stop compiling, so the build fails. The docs can’t drift from the code.
// from ExamplesTest — compiles and passes on every build
@Test
void exhaustionSurfacesLastFailure() {
RetryPolicy policy = RetryPolicy.builder()
.maxRetries(2).backoff(Backoff.fixed(Duration.ofMillis(5))).build();
try (RetryExecutor executor = RetryExecutor.builder().retryPolicy(policy).build()) {
CompletableFuture<String> result =
executor.submit("doomed", () -> { throw new IOException("permanent"); });
ExecutionException thrown = assertThrows(ExecutionException.class, result::get);
RetryExhaustedException cause =
assertInstanceOf(RetryExhaustedException.class, thrown.getCause());
assertEquals(3, cause.attempts()); // 1 initial + 2 retries
assertInstanceOf(IOException.class, cause.getCause());
}
}
Try it
<dependency>
<groupId>io.github.adrijshikhar</groupId>
<artifactId>retry-thread-pool</artifactId>
<version>0.2.0</version>
</dependency>
- Repo: github.com/adrijshikhar/retry-thread-pool
- API docs: javadoc.io
- For agents:
llms.txt·docs/AI_USAGE.md
Retries belong wherever your work runs. If your work runs on a pool, they belong on the pool.