[TAS] Requeue inadmissible workloads after non-TAS pod finishes #8709

sohankunkerkar · 2026-01-20T21:43:52Z

When a non-TAS pod terminates or is deleted, capacity is freed on the node. This fix requeues inadmissible workloads to reconsider the freed capacity.

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #8653

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Requeue inadmissible workloads after non-TAS pod finishes

netlify · 2026-01-20T21:43:59Z

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
🔨 Latest commit	`d7f11ef`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/69718fae1dec9400080481d2

Copilot

Pull request overview

This PR fixes a bug where inadmissible workloads were not automatically requeued when non-TAS pods terminated or were deleted, potentially causing workload starvation. The fix adds queue manager access to the NonTasUsageReconciler and calls QueueInadmissibleWorkloads after the cache is updated, following the same pattern used in the TAS ResourceFlavor controller.

Changes:

Added queue manager to NonTasUsageReconciler to enable requeuing of inadmissible workloads
Implemented automatic requeue when non-TAS pods terminate or are deleted, freeing capacity
Removed workaround code from integration tests that manually triggered requeuing

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
pkg/controller/tas/non_tas_usage_controller.go	Added queue manager parameter and requeueInadmissibleWorkloads method; triggers requeue when pods are deleted or terminated
pkg/controller/tas/controllers.go	Passes queue manager to NonTasUsageReconciler constructor
test/integration/singlecluster/tas/tas_test.go	Removed manual requeue workarounds now that automatic requeuing is fixed
test/integration/singlecluster/tas/suite_test.go	Cleaned up global qManager variable that was only needed for the workaround

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mimowo · 2026-01-21T12:36:10Z

/assign @gabesaba
ptal

gabesaba · 2026-01-21T13:17:33Z

This solution works, but has the same issues @mimowo raised in this thread: #8484 (comment)

We need to have some combination of the suggestions @mimowo made here: #8484 (comment)

filter for non-TAS pods (TAS pods are already handled by TAS Workloads)

bulk the requests in time, like 1min or so.

probably only requeue for ClusterQueues which are affected by the change (using Flavors matching the affected nodes)

1 is already accomplished via event filters. I think 3 is quite tricky.

My recommendation would be to go with option 2. You may be able to use libraries from client-go's workqueue: https://pkg.go.dev/k8s.io/client-go/util/workqueue#pkg-overview, add a dummy item (since, without implementing 1, we are requeueing everything anyway), and then requeue everything (in a batched manner) when the dummy item is picked up for work

mimowo · 2026-01-21T13:21:21Z

Yes, using a dedicated workqueue sgtm with delay of 1min. We do something similar with the core k8s in Job for clearing orphaned Pods, ptal: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/job/job_controller.go#L189

Ofc you could parametrize the workqueue (or entrire controller) by the batch time, and for testing use smaller, say 5s, while on prod we would use something like 1min.

k8s-ci-robot · 2026-01-22T02:27:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sohankunkerkar
Once this PR has been reviewed and has the lgtm label, please ask for approval from gabesaba. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

When a non-TAS pod terminates or is deleted, capacity is freed on the node. This fix requeues inadmissible workloads to reconsider the freed capacity. Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>

sohankunkerkar · 2026-01-22T14:50:44Z

@gabesaba @mimowo could you PTAL?

Copilot AI review requested due to automatic review settings January 20, 2026 21:43

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Jan 20, 2026

k8s-ci-robot requested review from gabesaba and mimowo January 20, 2026 21:44

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 20, 2026

Copilot started reviewing on behalf of sohankunkerkar January 20, 2026 21:44 View session

Copilot AI reviewed Jan 20, 2026

View reviewed changes

k8s-ci-robot assigned gabesaba Jan 21, 2026

sohankunkerkar force-pushed the fix-non-tas-bug branch from 66dc064 to 118145f Compare January 22, 2026 02:27

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 22, 2026

[TAS] Requeue inadmissible workloads after non-TAS pod finishes

d7f11ef

When a non-TAS pod terminates or is deleted, capacity is freed on the node. This fix requeues inadmissible workloads to reconsider the freed capacity. Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>

sohankunkerkar force-pushed the fix-non-tas-bug branch from 118145f to d7f11ef Compare January 22, 2026 02:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TAS] Requeue inadmissible workloads after non-TAS pod finishes #8709

[TAS] Requeue inadmissible workloads after non-TAS pod finishes #8709

Uh oh!

sohankunkerkar commented Jan 20, 2026 •

edited

Loading

Uh oh!

netlify bot commented Jan 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

mimowo commented Jan 21, 2026

Uh oh!

gabesaba commented Jan 21, 2026

Uh oh!

mimowo commented Jan 21, 2026

Uh oh!

k8s-ci-robot commented Jan 22, 2026

Uh oh!

sohankunkerkar commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[TAS] Requeue inadmissible workloads after non-TAS pod finishes #8709

Are you sure you want to change the base?

[TAS] Requeue inadmissible workloads after non-TAS pod finishes #8709

Uh oh!

Conversation

sohankunkerkar commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

netlify bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

mimowo commented Jan 21, 2026

Uh oh!

gabesaba commented Jan 21, 2026

Uh oh!

mimowo commented Jan 21, 2026

Uh oh!

k8s-ci-robot commented Jan 22, 2026

Uh oh!

sohankunkerkar commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sohankunkerkar commented Jan 20, 2026 •

edited

Loading

netlify bot commented Jan 20, 2026 •

edited

Loading