Estimating Consumption on VLQ

How Time Is Accounted

The accounting runs in qpu seconds. Two components per shot, are accounted.

Component	Cost per Shot
Circuit execution overhead	up to 50 µs (variable)
Circuit postprocessing overhead	approximately 350 µs (fixed)
Billed per shot	approximately 400 µs

Key point: Only these two components count toward your consumption. Latency incurred while waiting for the job to start and return results is not billed.

Every job submission carries an overhead of 3 to 15 seconds of latency — covering queuing, job scheduling, and result retrieval. This is wall-clock time you will observe, but it does not appear in your consumption bill. Do not factor it into your usage estimate.

Estimating Your Consumption

Per-Shot Cost

Use 400 µs (0.4 ms) as your worst-case per-shot figure. For typical circuits that are shorter or simpler, your circuit component will be less than 50 µs, bringing the per-shot cost closer to 350 µs.

Minimal Per-Job Cost

Due to internals of VLQ built-in scheduler, there is also a minimal cost per job. Minimal cost per-job cost is 0.9s

Formula

Total billed time [s] = Number of jobs × max ( Number of shots × Cost per shot, 0.9 )

Worst case:   Total billed time [s] = jobs × max ( shots × 400e-6, 0.9 )
Typical case: Total billed time [s] = jobs × max ( shots × (350e-6 + actual circuit time), 0.9 )

Examples

Jobs	Shots	Circuit time	Billed time
4	1,000	50 µs (max)	4 × max ( 1000 × (350+50) × 1e-6, 0.9 ) = 3.6 s
3	10,000	50 µs (max)	3 × max ( 10000 × (350+50) × 1e-6, 0.9 ) = 12.0 s
1	10,000	20 µs	1 × max ( 10000 × (350+20) × 1e-6, 0.9 ) = 3.7 s

Practical Tips

Use worst-case (400 µs/shot) when budgeting — it gives you a safe upper bound.
Expect 3–15 s of extra wall time per job submission regardless of shot count. Plan your experiment timing accordingly, but do not count this toward usage.
Asynchroneous execution: To minimize wall-clock time, submit multiple jobs asynchronously and let them queue in parallel. Once submitted, use job.wait_for_final_state() method to wait for completion, then retrieve all results together. This way the latency overhead of each job overlaps rather than stacking sequentially.