Image generation

Overview

Image generation uses qvac-ext-stable-diffusion.cpp as the inference engine. Load a supported model using modelType: "diffusion". Then, provide a text prompt describing the image to generate.

For image-to-image, also pass init_image (a Uint8Array of PNG bytes) — the model transforms the input guided by the prompt instead of starting from noise.

diffusion() returns one or more PNG images as Uint8Array buffers. Use progressStream to track generation progress step-by-step.

Functions

Use the following sequence of function calls:

For how to use each function, see SDK — API reference.

Models

Supported model families and their file layouts:

SD1.x, SD2.x: single all-in-one *.gguf file. No companion files needed.
SDXL, SD3: may require separate CLIP/T5 text encoder files (clipLModelSrc, clipGModelSrc, t5XxlModelSrc) in modelConfig depending on the model variant.
FLUX.2-klein: split layout — diffusion model *.gguf + LLM text encoder *.gguf (via llmModelSrc) + VAE *.safetensors (via vaeModelSrc).

For models available as constants, see SDK — Models.

Examples

Stable Diffusion

The following script shows a minimal text-to-image generation example using a single all-in-one SD 2.1 model:

diffusion-simple.js

import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0 } from "@qvac/sdk";
import fs from "fs";
// Minimal diffusion example — single GGUF model, no companion files needed.
// Works with SD 1.x / 2.x all-in-one models.
const modelSrc = process.argv[2] || SD_V2_1_1B_Q8_0;
const prompt = process.argv[3] || "a photo of a cat sitting on a windowsill";
const modelId = await loadModel({
    modelSrc,
    modelType: "diffusion",
    modelConfig: { prediction: "v" },
});
const { outputs } = diffusion({ modelId, prompt });
const buffers = await outputs;
fs.writeFileSync("output.png", buffers[0]);
console.log("Saved: output.png");
await unloadModel({ modelId, clearStorage: false });
process.exit(0);

FLUX.2-klein

The following script shows text-to-image generation using FLUX.2-klein with its split-layout model (separate diffusion model, LLM text encoder, and VAE):

diffusion-flux2-klein.js

import { loadModel, unloadModel, diffusion, FLUX_2_KLEIN_4B_Q4_0, FLUX_2_KLEIN_4B_VAE, QWEN3_4B_Q4_K_M } from "@qvac/sdk";
import fs from "fs";
import path from "path";
// FLUX.2 [klein] uses a split-layout: separate diffusion model + LLM text encoder + VAE
const diffusionModelSrc = process.argv[2] || FLUX_2_KLEIN_4B_Q4_0;
const llmModelSrc = process.argv[3] || QWEN3_4B_Q4_K_M;
const vaeModelSrc = process.argv[4] || FLUX_2_KLEIN_4B_VAE;
const prompt = process.argv[5] || "a futuristic city at sunset, photorealistic";
const outputDir = process.argv[6] || ".";
console.log("Loading FLUX.2 [klein] split-layout model...");
const modelId = await loadModel({
    modelSrc: diffusionModelSrc,
    modelType: "diffusion",
    modelConfig: {
        device: "gpu",
        threads: 4,
        llmModelSrc,
        vaeModelSrc,
    },
    onProgress: (p) => console.log(`Loading: ${p.percentage.toFixed(1)}%`),
});
console.log(`Model loaded: ${modelId}`);
console.log(`\nGenerating: "${prompt}"`);
const { progressStream, outputs, stats } = diffusion({
    modelId,
    prompt,
    width: 512,
    height: 512,
    steps: 20,
    guidance: 3.5,
    cfg_scale: 1,
    seed: -1,
});
for await (const { step, totalSteps } of progressStream) {
    process.stdout.write(`\rStep ${step}/${totalSteps}`);
}
console.log();
const buffers = await outputs;
for (let i = 0; i < buffers.length; i++) {
    const outputPath = path.join(outputDir, `flux2_${i}.png`);
    fs.writeFileSync(outputPath, buffers[i]);
    console.log(`Saved: ${outputPath}`);
}
console.log("\nStats:", await stats);
await unloadModel({ modelId, clearStorage: false });
console.log("Done.");
process.exit(0);

Image-to-image

Pass init_image to transform an existing image guided by a text prompt. Behavior depends on the model family:

SD / SDXL / SD3: SDEdit-style. Use strength to control how much the source is preserved (0 = keep source, 1 = ignore source).
FLUX.2: in-context conditioning. Requires prediction: "flux2_flow" in modelConfig at loadModel() time; strength is ignored on this path.

The following script loads an SD 2.1 model and transforms an input image using strength: 0.5:

diffusion-img2img.js

import { loadModel, unloadModel, diffusion, SD_V2_1_1B_Q8_0 } from "@qvac/sdk";
import fs from "fs";
import path from "path";
// img2img example — transforms an input image guided by a text prompt.
const inputPath = process.argv[2];
const prompt = process.argv[3] || "oil painting style, vibrant colors";
const outputDir = process.argv[4] || ".";
const modelSrc = process.argv[5] || SD_V2_1_1B_Q8_0;
if (!inputPath) {
    console.error("❌ Error: input image path is required");
    console.error("Usage: bun run bare:example dist/examples/diffusion-img2img.js <inputImage> [prompt] [outputDir] [modelSrc]");
    process.exit(1);
}
try {
    console.log("Loading diffusion model...");
    const modelId = await loadModel({
        modelSrc,
        modelType: "diffusion",
    });
    console.log(`Model loaded: ${modelId}`);
    const init_image = new Uint8Array(fs.readFileSync(inputPath));
    console.log(`\nTransforming "${inputPath}" with prompt: "${prompt}"`);
    const { progressStream, outputs, stats } = diffusion({
        modelId,
        prompt,
        init_image,
        strength: 0.5,
        steps: 30,
        seed: -1,
    });
    for await (const { step, totalSteps } of progressStream) {
        process.stdout.write(`\rStep ${step}/${totalSteps}`);
    }
    console.log();
    const buffers = await outputs;
    for (let i = 0; i < buffers.length; i++) {
        const outputPath = path.join(outputDir, `img2img_${i}.png`);
        fs.writeFileSync(outputPath, buffers[i]);
        console.log(`Saved: ${outputPath}`);
    }
    console.log("\nStats:", await stats);
    await unloadModel({ modelId, clearStorage: false });
    console.log("Done.");
    process.exit(0);
}
catch (error) {
    console.error("❌ Error:", error);
    process.exit(1);
}

Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.