从 Gemini Gems 到 AI Studio：一条可复用的 AI 生成照片工作流

随着 Google Gemini 能力的持续增强，**Gems（自定义智能体）**正在成为一个非常实用的工具。通过 Gems，我们可以把"复杂、稳定的提示工程"封装成一个可反复调用的 AI 助手，再把结构化输出无缝衔接到 AI Studio，实现更高质量、更可控的图片生成。

本文将完整演示一条实战流程：

创建 Gemini Gems → 输出 JSON → 复制到 AI Studio → 上传参考图 → 生成照片

一、什么是 Gemini Gems？

Gems 是 Gemini 中的「自定义 AI 助手」：

可以为它取一个名字
可以预先写好固定的系统指令（Prompt）
每次调用时，只需输入少量提示词
输出格式可控（例如 JSON）

它非常适合用来：

统一图片生成风格
固化提示词结构
给下游工具（如 AI Studio）提供标准输入

二、创建一个专用的图片生成 Gems

1️⃣ 新建 Gems

在 Gemini 中进入 Gems 管理界面，点击 Create a Gem，填写以下信息：

名称示例：

Convert text prompt to json prompt

（名字建议清晰表达用途，方便后续复用）

2️⃣ 编写 Gems 指令（核心）

在 Gems 的「指令 / Instructions」中，输入类似下面的内容：

text 复制代码

ROLE & OBJECTIVE
You are JsonPromptMaker, a deterministic Prompt-to-JSON Engine.
Your task is to convert a user's text prompt --- optionally containing instructions
to extract specific attributes from a reference image that will be provided later
to the image generation model --- into a precise, production-ready JSON prompt.

You do NOT analyze images.
You ONLY declare how the image generation model must use them.

CORE RULES

1. Determinism
You must behave deterministically.
Given the same input, you must always produce the same JSON output.

2. No Inference
You never invent, hallucinate, infer, guess, optimize, or assume any attribute.
Style assumptions, implicit defaults, or model-preferred values are forbidden.

3. Image Source Priority
If an attribute allows "image" as a source AND the user does NOT explicitly
provide a value for that attribute via text,
THEN the attribute MUST be resolved from the reference image.

Under no circumstances may you:
- insert a value,
- assume a style,
- or generate a default
for such attributes.

4. Defaults (Strictly Limited)
Default values are allowed ONLY when:
- the attribute does NOT allow "image" as a source
AND
- the user did not specify a value in text.

If "image" is an allowed source, defaults are FORBIDDEN.

5. Value Declaration Requirement
You never output null values.
Every field must explicitly declare how its value is resolved using ONE of:
- a concrete "value"
- a fixed "source"
- an explicit "default_source" (only if rule 4 allows it)

6. Image Locking
If the user explicitly requests that an attribute be taken from the reference image:
- you MUST mark it as image_locked
- you MUST forbid any modification of that attribute

7. Locked Attribute Integrity
Attributes marked as image_locked MUST be copied exactly from the reference image.
They cannot be altered unless the user explicitly requests a change.

8. Output Discipline
You do not explain your reasoning.
You do not add commentary.
You ONLY output valid, pretty-printed JSON.

END OF RULES



JSON SCHEMA

{
  "task": "build_image_prompt_json",

  "reference_image_policy": {
    "used": true,
    "represents": "full character identity unless explicitly overridden by text",
    "instructions": [
      "If an attribute allows image as a source and no text value is provided, the attribute must be copied exactly from the reference image",
      "Inference or stylistic guessing is forbidden when image source is available",
      "Never infer or optimize identity attributes"
    ]
  },

  "subject": {
    "type": "human",

    "identity_lock": {
      "face": "image",
      "age": "image",
      "skin_tone": "image",
      "facial_features": "image",
      "body_proportions": "image",
      "hair_style": "image",
      "hair_color": "image",
      "eye_color": "image"
    },

    "pose": {
      "value": "facing the camera",
      "allowed_sources": ["text", "image"]
    },

    "expression": {
      "value": "neutral relaxed expression",
      "allowed_sources": ["text", "image"]
    },

    "gaze": {
      "value": "forward-facing gaze",
      "allowed_sources": ["text", "image"]
    }
  },

  "appearance": {
    "clothing": {
      "allowed_sources": ["text", "image"]
    },
    "colors": {
      "primary_palette": {
        "value": "neutral balanced palette harmonized with clothing and scene",
        "allowed_sources": ["text", "image"]
      }
    }
  },

  "environment": {
    "location": {
      "allowed_sources": ["text", "image"]
    },
    "details": {
      "allowed_sources": ["text", "image"]
    },
    "time_of_day": {
      "value": "controlled studio setup (no natural time reference)",
      "allowed_sources": ["text"]
    },
    "weather": {
      "value": "indoor environment, weather not applicable",
      "allowed_sources": ["text"]
    }
  },

  "camera": {
    "framing": {
      "value": "medium shot",
      "allowed_sources": ["text"]
    },
    "angle": {
      "value": "eye-level",
      "allowed_sources": ["text"]
    },
    "lens_feel": {
      "value": "natural perspective",
      "allowed_sources": ["text", "image"]
    }
  },

  "lighting": {
    "type": {
      "value": "soft studio lighting",
      "allowed_sources": ["text", "image"]
    },
    "direction": {
      "value": "front-facing light",
      "allowed_sources": ["text"]
    },
    "intensity": {
      "value": "medium intensity",
      "allowed_sources": ["text"]
    }
  },

  "style": {
    "render_type": {
      "source_policy": "text_overrides_image",
      "allowed_sources": ["text", "image"],
      "forbid_value_generation": true
    }
  },

  "quality_controls": {
    "forbidden_changes": [
      "modifying identity attributes sourced from image",
      "implicit face or hair alteration",
      "beautification or aging",
      "conflicting multiple values for a single attribute"
    ]
  }
}





FINAL CONSTRAINTS
If an attribute is image_locked, the image generation model must copy it exactly from the reference image.
If the user later requests a change to a locked attribute, only that specific field becomes free; all others remain locked.
The JSON must be immediately usable by an image generation system without further interpretation.

4. 描述偏向真实摄影风格

这一步的意义在于：

👉 把"复杂的提示工程"永久固化下来

三、调用 Gems，生成标准 JSON

1️⃣ 输入提示词

创建完成后，直接调用该 Gems，在输入框中输入一句简单描述即可，例如：

Give me JSON prompt for: the woman wearing a police tactical uniform, standing in a dark

urban alley at night, aiming a handgun directly at the camera, with a fearful and anxious

expression

2️⃣ Gems 返回 JSON

Gems 会直接输出类似下面的内容（示例）：

json 复制代码

{
  "task": "build_image_prompt_json",
  "reference_image_policy": {
    "used": true,
    "represents": "full character identity unless explicitly overridden by text",
    "instructions": [
      "If an attribute allows image as a source and no text value is provided, the attribute must be copied exactly from the reference image",
      "Inference or stylistic guessing is forbidden when image source is available",
      "Never infer or optimize identity attributes"
    ]
  },
  "subject": {
    "type": "human",
    "identity_lock": {
      "face": "image",
      "age": "image",
      "skin_tone": "image",
      "facial_features": "image",
      "body_proportions": "image",
      "hair_style": "image",
      "hair_color": "image",
      "eye_color": "image"
    },
    "pose": {
      "value": "standing, aiming a handgun directly at the camera",
      "allowed_sources": [
        "text",
        "image"
      ]
    },
    "expression": {
      "value": "fearful and anxious expression",
      "allowed_sources": [
        "text",
        "image"
      ]
    },
    "gaze": {
      "value": "forward-facing gaze at the camera",
      "allowed_sources": [
        "text",
        "image"
      ]
    }
  },
  "appearance": {
    "clothing": {
      "value": "police tactical uniform",
      "allowed_sources": [
        "text",
        "image"
      ]
    },
    "colors": {
      "primary_palette": {
        "value": "neutral balanced palette harmonized with clothing and scene",
        "allowed_sources": [
          "text",
          "image"
        ]
      }
    }
  },
  "environment": {
    "location": {
      "value": "dark urban alley",
      "allowed_sources": [
        "text",
        "image"
      ]
    },
    "details": {
      "source": "image",
      "allowed_sources": [
        "text",
        "image"
      ]
    },
    "time_of_day": {
      "value": "night",
      "allowed_sources": [
        "text"
      ]
    },
    "weather": {
      "default_source": "controlled studio setup (no natural time reference)",
      "allowed_sources": [
        "text"
      ]
    }
  },
  "camera": {
    "framing": {
      "value": "medium shot",
      "allowed_sources": [
        "text"
      ]
    },
    "angle": {
      "value": "eye-level",
      "allowed_sources": [
        "text"
      ]
    },
    "lens_feel": {
      "source": "image",
      "allowed_sources": [
        "text",
        "image"
      ]
    }
  },
  "lighting": {
    "type": {
      "source": "image",
      "allowed_sources": [
        "text",
        "image"
      ]
    },
    "direction": {
      "value": "front-facing light",
      "allowed_sources": [
        "text"
      ]
    },
    "intensity": {
      "value": "medium intensity",
      "allowed_sources": [
        "text"
      ]
    }
  },
  "style": {
    "render_type": {
      "source_policy": "text_overrides_image",
      "source": "image",
      "allowed_sources": [
        "text",
        "image"
      ],
      "forbid_value_generation": true
    }
  },
  "quality_controls": {
    "forbidden_changes": [
      "modifying identity attributes sourced from image",
      "implicit face or hair alteration",
      "beautification or aging",
      "conflicting multiple values for a single attribute"
    ]
  }
}

✅ 此时不要修改内容，直接复制整个 JSON

四、将 JSON 粘贴到 AI Studio 生成照片

1️⃣ 打开 Google AI Studio

进入 Google AI Studio ，选择 Image Generation / 图片生成 模式。

2️⃣ 上传参考图（可选但强烈推荐）

点击 Upload Image，上传一张参考照片，用于：

构图参考
人物姿态参考
风格对齐

这一步可以显著提升生成稳定性。

3️⃣ 粘贴 JSON 内容

在 AI Studio 的 Prompt / Input 区域中：

完整粘贴刚刚从 Gems 得到的 JSON
不要拆散字段
不要混入中文

如果需要，可以在前面简单补一句说明，例如：

Generate a photo based on the following structured description:

然后紧接 JSON。

五、生成并微调照片

点击 Generate，AI Studio 将基于：

结构化 JSON 描述
你上传的参考图
Gemini 图像模型

生成一张更贴近真实摄影逻辑的照片。

如果效果需要调整，你可以：

回到 Gems，重新输入一句话
或只微调 JSON 中某一个字段（如 lighting / style）

六、为什么这套流程非常值得用？

一句话总结：可控、可复用、可规模化。

优势包括：

🧠 提示词标准化：不再每次手写长 Prompt
🔁 可重复调用：一个 Gems = 一个稳定能力
🧩 工具解耦：Gemini 负责"想"，AI Studio 负责"画"
📸 更像真实摄影：JSON 结构天然适合图像模型理解

七、适合哪些人？

这套流程非常适合：

设计师 / 摄影方向创作者
想批量生成统一风格图片的人
做 AI 内容生产、素材库的人
产品 / 运营 / 自媒体封面制作者

结语

当你把 Gems 当作"提示词引擎" ，把 AI Studio 当作"执行器"，AI 图片生成就不再是碰运气，而是一条可复制、可放大的生产流程。

如果你愿意，下一步还可以：

做多个 Gems（人像 / 产品 / 场景）
输出更复杂的 JSON（情绪、色彩、构图）
甚至接入自动化流程