Skip to content

结构化输出 Structured Output

作用:将大模型返回的非结构化数据转换为应用程序需要的结构化数据。

img.png

结构化输出转换器核心做两件事:

  1. 定义 format:在向大模型发起请求时,将 format 拼接在提示语之后一起发送给大模型,指导大模型按照该 format 进行返回
  2. 将字符串转换为目标格式:在大模型返回数据后,将返回的字符串内容转化为指定格式,例如 Java 类

转化为 Java 类型

java
/**
* 返回 Java 类
*/
@RequestMapping("/5")
public ActorFilms execute5() {
    return chatClient.prompt("Generate the filmography for a random actor.").call().entity(ActorFilms.class);
}

Java 类如下:

java
import lombok.Data;
import lombok.experimental.Accessors;

import java.util.List;

@Data
@Accessors(chain = true)
public class ActorFilms {
    private String actor;
    private List<String> films;
}

其底层实现是:

java
@RequestMapping("/55")
public ActorFilms execute55() {
    /*
     * 创建转换器(转换器的 format 提示语会根据 BeanOutputConverter 中的 clazz 属性进行编写)
     */
    BeanOutputConverter<ActorFilms> converter = new BeanOutputConverter<>(ActorFilms.class);
    /*
     * 将转换器的 format 赋值
     */
    PromptTemplate promptTemplate = new PromptTemplate("""
            Generate the filmography for a random actor.
            {format}""");
    Prompt prompt = promptTemplate.create(Map.of("format", converter.getFormat()));
    /*
     * 调用大模型
     */
    String content = chatClient.prompt(prompt).advisors(new SimpleLoggerAdvisor()).call().content();
    /*
     * 转换为指定类型
     */
    return converter.convert(content);
}

类型转换器会根据返回的不同的 Java 模型制定不同的 format 提示语。例如对于返回 ActorFilms,format 提示语如下(注意观察 schema 部分):

text
Your response should be in JSON format.
Do not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.
Do not include markdown code blocks in your response.
Remove the ```json markdown from the output.
Here is the JSON Schema instance your output must adhere to:
```{
  "$schema" : "https://json-schema.org/draft/2020-12/schema",
  "type" : "object",
  "properties" : {
    "actor" : {
      "type" : "string"
    },
    "films" : {
      "type" : "array",
      "items" : {
        "type" : "string"
      }
    }
  },
  "additionalProperties" : false
}```

转换为 List 类型

java
@RequestMapping("/6")
public List<ActorFilms> execute6() {
    return chatClient.prompt("Generate the filmography of 5 movies for 周星驰 and 刘德华.").call().entity(new ParameterizedTypeReference<>() {
    });
}

其底层实现:

java
@RequestMapping("/56")
public List<ActorFilms> execute56() {
    /*
     * 创建转换器(转换器的 format 提示语会根据 BeanOutputConverter 中的 clazz 属性进行编写)
     */
    BeanOutputConverter<List<ActorFilms>> converter = new BeanOutputConverter<>(new ParameterizedTypeReference<>() {
    });
    /*
     * 将转换器的 format 赋值
     */
    PromptTemplate promptTemplate = new PromptTemplate("Tell me the names of 5 movies those act  by {actor}.{format}");
    Prompt prompt = promptTemplate.create(Map.of("actor", "刘德华", "format", converter.getFormat()));
    /*
     * 调用大模型
     */
    String content = chatClient.prompt(prompt).advisors(new SimpleLoggerAdvisor()).call().content();
    /*
     * 转换为指定类型
     */
    return converter.convert(content);
}

在实际使用中,使用以上的转换器也不会 100% 转换成功,与模型选型有非常大的关系。例如,对于深度思考型模型(例如,qwen3:32bdeepseek-r1)在 BeanOutputConverter format 提示语的加持下,仍会返回思考块,返回格式类似如下:

text
<think>
xxx
</think>

json 结果串

对于这种情况我们需要自定义转换器来去除思考块。

如果返回结构体不是 JSON 串,可以使用 JSON-Repair 做 JSON 字符串修复。

自定义类型转换器(思考模式模型)

java
import java.lang.reflect.Type;
import java.util.Objects;
import java.util.regex.Pattern;

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.core.util.DefaultIndenter;
import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.ObjectWriter;
import com.fasterxml.jackson.databind.json.JsonMapper;
import com.github.victools.jsonschema.generator.Option;
import com.github.victools.jsonschema.generator.SchemaGenerator;
import com.github.victools.jsonschema.generator.SchemaGeneratorConfig;
import com.github.victools.jsonschema.generator.SchemaGeneratorConfigBuilder;
import com.github.victools.jsonschema.module.jackson.JacksonModule;
import com.github.victools.jsonschema.module.jackson.JacksonOption;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import org.springframework.ai.converter.StructuredOutputConverter;
import org.springframework.ai.model.KotlinModule;
import org.springframework.ai.util.JacksonUtils;
import org.springframework.core.KotlinDetector;
import org.springframework.core.ParameterizedTypeReference;
import org.springframework.lang.NonNull;

import static org.springframework.ai.util.LoggingMarkers.SENSITIVE_DATA_MARKER;

/**
 * 思考模式模型输出转换器
 * 作用:用于去除返回的 <think> 块中的思考块
 */
public class ThinkModelBeanOutputConverter<T> implements StructuredOutputConverter<T> {

	private final Logger logger = LoggerFactory.getLogger(ThinkModelBeanOutputConverter.class);
	private final Pattern pattern = Pattern.compile("<think>.*?</think>", Pattern.DOTALL);

	private final Type type;
	private final ObjectMapper objectMapper;
	private String jsonSchema;

	public ThinkModelBeanOutputConverter(Class<T> clazz) {
		this(ParameterizedTypeReference.forType(clazz));
	}

	public ThinkModelBeanOutputConverter(ParameterizedTypeReference<T> typeRef) {
		this(typeRef.getType(), null);
	}

	public ThinkModelBeanOutputConverter(Class<T> clazz, ObjectMapper objectMapper) {
		this(ParameterizedTypeReference.forType(clazz), objectMapper);
	}

	public ThinkModelBeanOutputConverter(ParameterizedTypeReference<T> typeRef, ObjectMapper objectMapper) {
		this(typeRef.getType(), objectMapper);
	}

	private ThinkModelBeanOutputConverter(Type type, ObjectMapper objectMapper) {
		Objects.requireNonNull(type, "Type cannot be null;");
		this.type = type;
		this.objectMapper = objectMapper != null ? objectMapper : getObjectMapper();
		generateSchema();
	}

	private void generateSchema() {
		JacksonModule jacksonModule = new JacksonModule(JacksonOption.RESPECT_JSONPROPERTY_REQUIRED,
				JacksonOption.RESPECT_JSONPROPERTY_ORDER);
		SchemaGeneratorConfigBuilder configBuilder = new SchemaGeneratorConfigBuilder(
				com.github.victools.jsonschema.generator.SchemaVersion.DRAFT_2020_12,
				com.github.victools.jsonschema.generator.OptionPreset.PLAIN_JSON)
			.with(jacksonModule)
			.with(Option.FORBIDDEN_ADDITIONAL_PROPERTIES_BY_DEFAULT);

		if (KotlinDetector.isKotlinReflectPresent()) {
			configBuilder.with(new KotlinModule());
		}

		SchemaGeneratorConfig config = configBuilder.build();
		SchemaGenerator generator = new SchemaGenerator(config);
		JsonNode jsonNode = generator.generateSchema(this.type);
		ObjectWriter objectWriter = this.objectMapper.writer(new DefaultPrettyPrinter()
			.withObjectIndenter(new DefaultIndenter().withLinefeed(System.lineSeparator())));
		try {
			this.jsonSchema = objectWriter.writeValueAsString(jsonNode);
		}
		catch (JsonProcessingException e) {
			logger.error("Could not pretty print json schema for jsonNode: {}", jsonNode);
			throw new RuntimeException("Could not pretty print json schema for " + this.type, e);
		}
	}

	@SuppressWarnings("unchecked")
	@Override
	public T convert(@NonNull String text) {
		try {
			// remove <think></think> block
			text = cleanThinkTags(text).trim();

			// Check for and remove triple backticks and "json" identifier
			if (text.startsWith("```") && text.endsWith("```")) {
				// Remove the first line if it contains "```json"
				String[] lines = text.split("\n", 2);
				if (lines[0].trim().equalsIgnoreCase("```json")) {
					text = lines.length > 1 ? lines[1] : "";
				}
				else {
					text = text.substring(3); // Remove leading ```
				}

				// Remove trailing ```
				text = text.substring(0, text.length() - 3);

				// Trim again to remove any potential whitespace
				text = text.trim();
			}
			return (T) this.objectMapper.readValue(text, this.objectMapper.constructType(this.type));
		}
		catch (JsonProcessingException e) {
			logger.error(SENSITIVE_DATA_MARKER,
					"Could not parse the given text to the desired target type: \"{}\" into {}", text, this.type);
			throw new RuntimeException(e);
		}
	}

	private String cleanThinkTags(String response) {
		return pattern.matcher(response).replaceAll("");
	}

	protected ObjectMapper getObjectMapper() {
		return JsonMapper.builder()
			.addModules(JacksonUtils.instantiateAvailableModules())
			.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
			.build();
	}

	@Override
	public String getFormat() {
		String template = """
				Your response should be in JSON format.
				Do not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.
				Do not include markdown code blocks in your response.
				Remove the ```json markdown from the output.
				Here is the JSON Schema instance your output must adhere to:
				```%s```
				""";
		return String.format(template, this.jsonSchema);
	}
}

说明:使用正则匹配出 <think>xxx</think>,删除该返回。

文章的最后,如果您觉得本文对您有用,请打赏一杯咖啡!感谢!