mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-03-17 16:44:07 +00:00
* common : implement parser combinators to simplify chat parsing * add virtual destructor to parser_base * fix memory leak from circular references of rules * implement gbnf grammar building * remove unused private variable * create a base visitor and implement id assignment as a visitor * fix const ref for grammar builder * clean up types, friend classes, and class declarations * remove builder usage from until_parser * Use a counter class to help assign rule ids * cache everything * add short description for each parser * create a type for the root parser * implement repetition parser * Make optional, one_or_more, and zero_or_more subclasses of repetition * improve context constructor * improve until parsing and add benchmarks * remove cached() pattern, cache in parser_base with specialized parsing functions for each parser * improve json parsing performance to better match legacy parsing * fix const auto * it for windows * move id assignment to classes instead of using a visitor * create named rules in the command r7b example * use '.' for any in GBNF * fix parens around choices in gbnf grammar * add convenience operators to turn strings to literals * add free-form operators for const char * to simplify defining literals * simplify test case parser * implement semantic actions * remove groups in favor of actions and a scratchpad * add built in actions for common operations * add actions to command r7b example * use std::default_searcher for platforms that don't have bm * improve parser_type handling and add cast helper * add partial result type to better control when to run actions * fix bug in until() * run actions on partial results by default * use common_chat_msg for result * add qwen3 example wip * trash partial idea and simplify * move action arguments to a struct * implement aho-corasick matcher for until_parser and to build exclusion grammars * use std::string for input, since std::string_view is incompatible with std::regex * Refactor tests * improve qwen3 example * implement sax-style parsing and refactor * fix json string in test * rename classes to use common_chat_ prefix * remove is_ suffix from functions * rename from id_counter to just counter * Final refactored tests * Fix executable name and editorconfig-checker * Third time's the charm... * add trigger parser to begin lazy grammar rule generation * working lazy grammar * refactor json rules now that we check for reachability * reduce pointer usage * print out grammars in example * rename to chat-peg-parser* and common_chat_peg_parser* * Revert unrelated changes * New macros for CMakeLists to enable multi-file compilations * starting unicode support * add unicode support to char_parser * use unparsed args as additional sources * Refactor tests to new harness * Fix CMakeLists * fix rate calculation * add unicode tests * fix trailing whitespace and line endings skip-checks: true * Helpers + rewrite qwen3 with helpers * Fix whitespace * extract unicode functions to separate file * refactor parse unicode function * fix compiler error * improve construction of sequence/choice parsers * be less clever * add make_parser helper function * expand usage of make_parser, alias common_chat_msg_peg_parser_builder to builder in source * lower bench iterations * add unicode support to until_parser * add unicode support to json_string_parser * clean up unicode tests * reduce unicode details to match src/unicode.cpp * simplify even further * remove unused functions * fix type * reformat char class parsing * clean up json string parser * clean up + fix diagnostics * reorder includes * compact builder functions * replace action_parser with capture_parser, rename env to semantics * rename env to semantics * clean up common_chat_parse_context * move type() to below constant * use default constructor for common_chat_peg_parser * make all operators functions for consistency * fix compilation errors in test-optional.cpp * simplify result values * rename json_string_unquoted to json_string_content * Move helper to separate class, add separate explicit and helper classes * Whitespace * Change + to append() * Reformat * Add extra helpers, tests and Minimax example * Add some extra optional debugging prints + real example of how to use them * fix bug in repetitions when min_count = 0 reports failures * dump rule in debug * fix token accumulation and assert parsing never fails * indent debug by depth * use LOG_* in tests so logs sync up with test logs * - Add selective testing - Refactor all messaging to use LOG_ERR - Fix lack of argument / tool name capturing - Temporary fix for double event capture * refactor rule() and introduce ref() * clean up visitor * clean up indirection in root parser w.r.t rules * store shared ptr directly in parser classes * replace aho-corasick automation with a simple trie * Reset prev for qwen3 helper example variant * refactor to use value semantics with std::variant/std::visit * simplify trie_matcher result * fix linting issues * add annotations to rules * revert test workaround * implement serializing the parser * remove redundant parsers * remove tests * gbnf generation fixes * remove LOG_* use in tests * update gbnf tests to test entire grammar * clean up gbnf generation and fix a few bugs * fix typo in test output * remove implicit conversion rules * improve test output * rename trie_matcher to trie * simplify trie to just know if a node is the end of a word * remove common_chat_ prefix and ensure a common_peg_ prefix to all types * rename chat-peg-parser -> peg-parser * promote chat-peg-parser-helper to chat-peg-parser * checkpoint * use a static_assert to ensure we handle every branch * inline trivial peg parser builders * use json strings for now * implement basic and native chat peg parser builders/extractors * resolve refs to their rules * remove packrat caching (for now) * update tests * compare parsers with incremental input * benchmark both complete and incremental parsing * add raw string generation from json schema * add support for string schemas in gbnf generation * fix qwen example to include \n * tidy up example * rename extractor to mapper * rename ast_arena to ast * place basic tests into one * use gbnf_format_literal from json-schema-to-grammar * integrate parser with common/chat and server * clean up schema and serialization * add json-schema raw string tests * clean up json creation and remove capture parser * trim spaces from reasoning and content * clean up redundant rules and comments * rename input_is_complete to is_partial to match rest of project * simplify json rules * remove extraneous file * remove comment * implement += and |= operators * add comments to qwen3 implementation * reorder arguments to common_chat_peg_parse * remove commented outdated tests * add explicit copy constructor * fix operators and constness * wip: update test-chat for qwen3-coder * bring json parser closer to json-schema-to-grammar rules * trim trailing space for most things * fix qwen3 coder rules w.r.t. trailing spaces * group rules * do not trim trailing space from string args * tweak spacing of qwen3 grammar * update qwen3-coder tests * qwen3-coder small fixes * place parser in common_chat_syntax to simplify invocation * use std::set to collect rules to keep order predictable for tests * initialize parser to make certain platforms happy * revert back to std::unordered_set, sort rule names at the end instead * uncomment rest of chat tests * define explicit default constructor * improve arena init and server integration * fix chat test * add json_member() * add a comprehensive native example * clean up example qwen test and add response_format example to native test * make build_peg_parser accept std::function instead of template * change peg parser parameters into const ref * push tool call on tool open for constructed parser * add parsing documentation * clean up some comments * add json schema support to qwen3-coder * add id initializer in tests * remove grammar debug line from qwen3-coder * refactor qwen3-coder to use sequence over operators * only call common_chat_peg_parse if appropriate format * simplify qwen3-coder space handling * revert qwen3-coder implementation * revert json-schema-to-grammar changes * remove unnecessary forward declaration * small adjustment to until_parser * rename C/C++ files to use dashes * codeowners : add aldehir to peg-parser and related files --------- Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>
234 lines
9.3 KiB
C++
234 lines
9.3 KiB
C++
// Chat support (incl. tool call grammar constraining & output parsing) w/ generic & custom template handlers.
|
|
|
|
#pragma once
|
|
|
|
#include "common.h"
|
|
#include "peg-parser.h"
|
|
#include <functional>
|
|
#include <chrono>
|
|
#include <string>
|
|
#include <vector>
|
|
#include <map>
|
|
|
|
struct common_chat_templates;
|
|
|
|
struct common_chat_tool_call {
|
|
std::string name;
|
|
std::string arguments;
|
|
std::string id;
|
|
|
|
bool operator==(const common_chat_tool_call & other) const {
|
|
return name == other.name && arguments == other.arguments && id == other.id;
|
|
}
|
|
};
|
|
|
|
struct common_chat_msg_content_part {
|
|
std::string type;
|
|
std::string text;
|
|
|
|
bool operator==(const common_chat_msg_content_part & other) const {
|
|
return type == other.type && text == other.text;
|
|
}
|
|
};
|
|
|
|
struct common_chat_msg {
|
|
std::string role;
|
|
std::string content;
|
|
std::vector<common_chat_msg_content_part> content_parts;
|
|
std::vector<common_chat_tool_call> tool_calls;
|
|
std::string reasoning_content;
|
|
std::string tool_name;
|
|
std::string tool_call_id;
|
|
|
|
template <class T> T to_json_oaicompat() const;
|
|
|
|
bool empty() const {
|
|
return content.empty() && content_parts.empty() && tool_calls.empty() && reasoning_content.empty() && tool_name.empty() && tool_call_id.empty();
|
|
}
|
|
void set_tool_call_ids(std::vector<std::string> & ids_cache, const std::function<std::string()> & gen_tool_call_id) {
|
|
for (auto i = 0u; i < tool_calls.size(); i++) {
|
|
if (ids_cache.size() <= i) {
|
|
auto id = tool_calls[i].id;
|
|
if (id.empty()) {
|
|
id = gen_tool_call_id();
|
|
}
|
|
ids_cache.push_back(id);
|
|
}
|
|
tool_calls[i].id = ids_cache[i];
|
|
}
|
|
}
|
|
bool operator==(const common_chat_msg & other) const {
|
|
return role == other.role
|
|
&& content == other.content
|
|
&& content_parts == other.content_parts
|
|
&& tool_calls == other.tool_calls
|
|
&& reasoning_content == other.reasoning_content
|
|
&& tool_name == other.tool_name
|
|
&& tool_call_id == other.tool_call_id;
|
|
}
|
|
bool operator!=(const common_chat_msg & other) const {
|
|
return !(*this == other);
|
|
}
|
|
};
|
|
|
|
struct common_chat_msg_diff {
|
|
std::string reasoning_content_delta;
|
|
std::string content_delta;
|
|
size_t tool_call_index = std::string::npos;
|
|
common_chat_tool_call tool_call_delta;
|
|
|
|
static std::vector<common_chat_msg_diff> compute_diffs(const common_chat_msg & previous_msg, const common_chat_msg & new_msg);
|
|
|
|
bool operator==(const common_chat_msg_diff & other) const {
|
|
return content_delta == other.content_delta
|
|
&& tool_call_index == other.tool_call_index
|
|
&& tool_call_delta == other.tool_call_delta;
|
|
}
|
|
};
|
|
|
|
struct common_chat_tool {
|
|
std::string name;
|
|
std::string description;
|
|
std::string parameters;
|
|
};
|
|
|
|
enum common_chat_tool_choice {
|
|
COMMON_CHAT_TOOL_CHOICE_AUTO,
|
|
COMMON_CHAT_TOOL_CHOICE_REQUIRED,
|
|
COMMON_CHAT_TOOL_CHOICE_NONE,
|
|
};
|
|
|
|
enum common_chat_format {
|
|
COMMON_CHAT_FORMAT_CONTENT_ONLY,
|
|
COMMON_CHAT_FORMAT_GENERIC,
|
|
COMMON_CHAT_FORMAT_MISTRAL_NEMO,
|
|
COMMON_CHAT_FORMAT_MAGISTRAL,
|
|
COMMON_CHAT_FORMAT_LLAMA_3_X,
|
|
COMMON_CHAT_FORMAT_LLAMA_3_X_WITH_BUILTIN_TOOLS,
|
|
COMMON_CHAT_FORMAT_DEEPSEEK_R1,
|
|
COMMON_CHAT_FORMAT_FIREFUNCTION_V2,
|
|
COMMON_CHAT_FORMAT_FUNCTIONARY_V3_2,
|
|
COMMON_CHAT_FORMAT_FUNCTIONARY_V3_1_LLAMA_3_1,
|
|
COMMON_CHAT_FORMAT_DEEPSEEK_V3_1,
|
|
COMMON_CHAT_FORMAT_HERMES_2_PRO,
|
|
COMMON_CHAT_FORMAT_COMMAND_R7B,
|
|
COMMON_CHAT_FORMAT_GRANITE,
|
|
COMMON_CHAT_FORMAT_GPT_OSS,
|
|
COMMON_CHAT_FORMAT_SEED_OSS,
|
|
COMMON_CHAT_FORMAT_NEMOTRON_V2,
|
|
COMMON_CHAT_FORMAT_APERTUS,
|
|
COMMON_CHAT_FORMAT_LFM2_WITH_JSON_TOOLS,
|
|
COMMON_CHAT_FORMAT_GLM_4_5,
|
|
COMMON_CHAT_FORMAT_MINIMAX_M2,
|
|
COMMON_CHAT_FORMAT_KIMI_K2,
|
|
COMMON_CHAT_FORMAT_QWEN3_CODER_XML,
|
|
COMMON_CHAT_FORMAT_APRIEL_1_5,
|
|
COMMON_CHAT_FORMAT_XIAOMI_MIMO,
|
|
|
|
// These are intended to be parsed by the PEG parser
|
|
COMMON_CHAT_FORMAT_PEG_SIMPLE,
|
|
COMMON_CHAT_FORMAT_PEG_NATIVE,
|
|
COMMON_CHAT_FORMAT_PEG_CONSTRUCTED,
|
|
|
|
COMMON_CHAT_FORMAT_COUNT, // Not a format, just the # formats
|
|
};
|
|
|
|
struct common_chat_templates_inputs {
|
|
std::vector<common_chat_msg> messages;
|
|
std::string grammar;
|
|
std::string json_schema;
|
|
bool add_generation_prompt = true;
|
|
bool use_jinja = true;
|
|
// Parameters below only supported when use_jinja is true
|
|
std::vector<common_chat_tool> tools;
|
|
common_chat_tool_choice tool_choice = COMMON_CHAT_TOOL_CHOICE_AUTO;
|
|
bool parallel_tool_calls = false;
|
|
common_reasoning_format reasoning_format = COMMON_REASONING_FORMAT_NONE;
|
|
bool enable_thinking = true;
|
|
std::chrono::system_clock::time_point now = std::chrono::system_clock::now();
|
|
std::map<std::string, std::string> chat_template_kwargs;
|
|
bool add_bos = false;
|
|
bool add_eos = false;
|
|
};
|
|
|
|
struct common_chat_params {
|
|
common_chat_format format = COMMON_CHAT_FORMAT_CONTENT_ONLY;
|
|
std::string prompt;
|
|
std::string grammar;
|
|
bool grammar_lazy = false;
|
|
bool thinking_forced_open = false;
|
|
std::vector<common_grammar_trigger> grammar_triggers;
|
|
std::vector<std::string> preserved_tokens;
|
|
std::vector<std::string> additional_stops;
|
|
std::string parser;
|
|
};
|
|
|
|
struct common_chat_syntax {
|
|
common_chat_format format = COMMON_CHAT_FORMAT_CONTENT_ONLY;
|
|
common_reasoning_format reasoning_format = COMMON_REASONING_FORMAT_NONE;
|
|
// Whether reasoning_content should be inlined in the content (e.g. for reasoning_format=deepseek in stream mode)
|
|
bool reasoning_in_content = false;
|
|
bool thinking_forced_open = false;
|
|
bool parse_tool_calls = true;
|
|
common_peg_arena parser = {};
|
|
};
|
|
|
|
// Check if the template supplied via "--chat-template" is supported or not. Returns true if it's valid
|
|
bool common_chat_verify_template(const std::string & tmpl, bool use_jinja);
|
|
|
|
void common_chat_templates_free(struct common_chat_templates * tmpls);
|
|
|
|
struct common_chat_templates_deleter { void operator()(common_chat_templates * tmpls) { common_chat_templates_free(tmpls); } };
|
|
|
|
typedef std::unique_ptr<struct common_chat_templates, common_chat_templates_deleter> common_chat_templates_ptr;
|
|
|
|
common_chat_templates_ptr common_chat_templates_init(
|
|
const struct llama_model * model,
|
|
const std::string & chat_template_override,
|
|
const std::string & bos_token_override = "",
|
|
const std::string & eos_token_override = "");
|
|
|
|
bool common_chat_templates_was_explicit(const struct common_chat_templates * tmpls);
|
|
const char * common_chat_templates_source(const struct common_chat_templates * tmpls, const char * variant = nullptr);
|
|
|
|
|
|
struct common_chat_params common_chat_templates_apply(
|
|
const struct common_chat_templates * tmpls,
|
|
const struct common_chat_templates_inputs & inputs);
|
|
|
|
// Format single message, while taking into account the position of that message in chat history
|
|
std::string common_chat_format_single(
|
|
const struct common_chat_templates * tmpls,
|
|
const std::vector<common_chat_msg> & past_msg,
|
|
const common_chat_msg & new_msg,
|
|
bool add_ass,
|
|
bool use_jinja);
|
|
|
|
// Returns an example of formatted chat
|
|
std::string common_chat_format_example(
|
|
const struct common_chat_templates * tmpls,
|
|
bool use_jinja,
|
|
const std::map<std::string, std::string> & chat_template_kwargs);
|
|
|
|
const char* common_chat_format_name(common_chat_format format);
|
|
const char* common_reasoning_format_name(common_reasoning_format format);
|
|
common_reasoning_format common_reasoning_format_from_name(const std::string & format);
|
|
common_chat_msg common_chat_parse(const std::string & input, bool is_partial, const common_chat_syntax & syntax);
|
|
common_chat_msg common_chat_peg_parse(const common_peg_arena & parser, const std::string & input, bool is_partial, const common_chat_syntax & syntax);
|
|
|
|
common_chat_tool_choice common_chat_tool_choice_parse_oaicompat(const std::string & tool_choice);
|
|
|
|
bool common_chat_templates_support_enable_thinking(const common_chat_templates * chat_templates);
|
|
|
|
// Parses a JSON array of messages in OpenAI's chat completion API format.
|
|
// T can be std::string containing JSON or nlohmann::ordered_json
|
|
template <class T> std::vector<common_chat_msg> common_chat_msgs_parse_oaicompat(const T & messages);
|
|
template <class T> T common_chat_msgs_to_json_oaicompat(const std::vector<common_chat_msg> & msgs, bool concat_typed_text = false);
|
|
|
|
// Parses a JSON array of tools in OpenAI's chat completion tool call API format.
|
|
// T can be std::string containing JSON or nlohmann::ordered_json
|
|
template <class T> std::vector<common_chat_tool> common_chat_tools_parse_oaicompat(const T & tools);
|
|
template <class T> T common_chat_tools_to_json_oaicompat(const std::vector<common_chat_tool> & tools);
|
|
|
|
template <class T> T common_chat_msg_diff_to_json_oaicompat(const common_chat_msg_diff & diff);
|