Commit Graph

10 Commits

Author SHA1 Message Date
Georgi Gerganov
91eaa414bf unicode : support \p{N}, \p{L} and \p{P} natively 2024-04-27 17:48:38 +03:00
Georgi Gerganov
ce5485aee0 unicode : always use std::wregex 2024-04-27 17:11:34 +03:00
Georgi Gerganov
2affd0b221 unicode : set bomb 2024-04-27 11:56:02 +03:00
Georgi Gerganov
ad929833cb llama : adapt punctuation regex + add llama 3 regex 2024-04-27 11:06:08 +03:00
Georgi Gerganov
06d3e693db unicode : fix? unicode_wstring_to_utf8 2024-04-26 12:55:11 +03:00
Kazim Abrar Mahi
753580360b Fixed issues 2024-04-26 11:43:29 +03:00
Kazim Abrar Mahi
feeaf4f39c Added needed functionality, testing remains 2024-04-26 11:43:29 +03:00
Kazim Abrar Mahi
7e308ed212 Adding unicode regex function 2024-04-26 11:43:29 +03:00
Kazim Abrar Mahi
a5710a4101 Adding unicode regex mappings 2024-04-26 11:43:29 +03:00
Jared Van Bortel
32c8486e1f wpm : portable unicode tolower (#6305)
Also use C locale for ispunct/isspace, and split unicode-data.cpp from unicode.cpp.
2024-03-26 17:46:21 -04:00