{"id":1277,"date":"2024-05-01T09:34:06","date_gmt":"2024-05-01T09:34:06","guid":{"rendered":"https:\/\/www.nicekj.com\/?p=1277"},"modified":"2024-05-01T09:34:34","modified_gmt":"2024-05-01T09:34:34","slug":"chatglm26bmoxingzai9ntritonzhongbushubingjichengzhilangchainshijianjingdongyunjishutuandui","status":"publish","type":"post","link":"https:\/\/www.nicekj.com\/chatglm26bmoxingzai9ntritonzhongbushubingjichengzhilangchainshijianjingdongyunjishutuandui.html","title":{"rendered":"chatglm2-6b\u6a21\u578b\u57289n-triton\u4e2d\u90e8\u7f72\u5e76\u96c6\u6210\u81f3langchain\u5b9e\u8df5 | \u4eac\u4e1c\u4e91\u6280\u672f\u56e2\u961f"},"content":{"rendered":"<h1 data-id=\"heading-0\">\u4e00.\u524d\u8a00<\/h1>\n<p>\u8fd1\u671f\uff0c ChatGLM-6B \u7684\u7b2c\u4e8c\u4ee3\u7248\u672cChatGLM2-6B\u5df2\u7ecf\u6b63\u5f0f\u53d1\u5e03\uff0c\u5f15\u5165\u4e86\u5982\u4e0b\u65b0\u7279\u6027\uff1a<\/p>\n<p>\u2460. \u57fa\u5ea7\u6a21\u578b\u5347\u7ea7\uff0c\u6027\u80fd\u66f4\u5f3a\u5927\uff0c\u5728\u4e2d\u6587C-Eval\u699c\u5355\u4e2d\uff0c\u4ee551.7\u5206\u4f4d\u5217\u7b2c6\uff1b<\/p>\n<p>\u2461. \u652f\u63018K-32k\u7684\u4e0a\u4e0b\u6587\uff1b<\/p>\n<p>\u2462. \u63a8\u7406\u6027\u80fd\u63d0\u5347\u4e8642%\uff1b<\/p>\n<p>\u2463. \u5bf9\u5b66\u672f\u7814\u7a76\u5b8c\u5168\u5f00\u653e\uff0c\u5141\u8bb8\u7533\u8bf7\u5546\u7528\u6388\u6743\u3002<\/p>\n<p>\u76ee\u524d\u5927\u591a\u6570\u90e8\u7f72\u65b9\u6848\u91c7\u7528\u7684\u662ffastapi+uvicorn+transformers\uff0c\u8fd9\u79cd\u65b9\u5f0f\u9002\u5408\u5feb\u901f\u8fd0\u884c\u4e00\u4e9bdemo\uff0c\u5728\u751f\u4ea7\u73af\u5883\u4e2d\u4f7f\u7528\u8fd8\u662f\u63a8\u8350\u4f7f\u7528\u4e13\u95e8\u7684\u6df1\u5ea6\u5b66\u4e60\u63a8\u7406\u670d\u52a1\u6846\u67b6\uff0c\u5982Triton\u3002\u672c\u6587\u5c06\u4ecb\u7ecd\u6211\u5229\u7528\u96c6\u56e29n-triton\u5de5\u5177\u90e8\u7f72ChatGLM2-6B\u8fc7\u7a0b\u4e2d\u8e29\u8fc7\u7684\u4e00\u4e9b\u5751\uff0c\u5e0c\u671b\u53ef\u4ee5\u4e3a\u6709\u90e8\u7f72\u9700\u6c42\u7684\u540c\u5b66\u63d0\u4f9b\u4e00\u4e9b\u5e2e\u52a9\u3002<\/p>\n<h1 data-id=\"heading-1\">\u4e8c.\u786c\u4ef6\u8981\u6c42<\/h1>\n<p>\u90e8\u7f72\u7684\u786c\u4ef6\u8981\u6c42\u53ef\u4ee5\u53c2\u8003\u5982\u4e0b\uff1a<\/p>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<table><thead><tr><th>\u91cf\u5316\u7b49\u7ea7<\/th><th>\u7f16\u7801 2048 \u957f\u5ea6\u7684\u6700\u5c0f\u663e\u5b58<\/th><th>\u751f\u6210 8192 \u957f\u5ea6\u7684\u6700\u5c0f\u663e\u5b58<\/th><\/tr><\/thead><tbody><tr><td>FP16 \/ BF16<\/td><td>13.1 GB<\/td><td>12.8 GB<\/td><\/tr><tr><td>INT8<\/td><td>8.2 GB<\/td><td>8.1 GB<\/td><\/tr><tr><td>INT4<\/td><td>5.5 GB<\/td><td>5.1 GB<\/td><\/tr><\/tbody><\/table>\n<p>\u6211\u90e8\u7f72\u4e862\u4e2apod\uff0c\u6bcf\u4e2apod\u7684\u8d44\u6e90\uff1aCPU\uff084\u6838\uff09\u3001\u5185\u5b58\uff0830G\uff09\u30011\u5f20P40\u663e\u5361\uff08\u663e\u5b5824G\uff09\u3002<\/p>\n<h1 data-id=\"heading-2\">\u4e09.\u90e8\u7f72\u5b9e\u8df5<\/h1>\n<p>Triton\u9ed8\u8ba4\u652f\u6301\u7684PyTorch\u6a21\u578b\u683c\u5f0f\u4e3aTorchScript\uff0c\u7531\u4e8eChatGLM2-6B\u6a21\u578b\u8f6c\u6362\u6210TorchScript\u683c\u5f0f\u4f1a\u62a5\u9519\uff0c\u672c\u6587\u5c06\u4ee5Python Backend\u7684\u65b9\u5f0f\u8fdb\u884c\u90e8\u7f72\u3002<\/p>\n<h2 data-id=\"heading-3\">1. \u6a21\u578b\u76ee\u5f55\u7ed3\u6784<\/h2>\n<p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.nicekj.com\/wp-content\/uploads\/replace\/a1ad831c4ca02624c497852892dac1e8.png\" alt=\"chatglm2-6b\u6a21\u578b\u57289n-triton\u4e2d\u90e8\u7f72\u5e76\u96c6\u6210\u81f3langchain\u5b9e\u8df5 | \u4eac\u4e1c\u4e91\u6280\u672f\u56e2\u961f\" \/><\/figure>\n<\/p>\n<p>9N-Triton\u4f7f\u7528\u96c6\u6210\u6a21\u578b\uff0c\u5982\u4e0a\u56fe\u6240\u793a\u6a21\u578b\u4ed3\u5e93(model_repository), \u5b83\u5185\u90e8\u53ef\u4ee5\u5305\u542b\u4e00\u4e2a\u6216\u591a\u4e2a\u5b50\u6a21\u578b\uff08\u5982chatglm2-6b)\u3002\u4e0b\u9762\u5bf9\u5404\u4e2a\u90e8\u5206\u8fdb\u884c\u5c55\u5f00\u4ecb\u7ecd\uff1a<\/p>\n<h2 data-id=\"heading-4\">2. python\u6267\u884c\u73af\u5883<\/h2>\n<p>\u8be5\u90e8\u5206\u4e3a\u6a21\u578b\u63a8\u7406\u65f6\u9700\u8981\u7684\u76f8\u5173python\u4f9d\u8d56\u5305\uff0c\u53ef\u4ee5\u4f7f\u7528conda-pack\u5c06conda\u865a\u62df\u73af\u5883\u6253\u5305\uff0c\u5982python-3-8.tar.gz\u3002\u5982\u5bf9\u6253\u5305conda\u73af\u5883\u4e0d\u719f\u6089\u7684\uff0c\u53ef\u4ee5\u53c2\u8003 <a href=\"https:\/\/link.juejin.cn?target=https%3A%2F%2Fconda.github.io%2Fconda-pack%2F\" target=\"_blank\" title=\"https:\/\/conda.github.io\/conda-pack\/\" ref=\"nofollow noopener noreferrer\" rel=\"noopener\">conda.github.io\/conda-pack\/<\/a>\u3002\u7136\u540e\u5728config.pbtxt\u4e2d\u914d\u7f6e\u6267\u884c\u73af\u5883\u8def\u5f84\uff1a<\/p>\n<pre><\/div><div class=\"code-block-extension-headerRight\"><span class=\"code-block-extension-lang\">css<\/span><div class=\"code-block-extension-copyCodeBtn\">\u590d\u5236\u4ee3\u7801<\/div><\/div><\/div><code class=\"hljs language-css code-block-extension-codeShowNum\" lang=\"css\"><span class=\"code-block-extension-codeLine\" data-line-num=\"1\">parameters: {<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"2\">  key: <span class=\"hljs-string\">\"EXECUTION_ENV_PATH\"<\/span>,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"3\">  value: {string_value: <span class=\"hljs-string\">\"$$TRITON_MODEL_DIRECTORY\/..\/python-3-8.tar.gz\"<\/span>}<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"4\">}<\/span>\n<\/code><\/pre>\n<p>\u5728\u5f53\u524d\u793a\u4f8b\u4e2d\uff0c$$TRITON_MODEL_DIRECTORY=&#8221;$pwd\/model_repository\/chatglm2-6b&#8221;\u3002<\/p>\n<p>\u6ce8\u610f\uff1a\u5f53\u524dpython\u6267\u884c\u73af\u5883\u4e3a\u6240\u6709\u5b50\u6a21\u578b\u5171\u4eab\uff0c\u5982\u679c\u60f3\u7ed9\u4e0d\u540c\u5b50\u6a21\u578b\u6307\u5b9a\u4e0d\u540c\u7684\u6267\u884c\u73af\u5883\uff0c\u5219\u5e94\u8be5\u5c06tar.gz\u6587\u4ef6\u653e\u5728\u5b50\u6a21\u578b\u76ee\u5f55\u4e0b\uff0c\u5982\u4e0b\u6240\u793a\uff1a<\/p>\n<p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.nicekj.com\/wp-content\/uploads\/replace\/0a3d16ecafef1debaa73270e75b1bb26.png\" alt=\"chatglm2-6b\u6a21\u578b\u57289n-triton\u4e2d\u90e8\u7f72\u5e76\u96c6\u6210\u81f3langchain\u5b9e\u8df5 | \u4eac\u4e1c\u4e91\u6280\u672f\u56e2\u961f\" \/><\/figure>\n<\/p>\n<p>\u540c\u65f6\uff0c\u5728config.pbtxt\u4e2d\u914d\u7f6e\u6267\u884c\u73af\u5883\u8def\u5f84\u5982\u4e0b\uff1a<\/p>\n<pre><\/div><div class=\"code-block-extension-headerRight\"><span class=\"code-block-extension-lang\">css<\/span><div class=\"code-block-extension-copyCodeBtn\">\u590d\u5236\u4ee3\u7801<\/div><\/div><\/div><code class=\"hljs language-css code-block-extension-codeShowNum\" lang=\"css\"><span class=\"code-block-extension-codeLine\" data-line-num=\"1\">parameters: {<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"2\">  key: <span class=\"hljs-string\">\"EXECUTION_ENV_PATH\"<\/span>,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"3\">  value: {string_value: <span class=\"hljs-string\">\"$$TRITON_MODEL_DIRECTORY\/python-3-8.tar.gz\"<\/span>}<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"4\">}<\/span>\n<\/code><\/pre>\n<h2 data-id=\"heading-5\">3. \u6a21\u578b\u914d\u7f6e\u6587\u4ef6<\/h2>\n<p>\u6a21\u578b\u4ed3\u5e93\u5e93\u4e2d\u7684\u6bcf\u4e2a\u6a21\u578b\u90fd\u5fc5\u987b\u5305\u542b\u4e00\u4e2a\u6a21\u578b\u914d\u7f6e\u6587\u4ef6config.pbtxt\uff0c\u7528\u4e8e\u6307\u5b9a\u5e73\u53f0\u548c\u6216\u540e\u7aef\u5c5e\u6027\u3001max_batch_size \u5c5e\u6027\u4ee5\u53ca\u6a21\u578b\u7684\u8f93\u5165\u548c\u8f93\u51fa\u5f20\u91cf\u7b49\u3002ChatGLM2-6B\u7684\u914d\u7f6e\u6587\u4ef6\u53ef\u4ee5\u53c2\u8003\u5982\u4e0b\uff1a<\/p>\n<pre><\/div><div class=\"code-block-extension-headerRight\"><span class=\"code-block-extension-lang\">yaml<\/span><div class=\"code-block-extension-copyCodeBtn\">\u590d\u5236\u4ee3\u7801<\/div><\/div><\/div><code class=\"hljs language-yaml code-block-extension-codeShowNum\" lang=\"yaml\"><span class=\"code-block-extension-codeLine\" data-line-num=\"1\"><span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">\"chatglm2-6b\"<\/span> <span class=\"hljs-string\">\/\/<\/span> <span class=\"hljs-string\">\u5fc5\u586b\uff0c\u6a21\u578b\u540d\uff0c\u9700\u4e0e\u8be5\u5b50\u6a21\u578b\u7684\u6587\u4ef6\u5939\u540d\u5b57\u76f8\u540c<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"2\"><span class=\"hljs-attr\">backend:<\/span> <span class=\"hljs-string\">\"python\"<\/span> <span class=\"hljs-string\">\/\/<\/span> <span class=\"hljs-string\">\u5fc5\u586b\uff0c\u6a21\u578b\u6240\u4f7f\u7528\u7684\u540e\u7aef\u5f15\u64ce<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"3\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"4\"><span class=\"hljs-attr\">max_batch_size:<\/span> <span class=\"hljs-number\">0<\/span> <span class=\"hljs-string\">\/\/<\/span> <span class=\"hljs-string\">\u6a21\u578b\u6bcf\u6b21\u8bf7\u6c42\u6700\u5927\u7684\u6279\u6570\u636e\u91cf\uff0c\u5f20\u91cfshape\u7531max_batch_size\u548cdims\u7ec4\u5408\u6307\u5b9a\uff0c\u5bf9\u4e8e<\/span> <span class=\"hljs-string\">max_batch_size<\/span> <span class=\"hljs-string\">\u5927\u4e8e<\/span> <span class=\"hljs-number\">0<\/span> <span class=\"hljs-string\">\u7684\u6a21\u578b\uff0c\u5b8c\u6574\u5f62\u72b6\u5f62\u6210\u4e3a<\/span> [ <span class=\"hljs-number\">-1<\/span> ] <span class=\"hljs-string\">+<\/span> <span class=\"hljs-string\">dims\u3002<\/span> <span class=\"hljs-string\">\u5bf9\u4e8e<\/span> <span class=\"hljs-string\">max_batch_size<\/span> <span class=\"hljs-string\">\u7b49\u4e8e<\/span> <span class=\"hljs-number\">0<\/span> <span class=\"hljs-string\">\u7684\u6a21\u578b\uff0c\u5b8c\u6574\u5f62\u72b6\u5f62\u6210\u4e3a<\/span> <span class=\"hljs-string\">dims\u3002<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"5\"><span class=\"hljs-string\">input<\/span> [ <span class=\"hljs-string\">\/\/<\/span> <span class=\"hljs-string\">\u5fc5\u586b\uff0c\u8f93\u5165\u5b9a\u4e49<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"6\">  {<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"7\">    <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">\"prompt\"<\/span> <span class=\"hljs-string\">\/\/\u5fc5\u586b\uff0c\u540d\u79f0<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"8\">    <span class=\"hljs-attr\">data_type:<\/span> <span class=\"hljs-string\">TYPE_STRING<\/span> <span class=\"hljs-string\">\/\/\u5fc5\u586b\uff0c\u6570\u636e\u7c7b\u578b<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"9\">    <span class=\"hljs-attr\">dims:<\/span> [ <span class=\"hljs-number\">-1<\/span> ] <span class=\"hljs-string\">\/\/\u5fc5\u586b\uff0c\u6570\u636e\u7ef4\u5ea6\uff0c-1<\/span> <span class=\"hljs-string\">\u8868\u793a\u53ef\u53d8\u7ef4\u5ea6<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"10\">  },<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"11\">  {<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"12\">    <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">\"history\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"13\">    <span class=\"hljs-attr\">data_type:<\/span> <span class=\"hljs-string\">TYPE_STRING<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"14\">    <span class=\"hljs-attr\">dims:<\/span> [ <span class=\"hljs-number\">-1<\/span> ]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"15\">  },<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"16\">  {<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"17\">    <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">\"temperature\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"18\">    <span class=\"hljs-attr\">data_type:<\/span> <span class=\"hljs-string\">TYPE_STRING<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"19\">    <span class=\"hljs-attr\">dims:<\/span> [ <span class=\"hljs-number\">-1<\/span> ]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"20\">  },<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"21\">  {<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"22\">    <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">\"max_token\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"23\">    <span class=\"hljs-attr\">data_type:<\/span> <span class=\"hljs-string\">TYPE_STRING<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"24\">    <span class=\"hljs-attr\">dims:<\/span> [ <span class=\"hljs-number\">-1<\/span> ]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"25\">  },<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"26\">  {<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"27\">    <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">\"history_len\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"28\">    <span class=\"hljs-attr\">data_type:<\/span> <span class=\"hljs-string\">TYPE_STRING<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"29\">    <span class=\"hljs-attr\">dims:<\/span> [ <span class=\"hljs-number\">-1<\/span> ]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"30\">  }<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"31\">]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"32\"><span class=\"hljs-string\">output<\/span> [ <span class=\"hljs-string\">\/\/\u5fc5\u586b\uff0c\u8f93\u51fa\u5b9a\u4e49<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"33\">  {<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"34\">    <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">\"response\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"35\">    <span class=\"hljs-attr\">data_type:<\/span> <span class=\"hljs-string\">TYPE_STRING<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"36\">    <span class=\"hljs-attr\">dims:<\/span> [ <span class=\"hljs-number\">-1<\/span> ]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"37\">  },<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"38\">  {<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"39\">    <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">\"history\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"40\">    <span class=\"hljs-attr\">data_type:<\/span> <span class=\"hljs-string\">TYPE_STRING<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"41\">    <span class=\"hljs-attr\">dims:<\/span> [ <span class=\"hljs-number\">-1<\/span> ]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"42\">  }<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"43\">]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"44\"><span class=\"hljs-attr\">parameters:<\/span> { <span class=\"hljs-string\">\/\/\u6307\u5b9apython\u6267\u884c\u73af\u5883<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"45\">  <span class=\"hljs-attr\">key:<\/span> <span class=\"hljs-string\">\"EXECUTION_ENV_PATH\"<\/span>,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"46\">  <span class=\"hljs-attr\">value:<\/span> {<span class=\"hljs-attr\">string_value:<\/span> <span class=\"hljs-string\">\"$$TRITON_MODEL_DIRECTORY\/..\/python-3-8.tar.gz\"<\/span>}<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"47\">}<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"48\"><span class=\"hljs-string\">instance_group<\/span> [ <span class=\"hljs-string\">\/\/\u6a21\u578b\u5b9e\u4f8b\u7ec4<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"49\">  { <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"50\">      <span class=\"hljs-attr\">count:<\/span> <span class=\"hljs-number\">1<\/span>  <span class=\"hljs-string\">\/\/\u5b9e\u4f8b\u6570\u91cf<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"51\">      <span class=\"hljs-attr\">kind:<\/span> <span class=\"hljs-string\">KIND_GPU<\/span>  <span class=\"hljs-string\">\/\/\u5b9e\u4f8b\u7c7b\u578b<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"52\">      <span class=\"hljs-attr\">gpus:<\/span> [ <span class=\"hljs-number\">0<\/span> ]  <span class=\"hljs-string\">\/\/\u6307\u5b9a\u5b9e\u4f8b\u53ef\u7528\u7684GPU\u7d22\u5f15<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"53\">  }<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"54\">]<\/span>\n<\/code><\/pre>\n<p>\u5176\u4e2d\u5fc5\u586b\u9879\u4e3a\u6700\u5c0f\u6a21\u578b\u914d\u7f6e\uff0c\u6a21\u578b\u914d\u7f6e\u6587\u4ef6\u66f4\u591a\u4fe1\u606f\u53ef\u4ee5\u53c2\u8003\uff1a <a href=\"https:\/\/link.juejin.cn?target=https%3A%2F%2Fgithub.com%2Ftriton-inference-server%2Fserver%2Fblob%2Fr22.04%2Fdocs%2Fmodel_configuration.md\" target=\"_blank\" title=\"https:\/\/github.com\/triton-inference-server\/server\/blob\/r22.04\/docs\/model_configuration.md\" ref=\"nofollow noopener noreferrer\" rel=\"noopener\">github.com\/triton-infe\u2026<\/a><\/p>\n<h2 data-id=\"heading-6\">4. \u81ea\u5b9a\u4e49python backend<\/h2>\n<p>\u4e3b\u8981\u9700\u8981\u5b9e\u73b0model.py \u4e2d\u63d0\u4f9b\u7684\u4e09\u4e2a\u63a5\u53e3\uff1a<\/p>\n<p>\u2460. initialize: \u521d\u59cb\u5316\u8be5Python\u6a21\u578b\u65f6\u4f1a\u8fdb\u884c\u8c03\u7528\uff0c\u4e00\u822c\u6267\u884c\u83b7\u53d6\u8f93\u51fa\u4fe1\u606f\u53ca\u521b\u5efa\u6a21\u578b\u7684\u64cd\u4f5c<\/p>\n<p>\u2461. execute: python\u6a21\u578b\u63a5\u6536\u8bf7\u6c42\u65f6\u7684\u6267\u884c\u51fd\u6570\uff1b<\/p>\n<p>\u2462. finalize: \u5220\u9664\u6a21\u578b\u65f6\u4f1a\u8fdb\u884c\u8c03\u7528\uff1b<\/p>\n<p>\u5982\u679c\u6709 n \u4e2a\u6a21\u578b\u5b9e\u4f8b\uff0c\u90a3\u4e48\u4f1a\u8c03\u7528 n \u6b21initialize \u548c finalize\u8fd9\u4e24\u4e2a\u51fd\u6570\u3002<\/p>\n<p>ChatGLM2-6B\u7684model.py\u6587\u4ef6\u53ef\u4ee5\u53c2\u8003\u5982\u4e0b\uff1a<\/p>\n<pre><\/div><div class=\"code-block-extension-headerRight\"><span class=\"code-block-extension-lang\">python<\/span><div class=\"code-block-extension-copyCodeBtn\">\u590d\u5236\u4ee3\u7801<\/div><\/div><\/div><code class=\"hljs language-python code-block-extension-codeShowNum\" lang=\"python\"><span class=\"code-block-extension-codeLine\" data-line-num=\"1\"><span class=\"hljs-keyword\">import<\/span> os<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"2\"><span class=\"hljs-comment\"># \u8bbe\u7f6e\u663e\u5b58\u7a7a\u95f2block\u6700\u5927\u5206\u5272\u9608\u503c<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"3\">os.environ[<span class=\"hljs-string\">'PYTORCH_CUDA_ALLOC_CONF'<\/span>] = <span class=\"hljs-string\">'max_split_size_mb:32'<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"4\"><span class=\"hljs-comment\"># \u8bbe\u7f6ework\u76ee\u5f55<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"5\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"6\">os.environ[<span class=\"hljs-string\">'TRANSFORMERS_CACHE'<\/span>] = os.path.dirname(os.path.abspath(__file__))+<span class=\"hljs-string\">\"\/work\/\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"7\">os.environ[<span class=\"hljs-string\">'HF_MODULES_CACHE'<\/span>] = os.path.dirname(os.path.abspath(__file__))+<span class=\"hljs-string\">\"\/work\/\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"8\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"9\"><span class=\"hljs-keyword\">import<\/span> json<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"10\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"11\"><span class=\"hljs-comment\"># triton_python_backend_utils is available in every Triton Python model. You<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"12\"><span class=\"hljs-comment\"># need to use this module to create inference requests and responses. It also<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"13\"><span class=\"hljs-comment\"># contains some utility functions for extracting information from model_config<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"14\"><span class=\"hljs-comment\"># and converting Triton input\/output types to numpy types.<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"15\"><span class=\"hljs-keyword\">import<\/span> triton_python_backend_utils <span class=\"hljs-keyword\">as<\/span> pb_utils<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"16\"><span class=\"hljs-keyword\">import<\/span> sys<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"17\"><span class=\"hljs-keyword\">import<\/span> gc<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"18\"><span class=\"hljs-keyword\">import<\/span> time<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"19\"><span class=\"hljs-keyword\">import<\/span> logging<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"20\"><span class=\"hljs-keyword\">import<\/span> torch<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"21\"><span class=\"hljs-keyword\">from<\/span> transformers <span class=\"hljs-keyword\">import<\/span> AutoTokenizer, AutoModel<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"22\"><span class=\"hljs-keyword\">import<\/span> numpy <span class=\"hljs-keyword\">as<\/span> np<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"23\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"24\">gc.collect()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"25\">torch.cuda.empty_cache()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"26\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"27\">logging.basicConfig(<span class=\"hljs-built_in\">format<\/span>=<span class=\"hljs-string\">'%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s'<\/span>,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"28\">                    level=logging.INFO)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"29\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"30\"><span class=\"hljs-keyword\">class<\/span> <span class=\"hljs-title class_\">TritonPythonModel<\/span>:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"31\">    <span class=\"hljs-string\">\"\"\"Your Python model must use the same class name. Every Python model<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"32\">    that is created must have \"TritonPythonModel\" as the class name.<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"33\">    \"\"\"<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"34\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"35\">    <span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">initialize<\/span>(<span class=\"hljs-params\">self, args<\/span>):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"36\">        <span class=\"hljs-string\">\"\"\"`initialize` is called only once when the model is being loaded.<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"37\">        Implementing `initialize` function is optional. This function allows<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"38\">        the model to intialize any state associated with this model.<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"39\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"40\">        Parameters<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"41\">        ----------<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"42\">        args : dict<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"43\">          Both keys and values are strings. The dictionary keys and values are:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"44\">          * model_config: A JSON string containing the model configuration<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"45\">          * model_instance_kind: A string containing model instance kind<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"46\">          * model_instance_device_id: A string containing model instance device ID<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"47\">          * model_repository: Model repository path<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"48\">          * model_version: Model version<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"49\">          * model_name: Model name<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"50\">        \"\"\"<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"51\">        <span class=\"hljs-comment\"># You must parse model_config. JSON string is not parsed here<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"52\">        self.model_config = json.loads(args[<span class=\"hljs-string\">'model_config'<\/span>])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"53\">        <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"54\">        output_response_config = pb_utils.get_output_config_by_name(self.model_config, <span class=\"hljs-string\">\"response\"<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"55\">        output_history_config = pb_utils.get_output_config_by_name(self.model_config, <span class=\"hljs-string\">\"history\"<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"56\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"57\">        <span class=\"hljs-comment\"># Convert Triton types to numpy types<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"58\">        self.output_response_dtype = pb_utils.triton_string_to_numpy(output_response_config[<span class=\"hljs-string\">'data_type'<\/span>])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"59\">        self.output_history_dtype = pb_utils.triton_string_to_numpy(output_history_config[<span class=\"hljs-string\">'data_type'<\/span>])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"60\">        <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"61\">        ChatGLM_path = os.path.dirname(os.path.abspath(__file__))+<span class=\"hljs-string\">\"\/ChatGLM2_6B\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"62\">        self.tokenizer = AutoTokenizer.from_pretrained(ChatGLM_path, trust_remote_code=<span class=\"hljs-literal\">True<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"63\">        model = AutoModel.from_pretrained(ChatGLM_path,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"64\">                                          torch_dtype=torch.bfloat16,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"65\">                                          trust_remote_code=<span class=\"hljs-literal\">True<\/span>).half().cuda()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"66\">        self.model = model.<span class=\"hljs-built_in\">eval<\/span>()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"67\">        logging.info(<span class=\"hljs-string\">\"model init success\"<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"68\">        <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"69\">    <span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">execute<\/span>(<span class=\"hljs-params\">self, requests<\/span>):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"70\">        <span class=\"hljs-string\">\"\"\"`execute` MUST be implemented in every Python model. `execute`<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"71\">        function receives a list of pb_utils.InferenceRequest as the only<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"72\">        argument. This function is called when an inference request is made<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"73\">        for this model. Depending on the batching configuration (e.g. Dynamic<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"74\">        Batching) used, `requests` may contain multiple requests. Every<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"75\">        Python model, must create one pb_utils.InferenceResponse for every<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"76\">        pb_utils.InferenceRequest in `requests`. If there is an error, you can<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"77\">        set the error argument when creating a pb_utils.InferenceResponse<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"78\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"79\">        Parameters<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"80\">        ----------<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"81\">        requests : list<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"82\">          A list of pb_utils.InferenceRequest<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"83\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"84\">        Returns<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"85\">        -------<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"86\">        list<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"87\">          A list of pb_utils.InferenceResponse. The length of this list must<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"88\">          be the same as `requests`<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"89\">          <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"90\">        \"\"\"<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"91\">        output_response_dtype = self.output_response_dtype<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"92\">        output_history_dtype = self.output_history_dtype<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"93\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"94\">        <span class=\"hljs-comment\"># output_dtype = self.output_dtype<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"95\">        responses = []<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"96\">        <span class=\"hljs-comment\"># Every Python backend must iterate over everyone of the requests<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"97\">        <span class=\"hljs-comment\"># and create a pb_utils.InferenceResponse for each of them.<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"98\">        <span class=\"hljs-keyword\">for<\/span> request <span class=\"hljs-keyword\">in<\/span> requests:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"99\">            prompt = pb_utils.get_input_tensor_by_name(request, <span class=\"hljs-string\">\"prompt\"<\/span>).as_numpy()[<span class=\"hljs-number\">0<\/span>]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"100\">            prompt = prompt.decode(<span class=\"hljs-string\">'utf-8'<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"101\">            history_origin = pb_utils.get_input_tensor_by_name(request, <span class=\"hljs-string\">\"history\"<\/span>).as_numpy()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"102\">            <span class=\"hljs-keyword\">if<\/span> <span class=\"hljs-built_in\">len<\/span>(history_origin) &gt; <span class=\"hljs-number\">0<\/span>:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"103\">                history = np.array([item.decode(<span class=\"hljs-string\">'utf-8'<\/span>) <span class=\"hljs-keyword\">for<\/span> item <span class=\"hljs-keyword\">in<\/span> history_origin]).reshape((-<span class=\"hljs-number\">1<\/span>,<span class=\"hljs-number\">2<\/span>)).tolist()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"104\">            <span class=\"hljs-keyword\">else<\/span>:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"105\">                history = []<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"106\">            temperature = pb_utils.get_input_tensor_by_name(request, <span class=\"hljs-string\">\"temperature\"<\/span>).as_numpy()[<span class=\"hljs-number\">0<\/span>]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"107\">            temperature = <span class=\"hljs-built_in\">float<\/span>(temperature.decode(<span class=\"hljs-string\">'utf-8'<\/span>))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"108\">            max_token = pb_utils.get_input_tensor_by_name(request, <span class=\"hljs-string\">\"max_token\"<\/span>).as_numpy()[<span class=\"hljs-number\">0<\/span>]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"109\">            max_token = <span class=\"hljs-built_in\">int<\/span>(max_token.decode(<span class=\"hljs-string\">'utf-8'<\/span>))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"110\">            history_len = pb_utils.get_input_tensor_by_name(request, <span class=\"hljs-string\">\"history_len\"<\/span>).as_numpy()[<span class=\"hljs-number\">0<\/span>]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"111\">            history_len = <span class=\"hljs-built_in\">int<\/span>(history_len.decode(<span class=\"hljs-string\">'utf-8'<\/span>))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"112\">            <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"113\">            <span class=\"hljs-comment\"># \u65e5\u5fd7\u8f93\u51fa\u4f20\u5165\u4fe1\u606f<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"114\">            in_log_info = {<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"115\">                <span class=\"hljs-string\">\"in_prompt\"<\/span>:prompt,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"116\">                <span class=\"hljs-string\">\"in_history\"<\/span>:history,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"117\">                <span class=\"hljs-string\">\"in_temperature\"<\/span>:temperature,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"118\">                <span class=\"hljs-string\">\"in_max_token\"<\/span>:max_token,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"119\">                <span class=\"hljs-string\">\"in_history_len\"<\/span>:history_len<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"120\">                       }<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"121\">            logging.info(in_log_info)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"122\">            response,history = self.model.chat(self.tokenizer,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"123\">                                               prompt,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"124\">                                               history=history[-history_len:] <span class=\"hljs-keyword\">if<\/span> history_len &gt; <span class=\"hljs-number\">0<\/span> <span class=\"hljs-keyword\">else<\/span> [],<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"125\">                                               max_length=max_token,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"126\">                                               temperature=temperature)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"127\">            <span class=\"hljs-comment\"># \u65e5\u5fd7\u8f93\u51fa\u5904\u7406\u540e\u7684\u4fe1\u606f<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"128\">            out_log_info = {<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"129\">                <span class=\"hljs-string\">\"out_response\"<\/span>:response,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"130\">                <span class=\"hljs-string\">\"out_history\"<\/span>:history<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"131\">                       }<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"132\">            logging.info(out_log_info)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"133\">            response = np.array(response)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"134\">            history = np.array(history)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"135\">            <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"136\">            response_output_tensor = pb_utils.Tensor(<span class=\"hljs-string\">\"response\"<\/span>,response.astype(self.output_response_dtype))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"137\">            history_output_tensor = pb_utils.Tensor(<span class=\"hljs-string\">\"history\"<\/span>,history.astype(self.output_history_dtype))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"138\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"139\">            final_inference_response = pb_utils.InferenceResponse(output_tensors=[response_output_tensor,history_output_tensor])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"140\">            responses.append(final_inference_response)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"141\">            <span class=\"hljs-comment\"># Create InferenceResponse. You can set an error here in case<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"142\">            <span class=\"hljs-comment\"># there was a problem with handling this inference request.<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"143\">            <span class=\"hljs-comment\"># Below is an example of how you can set errors in inference<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"144\">            <span class=\"hljs-comment\"># response:<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"145\">            <span class=\"hljs-comment\">#<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"146\">            <span class=\"hljs-comment\"># pb_utils.InferenceResponse(<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"147\">            <span class=\"hljs-comment\">#    output_tensors=..., TritonError(\"An error occured\"))<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"148\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"149\">        <span class=\"hljs-comment\"># You should return a list of pb_utils.InferenceResponse. Length<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"150\">        <span class=\"hljs-comment\"># of this list must match the length of `requests` list.<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"151\">        <span class=\"hljs-keyword\">return<\/span> responses<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"152\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"153\">    <span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">finalize<\/span>(<span class=\"hljs-params\">self<\/span>):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"154\">        <span class=\"hljs-string\">\"\"\"`finalize` is called only once when the model is being unloaded.<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"155\">        Implementing `finalize` function is OPTIONAL. This function allows<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"156\">        the model to perform any necessary clean ups before exit.<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"157\">        \"\"\"<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"158\">        <span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">'Cleaning up...'<\/span>)<\/span>\n<\/code><\/pre>\n<h2 data-id=\"heading-7\">5. \u90e8\u7f72\u6d4b\u8bd5<\/h2>\n<p>\u2460 \u9009\u62e99n-triton-devel-gpu-v0.3\u955c\u50cf\u521b\u5efanotebook\u6d4b\u8bd5\u5b9e\u4f8b\uff1b<\/p>\n<p>\u2461 \u628a\u6a21\u578b\u653e\u5728\/9n-triton-devel\/model_repository\u76ee\u5f55\u4e0b\uff0c\u6a21\u578b\u76ee\u5f55\u7ed3\u6784\u53c2\u80033.1\uff1b<\/p>\n<p>\u2462 \u8fdb\u5165\/9n-triton-devel\/server\/\u76ee\u5f55\uff0c\u62c9\u53d6\u6700\u65b0\u7248\u672c\u7684bin\u5e76\u89e3\u538b\uff1awget <a href=\"https:\/\/link.juejin.cn?target=http%3A%2F%2Fstorage.jd.local%2Fcom.bamboo.server.product%2F7196560%2F9n_predictor_server.tgz\" target=\"_blank\" title=\"http:\/\/storage.jd.local\/com.bamboo.server.product\/7196560\/9n_predictor_server.tgz\" ref=\"nofollow noopener noreferrer\" rel=\"noopener\">storage.jd.local\/com.bamboo.\u2026<\/a><\/p>\n<p>\u2463 \u4fee\u6539\/9n-triton-devel\/server\/start.sh \u4e3a\u5982\u4e0b\uff1a<\/p>\n<pre><\/div><div class=\"code-block-extension-headerRight\"><span class=\"code-block-extension-lang\">bash<\/span><div class=\"code-block-extension-copyCodeBtn\">\u590d\u5236\u4ee3\u7801<\/div><\/div><\/div><code class=\"hljs language-bash code-block-extension-codeShowNum\" lang=\"bash\"><span class=\"code-block-extension-codeLine\" data-line-num=\"1\"><span class=\"hljs-built_in\">mkdir<\/span> logs<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"2\"><span class=\"hljs-built_in\">rm<\/span> -rf \/9n-triton-devel\/server\/logs\/*<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"3\"><span class=\"hljs-built_in\">rm<\/span> -rf \/tmp\/python_env_*<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"4\"><span class=\"hljs-built_in\">export<\/span> LD_LIBRARY_PATH=\/9n-triton-devel\/server\/lib\/:<span class=\"hljs-variable\">$LD_LIBRARY_PATH<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"5\"><span class=\"hljs-built_in\">nohup<\/span> .\/bin\/9n_predictor_server --flagfile=.\/conf\/server.gflags 2&gt;&amp;1 &gt;\/dev\/null &amp;<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"6\"><span class=\"hljs-built_in\">sleep<\/span> 2<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"7\">pid=`ps x |grep <span class=\"hljs-string\">\"9n_predictor_server\"<\/span> | grep -v <span class=\"hljs-string\">\"grep\"<\/span> | grep -v <span class=\"hljs-string\">\"ldd\"<\/span> | grep -v <span class=\"hljs-string\">\"stat\"<\/span> | awk <span class=\"hljs-string\">'{print $1}'<\/span>`<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"8\"><span class=\"hljs-built_in\">echo<\/span> <span class=\"hljs-variable\">$pid<\/span><\/span>\n<\/code><\/pre>\n<p>\u2464 \u8fd0\u884c \/9n-triton-devel\/server\/start.sh \u811a\u672c<\/p>\n<p>\u2465 \u68c0\u67e5\u670d\u52a1\u542f\u52a8\u6210\u529f\uff08ChatGLM2-6B\u6a21\u578b\u542f\u52a8\uff0c\u5dee\u4e0d\u591a13\u5206\u949f\u5de6\u53f3\uff09<\/p>\n<p>\u65b9\u6cd51:\u67e5\u770b8010\u7aef\u53e3\u662f\u5426\u542f\u52a8\uff1anetstat -natp | grep 8010<\/p>\n<p>\u65b9\u6cd52:\u67e5\u770b\u65e5\u5fd7\uff1acat \/9n-triton-devel\/server\/logs\/predictor_core.INFO<\/p>\n<p>\u2466 \u7f16\u5199python grpc client\u8bbf\u95ee\u6d4b\u8bd5\u670d\u52a1\u811a\u672c\uff0c\u653e\u4e8e\/9n-triton-devel\/client\/\u76ee\u5f55\u4e0b\uff0c\u8bbf\u95ee\u7aef\u53e3\u4e3a8010\uff0cip\u4e3a127.0.0.1\uff0c\u53ef\u4ee5\u53c2\u8003\u5982\u4e0b\uff1a<\/p>\n<pre><\/div><div class=\"code-block-extension-headerRight\"><span class=\"code-block-extension-lang\">css<\/span><div class=\"code-block-extension-copyCodeBtn\">\u590d\u5236\u4ee3\u7801<\/div><\/div><\/div><code class=\"hljs language-css code-block-extension-codeShowNum\" lang=\"css\"><span class=\"code-block-extension-codeLine\" data-line-num=\"1\">#!\/usr\/bin\/python3<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"2\"># -*- coding: utf-<span class=\"hljs-number\">8<\/span> -*-<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"3\">import sys<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"4\">sys.path.<span class=\"hljs-built_in\">append<\/span>(<span class=\"hljs-string\">'.\/base'<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"5\">from multi_backend_client import MultiBackendClient<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"6\">import triton_python_backend_utils as python_backend_utils<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"7\">import multi_backend_message_pb2<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"8\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"9\">import time<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"10\">import argparse<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"11\">import io<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"12\">import os<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"13\">import numpy as np<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"14\">import json<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"15\">import struct<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"16\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"17\">def <span class=\"hljs-built_in\">print_result<\/span>(response, batch_size ):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"18\">    <span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">\"outputs len:\"<\/span> + <span class=\"hljs-built_in\">str<\/span>(<span class=\"hljs-built_in\">len<\/span>(response.outputs)))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"19\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"20\">    if (response.error_code == <span class=\"hljs-number\">0<\/span>):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"21\">        <span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">\"response : \"<\/span>, response)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"22\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"23\">        <span class=\"hljs-built_in\">print<\/span>(f<span class=\"hljs-string\">'res shape: {response.outputs[0].shape}'<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"24\">        res = python_backend_utils.<span class=\"hljs-built_in\">deserialize_bytes_tensor<\/span>(response.raw_output_contents[<span class=\"hljs-number\">0<\/span>])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"25\">        for i in res:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"26\">            <span class=\"hljs-built_in\">print<\/span>(i.<span class=\"hljs-built_in\">decode<\/span>())<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"27\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"28\">        <span class=\"hljs-built_in\">print<\/span>(f<span class=\"hljs-string\">'history shape: {response.outputs[1].shape}'<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"29\">        history = python_backend_utils.<span class=\"hljs-built_in\">deserialize_bytes_tensor<\/span>(response.raw_output_contents[<span class=\"hljs-number\">1<\/span>])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"30\">        for i in history:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"31\">            <span class=\"hljs-built_in\">print<\/span>(i.<span class=\"hljs-built_in\">decode<\/span>())<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"32\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"33\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"34\">def <span class=\"hljs-built_in\">send_one_request<\/span>(sender, request_pb, batch_size):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"35\">    succ, response = sender.<span class=\"hljs-built_in\">send_req<\/span>(request_pb)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"36\">    if succ:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"37\">        <span class=\"hljs-built_in\">print_result<\/span>(response, batch_size)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"38\">    else:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"39\">      <span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">'send_one_request fail '<\/span>, response)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"40\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"41\">def <span class=\"hljs-built_in\">send_request<\/span>(ip, port, temperature, max_token, history_len, batch_size=<span class=\"hljs-number\">1<\/span>, send_cnt=<span class=\"hljs-number\">1<\/span>):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"42\">    request_sender = <span class=\"hljs-built_in\">MultiBackendClient<\/span>(ip, port)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"43\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"44\">    request = multi_backend_message_pb2.<span class=\"hljs-built_in\">ModelInferRequest<\/span>()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"45\">    request.model_name = <span class=\"hljs-string\">\"chatglm2-6b\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"46\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"47\">    # \u8f93\u5165\u5360\u4f4d<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"48\">    input0 = multi_backend_message_pb2.<span class=\"hljs-built_in\">ModelInferRequest<\/span>().<span class=\"hljs-built_in\">InferInputTensor<\/span>()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"49\">    input0.name = <span class=\"hljs-string\">\"prompt\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"50\">    input0.datatype = <span class=\"hljs-string\">\"BYTES\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"51\">    input0.shape.<span class=\"hljs-built_in\">extend<\/span>([<span class=\"hljs-number\">1<\/span>])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"52\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"53\">    input1 = multi_backend_message_pb2.<span class=\"hljs-built_in\">ModelInferRequest<\/span>().<span class=\"hljs-built_in\">InferInputTensor<\/span>()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"54\">    input1.name = <span class=\"hljs-string\">\"history\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"55\">    input1.datatype = <span class=\"hljs-string\">\"BYTES\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"56\">    input1.shape.<span class=\"hljs-built_in\">extend<\/span>([-<span class=\"hljs-number\">1<\/span>])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"57\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"58\">    input2 = multi_backend_message_pb2.<span class=\"hljs-built_in\">ModelInferRequest<\/span>().<span class=\"hljs-built_in\">InferInputTensor<\/span>()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"59\">    input2.name = <span class=\"hljs-string\">\"temperature\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"60\">    input2.datatype = <span class=\"hljs-string\">\"BYTES\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"61\">    input2.shape.<span class=\"hljs-built_in\">extend<\/span>([<span class=\"hljs-number\">1<\/span>])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"62\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"63\">    input3 = multi_backend_message_pb2.<span class=\"hljs-built_in\">ModelInferRequest<\/span>().<span class=\"hljs-built_in\">InferInputTensor<\/span>()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"64\">    input3.name = <span class=\"hljs-string\">\"max_token\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"65\">    input3.datatype = <span class=\"hljs-string\">\"BYTES\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"66\">    input3.shape.<span class=\"hljs-built_in\">extend<\/span>([<span class=\"hljs-number\">1<\/span>])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"67\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"68\">    input4 = multi_backend_message_pb2.<span class=\"hljs-built_in\">ModelInferRequest<\/span>().<span class=\"hljs-built_in\">InferInputTensor<\/span>()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"69\">    input4.name = <span class=\"hljs-string\">\"history_len\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"70\">    input4.datatype = <span class=\"hljs-string\">\"BYTES\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"71\">    input4.shape.<span class=\"hljs-built_in\">extend<\/span>([<span class=\"hljs-number\">1<\/span>])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"72\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"73\">    query = <span class=\"hljs-string\">'\u8bf7\u7ed9\u51fa\u4e00\u4e2a\u5177\u4f53\u793a\u4f8b'<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"74\">    input0.contents.bytes_contents.<span class=\"hljs-built_in\">append<\/span>(<span class=\"hljs-built_in\">bytes<\/span>(query, encoding=<span class=\"hljs-string\">\"utf8\"<\/span>))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"75\">    request.inputs.<span class=\"hljs-built_in\">extend<\/span>([input0])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"76\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"77\">    history_origin = np.<span class=\"hljs-built_in\">array<\/span>([[<span class=\"hljs-string\">'\u4f60\u77e5\u9053\u9e21\u5154\u540c\u7b3c\u95ee\u9898\u4e48'<\/span>, <span class=\"hljs-string\">'\u9e21\u5154\u540c\u7b3c\u95ee\u9898\u662f\u4e00\u4e2a\u7ecf\u5178\u7684\u6570\u5b66\u95ee\u9898\uff0c\u6d89\u53ca\u5230\u57fa\u672c\u7684\u4ee3\u6570\u65b9\u7a0b\u548c\u89e3\u9898\u65b9\u6cd5\u3002\u95ee\u9898\u63cf\u8ff0\u4e3a\uff1a\u5728\u4e00\u4e2a\u7b3c\u5b50\u91cc\u9762\uff0c\u6709\u82e5\u5e72\u53ea\u9e21\u548c\u5154\u5b50\uff0c\u5df2\u77e5\u5b83\u4eec\u7684\u603b\u6570\u548c\u603b\u817f\u6570\uff0c\u95ee\u9e21\u548c\u5154\u5b50\u7684\u6570\u91cf\u5404\u662f\u591a\u5c11\uff1fnn\u89e3\u6cd5\u5982\u4e0b\uff1a\u5047\u8bbe\u9e21\u7684\u6570\u91cf\u4e3ax\uff0c\u5154\u5b50\u7684\u6570\u91cf\u4e3ay\uff0c\u5219\u603b\u817f\u6570\u4e3a2x+4y\u3002\u6839\u636e\u9898\u610f\uff0c\u53ef\u4ee5\u5217\u51fa\u65b9\u7a0b\u7ec4\uff1annx + y = \u603b\u6570n2x + 4y = \u603b\u817f\u6570nn\u901a\u8fc7\u89e3\u65b9\u7a0b\u7ec4\uff0c\u53ef\u4ee5\u6c42\u5f97x\u548cy\u7684\u503c\uff0c\u4ece\u800c\u786e\u5b9a\u9e21\u548c\u5154\u5b50\u7684\u6570\u91cf\u3002'<\/span>]]).<span class=\"hljs-built_in\">reshape<\/span>((-<span class=\"hljs-number\">1<\/span>,))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"78\">    history = [<span class=\"hljs-built_in\">bytes<\/span>(item, encoding=<span class=\"hljs-string\">\"utf8\"<\/span>) for item in history_origin]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"79\">    input1.contents.bytes_contents.<span class=\"hljs-built_in\">extend<\/span>(history)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"80\">    request.inputs.<span class=\"hljs-built_in\">extend<\/span>([input1])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"81\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"82\">    input2.contents.bytes_contents.<span class=\"hljs-built_in\">append<\/span>(<span class=\"hljs-built_in\">bytes<\/span>(temperature, encoding=<span class=\"hljs-string\">\"utf8\"<\/span>))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"83\">    request.inputs.<span class=\"hljs-built_in\">extend<\/span>([input2])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"84\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"85\">    input3.contents.bytes_contents.<span class=\"hljs-built_in\">append<\/span>(<span class=\"hljs-built_in\">bytes<\/span>(max_token, encoding=<span class=\"hljs-string\">\"utf8\"<\/span>))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"86\">    request.inputs.<span class=\"hljs-built_in\">extend<\/span>([input3])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"87\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"88\">    input4.contents.bytes_contents.<span class=\"hljs-built_in\">append<\/span>(<span class=\"hljs-built_in\">bytes<\/span>(history_len, encoding=<span class=\"hljs-string\">\"utf8\"<\/span>))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"89\">    request.inputs.<span class=\"hljs-built_in\">extend<\/span>([input4])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"90\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"91\">    # \u8f93\u51fa\u5360\u4f4d<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"92\">    output_tensor0 = multi_backend_message_pb2.<span class=\"hljs-built_in\">ModelInferRequest<\/span>().<span class=\"hljs-built_in\">InferRequestedOutputTensor<\/span>()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"93\">    output_tensor0.name = <span class=\"hljs-string\">\"response\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"94\">    request.outputs.<span class=\"hljs-built_in\">extend<\/span>([output_tensor0])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"95\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"96\">    output_tensor1 = multi_backend_message_pb2.<span class=\"hljs-built_in\">ModelInferRequest<\/span>().<span class=\"hljs-built_in\">InferRequestedOutputTensor<\/span>()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"97\">    output_tensor1.name = <span class=\"hljs-string\">\"history\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"98\">    request.outputs.<span class=\"hljs-built_in\">extend<\/span>([output_tensor1])<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"99\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"100\">    min_ms = <span class=\"hljs-number\">0<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"101\">    max_ms = <span class=\"hljs-number\">0<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"102\">    avg_ms = <span class=\"hljs-number\">0<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"103\">    for i in <span class=\"hljs-built_in\">range<\/span>(send_cnt):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"104\">        start = time.<span class=\"hljs-built_in\">time_ns<\/span>()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"105\">        <span class=\"hljs-built_in\">send_one_request<\/span>(request_sender, request, batch_size)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"106\">        cost = (time.<span class=\"hljs-built_in\">time_ns<\/span>()-start)\/<span class=\"hljs-number\">1000000<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"107\">        print (<span class=\"hljs-string\">\"idx:%d cost  ms:%d\"<\/span> % (i, cost))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"108\">        if cost &gt; max_ms:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"109\">            max_ms = cost<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"110\">        if cost &lt; min_ms or min_ms==<span class=\"hljs-number\">0<\/span>:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"111\">            min_ms = cost<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"112\">        avg_ms += cost<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"113\">    avg_ms \/= send_cnt<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"114\">    <span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">\"cnt=%d max=%dms min=%dms avg=%dms\"<\/span> % (send_cnt, max_ms, min_ms, avg_ms))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"115\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"116\">if __name__ == <span class=\"hljs-string\">'__main__'<\/span>:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"117\">        parser = argparse.<span class=\"hljs-built_in\">ArgumentParser<\/span>()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"118\">        parser.<span class=\"hljs-built_in\">add_argument<\/span>( <span class=\"hljs-string\">'-ip'<\/span>, <span class=\"hljs-string\">'--ip_address'<\/span>, help = <span class=\"hljs-string\">'ip address'<\/span>, default=<span class=\"hljs-string\">'127.0.0.1'<\/span>, required=False)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"119\">        parser.<span class=\"hljs-built_in\">add_argument<\/span>( <span class=\"hljs-string\">'-p'<\/span>,  <span class=\"hljs-string\">'--port'<\/span>, help = <span class=\"hljs-string\">'port'<\/span>, default=<span class=\"hljs-string\">'8010'<\/span>, required=False)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"120\">        parser.<span class=\"hljs-built_in\">add_argument<\/span>( <span class=\"hljs-string\">'-t'<\/span>,  <span class=\"hljs-string\">'--temperature'<\/span>, help = <span class=\"hljs-string\">'temperature'<\/span>, default=<span class=\"hljs-string\">'0.01'<\/span>, required=False)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"121\">        parser.<span class=\"hljs-built_in\">add_argument<\/span>( <span class=\"hljs-string\">'-m'<\/span>,  <span class=\"hljs-string\">'--max_token'<\/span>, help = <span class=\"hljs-string\">'max_token'<\/span>, default=<span class=\"hljs-string\">'16000'<\/span>, required=False)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"122\">        parser.<span class=\"hljs-built_in\">add_argument<\/span>( <span class=\"hljs-string\">'-hl'<\/span>,  <span class=\"hljs-string\">'--history_len'<\/span>, help = <span class=\"hljs-string\">'history_len'<\/span>, default=<span class=\"hljs-string\">'10'<\/span>, required=False)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"123\">        parser.<span class=\"hljs-built_in\">add_argument<\/span>( <span class=\"hljs-string\">'-b'<\/span>,  <span class=\"hljs-string\">'--batch_size'<\/span>, help = <span class=\"hljs-string\">'batch size'<\/span>, default=<span class=\"hljs-number\">1<\/span>, required=False, type = int)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"124\">        parser.<span class=\"hljs-built_in\">add_argument<\/span>( <span class=\"hljs-string\">'-c'<\/span>,  <span class=\"hljs-string\">'--send_count'<\/span>, help = <span class=\"hljs-string\">'send count'<\/span>, default=<span class=\"hljs-number\">1<\/span>, required=False, type = int)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"125\">        args = parser.<span class=\"hljs-built_in\">parse_args<\/span>()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"126\">        <span class=\"hljs-built_in\">send_request<\/span>(args.ip_address, args.port, args.temperature, args.max_token, args.history_len, args.batch_size, args.send_count)<\/span>\n<\/code><\/pre>\n<p>\u901a\u7528predictor\u8bf7\u6c42\u683c\u5f0f\u53ef\u4ee5\u53c2\u8003\uff1a <a href=\"https:\/\/link.juejin.cn?target=https%3A%2F%2Fgithub.com%2Fkserve%2Fkserve%2Fblob%2Fmaster%2Fdocs%2Fpredict-api%2Fv2%2Fgrpc_predict_v2.proto\" target=\"_blank\" title=\"https:\/\/github.com\/kserve\/kserve\/blob\/master\/docs\/predict-api\/v2\/grpc_predict_v2.proto\" ref=\"nofollow noopener noreferrer\" rel=\"noopener\">github.com\/kserve\/kser\u2026<\/a><\/p>\n<h2 data-id=\"heading-8\">6. \u6a21\u578b\u90e8\u7f72<\/h2>\n<p>\u4e5d\u6570\u7b97\u6cd5\u4e2d\u53f0\u63d0\u4f9b\u4e86\u4e24\u79cd\u90e8\u7f72\u6a21\u578b\u670d\u52a1\u65b9\u5f0f\uff0c\u5206\u522b\u4e3a\u754c\u9762\u90e8\u7f72\u548cSDK\u90e8\u7f72\u3002\u5229\u7528\u754c\u9762\u4e2d\u7684\u6a21\u578b\u90e8\u7f72\u53ea\u652f\u6301JSF\u534f\u8bae\u63a5\u53e3\uff0c\u82e5\u8981\u63d0\u4f9bJSF\u670d\u52a1\u63a5\u53e3\uff0c\u5219\u53ef\u4ee5\u53c2\u8003 <a href=\"https:\/\/link.juejin.cn?target=http%3A%2F%2Feasyalgo.jd.com%2Fhelp%2F%25E4%25BD%25BF%25E7%2594%25A8%25E6%258C%2587%25E5%258D%2597%2F%25E6%25A8%25A1%25E5%259E%258B%25E8%25AE%25A1%25E7%25AE%2597%2F%25E6%25A8%25A1%25E5%259E%258B%25E9%2583%25A8%25E7%25BD%25B2.html\" target=\"_blank\" title=\"http:\/\/easyalgo.jd.com\/help\/%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97\/%E6%A8%A1%E5%9E%8B%E8%AE%A1%E7%AE%97\/%E6%A8%A1%E5%9E%8B%E9%83%A8%E7%BD%B2.html\" ref=\"nofollow noopener noreferrer\" rel=\"noopener\">easyalgo.jd.com\/help\/%E4%BD\u2026<\/a> \u76f4\u63a5\u90e8\u7f72\u3002<\/p>\n<p>\u7531\u4e8e\u6211\u540e\u7eed\u9700\u8981\u5c06ChatGLM2-6B\u6a21\u578b\u96c6\u6210\u81f3langchain\u4e2d\u4f7f\u7528\uff0c\u6240\u4ee5\u5bf9\u5916\u63d0\u4f9bhttp\u534f\u8bae\u63a5\u53e3\u6bd4\u8f83\u4fbf\u5229\uff0c\u7ecf\u4e0e\u7b97\u6cd5\u4e2d\u53f0\u540c\u5b66\u8bf7\u6559\u540e\u4f7f\u7528SDK\u65b9\u5f0f\u90e8\u7f72\u53ef\u4ee5\u6ee1\u8db3\u3002\u7531\u4e8e\u754c\u9762\u90e8\u7f72\u548cSDK\u90e8\u7f72\u76ee\u524d\u7814\u53d1\u6ca1\u6709\u5bf9\u9f50\uff0c\u7528\u754c\u9762\u90e8\u7f72\u65f6\u76f4\u63a5\u53ef\u4ee5\u4f7f\u75283.1\u4e2d\u7684\u6a21\u578b\u7ed3\u6784\uff0c\u4f7f\u7528SDK\u90e8\u7f72\u5219\u9700\u8981\u8c03\u6574\u6a21\u578b\u7ed3\u6784\u5982\u4e0b\uff1a<\/p>\n<p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.nicekj.com\/wp-content\/uploads\/replace\/a31c72f7023df9da9929ccb5f497e347.png\" alt=\"chatglm2-6b\u6a21\u578b\u57289n-triton\u4e2d\u90e8\u7f72\u5e76\u96c6\u6210\u81f3langchain\u5b9e\u8df5 | \u4eac\u4e1c\u4e91\u6280\u672f\u56e2\u961f\" \/><\/figure>\n<\/p>\n<p>\u540c\u65f6\u9700\u8981\u5728config.pbtxt\u4e2d\u5c06\u6267\u884c\u73af\u5883\u8def\u5f84\u8bbe\u7f6e\u5982\u4e0b\uff1a<\/p>\n<pre><\/div><div class=\"code-block-extension-headerRight\"><span class=\"code-block-extension-lang\">css<\/span><div class=\"code-block-extension-copyCodeBtn\">\u590d\u5236\u4ee3\u7801<\/div><\/div><\/div><code class=\"hljs language-css code-block-extension-codeShowNum\" lang=\"css\"><span class=\"code-block-extension-codeLine\" data-line-num=\"1\">parameters: {<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"2\">  key: <span class=\"hljs-string\">\"EXECUTION_ENV_PATH\"<\/span>,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"3\">  value: {string_value: <span class=\"hljs-string\">\"$$TRITON_MODEL_DIRECTORY\/1\/python-3-8.tar.gz\"<\/span>}<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"4\">}<\/span>\n<\/code><\/pre>\n<p>\u6a21\u578b\u90e8\u7f72\u4ee3\u7801\u53ef\u4ee5\u53c2\u8003\u5982\u4e0b\uff1a<\/p>\n<pre><\/div><div class=\"code-block-extension-headerRight\"><span class=\"code-block-extension-lang\">ini<\/span><div class=\"code-block-extension-copyCodeBtn\">\u590d\u5236\u4ee3\u7801<\/div><\/div><\/div><code class=\"hljs language-ini code-block-extension-codeShowNum\" lang=\"ini\"><span class=\"code-block-extension-codeLine\" data-line-num=\"1\">from das.triton.model import TritonModel<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"2\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"3\"><span class=\"hljs-attr\">model<\/span> = TritonModel(<span class=\"hljs-string\">\"chatglm2-6b\"<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"4\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"5\"><span class=\"hljs-attr\">predictor<\/span> = model.deploy(<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"6\">    <span class=\"hljs-attr\">path<\/span>=<span class=\"hljs-string\">\"$pwd\/model_repository\/chatglm2-6b\"<\/span>, <span class=\"hljs-comment\"># \u6a21\u578b\u6587\u4ef6\u6240\u5728\u7684\u76ee\u5f55<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"7\">    <span class=\"hljs-attr\">protocol<\/span>=<span class=\"hljs-string\">'http'<\/span>,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"8\">    <span class=\"hljs-attr\">endpoint<\/span> = <span class=\"hljs-string\">\"9n-das-serving-lf2.jd.local\"<\/span>,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"9\">    <span class=\"hljs-attr\">cpu<\/span>=<span class=\"hljs-number\">4<\/span>,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"10\">    <span class=\"hljs-attr\">memory<\/span>=<span class=\"hljs-number\">30<\/span>,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"11\">    <span class=\"hljs-attr\">use_gpu<\/span>=<span class=\"hljs-literal\">True<\/span>, <span class=\"hljs-comment\"># \u6839\u636e\u662f\u5426\u9700\u8981gpu\u52a0\u901f\u63a8\u7406\u6765\u914d\u7f6e<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"12\">    <span class=\"hljs-attr\">override<\/span> = <span class=\"hljs-literal\">True<\/span>,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"13\">    <span class=\"hljs-attr\">instances<\/span>=<span class=\"hljs-number\">2<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"14\">    )<\/span>\n<\/code><\/pre>\n<h1 data-id=\"heading-9\">\u56db.\u96c6\u6210\u81f3langchain<\/h1>\n<p>\u4f7f\u7528langchain\u53ef\u4ee5\u5feb\u901f\u57fa\u4e8eLLM\u6a21\u578b\u5f00\u53d1\u4e00\u4e9b\u5e94\u7528\u3002\u4f7f\u7528LLMs\u6a21\u5757\u5c01\u88c5ChatGLM2-6B\uff0c\u8bf7\u6c42\u6211\u4eec\u7684\u6a21\u578b\u670d\u52a1\uff0c\u4e3b\u8981\u5b9e\u73b0_call\u51fd\u6570\uff0c\u53ef\u4ee5\u53c2\u8003\u5982\u4e0b\u4ee3\u7801\uff1a<\/p>\n<pre><\/div><div class=\"code-block-extension-headerRight\"><span class=\"code-block-extension-lang\">python<\/span><div class=\"code-block-extension-copyCodeBtn\">\u590d\u5236\u4ee3\u7801<\/div><\/div><\/div><code class=\"hljs language-python code-block-extension-codeShowNum\" lang=\"python\"><span class=\"code-block-extension-codeLine\" data-line-num=\"1\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"2\"><span class=\"hljs-keyword\">import<\/span> json<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"3\"><span class=\"hljs-keyword\">import<\/span> time<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"4\"><span class=\"hljs-keyword\">import<\/span> base64<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"5\"><span class=\"hljs-keyword\">import<\/span> struct<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"6\"><span class=\"hljs-keyword\">import<\/span> requests<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"7\"><span class=\"hljs-keyword\">import<\/span> numpy <span class=\"hljs-keyword\">as<\/span> np<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"8\"><span class=\"hljs-keyword\">from<\/span> pathlib <span class=\"hljs-keyword\">import<\/span> Path<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"9\"><span class=\"hljs-keyword\">from<\/span> abc <span class=\"hljs-keyword\">import<\/span> ABC, abstractmethod<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"10\"><span class=\"hljs-keyword\">from<\/span> langchain.llms.base <span class=\"hljs-keyword\">import<\/span> LLM<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"11\"><span class=\"hljs-keyword\">from<\/span> langchain.llms <span class=\"hljs-keyword\">import<\/span> OpenAI<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"12\"><span class=\"hljs-keyword\">from<\/span> langchain.llms.utils <span class=\"hljs-keyword\">import<\/span> enforce_stop_tokens<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"13\"><span class=\"hljs-keyword\">from<\/span> typing <span class=\"hljs-keyword\">import<\/span> <span class=\"hljs-type\">Dict<\/span>, <span class=\"hljs-type\">List<\/span>, <span class=\"hljs-type\">Optional<\/span>, <span class=\"hljs-type\">Tuple<\/span>, <span class=\"hljs-type\">Union<\/span>, Mapping, <span class=\"hljs-type\">Any<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"14\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"15\"><span class=\"hljs-keyword\">import<\/span> warnings<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"16\">warnings.filterwarnings(<span class=\"hljs-string\">\"ignore\"<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"17\"><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"18\"><span class=\"hljs-keyword\">class<\/span> <span class=\"hljs-title class_\">ChatGLM<\/span>(<span class=\"hljs-title class_ inherited__\">LLM<\/span>):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"19\">    max_token = <span class=\"hljs-number\">32000<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"20\">    temperature = <span class=\"hljs-number\">0.01<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"21\">    history_len = <span class=\"hljs-number\">10<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"22\">    url = <span class=\"hljs-string\">\"\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"23\">    <span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">__init__<\/span>(<span class=\"hljs-params\">self<\/span>):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"24\">        <span class=\"hljs-built_in\">super<\/span>(ChatGLM, self).__init__()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"25\">        <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"26\"><span class=\"hljs-meta\">    @property<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"27\">    <span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">_llm_type<\/span>(<span class=\"hljs-params\">self<\/span>):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"28\">        <span class=\"hljs-keyword\">return<\/span> <span class=\"hljs-string\">\"ChatGLM2-6B\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"29\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"30\"><span class=\"hljs-meta\">    @property<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"31\">    <span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">_history_len<\/span>(<span class=\"hljs-params\">self<\/span>) -&gt; <span class=\"hljs-built_in\">int<\/span>:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"32\">        <span class=\"hljs-keyword\">return<\/span> self.history_len<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"33\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"34\"><span class=\"hljs-meta\">    @property<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"35\">    <span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">_max_token<\/span>(<span class=\"hljs-params\">self<\/span>) -&gt; <span class=\"hljs-built_in\">int<\/span>:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"36\">        <span class=\"hljs-keyword\">return<\/span> self.max_token<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"37\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"38\"><span class=\"hljs-meta\">    @property<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"39\">    <span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">_temperature<\/span>(<span class=\"hljs-params\">self<\/span>) -&gt; <span class=\"hljs-built_in\">float<\/span>:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"40\">        <span class=\"hljs-keyword\">return<\/span> self.temperature<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"41\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"42\">    <span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">_deserialize_bytes_tensor<\/span>(<span class=\"hljs-params\">self, encoded_tensor<\/span>):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"43\">        <span class=\"hljs-string\">\"\"\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"44\">        Deserializes an encoded bytes tensor into an<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"45\">        numpy array of dtype of python objects<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"46\">        Parameters<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"47\">        ----------<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"48\">        encoded_tensor : bytes<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"49\">            The encoded bytes tensor where each element<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"50\">            has its length in first 4 bytes followed by<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"51\">            the content<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"52\">        Returns<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"53\">        -------<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"54\">        string_tensor : np.array<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"55\">            The 1-D numpy array of type object containing the<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"56\">            deserialized bytes in 'C' order.<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"57\">        \"\"\"<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"58\">        strs = <span class=\"hljs-built_in\">list<\/span>()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"59\">        offset = <span class=\"hljs-number\">0<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"60\">        val_buf = encoded_tensor<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"61\">        <span class=\"hljs-keyword\">while<\/span> offset &lt; <span class=\"hljs-built_in\">len<\/span>(val_buf):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"62\">            l = struct.unpack_from(<span class=\"hljs-string\">\"&lt;I\"<\/span>, val_buf, offset)[<span class=\"hljs-number\">0<\/span>]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"63\">            offset += <span class=\"hljs-number\">4<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"64\">            sb = struct.unpack_from(<span class=\"hljs-string\">\"&lt;{}s\"<\/span>.<span class=\"hljs-built_in\">format<\/span>(l), val_buf, offset)[<span class=\"hljs-number\">0<\/span>]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"65\">            offset += l<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"66\">            strs.append(sb)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"67\">        <span class=\"hljs-keyword\">return<\/span> (np.array(strs, dtype=np.object_))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"68\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"69\"><span class=\"hljs-meta\">    @classmethod<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"70\">    <span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">_infer<\/span>(<span class=\"hljs-params\">cls, url, query, history, temperature, max_token, history_len<\/span>):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"71\">        query = base64.b64encode(query.encode(<span class=\"hljs-string\">'utf-8'<\/span>)).decode(<span class=\"hljs-string\">'utf-8'<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"72\">        history_origin = np.asarray(history).reshape((-<span class=\"hljs-number\">1<\/span>,))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"73\">        history = [base64.b64encode(item.encode(<span class=\"hljs-string\">'utf-8'<\/span>)).decode(<span class=\"hljs-string\">'utf-8'<\/span>) <span class=\"hljs-keyword\">for<\/span> item <span class=\"hljs-keyword\">in<\/span> history_origin]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"74\">        temperature = base64.b64encode(temperature.encode(<span class=\"hljs-string\">'utf-8'<\/span>)).decode(<span class=\"hljs-string\">'utf-8'<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"75\">        max_token = base64.b64encode(max_token.encode(<span class=\"hljs-string\">'utf-8'<\/span>)).decode(<span class=\"hljs-string\">'utf-8'<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"76\">        history_len = base64.b64encode(history_len.encode(<span class=\"hljs-string\">'utf-8'<\/span>)).decode(<span class=\"hljs-string\">'utf-8'<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"77\">        data = {<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"78\">            <span class=\"hljs-string\">\"model_name\"<\/span>: <span class=\"hljs-string\">\"chatglm2-6b\"<\/span>,<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"79\">            <span class=\"hljs-string\">\"inputs\"<\/span>: [<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"80\">                {<span class=\"hljs-string\">\"name\"<\/span>: <span class=\"hljs-string\">\"prompt\"<\/span>, <span class=\"hljs-string\">\"datatype\"<\/span>: <span class=\"hljs-string\">\"BYTES\"<\/span>, <span class=\"hljs-string\">\"shape\"<\/span>: [<span class=\"hljs-number\">1<\/span>], <span class=\"hljs-string\">\"contents\"<\/span>: {<span class=\"hljs-string\">\"bytes_contents\"<\/span>: [query]}},<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"81\">                {<span class=\"hljs-string\">\"name\"<\/span>: <span class=\"hljs-string\">\"history\"<\/span>, <span class=\"hljs-string\">\"datatype\"<\/span>: <span class=\"hljs-string\">\"BYTES\"<\/span>, <span class=\"hljs-string\">\"shape\"<\/span>: [-<span class=\"hljs-number\">1<\/span>], <span class=\"hljs-string\">\"contents\"<\/span>: {<span class=\"hljs-string\">\"bytes_contents\"<\/span>: history}},<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"82\">                {<span class=\"hljs-string\">\"name\"<\/span>: <span class=\"hljs-string\">\"temperature\"<\/span>, <span class=\"hljs-string\">\"datatype\"<\/span>: <span class=\"hljs-string\">\"BYTES\"<\/span>, <span class=\"hljs-string\">\"shape\"<\/span>: [<span class=\"hljs-number\">1<\/span>], <span class=\"hljs-string\">\"contents\"<\/span>: {<span class=\"hljs-string\">\"bytes_contents\"<\/span>: [temperature]}},<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"83\">                {<span class=\"hljs-string\">\"name\"<\/span>: <span class=\"hljs-string\">\"max_token\"<\/span>, <span class=\"hljs-string\">\"datatype\"<\/span>: <span class=\"hljs-string\">\"BYTES\"<\/span>, <span class=\"hljs-string\">\"shape\"<\/span>: [<span class=\"hljs-number\">1<\/span>], <span class=\"hljs-string\">\"contents\"<\/span>: {<span class=\"hljs-string\">\"bytes_contents\"<\/span>: [max_token]}},<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"84\">                {<span class=\"hljs-string\">\"name\"<\/span>: <span class=\"hljs-string\">\"history_len\"<\/span>, <span class=\"hljs-string\">\"datatype\"<\/span>: <span class=\"hljs-string\">\"BYTES\"<\/span>, <span class=\"hljs-string\">\"shape\"<\/span>: [<span class=\"hljs-number\">1<\/span>], <span class=\"hljs-string\">\"contents\"<\/span>: {<span class=\"hljs-string\">\"bytes_contents\"<\/span>: [history_len]}}<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"85\">                ],<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"86\">            <span class=\"hljs-string\">\"outputs\"<\/span>: [{<span class=\"hljs-string\">\"name\"<\/span>: <span class=\"hljs-string\">\"response\"<\/span>},<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"87\">                        {<span class=\"hljs-string\">\"name\"<\/span>: <span class=\"hljs-string\">\"history\"<\/span>}]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"88\">            }<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"89\">        response = requests.post(url = url, <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"90\">                                 data = json.dumps(data, ensure_ascii=<span class=\"hljs-literal\">True<\/span>), <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"91\">                                 headers = {<span class=\"hljs-string\">\"Content_Type\"<\/span>: <span class=\"hljs-string\">\"application\/json\"<\/span>}, <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"92\">                                 timeout=<span class=\"hljs-number\">120<\/span>)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"93\">        <span class=\"hljs-keyword\">return<\/span> response <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"94\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"95\">    <span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">_call<\/span>(<span class=\"hljs-params\">self, <\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"96\">              query: <span class=\"hljs-built_in\">str<\/span>, <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"97\">              history: <span class=\"hljs-type\">List<\/span>[<span class=\"hljs-type\">List<\/span>[<span class=\"hljs-built_in\">str<\/span>]] =[], <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"98\">              stop: <span class=\"hljs-type\">Optional<\/span>[<span class=\"hljs-type\">List<\/span>[<span class=\"hljs-built_in\">str<\/span>]] =<span class=\"hljs-literal\">None<\/span>):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"99\">        temperature = <span class=\"hljs-built_in\">str<\/span>(self.temperature)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"100\">        max_token = <span class=\"hljs-built_in\">str<\/span>(self.max_token)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"101\">        history_len = <span class=\"hljs-built_in\">str<\/span>(self.history_len)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"102\">        url = self.url<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"103\">        response = self._infer(url, query, history, temperature, max_token, history_len)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"104\">        <span class=\"hljs-keyword\">if<\/span> response.status_code!=<span class=\"hljs-number\">200<\/span>:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"105\">            <span class=\"hljs-keyword\">return<\/span> <span class=\"hljs-string\">\"\u67e5\u8be2\u7ed3\u679c\u9519\u8bef\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"106\">        <span class=\"hljs-keyword\">if<\/span> stop <span class=\"hljs-keyword\">is<\/span> <span class=\"hljs-keyword\">not<\/span> <span class=\"hljs-literal\">None<\/span>:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"107\">            response = enforce_stop_tokens(response, stop)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"108\">        result = json.loads(response.text)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"109\">        <span class=\"hljs-comment\"># \u5904\u7406response<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"110\">        res = base64.b64decode(result[<span class=\"hljs-string\">'raw_output_contents'<\/span>][<span class=\"hljs-number\">0<\/span>].encode(<span class=\"hljs-string\">'utf-8'<\/span>))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"111\">        res_response = self._deserialize_bytes_tensor(res)[<span class=\"hljs-number\">0<\/span>].decode()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"112\">        <span class=\"hljs-keyword\">return<\/span> res_response<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"113\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"114\">    <span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">chat<\/span>(<span class=\"hljs-params\">self, <\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"115\">              query: <span class=\"hljs-built_in\">str<\/span>, <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"116\">              history: <span class=\"hljs-type\">List<\/span>[<span class=\"hljs-type\">List<\/span>[<span class=\"hljs-built_in\">str<\/span>]] =[], <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"117\">              stop: <span class=\"hljs-type\">Optional<\/span>[<span class=\"hljs-type\">List<\/span>[<span class=\"hljs-built_in\">str<\/span>]] =<span class=\"hljs-literal\">None<\/span>):<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"118\">        temperature = <span class=\"hljs-built_in\">str<\/span>(self.temperature)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"119\">        max_token = <span class=\"hljs-built_in\">str<\/span>(self.max_token)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"120\">        history_len = <span class=\"hljs-built_in\">str<\/span>(self.history_len)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"121\">        url = self.url<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"122\">        response = self._infer(url, query, history, temperature, max_token, history_len)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"123\">        <span class=\"hljs-keyword\">if<\/span> response.status_code!=<span class=\"hljs-number\">200<\/span>:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"124\">            <span class=\"hljs-keyword\">return<\/span> <span class=\"hljs-string\">\"\u67e5\u8be2\u7ed3\u679c\u9519\u8bef\"<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"125\">        <span class=\"hljs-keyword\">if<\/span> stop <span class=\"hljs-keyword\">is<\/span> <span class=\"hljs-keyword\">not<\/span> <span class=\"hljs-literal\">None<\/span>:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"126\">            response = enforce_stop_tokens(response, stop)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"127\">        result = json.loads(response.text)<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"128\">        <span class=\"hljs-comment\"># \u5904\u7406response<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"129\">        res = base64.b64decode(result[<span class=\"hljs-string\">'raw_output_contents'<\/span>][<span class=\"hljs-number\">0<\/span>].encode(<span class=\"hljs-string\">'utf-8'<\/span>))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"130\">        res_response = self._deserialize_bytes_tensor(res)[<span class=\"hljs-number\">0<\/span>].decode()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"131\">        <span class=\"hljs-comment\"># \u5904\u7406history<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"132\">        history_shape = result[<span class=\"hljs-string\">'outputs'<\/span>][<span class=\"hljs-number\">1<\/span>][<span class=\"hljs-string\">\"shape\"<\/span>]<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"133\">        history_enc = base64.b64decode(result[<span class=\"hljs-string\">'raw_output_contents'<\/span>][<span class=\"hljs-number\">1<\/span>].encode(<span class=\"hljs-string\">'utf-8'<\/span>))<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"134\">        res_history = np.array([i.decode() <span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> self._deserialize_bytes_tensor(history_enc)]).reshape(history_shape).tolist()<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"135\">        <span class=\"hljs-keyword\">return<\/span> res_response, res_history<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"136\">    <\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"137\"><span class=\"hljs-meta\">    @property<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"138\">    <span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">_identifying_params<\/span>(<span class=\"hljs-params\">self<\/span>) -&gt; Mapping[<span class=\"hljs-built_in\">str<\/span>, <span class=\"hljs-type\">Any<\/span>]:<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"139\">        <span class=\"hljs-string\">\"\"\"Get the identifying parameters.<\/span><\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"140\">        \"\"\"<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"141\">        _param_dict = {<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"142\">            <span class=\"hljs-string\">\"url\"<\/span>: self.url<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"143\">        }<\/span>\n<span class=\"code-block-extension-codeLine\" data-line-num=\"144\">        <span class=\"hljs-keyword\">return<\/span> _param_dict<\/span>\n<\/code><\/pre>\n<p>\u6ce8\u610f\uff1a\u6a21\u578b\u670d\u52a1\u8c03\u7528url\u7b49\u4e8e\u5728\u6a21\u578b\u90e8\u7f72\u9875\u9762\u8c03\u7528\u4fe1\u606fURL\u540e\u52a0\u4e0a&#8221; <a href=\"https:\/\/link.juejin.cn?target=http%3A%2F%2F9n-das-serving-lf2.jd.local%3A2000%2Fv1%2Ftriton%2Fmodels%2Fzhaofenglong6-chatglm2-6b%2Fversion%2FMutilBackendService%2FPredict\" target=\"_blank\" title=\"http:\/\/9n-das-serving-lf2.jd.local:2000\/v1\/triton\/models\/zhaofenglong6-chatglm2-6b\/version\/MutilBackendService\/Predict\" ref=\"nofollow noopener noreferrer\" rel=\"noopener\">MutilBackendService\/Predict<\/a> &#8220;<\/p>\n<h1 data-id=\"heading-10\">\u4e94.\u603b\u7ed3<\/h1>\n<p>\u672c\u6587\u8be6\u7ec6\u4ecb\u7ecd\u4e86\u5728\u96c6\u56e29n-triton\u5de5\u5177\u4e0a\u90e8\u7f72ChatGLM2-6B\u8fc7\u7a0b\uff0c\u5e0c\u671b\u53ef\u4ee5\u4e3a\u6709\u90e8\u7f72\u9700\u6c42\u7684\u540c\u5b66\u63d0\u4f9b\u4e00\u4e9b\u5e2e\u52a9\u3002<\/p>\n<blockquote>\n<p>\u4f5c\u8005\uff1a\u4eac\u4e1c\u4fdd\u9669&nbsp;\u8d75\u98ce\u9f99<\/p>\n<p>\u6765\u6e90\uff1a\u4eac\u4e1c\u4e91\u5f00\u53d1\u8005\u793e\u533a \u8f6c\u8f7d\u8bf7\u6ce8\u660e\u51fa\u5904<\/p>\n<\/blockquote>","protected":false},"excerpt":{"rendered":"<p>\u672c\u6587\u5c06\u4ecb\u7ecd\u6211\u5229\u7528\u96c6\u56e29n-triton\u5de5\u5177\u90e8\u7f72ChatGLM2-6B\u8fc7\u7a0b\u4e2d\u8e29\u8fc7\u7684\u4e00\u4e9b\u5751\uff0c\u5e0c\u671b\u53ef\u4ee5\u4e3a\u6709\u90e8\u7f72\u9700\u6c42\u7684\u540c\u5b66\u63d0\u4f9b\u4e00\u4e9b\u5e2e\u52a9\u3002<\/p>\n","protected":false},"author":1,"featured_media":4419,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"rank_math_title":"ChatGLM-6B \u90e8\u7f72\u4e0e\u5e94\u7528\uff1aTriton & Langchain \u5b9e\u6218 - \u5948\u65af\u79d1\u6280\u793e\u533a","rank_math_description":"\u672c\u6587\u8be6\u7ec6\u4ecb\u7ecd\u4e86\u5982\u4f55\u4f7f\u7528 Triton \u90e8\u7f72 ChatGLM-6B \u6a21\u578b\uff0c\u5e76\u96c6\u6210\u81f3 langchain \u8fdb\u884c\u5e94\u7528\u5f00\u53d1\u3002\u6db5\u76d6\u4e86\u73af\u5883\u914d\u7f6e\u3001\u6a21\u578b\u8f6c\u6362\u3001Python Backend \u7f16\u5199\u3001\u90e8\u7f72\u6d4b\u8bd5\u53ca\u96c6\u6210\u793a\u4f8b\uff0c\u4e3a\u5f00\u53d1\u8005\u63d0\u4f9b\u5b9e\u7528\u6307\u5357\u3002","rank_math_focus_keyword":"ChatGLM, Triton \u90e8\u7f72, LLM, langchain, \u6a21\u578b\u63a8\u7406","views":"11","footnotes":""},"categories":[3],"tags":[126,127,128,129,136],"collection":[],"class_list":["post-1277","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-fenlei2","tag-gpt","tag-ai","tag-128","tag-129","tag-136"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.nicekj.com\/nicekj2024\/wp\/v2\/posts\/1277","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.nicekj.com\/nicekj2024\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nicekj.com\/nicekj2024\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nicekj.com\/nicekj2024\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nicekj.com\/nicekj2024\/wp\/v2\/comments?post=1277"}],"version-history":[{"count":0,"href":"https:\/\/www.nicekj.com\/nicekj2024\/wp\/v2\/posts\/1277\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.nicekj.com\/nicekj2024\/wp\/v2\/media\/4419"}],"wp:attachment":[{"href":"https:\/\/www.nicekj.com\/nicekj2024\/wp\/v2\/media?parent=1277"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nicekj.com\/nicekj2024\/wp\/v2\/categories?post=1277"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nicekj.com\/nicekj2024\/wp\/v2\/tags?post=1277"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.nicekj.com\/nicekj2024\/wp\/v2\/collection?post=1277"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}