Skip to content

Commit

Permalink
feat: Text check before synchronization (#689)
Browse files Browse the repository at this point in the history
* fix: icon

* fix: web selector

* fix: web selector

* perf: link sync

* dev doc

* chomd doc

* perf: git intro

* 466 intro

* intro img

* add json editor (#5)

* team limit

* websync limit

* json editor

* text editor

* perf: search test

* change cq value type

* doc

* intro img

---------

Co-authored-by: heheer <[email protected]>
  • Loading branch information
c121914yu and newfish-cmyk authored Jan 4, 2024
1 parent c2abbb5 commit 8288290
Show file tree
Hide file tree
Showing 64 changed files with 1,789 additions and 1,489 deletions.
Binary file modified .github/imgs/intro1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .github/imgs/intro2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .github/imgs/intro3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .github/imgs/intro4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions docSite/content/docs/development/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ git clone [email protected]:<github_username>/FastGPT.git
### 5. 运行

```bash
# 给脚本代码执行权限
chmod -R +x ./scripts/
# 代码根目录下执行,会安装根 package、projects 和 packages 内所有依赖
pnpm i
# 切换到应用目录
Expand Down Expand Up @@ -105,6 +107,10 @@ docker build -t dockername/fastgpt:tag --build-arg name=app --build-arg proxy=ta
1. 如果你是连接远程的数据库,先检查对应的端口是否开放。
2. 如果是本地运行的数据库,可尝试`host`改成`localhost``127.0.0.1`

### sh ./scripts/postinstall.sh 没权限

FastGPT 在`pnpm i`后会执行`postinstall`脚本,用于自动生成`ChakraUI``Type`。如果没有权限,可以先执行`chmod -R +x ./scripts/`,再执行`pnpm i`

### 加入社区

遇到困难了吗?有任何问题吗? 加入微信群与开发者和用户保持沟通。
Expand Down
16 changes: 16 additions & 0 deletions docSite/content/docs/development/qa.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ OneAPI 中没有配置该模型渠道。

页面中是用 stream=true 模式,所以API也需要设置 stream=true 来进行测试。部分模型接口(国产居多)非 Stream 的兼容有点垃圾。

### Incorrect API key provided: sk-xxxx.You can find your api Key at xxx

OneAPI 的 API Key 配置错误,需要修改`OPENAI_API_KEY`环境变量,并重启容器(先 stop 然后 rm 掉,最后再 up -d 运行一次)。可以`exec`进入容器,`env`查看环境变量是否生效。

## Docker 部署常见问题

### 如何更新?
Expand Down Expand Up @@ -87,3 +91,15 @@ PG 数据库没有连接上/初始化失败,可以查看日志。FastGPT 会
mongo连接失败,检查
1. mongo 服务有没有起来(有些 cpu 不支持 AVX,无法用 mongo5,需要换成 mongo4.x,可以dockerhub找个最新的4.x,修改镜像版本,重新运行)
2. 环境变量(账号密码,注意host和port)

## 本地开发问题

### TypeError: Cannot read properties of null (reading 'useMemo' )

用 Node18 试试,可能最新的 Node 有问题。 本地开发流程:

1. 根目录: `pnpm i`
2. 复制 `config.json` -> `config.local.json`
3. 复制 `.env.template` -> `.env.local`
4. `cd projects/app`
5. `pnpm dev`
13 changes: 8 additions & 5 deletions docSite/content/docs/development/upgrading/466.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,11 @@ weight: 830

## V4.6.6 更新说明

1. 新增 - 搜索方式:分离向量语义检索,全文检索和重排,通过 RRF 进行排序合并。
2. 优化 - 问题分类提示词,id引导。测试国产商用 api 模型(百度阿里智谱讯飞)使用 Prompt 模式均可分类。
3. UI 优化,未来将逐步替换新的UI设计。
4. 优化代码:Icon 抽离和自动化获取。
5. 查看 [FastGPT 2024 RoadMap](https://github.com/labring/FastGPT?tab=readme-ov-file#-%E5%9C%A8%E7%BA%BF%E4%BD%BF%E7%94%A8)
1. 查看 [FastGPT 2024 RoadMap](https://github.com/labring/FastGPT?tab=readme-ov-file#-%E5%9C%A8%E7%BA%BF%E4%BD%BF%E7%94%A8)
2. 新增 - Http 模块请求头支持 Json 编辑器。
3. 新增 - [ReRank模型部署](/docs/development/custom-models/reranker/)
4. 新增 - 搜索方式:分离向量语义检索,全文检索和重排,通过 RRF 进行排序合并。
5. 优化 - 问题分类提示词,id引导。测试国产商用 api 模型(百度阿里智谱讯飞)使用 Prompt 模式均可分类。
6. UI 优化,未来将逐步替换新的UI设计。
7. 优化代码:Icon 抽离和自动化获取。
8. 修复 - 链接读取的数据集,未保存选择器,导致同步时不使用选择器。
1 change: 1 addition & 0 deletions packages/global/common/file/api.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@ export type UrlFetchParams = {
export type UrlFetchResponse = {
url: string;
content: string;
selector?: string;
}[];
2 changes: 1 addition & 1 deletion packages/global/common/string/tiktoken/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ export function countPromptTokens(
const text = `${role}\n${prompt}`;
try {
const encodeText = enc.encode(text);
return encodeText.length + 3; // 补充 role 估算值
return encodeText.length + role.length; // 补充 role 估算值
} catch (error) {
return text.length;
}
Expand Down
3 changes: 2 additions & 1 deletion packages/global/common/system/types/index.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ export type FastGPTFeConfigsType = {
google?: string;
};
limit?: {
exportLimitMinutes?: number;
exportDatasetLimitMinutes?: number;
websiteSyncLimitMinuted?: number;
};
scripts?: { [key: string]: string }[];
favicon?: string;
Expand Down
29 changes: 25 additions & 4 deletions packages/global/core/dataset/constant.ts
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,19 @@ export const DatasetCollectionTrainingTypeMap = {
}
};

export enum DatasetCollectionSyncResultEnum {
sameRaw = 'sameRaw',
success = 'success'
}
export const DatasetCollectionSyncResultMap = {
[DatasetCollectionSyncResultEnum.sameRaw]: {
label: 'core.dataset.collection.sync.result.sameRaw'
},
[DatasetCollectionSyncResultEnum.success]: {
label: 'core.dataset.collection.sync.result.success'
}
};

/* ------------ data -------------- */
export enum DatasetDataIndexTypeEnum {
chunk = 'chunk',
Expand Down Expand Up @@ -150,16 +163,24 @@ export enum SearchScoreTypeEnum {
}
export const SearchScoreTypeMap = {
[SearchScoreTypeEnum.embedding]: {
label: 'core.dataset.search.score.embedding'
label: 'core.dataset.search.score.embedding',
desc: 'core.dataset.search.score.embedding desc',
showScore: true
},
[SearchScoreTypeEnum.fullText]: {
label: 'core.dataset.search.score.fullText'
label: 'core.dataset.search.score.fullText',
desc: 'core.dataset.search.score.fullText desc',
showScore: false
},
[SearchScoreTypeEnum.reRank]: {
label: 'core.dataset.search.score.reRank'
label: 'core.dataset.search.score.reRank',
desc: 'core.dataset.search.score.reRank desc',
showScore: true
},
[SearchScoreTypeEnum.rrf]: {
label: 'core.dataset.search.score.rrf'
label: 'core.dataset.search.score.rrf',
desc: 'core.dataset.search.score.rrf desc',
showScore: false
}
};

Expand Down
5 changes: 4 additions & 1 deletion packages/global/core/dataset/type.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,10 @@ export type DatasetCollectionSchemaType = {
qaPrompt?: string;
rawTextLength?: number;
hashRawText?: string;
metadata?: Record<string, any>;
metadata?: {
webPageSelector?: string;
[key: string]: any;
};
};

export type DatasetDataIndexItemType = {
Expand Down
3 changes: 3 additions & 0 deletions packages/global/core/module/node/constant.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@ export enum FlowNodeInputTypeEnum {
slider = 'slider',
target = 'target', // data input
switch = 'switch',

// editor
textarea = 'textarea',
JSONEditor = 'JSONEditor',

addInputParam = 'addInputParam', // params input

Expand Down
2 changes: 1 addition & 1 deletion packages/global/core/module/template/system/http.ts
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ export const HttpModule: FlowModuleTemplateType = {
},
{
key: ModuleInputKeyEnum.httpHeader,
type: FlowNodeInputTypeEnum.textarea,
type: FlowNodeInputTypeEnum.JSONEditor,
valueType: ModuleIOValueTypeEnum.string,
label: 'core.module.input.label.Http Request Header',
description: 'core.module.input.description.Http Request Header',
Expand Down
4 changes: 4 additions & 0 deletions packages/global/support/user/team/type.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ export type TeamSchema = {
balance: number;
maxSize: number;
lastDatasetBillTime: Date;
limit: {
lastExportDatasetTime: Date;
lastWebsiteSyncTime: Date;
};
};

export type TeamMemberSchema = {
Expand Down
4 changes: 0 additions & 4 deletions packages/global/support/user/type.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,6 @@ export type UserModelSchema = {
key: string;
baseUrl: string;
};
limit: {
exportKbTime?: Date;
datasetMaxCount?: number;
};
};

export type UserType = {
Expand Down
28 changes: 16 additions & 12 deletions packages/service/common/string/cheerio.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ export const cheerioToHtml = ({
// get origin url
const originUrl = new URL(fetchUrl).origin;

const selectDom = $(selector || 'body');
const usedSelector = selector || 'body';
const selectDom = $(usedSelector);

// remove i element
selectDom.find('i,script').remove();
Expand Down Expand Up @@ -49,7 +50,10 @@ export const cheerioToHtml = ({
.get()
.join('\n');

return html;
return {
html,
usedSelector
};
};
export const urlsFetch = async ({
urlList,
Expand All @@ -66,25 +70,25 @@ export const urlsFetch = async ({
});

const $ = cheerio.load(fetchRes.data);

const md = await htmlToMarkdown(
cheerioToHtml({
fetchUrl: url,
$,
selector
})
);
const { html, usedSelector } = cheerioToHtml({
fetchUrl: url,
$,
selector
});
const md = await htmlToMarkdown(html);

return {
url,
content: md
content: md,
selector: usedSelector
};
} catch (error) {
console.log(error, 'fetch error');

return {
url,
content: ''
content: '',
selector: ''
};
}
})
Expand Down
3 changes: 3 additions & 0 deletions packages/service/common/string/markdown.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ export const htmlToMarkdown = (html?: string | null) =>
worker.terminate();
reject(err);
});
worker.on('exit', (code) => {
console.log('html 2 md finish', code);
});

worker.postMessage(html);
});
10 changes: 6 additions & 4 deletions packages/service/core/dataset/collection/controller.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,16 @@ export async function createOneCollection({
qaPrompt,
hashRawText,
rawTextLength,
metadata = {}
}: CreateDatasetCollectionParams & { teamId: string; tmbId: string }) {
metadata = {},
...props
}: CreateDatasetCollectionParams & { teamId: string; tmbId: string; [key: string]: any }) {
const { _id } = await MongoDatasetCollection.create({
name,
...props,
teamId,
tmbId,
datasetId,
parentId: parentId || null,
datasetId,
name,
type,
trainingType,
chunkSize,
Expand Down
1 change: 1 addition & 0 deletions packages/service/core/dataset/collection/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ const DatasetCollectionSchema = new Schema({
qaPrompt: {
type: String
},

rawTextLength: {
type: Number
},
Expand Down
Loading

0 comments on commit 8288290

Please sign in to comment.