内容提取服务

服务简码 HNLP

默认端口 8230

默认路由 /hnlp/**

组件编码 hzero-nlp

简介

1.1 概述

自然语言提取。

1.2 组件坐标

OP版本

<dependency>
    <groupId>org.hzero</groupId>
    <artifactId>hzero-nlp</artifactId>
    <version>${hzero.service.version}</version>
</dependency>

SaaS版本

<dependency>
	<groupId>org.hzero</groupId>
    <artifactId>hzero-nlp-saas</artifactId>
    <version>${hzero.service.version}</version>
</dependency>

1.3 主要功能

基础数据管理
模板管理
词语映射
内容提取测试

1.4 服务配置参数

# 文本识别接口
nlp.python.url: ${NLP_PYTHON_URL:http://python.hzero.org:5000/text_extract}
# 缓存获取接口
nlp.python.evict: ${NLP_PYTHON_EVICT:http://python.hzero.org:5000/cache_evict}

部署指导

2.1 CoreNLP

硬件要求：

4GB RAM+
2 Cores+

操作系统:

Centos,Ubuntu等linux发行版
Docker

部署命令:

docker run --name=corenlp -p 9000:9000 -d registry.saas.hand-china.com/hzero/corenlp:1.0

容器内监听端口：tcp/9000

2.2 Python NLP

硬件要求：

4GB RAM+
4Cores+

操作系统:

Centos,Ubuntu等linux发行版
Docker

部署镜像：

docker run --name=nlp-worker\
 -p 5000:5000\
 -d -e CORE_NLP_HOST=http://192.168.11.167\
 registry.choerodon.com.cn/hzero-hzero/hzero-nlp-worker:0.10.1.RELEASE

通过-p指定外部监听端口号，通过-e 指定环境变量参数，具体参数见下表
容器内监听端口：tcp/5000 也可由WSGI_BIND参数指定

2.3 环境变量：

变量名称	变量说明	示例
WSGI_WORKERS	WSGI web服务器工作线程数，建议不要超过服务器cpu核心数两倍	4
WSGI_BIND	Web服务监听端口	0.0.0.0:5000
CACHE_NUM	内存缓存条目数量，文本识别时所有的内存缓存，根据服务器内存大小调整	60000
CACHE_TTL	内存缓存失效时间，单位是秒	86400
CORE_NLP_HOST	CoreNLP服务host地址	http://localhost
CORE_NLP_PORT	CoreNLP服务端口号	9000
MONGO_URL	mongodb的url	mongodb://user:passsword@172.20.0.201:27017
MONGO_DB	mongodb中的数据库名称	hzero_nlp
REDIS_HOST	redis服务ip	redis.hzero.org
REDIS_PORT	redis端口号	6379
REDIS_DB	redis db号	1

开发指导

3.1 接口调用说明

介绍使用NLP识别功能所需的三个接口的使用方法。

3.2 Oauth Token获取接口

以下是通过OkHttp发送请求的代码片段。

OkHttpClient client = new OkHttpClient(); 

MediaType mediaType = MediaType.parse("application/x-www-form-urlencoded");
// 通过form的形式进行传参
RequestBody body = RequestBody.create(mediaType, "grant_type=password&username=77572307&password=YWRtaW4xMjM%3D&scope=default");
Request request = new Request.Builder()
  // 请求地址，其中host，port需要根据实际情况进行修改
  .url("http://host:port/oauth/oauth/token")
  .post(body)
  .addHeader("Content-Type", "application/x-www-form-urlencoded")
  // 指定Oauth client的 clientid 和secret
  .addHeader("Authorization", "Basic Y2xpZW50OnNlY3JldA==")
  .build();

Response response = client.newCall(request).execute();

Oauth client的clientId与secret使用Basic认证,内容格式为client:secret，以英文冒号分隔，对以上内容进行Base64编码，可得Y2xpZW50OnNlY3JldA==。
form参数：

grant_type=password, 固定
username=xxxxx,用户名根据实际情况
password=xxxxx,base64编码的密码
scope=default,固定

响应报文如下：

{
    "access_token": "63c13913-8e24-4fbf-8925-92f0f009e661",
    "token_type": "bearer",
    "refresh_token": "55d951d2-d62c-4fdb-ad2c-688416da71ed",
    "expires_in": 71192,
    "scope": "default"
}

响应结果中的access_token即为后续需要用到的权限认证token。

3.3 识别请求接口

以下是通过OkHttp发送请求的代码片段。

OkHttpClient client = new OkHttpClient();

MediaType mediaType = MediaType.parse("application/json");
RequestBody body = RequestBody.create(mediaType, "{\n    \"text\": \"我明天要去北京\",\n    \"templateCode\": \"test-2019-06-24\",\n    \"context\": [\n      {\"contextKey\":\"1\",\"contextType\":\"select_tenant\"}\n    ]\n  }");
Request request = new Request.Builder()
  .url("http://host:port/hnlp/v1/0/text-extract/do")
  .post(body)
  .addHeader("Content-Type", "application/json")
  // 设置token
  .addHeader("Authorization", "Bearer 67c1b371-d6c4-4812-a853-59603f40b6e4")
  .build();

Response response = client.newCall(request).execute();

请求路径：http:// host:port /hnlp/v1/ 0 /text-extract/do
其中0代表用户的租户id。host port 需要根据实际情况给定。请求头中需要添加bearer token。
请求报文格式如下：

{
    "text": "我明天要去北京", # 识别文本
    "templateCode": "test-2019-06-24", # 识别需要使用到的模版编码
    "context": [
      {"contextKey":"1","contextType":"select_tenant"} # 识别时查询基础数据的上下文条件，
                                                       # 只有当基础数据上下文全部满足时才会被用作识别
    ]
}

后端调用可以使用nlp客户端进行内容提取

3.4 基础数据同步接口

请求路径： http:// host:port /hnlp/v1/ 0 /basic-datas/send
请求方法：POST
其中0代表使用的租户id。 host port 根据实际情况进行替换。
请求头:

header	value
Content-Type	application/json
Authorization	Bearer xxxxxx(token)

请求使用json进行传参。
Body：

{
  "action": "UPSERT", #可以为UPSERT或者DELETE，分别代表更新与删除
  "context": [
    {
      "contextKey": "key",
      "contextType": "type" #上下文约束
    }
  ],
  "dataKey": "string", #基础数据的code
  "dataType": "string", #数据类型
  "id": "string", #数据id
  "value": "string" #数据用于识别的值
}

基础数据采用id-dataType-tenantId为唯一索引，当传入已存在的基础数据时，会覆盖已有的基础数据。

3.5 基础数据调度接口

请求路径： http:// host:port /v1/ 0 /basic-datas/scheduling
请求方法：POST
其中0代表使用的租户id。 host port 根据实际情况进行替换。
此功能需要配合接口平台使用，将目标接口定义在接口平台
目标接口的返回结构需要满足结构：

[
  {
    "context": [
      {
        "contextKey": "string",
        "contextType": "string"
      }
    ],
    "dataKey": "string",
    "dataType": "string",
    "id": "string",
    "value": "string"
  }
]