Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release 1.8.5 #270

Merged
merged 1 commit into from
Sep 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 3 additions & 7 deletions README.cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,12 @@

---

**Vega ver1.8.4 发布**
**Vega ver1.8.5 发布**

- 错误修正

- 修正ASHA算法更新数据时失败的问题。
- 修正HCCL+Apex下,loss不更新的问题。
- 增加字典类指标。
- 更新安全配置文档。
- 移除安全模式下对Horovod和TensorFlow的支持。
- 增加安全模型下对Python3.9及以上版本的要求。
- 修正SPNAS算法集群训练失败时的问题。
- 修正了安全模式下模型拷贝失败等问题。

---

Expand Down
11 changes: 4 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,13 @@

---

**Vega ver1.8.4 released**
**Vega ver1.8.5 released**

- Bug Fixed:

- Fixed bug that ASHA failed to update data.
- Fixed bug that loss is not updated on HCCL+Apex.
- Add dictionary metrics.
- Update the security configuration document.
- Not Allowed Horovod and TensorFlow in safe mode.
- Python 3.9 or later is required in the security model.
- Fixed a bug when the SPNAS algorithm cluster training fails.
- Fixed bugs such as model copy failure in safe mode.


---

Expand Down
2 changes: 1 addition & 1 deletion RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
**Vega ver1.8.4 released:**
**Vega ver1.8.5 released:**

**Introduction**

Expand Down
7 changes: 0 additions & 7 deletions docs/cn/user/ascend_910.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,10 +137,3 @@ pip3 install --user --no-deps noah-vega
```bash
pip3 show noah-vega
```

另外要注意的是,dask和distributed这两个包,需要安装如下版本:

```bash
pip3 install --user distributed==2021.7.0
pip3 install --user dask==2021.7.0
```
29 changes: 28 additions & 1 deletion docs/cn/user/security_configure.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Vega的安全配置,包括如下步骤:
1. **Python3.9及以上**
2. **dask和distributed版本为2022.2.0**


## 1.安装OpenSSL

首先要安装OpenSSL 1.1.1,从源码编译安装,或者直接安装编译后的发行包。
Expand Down Expand Up @@ -87,7 +88,7 @@ openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out s
rm server.csr
```

执行如下脚本生成评估服务客户端所使用的证书的加密私钥,执行该命令时,会提示输入加密密码,密码的强度要求如服务器端私钥,且和服务器段私钥密码不同,请记录好改密码,后继还需使用:
执行如下脚本生成评估服务客户端所使用的证书的加密私钥,执行该命令时,会提示输入加密密码,密码的强度要求如服务器端私钥,且和服务器端私钥密码不同,请记录好该密码,后继还需使用:

```shell
openssl genrsa -aes-256-ofb -out client.key 4096
Expand Down Expand Up @@ -172,6 +173,17 @@ chmod 600 ~/.vega/*
1. 如上的秘钥、证书、加密材料也可以放到其他目录位置,注意访问权限要设置为`600`,并在后继的配置文件中同步修改该文件的位置,需要使用绝对路径。
2. 在训练集群上,需要保留`ca.crt`、`client.key`、`client.crt`、`ksmaster_client.dat`、`ksstandby_client.dat`、`server_dask.key`、`server_dask.crt`、`client_dask.key`、`client_dask.crt`,并删除其他文件。
3. 评估服务上,需要保留`ca.crt`、`server.key`、`server.crt`、`ksmaster_server.dat`、`ksstandby_server.dat`,并删除其他文件。
4. 以下为默认配置的加密套件:

```txt
ECDHE-ECDSA-AES128-CCM:ECDHE-ECDSA-AES256-CCM:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-DSS-AES128-GCM-SHA256:DHE-DSS-AES256-GCM-SHA384:DHE-RSA-AES128-CCM:DHE-RSA-AES256-CCM
```

如需缩小范围,可在`client.ini`与`vega.ini`中加入配置:

```ini
ciphers=ECDHE-ECDSA-AES128-CCM:ECDHE-ECDSA-AES256-CCM
```

在`~/.vega`目录下创建`server.ini`和`client.ini`。

Expand Down Expand Up @@ -290,3 +302,18 @@ find ~/.local/ -name *.pem
### 9.5 Horovod 和 TensorFlow

在安全模式下,Vega不支持Horovod数据并行,也不支持TensorFlow框架,Vega在运行前检查若是Horovod数据并行程序,或者TensorFlow框架,会自动退出。

### 9.6 限定Distributed仅使用tls1.3协议进行通信

若需要限定开源软件Distributed的组件间的通信仅使用tls1.3协议,需要配置`~/.config/dask/distributed.yaml`

distributed.yaml:

```yaml
distributed:
comm:
tls:
min-version: 1.3
```

请参考Dask的[配置指导](https://docs.dask.org/en/stable/configuration.html)。
7 changes: 0 additions & 7 deletions docs/en/user/ascend_910.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,10 +144,3 @@ Run the following command to view the Vega dependency package:
```bash
pip3 show noah-vega
```

Note that the following versions must be installed for the dask and distributed packages:

```bash
pip3 install --user distributed==2021.7.0
pip3 install --user dask==2021.7.0
```
27 changes: 27 additions & 0 deletions docs/en/user/security_configure.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ requirements:
1. **Python 3.9 or later.**
2. **Dask and Distributed version is 2022.2.0.**


## 1. Install OpenSSL

You need to install OpenSSL 1.1.1, compile and install from the source code, or directly install the compiled release package.
Expand Down Expand Up @@ -173,6 +174,17 @@ Description:
1. The preceding keys, certificates, and encryption materials can also be stored in other directories. The access permission must be set to 600, and the file location must be changed to an absolute path in subsequent configuration files.
2. In the train cluster, reserve `ca.crt`, `client.key`, `client.crt`, `ksmaster_client.dat`, `ksstandby_client.dat`, and `server_dask.key`, `server_dask.crt`, `client_dask.key`, `client_dask.crt`, and delete other files.
3. In the evaluate service, reserve `ca.crt`, `server.key`, `server.crt`, `ksmaster_server.dat`, and `ksstandby_server.dat` files, and delete other files.
4. The default cipher suites are as follows::

```txt
ECDHE-ECDSA-AES128-CCM:ECDHE-ECDSA-AES256-CCM:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-DSS-AES128-GCM-SHA256:DHE-DSS-AES256-GCM-SHA384:DHE-RSA-AES128-CCM:DHE-RSA-AES256-CCM
```

To narrow down the scope, add configurations to the `client.ini` and `vega.ini` files:

```ini
ciphers=ECDHE-ECDSA-AES128-CCM:ECDHE-ECDSA-AES256-CCM
```

Create `server.ini` and `client.ini` in the `~/.vega` directory.

Expand Down Expand Up @@ -292,3 +304,18 @@ Find the private key file of the open-source software on which Vega depends amon
### 9.5 Horovod and TensorFlow

In security mode, Vega does not support Horovod or the TensorFlow framework. Vega automatically exits if Vega run on Horovod or the TensorFlow framework.

### 9.6 Only TLS 1.3 can be used for Distributed

If only the tls1.3 protocol needs to be used for communication between distributed components,configure `~/.config/dask/distributed.yaml`

distributed.yaml:

```yaml
distributed:
comm:
tls:
min-version: 1.3
```

For details, see the [Configuration Guide](https://docs.dask.org/en/stable/configuration.html)。
2 changes: 1 addition & 1 deletion evaluate_service/RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
**Evaluate Service ver1.8.4 released:**
**Evaluate Service ver1.8.5 released:**

**Introduction**

Expand Down
2 changes: 1 addition & 1 deletion evaluate_service/evaluate_service/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@

"""Evaluate service."""

__version__ = "1.8.4"
__version__ = "1.8.5"
6 changes: 4 additions & 2 deletions evaluate_service/evaluate_service/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,8 +176,10 @@ def upload_files(self):
logging.warning("The timestamp is {}.".format(self.now_time))
self.upload_file_path = os.path.join(self.current_path, "out", self.now_time)
self.share_dir = os.path.join(self.current_path, "out", self.job_id)
os.makedirs(self.upload_file_path)
os.makedirs(self.share_dir)
if not os.path.exists(self.upload_file_path):
os.makedirs(self.upload_file_path)
if not os.path.exists(self.share_dir):
os.makedirs(self.share_dir)
patterns = [".pkl", ".pth", ".pt", ".pb", ".ckpt", ".air", '.om',
".onnx", ".caffemodel", ".pbtxt", ".prototxt"]
model_file = request.files.get("model_file")
Expand Down
20 changes: 18 additions & 2 deletions evaluate_service/evaluate_service/run_flask.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

import configparser
import logging
import ssl
import os
from multiprocessing import Process
import gevent
Expand Down Expand Up @@ -76,13 +77,28 @@ def run_flask(app, host, port, security_mode):
encrypted_password = config.get('security').get('encrypted_password')
key_component_1 = config.get('security').get('key_component_1')
key_component_2 = config.get('security').get('key_component_2')
ciphers = config.get('security').get('ciphers')
cipher_suites = "ECDHE-ECDSA-AES128-CCM:ECDHE-ECDSA-AES256-CCM:ECDHE-ECDSA-AES128-GCM-SHA256" \
":ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384" \
":DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-DSS-AES128-GCM-SHA256" \
":DHE-DSS-AES256-GCM-SHA384:DHE-RSA-AES128-CCM:DHE-RSA-AES256-CCM"

if ciphers:
ciphersList = [cipher for cipher in ciphers.split(':') if cipher in cipher_suites.split(':')]
if ciphersList == []:
raise ssl.SSLError("The ciphers are invalid, please check.")
else:
ciphers = ':'.join(ciphersList)
else:
ciphers = cipher_suites

if not check_risky_files((ca_cert, server_cert, server_secret_key, key_component_1, key_component_2)):
return
try:
if encrypted_password == "":
ssl_context = create_context(ca_cert, server_cert, server_secret_key)
ssl_context = create_context(ca_cert, server_cert, server_secret_key, ciphers)
else:
ssl_context = create_context(ca_cert, server_cert, server_secret_key,
ssl_context = create_context(ca_cert, server_cert, server_secret_key, ciphers,
encrypted_password, key_component_1, key_component_2)
except Exception:
logging.error("Fail to create context.")
Expand Down
3 changes: 2 additions & 1 deletion evaluate_service/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def run(self):

setuptools.setup(
name="evaluate-service",
version="1.8.4",
version="1.8.5",
packages=["evaluate_service"],
include_package_data=True,
python_requires=">=3.6",
Expand All @@ -80,6 +80,7 @@ def run(self):
"Flask-RESTful",
"Flask-Limiter",
"gevent",
"PyYAML",
],
cmdclass={
"build_py": custom_build_py,
Expand Down
1 change: 1 addition & 0 deletions examples/data_augmentation/cyclesr/cyclesr.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
general:
backend: pytorch
requires: ["tensorboardX"]


pipeline: [fully_train]
Expand Down
1 change: 1 addition & 0 deletions examples/nas/modnas/darts.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
general:
backend: pytorch
requires: ["tensorboardX"]


pipeline: [nas, fully_train]
Expand Down
1 change: 1 addition & 0 deletions examples/nas/modnas/mbv2.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
general:
backend: pytorch
requires: ["tensorboardX"]


pipeline: [fully_train]
Expand Down
1 change: 1 addition & 0 deletions examples/nas/modnas/ps.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
general:
backend: pytorch
requires: ["tensorboardX"]


pipeline: [nas, fully_train]
Expand Down
1 change: 1 addition & 0 deletions examples/nas/modnas/pxl.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
general:
backend: pytorch
requires: ["tensorboardX"]


pipeline: [nas, fully_train]
Expand Down
3 changes: 1 addition & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

setuptools.setup(
name="noah-vega",
version="1.8.4",
version="1.8.5",
packages=["vega"],
include_package_data=True,
python_requires=">=3.6",
Expand Down Expand Up @@ -59,7 +59,6 @@
"dill",
"scikit-learn",
"opencv-python",
"tensorboardX",
],
entry_points="""
[console_scripts]
Expand Down
2 changes: 1 addition & 1 deletion vega/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
"get_quota",
]

__version__ = "1.8.4"
__version__ = "1.8.5"


import sys
Expand Down
6 changes: 4 additions & 2 deletions vega/algorithms/nas/sp_nas/spnas_trainer_callback.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
def valid():
"""Construct the trainer of SpNas."""
config_val = DatasetConfig().to_dict()
dataset_type = config_val.type
config_val = config_val['_class_data'].val
prefix = "FasterRcnn_eval.mindrecord"
mindrecord_dir = config_val.mindrecord_dir
Expand All @@ -49,7 +50,7 @@ def valid():
if not os.path.exists(mindrecord_file):
if not os.path.isdir(mindrecord_dir):
os.makedirs(mindrecord_dir)
if config_val.dataset == "coco":
if dataset_type == "CocoDataset":
if os.path.isdir(config_val.coco_root):
data_to_mindrecord_byte_image(config_val, "coco", False, prefix, file_num=1)
else:
Expand All @@ -67,6 +68,7 @@ def valid():
def train():
"""Train fasterrcnn dataset."""
config_train = DatasetConfig().to_dict()
dataset_type = config_train.type
config_train = config_train['_class_data'].train
prefix = "FasterRcnn.mindrecord"
mindrecord_dir = config_train.mindrecord_dir
Expand All @@ -78,7 +80,7 @@ def train():
if rank == 0 and not os.path.exists(mindrecord_file):
if not os.path.isdir(mindrecord_dir):
os.makedirs(mindrecord_dir)
if config.dataset == "coco":
if dataset_type == "CocoDataset":
if os.path.isdir(config_train.coco_root):
if not os.path.exists(config_train.coco_root):
logging.info("Please make sure config:coco_root is valid.")
Expand Down
4 changes: 2 additions & 2 deletions vega/algorithms/nas/sp_nas/src/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -361,10 +361,10 @@ def create_coco_label(is_training, config):
from pycocotools.coco import COCO

coco_root = config.coco_root
data_type = config.val_data_type
if is_training:
data_type = config.train_data_type

else:
data_type = config.val_data_type
# Classes need to train or test.
train_cls = config.coco_classes
train_cls_dict = {}
Expand Down
2 changes: 1 addition & 1 deletion vega/algorithms/nas/sp_nas/src/model_utils/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="default_
else:
parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice,
help=help_description)
args = parser.parse_args()
args, _ = parser.parse_known_args()
return args


Expand Down
12 changes: 7 additions & 5 deletions vega/common/backend_register.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@
"""Backend Register."""

import os
import sys
import logging
import traceback

__all__ = [
"set_backend",
Expand Down Expand Up @@ -135,7 +136,8 @@ def get_devices():

def import_extension_module():
"""Import extension module."""
try:
import ascend_automl
except ImportError:
pass
if is_npu_device():
try:
import ascend_automl
except ImportError:
logging.debug(traceback.format_exc())
Loading