NLP之BERT英⽂阅读理解问答SQuAD2.0超详细教程环境
linux
python 3.6
tensorflow 1.12.0
⽂件准备⼯作
下载bert源代码 :
下载bert的预训练模型:
12-layer, 768-hidden, 12-heads, 110M parameters
数据准备⼯作
建⽴⼀个$SQUAD_DIR⽂件夹,把下载好的⽂件放到⽂件夹下。
编码
在bert⽂件夹下的run_squad.py中comment掉以下⼏⾏
if(len(qa["answers"])!=1)and(not is_impossible):
raise ValueError(
"For training, each question should have exactly 1 answer.")
编写运⾏脚本
在GPU服务器上,你可以这么运⾏BERT_BASE:
新建⼀个运⾏脚本⽂件名为“run.sh”,将⽂件内容编辑为:
export SQUAD_DIR=⾃⼰建的$SQUAD_DIR路径
export BERT_BASE_DIR=预训练模型所在路径
python run_squad.py \
--vocab_file=$BERT_BASE_ \
-
-bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
--do_train=True \
--train_file=$SQUAD_DIR/train-v2.0.json \
--do_predict=True \
--predict_file=$SQUAD_DIR/dev-v2.0.json \
--train_batch_size=12 \
--learning_rate=3e-5 \
--num_train_epochs=2.0 \
--max_seq_length=328 \
--doc_stride=128 \
-
-output_dir=/tmp/squad_base \
--version_2_with_negative=True \
--null_score_diff_threshold=$THRESH
$THRESH取-1到-5之间的值
Google 设定的max_seq_length参数的default值是328,因为我的训练⽂本⽐较长,这⾥我修改成了512。
运⾏脚本
chmod +x run.sh
./run.sh
chmod +x 的意思就是给⽂件执⾏权限
运⾏的时间可能会有点久,视配置⽽定。运⾏结束后,会看到以下结果。
⽣成的结果储存在 /tmp/squad_base/路径下:
调参/预测
流程:
1. 在bert⽂件夹⾥新建⼀个/squad/⽂件夹,把/tmp/squad_base/路径下的predictions.json和null_odds.json放到/squad/⾥。
2. 使⽤以下指令给 dev set做预测&给$THRESH调参
python $SQUAD_DIR/evaluate-v2.0.py $SQUAD_DIR/dev-v2.0.json ./squad/predictions.json --na-prob-file ./squad/null_odds.json
效果(THRESH=-1):
踩过的坑
1. 问题:运⾏run.sh报错:ValueError: For training, each question should have exactly 1 answer.
(tensorflow)[isi@localhost bert_squad]$ ./run.sh
Traceback (most recent call last):
File "run_squad.py", line 1282, in<module>
tf.app.run()
File "/u01/isi/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_it(main(argv))
File "run_squad.py", line 1158, in main
input_ain_file, is_training=True)
File "run_squad.py", line 267, in read_squad_examples
"For training, each question should have exactly 1 answer.")
ValueError: For training, each question should have exactly 1 answer.
解决⽅法:
打开run_squad.py,到265-267⾏,comment掉以下代码。
#          if (len(qa["answers"]) != 1) and (not is_impossible):
#            raise ValueError(
#                "For training, each question should have exactly 1 answer.")
2. ResourceExhaustedError 没内存了
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[12,12,512,512] and type float on /job:localhost/replica: 0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node bert/encoder/layer_9/attention/self/Softmax (defined at /u01/isi/jingyiwang/bert_squad/modeling.py:720)= Softmax[T=DT_FLOAT, _dev ice="/job:localhost/replica:0/task:0/device:GPU:0"](bert/encoder/layer_9/attention/self/add)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocatio n info.
[[{{node truediv/_4029}}= _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localh ost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3857_truediv", tensor_type=DT_FLOAT, _device="/job:localho st/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocatio n info.
解决⽅法:
运⾏过程中遇到内存耗尽造成的花式报错,最后调⽤了4个GPU成功了。调⽤GPU只需要在run.sh开头添加
export CUDA_VISIBLE_DEVICES=1,2,3,4(GPU编号)
3. 使⽤这段脚本python $SQUAD_DIR/evaluate-v2.0.py $SQUAD_DIR/dev-v2.0.json ./squad/predictions.json --na-prob-file ./squad/null_odds.json给
预测null和⾮空答案的阈值调参的时候,遇到以下错误:
Traceback (most recent call last):
File "squad_dir/evaluate-v2.0.py", line 276, in<module>
main()
File "squad_dir/evaluate-v2.0.py", line 236, in main
preds = json.load(f)
File "/anaconda3/envs/tensorflow/lib/python3.6/json/__init__.py", line 299, in load
tensorflow入门教程parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/anaconda3/envs/tensorflow/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/anaconda3/envs/tensorflow/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/anaconda3/envs/tensorflow/lib/python3.6/json/decoder.py", line 357, in raw_decode    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
原因:
忘记删除之间错误版本的prediction.json⽂件
解决⽅法:
把代码跑通⽣成的那版prediction.json⽂件放在squad⽂件夹⾥
资料
本⽂格式参考