记录格式基本按昨天定的格式来写,先前我一般是先自己边学习,边整理代码,同时也不断问搜索大神,最后整理好代码后再写博客。现在觉得着这种方式不太好,主要是最后写的时候,并不能很好记录过程中学到的知识。所以以后要在学习的过程中就开始写,并把学到的知识点一一记录,这样应该更翔实,以后复查会更方便。

问题描述

过拟合就是就是拟合的过了头,太假不中用,表现上是对于训练数据各种评估指标都很好,但是对于测试数据则不然。而欠拟合可以认为是过拟合的一个相对概念,拟合吗,不是少点就是多点,实际也不会有绝对的拟合。
在这里教程中主要就处理了过拟合的问题,实际中主要也是过拟合的问题。测试数据采用的是前面文本分类的数据,在这里数据被处理成了另一种形式。影评是段话,假设词汇总量是1w,那么就可以用一个长度为1w的数组,如果影评里出现了某个单词就在它的索引位置置1,这就是本例采用的数据格式。
从这里我们可以看出,数据转换必然带有信息丢失,不过呢,细想下,固定的单词可组成的句子也是有限,必然还包含评价影片好坏的信息,也算有一定的合理性。
处理数据样例
上图是一个影评数据处理后的直方图,由于字典排序是根据单词使用频率来处理的,所以从上图也可看出越往前越稠密,符合数据的特征。

拟合源码

(venv) xianfa@xianfa:~/machinelearning/study$ cat fittingproblem.py 
import tensorflow as tf
from tensorflow import keras

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

def multi_hot_sequences(sequences, dimension):
    # Create an all-zero matrix of shape (len(sequences), dimension)
    results = np.zeros((len(sequences), dimension))
    for i, word_indices in enumerate(sequences):
        results[i, word_indices] = 1.0  # set specific indices of results[i] to 1s
    return results

def plot_history(histories, key='binary_crossentropy'):
    plt.clf()
    plt.figure(figsize=(16,10))
    
    for name, history in histories:
        val = plt.plot(history.epoch, history.history['val_'+key], '--', label=name.title()+' Val')
        plt.plot(history.epoch, history.history[key], color=val[0].get_color(), label=name.title()+' Train')

        plt.xlabel('Epochs')
        plt.ylabel(key.replace('_',' ').title())
        plt.legend()

        plt.xlim([0,max(history.epoch)])

if __name__ == '__main__':
    #prehandle train and test data
    NUM_WORDS = 10000
    (train_data, train_labels), (test_data, test_labels) = keras.datasets.imdb.load_data(num_words=NUM_WORDS)

    train_data = multi_hot_sequences(train_data, dimension=NUM_WORDS)
    test_data = multi_hot_sequences(test_data, dimension=NUM_WORDS)

    #draw figure with one data
    plt.plot(train_data[0])
    plt.savefig('./traindata.jpg')

    #train baseline model
    print('\ntrain baseline model start:\n')
    baseline_model = keras.Sequential([
        # `input_shape` is only required here so that `.summary` works.
        keras.layers.Dense(16, activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
        keras.layers.Dense(16, activation=tf.nn.relu),
        keras.layers.Dense(1, activation=tf.nn.sigmoid)
        ])

    baseline_model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['accuracy', 'binary_crossentropy'])

    baseline_model.summary()

    baseline_history = baseline_model.fit(
        train_data,
        train_labels,
        epochs=20,
        batch_size=512,
        validation_data=(test_data, test_labels),
        verbose=2)
    print('\ntrain baseline model finish\n')

    #train smaller model
    print('\ntrain smaller model start:\n')
    smaller_model = keras.Sequential([
        keras.layers.Dense(4, activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
        keras.layers.Dense(4, activation=tf.nn.relu),
        keras.layers.Dense(1, activation=tf.nn.sigmoid)
        ])

    smaller_model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['accuracy', 'binary_crossentropy'])

    smaller_model.summary()

    smaller_history = smaller_model.fit(
        train_data,
        train_labels,
        epochs=20,
        batch_size=512,
        validation_data=(test_data, test_labels),
        verbose=2)
    print('\ntrain smaller model finish\n')

    #train bigger model
    print('\ntrain bigger model start:\n')
    bigger_model = keras.models.Sequential([
        keras.layers.Dense(512, activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
        keras.layers.Dense(512, activation=tf.nn.relu),
        keras.layers.Dense(1, activation=tf.nn.sigmoid)
        ])

    bigger_model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['accuracy','binary_crossentropy'])

    bigger_model.summary()

    bigger_history = bigger_model.fit(
        train_data,
        train_labels,
        epochs=20,
        batch_size=512,
        validation_data=(test_data, test_labels),
        verbose=2)
    print('\ntrain bigger model finish\n')

    #draw the training info and save figure
    plot_history([('baseline', baseline_history),('smaller', smaller_history),('bigger', bigger_history)])
    plt.savefig('./modelsizehist.jpg')

    #train regular model
    print('\ntrain regular model start:\n')
    l2_model = keras.models.Sequential([
        keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001), activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
        keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),activation=tf.nn.relu),
        keras.layers.Dense(1, activation=tf.nn.sigmoid)
        ])

    l2_model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['accuracy', 'binary_crossentropy'])

    l2_model_history = l2_model.fit(
        train_data,
        train_labels,
        epochs=20,
        batch_size=512,
        validation_data=(test_data, test_labels),
        verbose=2)
    print('\ntrain regular model finish\n')

    #draw baseline and regular history training figure and save it
    plot_history([('baseline', baseline_history), ('l2', l2_model_history)])
    plt.savefig('./modelregular.jpg')

    #train dropout model
    print('\ntrain dropout model start:\n')
    dpt_model = keras.models.Sequential([
        keras.layers.Dense(16, activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
        keras.layers.Dropout(0.5),
        keras.layers.Dense(16, activation=tf.nn.relu),
        keras.layers.Dropout(0.5),
        keras.layers.Dense(1, activation=tf.nn.sigmoid)
        ])

    dpt_model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['accuracy','binary_crossentropy'])

    dpt_model_history = dpt_model.fit(
        train_data,
        train_labels,
        epochs=20,
        batch_size=512,
        validation_data=(test_data, test_labels),
        verbose=2)
    print('\ntrain dropout model finish\n')

    #draw baseline and dropout history training figure and save it
    plot_history([('baseline', baseline_history), ('dropout', dpt_model_history)])
    plt.savefig('./modeldropout.jpg')

源码分析

执行结果

(venv) xianfa@xianfa:~/machinelearning/study$ python fittingproblem.py 

train baseline model start:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 16)                160016    
_________________________________________________________________
dense_1 (Dense)              (None, 16)                272       
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 17        
=================================================================
Total params: 160,305
Trainable params: 160,305
Non-trainable params: 0
_________________________________________________________________
Train on 25000 samples, validate on 25000 samples
Epoch 1/20
2019-01-23 12:38:38.513041: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
 - 5s - loss: 0.5094 - acc: 0.7991 - binary_crossentropy: 0.5094 - val_loss: 0.3536 - val_acc: 0.8704 - val_binary_crossentropy: 0.3536
Epoch 2/20
 - 4s - loss: 0.2619 - acc: 0.9067 - binary_crossentropy: 0.2619 - val_loss: 0.2859 - val_acc: 0.8868 - val_binary_crossentropy: 0.2859
Epoch 3/20
 - 4s - loss: 0.1908 - acc: 0.9340 - binary_crossentropy: 0.1908 - val_loss: 0.2865 - val_acc: 0.8861 - val_binary_crossentropy: 0.2865
Epoch 4/20
 - 4s - loss: 0.1532 - acc: 0.9480 - binary_crossentropy: 0.1532 - val_loss: 0.3060 - val_acc: 0.8813 - val_binary_crossentropy: 0.3060
Epoch 5/20
 - 4s - loss: 0.1275 - acc: 0.9575 - binary_crossentropy: 0.1275 - val_loss: 0.3304 - val_acc: 0.8761 - val_binary_crossentropy: 0.3304
Epoch 6/20
 - 4s - loss: 0.1063 - acc: 0.9666 - binary_crossentropy: 0.1063 - val_loss: 0.3558 - val_acc: 0.8718 - val_binary_crossentropy: 0.3558
Epoch 7/20
 - 4s - loss: 0.0878 - acc: 0.9732 - binary_crossentropy: 0.0878 - val_loss: 0.4015 - val_acc: 0.8643 - val_binary_crossentropy: 0.4015
Epoch 8/20
 - 4s - loss: 0.0713 - acc: 0.9800 - binary_crossentropy: 0.0713 - val_loss: 0.4284 - val_acc: 0.8660 - val_binary_crossentropy: 0.4284
Epoch 9/20
 - 4s - loss: 0.0555 - acc: 0.9864 - binary_crossentropy: 0.0555 - val_loss: 0.4697 - val_acc: 0.8635 - val_binary_crossentropy: 0.4697
Epoch 10/20
 - 4s - loss: 0.0418 - acc: 0.9913 - binary_crossentropy: 0.0418 - val_loss: 0.5180 - val_acc: 0.8590 - val_binary_crossentropy: 0.5180
Epoch 11/20
 - 4s - loss: 0.0308 - acc: 0.9948 - binary_crossentropy: 0.0308 - val_loss: 0.5621 - val_acc: 0.8597 - val_binary_crossentropy: 0.5621
Epoch 12/20
 - 4s - loss: 0.0222 - acc: 0.9971 - binary_crossentropy: 0.0222 - val_loss: 0.6059 - val_acc: 0.8580 - val_binary_crossentropy: 0.6059
Epoch 13/20
 - 4s - loss: 0.0157 - acc: 0.9982 - binary_crossentropy: 0.0157 - val_loss: 0.6448 - val_acc: 0.8575 - val_binary_crossentropy: 0.6448
Epoch 14/20
 - 4s - loss: 0.0110 - acc: 0.9993 - binary_crossentropy: 0.0110 - val_loss: 0.6933 - val_acc: 0.8545 - val_binary_crossentropy: 0.6933
Epoch 15/20
 - 4s - loss: 0.0075 - acc: 0.9998 - binary_crossentropy: 0.0075 - val_loss: 0.7348 - val_acc: 0.8546 - val_binary_crossentropy: 0.7348
Epoch 16/20
 - 4s - loss: 0.0054 - acc: 0.9998 - binary_crossentropy: 0.0054 - val_loss: 0.7726 - val_acc: 0.8538 - val_binary_crossentropy: 0.7726
Epoch 17/20
 - 4s - loss: 0.0039 - acc: 1.0000 - binary_crossentropy: 0.0039 - val_loss: 0.8065 - val_acc: 0.8545 - val_binary_crossentropy: 0.8065
Epoch 18/20
 - 4s - loss: 0.0030 - acc: 1.0000 - binary_crossentropy: 0.0030 - val_loss: 0.8398 - val_acc: 0.8535 - val_binary_crossentropy: 0.8398
Epoch 19/20
 - 4s - loss: 0.0023 - acc: 1.0000 - binary_crossentropy: 0.0023 - val_loss: 0.8661 - val_acc: 0.8543 - val_binary_crossentropy: 0.8661
Epoch 20/20
 - 4s - loss: 0.0019 - acc: 1.0000 - binary_crossentropy: 0.0019 - val_loss: 0.8947 - val_acc: 0.8532 - val_binary_crossentropy: 0.8947

train baseline model finish


train smaller model start:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_3 (Dense)              (None, 4)                 40004     
_________________________________________________________________
dense_4 (Dense)              (None, 4)                 20        
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 5         
=================================================================
Total params: 40,029
Trainable params: 40,029
Non-trainable params: 0
_________________________________________________________________
Train on 25000 samples, validate on 25000 samples
Epoch 1/20
 - 5s - loss: 0.5612 - acc: 0.7657 - binary_crossentropy: 0.5612 - val_loss: 0.4375 - val_acc: 0.8655 - val_binary_crossentropy: 0.4375
Epoch 2/20
 - 4s - loss: 0.3460 - acc: 0.8968 - binary_crossentropy: 0.3460 - val_loss: 0.3374 - val_acc: 0.8807 - val_binary_crossentropy: 0.3374
Epoch 3/20
 - 4s - loss: 0.2613 - acc: 0.9177 - binary_crossentropy: 0.2613 - val_loss: 0.2999 - val_acc: 0.8878 - val_binary_crossentropy: 0.2999
Epoch 4/20
 - 4s - loss: 0.2160 - acc: 0.9310 - binary_crossentropy: 0.2160 - val_loss: 0.2879 - val_acc: 0.8872 - val_binary_crossentropy: 0.2879
Epoch 5/20
 - 4s - loss: 0.1865 - acc: 0.9410 - binary_crossentropy: 0.1865 - val_loss: 0.2867 - val_acc: 0.8850 - val_binary_crossentropy: 0.2867
Epoch 6/20
 - 4s - loss: 0.1644 - acc: 0.9471 - binary_crossentropy: 0.1644 - val_loss: 0.2875 - val_acc: 0.8849 - val_binary_crossentropy: 0.2875
Epoch 7/20
 - 4s - loss: 0.1468 - acc: 0.9549 - binary_crossentropy: 0.1468 - val_loss: 0.2941 - val_acc: 0.8820 - val_binary_crossentropy: 0.2941
Epoch 8/20
 - 4s - loss: 0.1327 - acc: 0.9596 - binary_crossentropy: 0.1327 - val_loss: 0.3039 - val_acc: 0.8809 - val_binary_crossentropy: 0.3039
Epoch 9/20
 - 4s - loss: 0.1211 - acc: 0.9632 - binary_crossentropy: 0.1211 - val_loss: 0.3156 - val_acc: 0.8778 - val_binary_crossentropy: 0.3156
Epoch 10/20
 - 4s - loss: 0.1100 - acc: 0.9682 - binary_crossentropy: 0.1100 - val_loss: 0.3271 - val_acc: 0.8760 - val_binary_crossentropy: 0.3271
Epoch 11/20
 - 4s - loss: 0.1007 - acc: 0.9721 - binary_crossentropy: 0.1007 - val_loss: 0.3401 - val_acc: 0.8742 - val_binary_crossentropy: 0.3401
Epoch 12/20
 - 4s - loss: 0.0916 - acc: 0.9748 - binary_crossentropy: 0.0916 - val_loss: 0.3556 - val_acc: 0.8714 - val_binary_crossentropy: 0.3556
Epoch 13/20
 - 4s - loss: 0.0838 - acc: 0.9783 - binary_crossentropy: 0.0838 - val_loss: 0.3689 - val_acc: 0.8695 - val_binary_crossentropy: 0.3689
Epoch 14/20
 - 4s - loss: 0.0769 - acc: 0.9809 - binary_crossentropy: 0.0769 - val_loss: 0.3862 - val_acc: 0.8680 - val_binary_crossentropy: 0.3862
Epoch 15/20
 - 4s - loss: 0.0702 - acc: 0.9840 - binary_crossentropy: 0.0702 - val_loss: 0.4001 - val_acc: 0.8672 - val_binary_crossentropy: 0.4001
Epoch 16/20
 - 4s - loss: 0.0644 - acc: 0.9861 - binary_crossentropy: 0.0644 - val_loss: 0.4174 - val_acc: 0.8658 - val_binary_crossentropy: 0.4174
Epoch 17/20
 - 4s - loss: 0.0590 - acc: 0.9879 - binary_crossentropy: 0.0590 - val_loss: 0.4354 - val_acc: 0.8628 - val_binary_crossentropy: 0.4354
Epoch 18/20
 - 4s - loss: 0.0537 - acc: 0.9899 - binary_crossentropy: 0.0537 - val_loss: 0.4597 - val_acc: 0.8601 - val_binary_crossentropy: 0.4597
Epoch 19/20
 - 4s - loss: 0.0488 - acc: 0.9916 - binary_crossentropy: 0.0488 - val_loss: 0.4717 - val_acc: 0.8596 - val_binary_crossentropy: 0.4717
Epoch 20/20
 - 4s - loss: 0.0445 - acc: 0.9926 - binary_crossentropy: 0.0445 - val_loss: 0.4908 - val_acc: 0.8596 - val_binary_crossentropy: 0.4908

train smaller model finish


train bigger model start:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_6 (Dense)              (None, 512)               5120512   
_________________________________________________________________
dense_7 (Dense)              (None, 512)               262656    
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 513       
=================================================================
Total params: 5,383,681
Trainable params: 5,383,681
Non-trainable params: 0
_________________________________________________________________
Train on 25000 samples, validate on 25000 samples
Epoch 1/20
 - 26s - loss: 0.3472 - acc: 0.8489 - binary_crossentropy: 0.3472 - val_loss: 0.2959 - val_acc: 0.8805 - val_binary_crossentropy: 0.2959
Epoch 2/20
 - 24s - loss: 0.1468 - acc: 0.9462 - binary_crossentropy: 0.1468 - val_loss: 0.3497 - val_acc: 0.8663 - val_binary_crossentropy: 0.3497
Epoch 3/20
 - 24s - loss: 0.0532 - acc: 0.9838 - binary_crossentropy: 0.0532 - val_loss: 0.4134 - val_acc: 0.8665 - val_binary_crossentropy: 0.4134
Epoch 4/20
 - 25s - loss: 0.0096 - acc: 0.9984 - binary_crossentropy: 0.0096 - val_loss: 0.5677 - val_acc: 0.8693 - val_binary_crossentropy: 0.5677
Epoch 5/20
 - 26s - loss: 0.0012 - acc: 1.0000 - binary_crossentropy: 0.0012 - val_loss: 0.6544 - val_acc: 0.8694 - val_binary_crossentropy: 0.6544
Epoch 6/20
 - 25s - loss: 2.9839e-04 - acc: 1.0000 - binary_crossentropy: 2.9839e-04 - val_loss: 0.6945 - val_acc: 0.8689 - val_binary_crossentropy: 0.6945
Epoch 7/20
 - 25s - loss: 1.7431e-04 - acc: 1.0000 - binary_crossentropy: 1.7431e-04 - val_loss: 0.7204 - val_acc: 0.8698 - val_binary_crossentropy: 0.7204
Epoch 8/20
 - 27s - loss: 1.2371e-04 - acc: 1.0000 - binary_crossentropy: 1.2371e-04 - val_loss: 0.7399 - val_acc: 0.8702 - val_binary_crossentropy: 0.7399
Epoch 9/20
 - 30s - loss: 9.4242e-05 - acc: 1.0000 - binary_crossentropy: 9.4242e-05 - val_loss: 0.7567 - val_acc: 0.8704 - val_binary_crossentropy: 0.7567
Epoch 10/20
 - 26s - loss: 7.4600e-05 - acc: 1.0000 - binary_crossentropy: 7.4600e-05 - val_loss: 0.7693 - val_acc: 0.8704 - val_binary_crossentropy: 0.7693
Epoch 11/20
 - 25s - loss: 6.0715e-05 - acc: 1.0000 - binary_crossentropy: 6.0715e-05 - val_loss: 0.7816 - val_acc: 0.8704 - val_binary_crossentropy: 0.7816
Epoch 12/20
 - 25s - loss: 5.0323e-05 - acc: 1.0000 - binary_crossentropy: 5.0323e-05 - val_loss: 0.7926 - val_acc: 0.8705 - val_binary_crossentropy: 0.7926
Epoch 13/20
 - 26s - loss: 4.2414e-05 - acc: 1.0000 - binary_crossentropy: 4.2414e-05 - val_loss: 0.8019 - val_acc: 0.8708 - val_binary_crossentropy: 0.8019
Epoch 14/20
 - 25s - loss: 3.6263e-05 - acc: 1.0000 - binary_crossentropy: 3.6263e-05 - val_loss: 0.8112 - val_acc: 0.8706 - val_binary_crossentropy: 0.8112
Epoch 15/20
 - 25s - loss: 3.1279e-05 - acc: 1.0000 - binary_crossentropy: 3.1279e-05 - val_loss: 0.8194 - val_acc: 0.8706 - val_binary_crossentropy: 0.8194
Epoch 16/20
 - 24s - loss: 2.7264e-05 - acc: 1.0000 - binary_crossentropy: 2.7264e-05 - val_loss: 0.8266 - val_acc: 0.8708 - val_binary_crossentropy: 0.8266
Epoch 17/20
 - 25s - loss: 2.3933e-05 - acc: 1.0000 - binary_crossentropy: 2.3933e-05 - val_loss: 0.8348 - val_acc: 0.8706 - val_binary_crossentropy: 0.8348
Epoch 18/20
 - 25s - loss: 2.1173e-05 - acc: 1.0000 - binary_crossentropy: 2.1173e-05 - val_loss: 0.8409 - val_acc: 0.8708 - val_binary_crossentropy: 0.8409
Epoch 19/20
 - 24s - loss: 1.8799e-05 - acc: 1.0000 - binary_crossentropy: 1.8799e-05 - val_loss: 0.8472 - val_acc: 0.8709 - val_binary_crossentropy: 0.8472
Epoch 20/20
 - 24s - loss: 1.6804e-05 - acc: 1.0000 - binary_crossentropy: 1.6804e-05 - val_loss: 0.8540 - val_acc: 0.8708 - val_binary_crossentropy: 0.8540

train bigger model finish


train regular model start:

Train on 25000 samples, validate on 25000 samples
Epoch 1/20
 - 5s - loss: 0.5381 - acc: 0.7912 - binary_crossentropy: 0.4994 - val_loss: 0.3870 - val_acc: 0.8764 - val_binary_crossentropy: 0.3476
Epoch 2/20
 - 4s - loss: 0.3047 - acc: 0.9085 - binary_crossentropy: 0.2612 - val_loss: 0.3326 - val_acc: 0.8878 - val_binary_crossentropy: 0.2858
Epoch 3/20
 - 4s - loss: 0.2496 - acc: 0.9294 - binary_crossentropy: 0.2001 - val_loss: 0.3357 - val_acc: 0.8852 - val_binary_crossentropy: 0.2840
Epoch 4/20
 - 4s - loss: 0.2232 - acc: 0.9420 - binary_crossentropy: 0.1697 - val_loss: 0.3511 - val_acc: 0.8828 - val_binary_crossentropy: 0.2962
Epoch 5/20
 - 4s - loss: 0.2096 - acc: 0.9488 - binary_crossentropy: 0.1531 - val_loss: 0.3653 - val_acc: 0.8789 - val_binary_crossentropy: 0.3076
Epoch 6/20
 - 4s - loss: 0.1959 - acc: 0.9538 - binary_crossentropy: 0.1375 - val_loss: 0.3821 - val_acc: 0.8751 - val_binary_crossentropy: 0.3230
Epoch 7/20
 - 4s - loss: 0.1883 - acc: 0.9579 - binary_crossentropy: 0.1280 - val_loss: 0.3970 - val_acc: 0.8719 - val_binary_crossentropy: 0.3360
Epoch 8/20
 - 4s - loss: 0.1801 - acc: 0.9616 - binary_crossentropy: 0.1182 - val_loss: 0.4116 - val_acc: 0.8715 - val_binary_crossentropy: 0.3493
Epoch 9/20
 - 4s - loss: 0.1737 - acc: 0.9632 - binary_crossentropy: 0.1107 - val_loss: 0.4332 - val_acc: 0.8690 - val_binary_crossentropy: 0.3693
Epoch 10/20
 - 4s - loss: 0.1711 - acc: 0.9658 - binary_crossentropy: 0.1065 - val_loss: 0.4486 - val_acc: 0.8654 - val_binary_crossentropy: 0.3833
Epoch 11/20
 - 4s - loss: 0.1646 - acc: 0.9675 - binary_crossentropy: 0.0986 - val_loss: 0.4616 - val_acc: 0.8648 - val_binary_crossentropy: 0.3949
Epoch 12/20
 - 4s - loss: 0.1599 - acc: 0.9694 - binary_crossentropy: 0.0928 - val_loss: 0.4817 - val_acc: 0.8603 - val_binary_crossentropy: 0.4139
Epoch 13/20
 - 4s - loss: 0.1568 - acc: 0.9714 - binary_crossentropy: 0.0881 - val_loss: 0.4930 - val_acc: 0.8613 - val_binary_crossentropy: 0.4236
Epoch 14/20
 - 4s - loss: 0.1471 - acc: 0.9757 - binary_crossentropy: 0.0777 - val_loss: 0.5021 - val_acc: 0.8596 - val_binary_crossentropy: 0.4325
Epoch 15/20
 - 4s - loss: 0.1392 - acc: 0.9806 - binary_crossentropy: 0.0697 - val_loss: 0.5143 - val_acc: 0.8608 - val_binary_crossentropy: 0.4447
Epoch 16/20
 - 4s - loss: 0.1347 - acc: 0.9818 - binary_crossentropy: 0.0649 - val_loss: 0.5279 - val_acc: 0.8594 - val_binary_crossentropy: 0.4579
Epoch 17/20
 - 4s - loss: 0.1308 - acc: 0.9842 - binary_crossentropy: 0.0608 - val_loss: 0.5355 - val_acc: 0.8582 - val_binary_crossentropy: 0.4650
Epoch 18/20
 - 4s - loss: 0.1281 - acc: 0.9842 - binary_crossentropy: 0.0575 - val_loss: 0.5494 - val_acc: 0.8582 - val_binary_crossentropy: 0.4783
Epoch 19/20
 - 4s - loss: 0.1242 - acc: 0.9862 - binary_crossentropy: 0.0531 - val_loss: 0.5661 - val_acc: 0.8590 - val_binary_crossentropy: 0.4951
Epoch 20/20
 - 4s - loss: 0.1218 - acc: 0.9872 - binary_crossentropy: 0.0503 - val_loss: 0.5762 - val_acc: 0.8561 - val_binary_crossentropy: 0.5042

train regular model finish


train dropout model start:

Train on 25000 samples, validate on 25000 samples
Epoch 1/20
 - 5s - loss: 0.6305 - acc: 0.6341 - binary_crossentropy: 0.6305 - val_loss: 0.4923 - val_acc: 0.8501 - val_binary_crossentropy: 0.4923
Epoch 2/20
 - 4s - loss: 0.4668 - acc: 0.7954 - binary_crossentropy: 0.4668 - val_loss: 0.3483 - val_acc: 0.8759 - val_binary_crossentropy: 0.3483
Epoch 3/20
 - 4s - loss: 0.3640 - acc: 0.8632 - binary_crossentropy: 0.3640 - val_loss: 0.2912 - val_acc: 0.8879 - val_binary_crossentropy: 0.2912
Epoch 4/20
 - 4s - loss: 0.2980 - acc: 0.8948 - binary_crossentropy: 0.2980 - val_loss: 0.2751 - val_acc: 0.8881 - val_binary_crossentropy: 0.2751
Epoch 5/20
 - 4s - loss: 0.2459 - acc: 0.9132 - binary_crossentropy: 0.2459 - val_loss: 0.2791 - val_acc: 0.8884 - val_binary_crossentropy: 0.2791
Epoch 6/20
 - 4s - loss: 0.2136 - acc: 0.9248 - binary_crossentropy: 0.2136 - val_loss: 0.2851 - val_acc: 0.8854 - val_binary_crossentropy: 0.2851
Epoch 7/20
 - 4s - loss: 0.1856 - acc: 0.9348 - binary_crossentropy: 0.1856 - val_loss: 0.3015 - val_acc: 0.8849 - val_binary_crossentropy: 0.3015
Epoch 8/20
 - 4s - loss: 0.1685 - acc: 0.9417 - binary_crossentropy: 0.1685 - val_loss: 0.3253 - val_acc: 0.8846 - val_binary_crossentropy: 0.3253
Epoch 9/20
 - 4s - loss: 0.1509 - acc: 0.9468 - binary_crossentropy: 0.1509 - val_loss: 0.3386 - val_acc: 0.8828 - val_binary_crossentropy: 0.3386
Epoch 10/20
 - 4s - loss: 0.1337 - acc: 0.9530 - binary_crossentropy: 0.1337 - val_loss: 0.3482 - val_acc: 0.8803 - val_binary_crossentropy: 0.3482
Epoch 11/20
 - 4s - loss: 0.1198 - acc: 0.9571 - binary_crossentropy: 0.1198 - val_loss: 0.3727 - val_acc: 0.8790 - val_binary_crossentropy: 0.3727
Epoch 12/20
 - 4s - loss: 0.1113 - acc: 0.9590 - binary_crossentropy: 0.1113 - val_loss: 0.3993 - val_acc: 0.8790 - val_binary_crossentropy: 0.3993
Epoch 13/20
 - 4s - loss: 0.1040 - acc: 0.9613 - binary_crossentropy: 0.1040 - val_loss: 0.4301 - val_acc: 0.8771 - val_binary_crossentropy: 0.4301
Epoch 14/20
 - 4s - loss: 0.0957 - acc: 0.9688 - binary_crossentropy: 0.0957 - val_loss: 0.4428 - val_acc: 0.8776 - val_binary_crossentropy: 0.4428
Epoch 15/20
 - 4s - loss: 0.0887 - acc: 0.9705 - binary_crossentropy: 0.0887 - val_loss: 0.4550 - val_acc: 0.8766 - val_binary_crossentropy: 0.4550
Epoch 16/20
 - 4s - loss: 0.0819 - acc: 0.9743 - binary_crossentropy: 0.0819 - val_loss: 0.4761 - val_acc: 0.8745 - val_binary_crossentropy: 0.4761
Epoch 17/20
 - 4s - loss: 0.0782 - acc: 0.9750 - binary_crossentropy: 0.0782 - val_loss: 0.5006 - val_acc: 0.8751 - val_binary_crossentropy: 0.5006
Epoch 18/20
 - 4s - loss: 0.0743 - acc: 0.9756 - binary_crossentropy: 0.0743 - val_loss: 0.5437 - val_acc: 0.8752 - val_binary_crossentropy: 0.5437
Epoch 19/20
 - 4s - loss: 0.0724 - acc: 0.9762 - binary_crossentropy: 0.0724 - val_loss: 0.5682 - val_acc: 0.8743 - val_binary_crossentropy: 0.5682
Epoch 20/20
 - 4s - loss: 0.0695 - acc: 0.9774 - binary_crossentropy: 0.0695 - val_loss: 0.5705 - val_acc: 0.8733 - val_binary_crossentropy: 0.5705

train dropout model finish

在本人机器上经过14分钟的等待,终于执行完毕。以上为控制台输出信息,还有几张附图。简略说下处理的过程,首先是数据预处理,然后按照基准模型,小模型,大模型,基于基准模型的正则模型,基于基准模型的dropout模型,分别进行了训练,并将相应的训练过程绘图保存,通过图例说明了防止过拟合使用的方法。

知识问题

for i, word_indices in enumerate(sequences):results[i, word_indices] = 1.0 对于这两句代码,是python遍历数组的一种方式,sequences在本例中是所有的训练数据,每一条训练数据都是一个一维数组。另外自己总停留在C++的层次上来思考这个代码,总认为遍历二维数组需要两次循环,所以就理解不了,后来把索引当成二维来理解的时候终于想通,python里面的enumerate并不会按行处理行向量,它处理的直接就是内部的元素,所以这里处理的直接就是某个单词。
for name, history in histories:遍历元组的一种方法,histories是个一个数组,其内部成员是一个元组,元组的成员是一个字符串和一个训练历史记录。name.title(),name是一个字符串,这里将字符串里的每个单词的首字母变为大写。 binary_crossentropy这个问了下搜索大神,应该是对数损失函数,与它对应得是激励函数,至于那些暂时不深究,不能忘了我的学习计划,先学会用,再学原理。当然了这里它的意思不可能是一个函数,而是一个值,这里简单理解为某种形式的loss取对数的结果吧,反正本质上就是loss。如果注意看下运行输出的结果,可以发现除了正则化模型外,其他模型loss=binary_crossentropy。每次作图都用acc,loss这些通用得不行么,非得每次都换?感谢大神用心,只有这样不断变换本人才能学到新得东西。 从程序执行的结果图示可以看出防止过拟合的三种方法:

采用较小模型
不同大小模型的训练验证损失曲线
上图主要看虚线,值越小越好,明显看出smaller模型表现得更好,而对于基准和大模型其实从前面训练acc为1就可以看出过拟合了。

采用权重正则化模型
基准和权重正则化训练对比
上图同样看虚线,值越小越好,明显看出L2模型表现得更好,本质上是降低网络复杂性,使用较小得权重,使权重分布更加规则。

采用添加dropout层模型
基准和添加dropout层模型训练对比
上图同样看虚线,值越小越好,可以看出dropout模型表现得更好,本质上是使网络正则化,训练时随机丢弃某些神经元的结果,也就相当于这些神经元不起作用,也可以认为减小了模型规模,本人先这么粗犷的理解。

学习总结

记录博客方式更改,不再采用先学后记的方式,而是采用边学边记录的方式,好记性不如烂笔头,更何况还没好记性。手册里面还有一条增加训练样本可以降低过拟合,这个本质上也是模型规模,样本越多,模型规模不变,就相当于样本不变,模型规模变小。主要学到了防止过拟合的处理方法,相反方向立马就可推出防止欠拟合的方法。