br ing AUC in the GAOGB algorithm The next generation
ing AUC in the GAOGB algorithm. The next generation CAY10683 then pro-duced from cross-over and mutation operations. After running G0 iterations, the optimal parameter combination is obtained.
The data flow of the GAOGB computation process is shown in Fig. 1. At the preparation phase, the data are preprocessed, and partitioned into three parts: optimization set, training set, and test-ing set. The feasible regions of α, γ and τ are determined. At the optimization phase, the optimization set is used to find the
Fig. 1. Computation stages and data flow of the GAOGB model.
global optimal parameters with GA. The optimization set is further divided to perform a 3-fold cross validation and generate testing results. The testing AUC on the optimization set is the objective for maximization. The three parameters of interest (α, γ and τ ) are the real-valued decision variables. The classical GA follows a selection-crossover-mutation sequence, and a certain number of it-erations is set as the stopping criteria. At the model tune phase, the training and testing processes are subsequently performed, and performance metrics are generated at the end.
In terms of theoretical computational complexity, GAOGB, On-line Gradient Boosting with the Adaptive Linear Regressor (OLRGB), and Online Adaptive Boosting with the Adaptive Linear Regressor (OLRAB) in Parag et al. (2008) require O(TN2) running time; the OSELM and OLR both have O(T) time complexity. In a theoretical comparison, the time complexities of OSELM and OLR are lower than the GAOGB, OLRGB and OLRAB models due to their compara-tive simplicity.
4. Experimental results
4.1. Experimental settings
The computation process of GAOGB follows the flowchart in Fig. 1. Each dataset is randomly split into an optimization set, a training set, and a testing set with certain ratios. The optimiza-tion set adopts the OGB model in Algorithm 2 as the “objective function” and the AUC is the criteria for deciding on the optimal-ity of the parameters of interest. The GA is performed at 40 itera-tions. After optimization, training and testing are performed with the training and testing sets partitioned from the data prepara-tion phase. Optimal parameters are assigned to the OGB model in Algorithm 2, and a 3-fold cross-validation evaluator is proposed on each dataset. Based on 5 replications of cross-validation, the per-formance measures are calculated for the GAOGB model, presented in averages and standard deviations respectively.
The performance is evaluated based on computational effective-ness and e ciency. E ciency here is defined as retraining time upon arrival of a new batch data. The model effectiveness is as-sessed by testing accuracy, AUC, specificity, sensitivity, and the standard deviation (SD) of all 3 folds at each replication. The ex-periment is conducted on a Dell Desktop with processor 3.6 GHz Intel Core i7.
Other than the GAOGB model, 4 other models are adopted in the experiment to compare the results: OSELM, OLRAB, OLRGB, and the OLR. The o ine AdaBoost, SVM and MLP models are evalu-ated to demonstrate the superiority of GAOGB in retraining e -ciency and non or trivial trade-off in accuracy. Each model follows the same experiment settings in synchronization with the GAOGB model. The parameters for each model is uniformly set with care-ful experimental designs (Table 1). For the GAOGB model, α, γ and
Parameter setting. N: number of base learners; N1 : number of neurons.
GAOGB, OLRGB, OLRAB N = number of features
OSELM N1 = number of features
τ are decided for each dataset differently depending on the opti-mization results.