Log of Streaming.
- at least 6 models explored
final model: a mix of 6 models on same features
- 1x lightgbm
procedure (loop 6 times)
6 models (i.e., 6 times)
- fit 1 models
get the predicted value
results: 6 predicted values from 6 mdoels respectively
average 6 predicted values (all weights=1)
#1 + #2 + #3 + #4 + #5 + #6: .2965
#1 + #2: 0.29502 on private
Bootstrap Method: statistic varible of statistic variable
Example: a sample of 100 values (x)
goal: get an estimate of the mean of the sample.
old way: calculate the mean directly from the sample:
mean(x) = 1/100 * sum(x)
- define sample size in each sub-samples,such as 1000
define number of sub-samples, e.g. 3
i.e., 3 sub-samples with 1000 samples each
- Create random sub-samples of dataset with replacement
- Calculate the mean of each sub-sample.
- Calculate the average of all of collected means
- use that as estimated mean for the data.
Bootstrap Aggregation (Bagging): combine the predictions from multiple machine learning algorithms together to make more accurate predictions than any individual model.
reduce the variance for those algorithm that have high variance. An algorithm that has high variance are decision trees, like classification and regression trees (CART). lower variance, increase bias
Example of CART: a sample dataset of 1000 instances (x) and we are using the CART algorithm. Bagging of the CART algorithm would work as follows.
- Create many (e.g. 100) random sub-samples of our dataset with replacement.
- Train a CART model on each sample.
- Given a new dataset, calculate the average prediction from each model.
I will post the notes on my blog.
- easy search <= search bar
- comment section for each note
- embed YouTube video
- achive page