Log of Streaming.

## 2018-01-27 Saturday

**Link:** Kaggle: Porto Seguro’s Safe Driver Prediction 1st place answer

### Video

### Notes

- at least 6 models explored
final model: a mix of 6 models on same features

- 1x lightgbm
5x nn

procedure (loop 6 times)

6 models (i.e., 6 times)

- fit 1 models
get the predicted value

results: 6 predicted values from 6 mdoels respectively

average 6 predicted values (all weights=1)

`#1 + #2 + #3 + #4 + #5 + #6`

: .2965`#1 + #2`

: 0.29502 on private

Nonlinear stacking

Bootstrap Method: statistic varible of statistic variable

Example: a sample of 100 values (x)

goal: get an estimate of the mean of the sample.

old way: calculate the mean directly from the sample:

`mean(x) = 1/100 * sum(x)`

bootstrap:

- define sample size in each sub-samples,such as 1000
define number of sub-samples, e.g. 3

i.e., 3 sub-samples with 1000 samples each

- Create random sub-samples of dataset with replacement
- Calculate the mean of each sub-sample.
- Calculate the average of all of collected means
- use that as estimated mean for the data.

Bootstrap Aggregation (Bagging): combine the predictions from multiple machine learning algorithms together to make more accurate predictions than any individual model.

reduce the variance for those algorithm that have high variance. An algorithm that has high variance are decision trees, like

**classification and regression trees (CART)**. lower variance, increase biasExample of CART: a sample dataset of 1000 instances (x) and we are using the CART algorithm. Bagging of the CART algorithm would work as follows.

- Create many (e.g. 100) random sub-samples of our dataset with replacement.
- Train a CART model on each sample.
- Given a new dataset, calculate the average prediction from each model.

### the way to share notes

I will post the notes on my blog.

- easy search <= search bar
- comment section for each note
- embed YouTube video
- achive page