Skip to content

SWU-DS-Kaggle/B-DontOverFit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 

Repository files navigation

B-DontOverFit

๐Ÿ˜์ˆ˜๋ฏผ

shap์ด๋ž€?

  • ๋ธ”๋ž™๋ฐ•์Šค ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•œ ๋งค์†Œ๋“œ์ด๋‹ค. ์ตœ๊ทผ์— ๋งŽ์ด ์“ฐ๋Š” DNN, ๋กœ์ง€์Šคํ‹ฑ, ๊ทธ๋ž˜๋””์–ธํŠธ ๋ถ€์ŠคํŒ… ๋ชจ๋ธ๋“ฑ์€ ๋ธ”๋ž™๋ฐ•์Šค ๋ชจ๋ธ์ธ๋ฐ, ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์—์„œ ๊ฒฐ๊ณผ๋ฅผ ํ•ด์„ํ•˜๊ณ  '์™œ'์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™”๋Š”์ง€ ์„ค๋ช…ํ•ด์ฃผ๊ธฐ ์œ„ํ•˜์—ฌ shap๊ฐ™์€ ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

์œ ๋ž˜?

  • ๊ฒŒ์ž„์ด๋ก  ์ค‘ shapley values์— ๊ธฐ์ดˆํ•˜์—ฌ ๋‚˜์™”๋‹ค. shapley values๋Š” ๊ฐ ๊ณตํ—Œ์ž๊ฐ€ ์–ผ๋งˆ๋‚˜ ๊ณตํ—Œํ–ˆ๋Š”์ง€ ๋‚˜ํƒ€๋‚ด๋Š” ์ˆ˜์น˜์ด๋‹ค. ํ˜‘๋ ฅ๊ณผ ๋น„ํ˜‘๋ ฅ์— ๋”ฐ๋ฅธ ์˜ํ–ฅ์„ ๊ณ„์‚ฐํ•œ๋‹ค.

๋ฌธ์ œ์ ?

  • ๋ณ€์ˆ˜๊ฐ„ ๋†’์€ ์ƒ๊ด€๊ด€๊ณ„์— ์ทจ์•ฝํ•˜๋‹ค

์ฐธ๊ณ ) https://shap-lrjball.readthedocs.io/en/latest/index.html

๐Ÿ˜ํ˜œ๋นˆ

eli5: ํ•ญ๋ชฉ์˜ ์ค‘์š”๋„๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ  ์˜์‚ฌ๊ฒฐ์ • ํŠธ๋ฆฌ, ํŠธ๋ฆฌ๊ธฐ๋ฐ˜ ์•™์ƒ๋ธ” ์˜ˆ์ธก, ๊ฐ ํ•ญ๋ชฉ์ด ์ตœ์ข… ๊ฒฐ๊ณผ์— ์–ผ๋งˆ๋‚˜ ๊ธฐ์—ฌํ–ˆ๋Š”์ง€ => ๋ฐ์ดํ„ฐ์—์„œ ์˜ํ–ฅ๋ ฅ์ด ์—†๋Š” ๊ฒƒ์€ ๋นผ๋Š” ๊ณผ์ • ๋ธ”๋ž™๋ฐ•์Šค: ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋ญ”๊ฐ€๋ฅผ ๋„ฃ์œผ๋ฉด ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ค์ง€๋งŒ ๋ฌด์Šจ ์ผ์ด ์ผ์–ด๋‚˜๋Š”์ง€ ๋ชจ๋ฅผ ๋•Œ <-> ์™œ ์ด๋Ÿฐ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™”๋Š”์ง€ ๋ณผ ์ˆ˜ ์žˆ๊ฒŒ ์–ด๋–ค ํ•ญ๋ชฉ์ด ์ค‘์š”ํ–ˆ๋Š”์ง€๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

๐Ÿ˜๋ฏผ์ง€

"๊ต์ฐจ๊ฒ€์ฆ"

cross validation : ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ ์ ์„๋•Œ fold out ํ‰๊ฐ€์˜ ๋‚ฎ์€ ์‹ ๋ขฐ์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์…‹๊ณผ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ์…‹์„ ์—ฌ๋Ÿฌ๋ฒˆ ๋งŒ๋“ค์–ด ํ•™์Šต-์˜ˆ์ธก์„ ์—ฌ๋Ÿฌ๋ฒˆ ์ˆ˜ํ–‰ํ•˜๊ณ  ํ‰๊ท  ์ •ํ™•๋„๋ฅผ ๊ตฌํ•˜๋Š” ํ‰๊ฐ€๋ฐฉ์‹ -> ๊ต์ฐจ๊ฒ€์ฆ์„ ํ†ตํ•ด ์ข€ ๋” ์‹ ๋ขฐ์„ฑ์žˆ๋Š” ๊ฒ€์‚ฌ๋ฅผ ํ•  ์ˆ˜ ์žˆ๋‹ค

<์ข…๋ฅ˜>

  1. k-fold cross validation
  2. shuffle split cross validation
  3. leave-one-out cross validation
  4. leave-p-out cross validation

์บก์ฒ˜

k fold

k=5์ผ๋•Œ ๋žœ๋ค์œผ๋กœ ํŠธ๋ ˆ์ธ ์„ธํŠธ๋ฅผ 5์กฐ๊ฐ์œผ๋กœ ๋‚˜๋ˆ„์–ด 4๊ฐœ๋Š” ํŠธ๋ ˆ์ธ์šฉ 1๊ฐœ๋Š” ๊ต์ฐจ๊ฒ€์ฆ์šฉ์œผ๋กœ ์‚ฌ์šฉ

k fold cross validation

k ํด๋“œ ๊ต์ฐจ ๊ฒ€์ฆ

k=5, 5์„ธํŠธ๋ฅผ ๋งŒ๋“ค๋ฉด 4์„ธํŠธ๊ฐ€ ํ•™์Šต ๋ฐ์ดํ„ฐ(train ์šฉ)๊ฐ€ ๋˜๊ณ  5๋ฒˆ์งธ ์„ธํŠธ๋Š” ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ(๊ต์ฐจ ๊ฒ€์ฆ์šฉ)๋ฅผ ์œ„ํ•œ๊ฒƒ

k k-1 -> ํŠธ๋ ˆ์ธ ์…‹ 1 -> ํ…Œ์ŠคํŠธ ์…‹ k๋ฒˆ ํšŸ์ˆ˜๋งŒํผ ๊ต์ฐจ๊ฒ€์ฆ์„ ๋ฐ˜๋ณต k=5๋ฉด ์ •ํ™•๋„๊ฐ€ 5๊ฐœ -> ๊ทธ๊ฑฐ์˜ ํ‰๊ท 

kFold์—์„œ ๋‚˜๋ˆˆ ๋ถ€๋ถ„์ด ํŽธํ–ฅ๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์‚ฌ์ง„์ฒ˜๋Ÿผ ์—ฌ๊ธฐ์ €๊ธฐ์„œ ์ˆ˜์ง‘ํ•จ

์ผ๋ฐ˜์ ์œผ๋กœ kFold๋Š” ํšŒ๊ท€์—์„œ ์‚ฌ์šฉํ•˜๊ณ  startifiedKFold๋Š” ๋ถ„๋ฅ˜์— ์‚ฌ์šฉ๋œ๋‹ค

star

startified K Fold

๊ณ„์ธตํ™” k๊ฒน ๊ต์ฐจ๋“ฑ๊ธ‰ : ๊ณ„์ธตํ™”๋œ ํด๋“œ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” k ํด๋“œ์˜ ๋ณ€ํ˜•์ด๋‹ค -> ๋ฐ์ดํ„ฐ๊ฐ€ ํŽธํ–ฅ๋˜๋ฉด k ํด๋“œ ๊ต์ฐจ ๊ฒ€์ฆ์ด ์ œ๋Œ€๋กœ ์ˆ˜ํ–‰๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ๋‹ค -> ๊ทธ๋ž˜์„œ ํŠธ๋ ˆ์ธ ์„ธํŠธ์™€ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ์„ธํŠธ๋ฅผ ์—ฌ๊ธฐ์ €๊ธฐ์„œ ์ˆ˜์ง‘ํ•จ

๊ฐ์„ธํŠธ์—๋Š” ๊ฐ ์„ธํŠธ์˜ ์ƒ˜ํ”Œ์ด ์ „์ฒด ์„ธํŠธ์™€ ๊ฑฐ์˜ ๊ฐ™์€ ๋น„์œจ๋กœ ํฌํ•จ๋˜์–ด์ž‡์Œ

g

Repeated Stratified K Fold

: startifiedKFold์™€ ๋น„์Šทํ•จ ์–˜๋Š” ๊ทธ ์•ˆ์—์„œ ๋ฐ˜๋ณต๋จ -> ๊ฐ ๋ฐ˜๋ณต๋งˆ๋‹ค ๋˜ ๋žœ๋ค์œผ๋กœ n๋ฒˆ ๋ฐ˜๋ณต "๊ต์ฐจ๊ฒ€์ฆ ๋‚ด๋ถ€์— ๊ต์ฐจ๊ฒ€์ฆ์ด ์žˆ๋‹ค"

๐Ÿ˜์„ฑ์œค

keywords : default parameter -> select best features -> run grid search -> train best model -> feature selection of each models

models used : Logistic Regression, AdaBoost Classifier, SGD Classifier, SVC

image

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •