Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Kaggle boschコンペ振り返り

17 723 vues

Publié le

kaggle boschコンペに参加し15/1373位に入りました。
その時にやったことのまとめです。

feature engineeringを頑張ったほか、xgboostの機能を使って色々なアイデアを試して見ました(今回のコンペではこちらは精度に貢献しませんでしたが)。

Publié dans : Données & analyses
  • Hi there! Essay Help For Students | Discount 10% for your first order! - Check our website! https://vk.cc/80SakO
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Kaggle boschコンペ振り返り

  1. 1. Bosch Production Line Performance 2017/1/20 hskksk 1
  2. 2. • • • Result • • 2
  3. 3. bosch production line performance ※ 3
  4. 4. 4
  5. 5. In this competition, Bosch is challenging Kagglers to predict internal failures using thousands of measurements and tests made for each component along the assembly line. This would enable Bosch to bring quality products at lower costs to the end user. 5
  6. 6. • : 2016/8/17 • : 2016/11/12 • 2016/9 6
  7. 7. Submissions are evaluated on the Matthews correlation coefficient (MCC) between the predicted and the observed response. The MCC is given by: where TP is the number of true positives, TN the number of true negatives, FP the number of false positives, and FN the number of false negatives. 7
  8. 8. Lx_Sy_Dz Lx_Sy_F{z-1} 8
  9. 9. 9
  10. 10. 0: 1,176,868 (99.4%) 1: 6,879 ( 0.6%) extremely imbalanced data 10
  11. 11. Result 11
  12. 12. • g_votte • tkm • hskksk( ) 12
  13. 13. LB (hskksk only) 13
  14. 14. LB ( ) 14
  15. 15. Public Leaderboard 15
  16. 16. Private Leaderboard 16
  17. 17. Top Ten ! 17
  18. 18. 18
  19. 19. • LB (CV ) • ( ) 19
  20. 20. 20
  21. 21. • GCP with R/Python • Rmarkdown • xgboost • github GCP 1 21
  22. 22. CV 22
  23. 23. LB • 30submit LB • LB • 23
  24. 24. 1. Cross-Validation fold 2. 3. MCC 24
  25. 25. 1 • Cross-Validation fold • Predicting Redhat Business Value 25
  26. 26. Redhat • CV • CV CV ( ) • • fold 26
  27. 27. • → • ID → ID • 27
  28. 28. 2 • • 28
  29. 29. qqplot • • Station32, 33 OK • 29
  30. 30. 30
  31. 31. 3 MCC • Gaussian Process LB • • ,mcc • LB 31
  32. 32. Feature engineering 32
  33. 33. • 25 • 3154 33
  34. 34. 1. ID • Forum magic feature 2. • 3. • 34
  35. 35. • • ID 35
  36. 36. • • ID 36
  37. 37. Station 38 • • Station 38 !! • ID Station 38 NA 37
  38. 38. ID 38
  39. 39. • bitmap ( 17017 ) • bitmap • • • • 39
  40. 40. 40
  41. 41. 41
  42. 42. 42
  43. 43. 43
  44. 44. 44
  45. 45. • Stacking • xgboost • xgboost • objective 45
  46. 46. Stacking • 2 stacking • 8 xgboost stacking • narrow-deep stacking • deep learning • Layer 46
  47. 47. xgboost • base_margin • dart(Dropouts meet Multiple Additive Regression Trees) 47
  48. 48. base_margin base_margin xgboost learn = xgb.DMatrix(...) base_margin = logit( p(y|x)) setinfo(learn, 'base_margin', base_margin) m <- xgb.train( data = learn, ... ) 48
  49. 49. base_margin • • dart 49
  50. 50. Dart • Dropouts meet Multiple Additive Regression Trees1 • dropout • • 0.5 1 Rashmi, K. V, & Gilad-Bachrach, R. (n.d.). DART: Dropouts meet Multiple Additive Regression Trees, 38. 50
  51. 51. xgboost • GBDT-feature + Factorization Machines • GBDT-feature: GBDT tree • One-hot Encoding → Factorization Machines • OpenMP libffm • only libFM 51
  52. 52. objective binary:logistic 52
  53. 53. smoothed-MCC mcc smoothing xgboost gradient,hessian(diagonal only) 53
  54. 54. 54
  55. 55. • • • • 55
  56. 56. 56
  57. 57. • hskksk Line2 tkm Line0 57
  58. 58. • • • 3 fold 1 • MCC LB Feedback • tkm g_votte • LB Feedback 58
  59. 59. Public Private • tkm submit Public Score Private • • • Public tkm • 59
  60. 60. • submit • • • • mcc • • 60
  61. 61. kaggle • CV LB CV • fold • fold CV • 61
  62. 62. kaggle • • Accuracy confusion matrix • mcc • • think more, try less 2 2 kaggle (Owen Zhang) 62
  63. 63. Enjoy Kaggle! 63

×