國家自然科學基金項目（31572375）； 中央高?；究蒲袠I務費專項（2662016PY006）； 中央高?；究蒲袠I務費專項（2262018JC033）； 華中農業大學大北農青年學者提升專項（2017DBN019）
為指導養豬生產者更好地預測母豬的產仔數性狀，盡早淘汰繁殖力較差的母豬，提升母豬群體的繁殖潛力，對記錄了母豬總產仔數、產活仔數、健仔數、5日齡仔豬數和1 kg以上仔豬數的生產數據進行處理和描述統計，使用R軟件中的Boruta包篩選出影響母豬產仔數性狀的重要特征如品種、胎次、配種季節等，利用傳統回歸分析方法（LR）和不同機器學習方法—決策樹（decision tree，DT）、K近鄰（K-nearest neighbor，KNN）、支持向量機（support vector machine，SVM）對產仔數性狀進行回歸分析，最后比較機器學習方法與傳統回歸方法建模的優劣。結果顯示，母豬總產仔數、產活仔數、健仔數、5日齡仔豬數和1 kg以上仔豬數不同回歸分析方法的R2均達到0.71以上（0.71~0.88），體現了特征選擇的正確性；在預測母豬總產仔數、產活仔數、健仔數、5日齡仔豬數和1 kg以上仔豬數中SVM模型均顯著優于其他機器學習模型（P<0.05）并且要優于傳統回歸方法，而且在以上模型中預測1 kg以上仔豬數的SVM模型最優。因此，在今后的養豬生產中機器學習方法可能會成為養豬生產者早期選育高繁殖力母豬的一種新途徑。
Currently,litter size trait is an important indicator to measure sow fertility and play important roles in determining total income of pig farm in China. An accurate prediction of these traits in the early life of an animal will allow pig producers to adjust their management practices in order to cull bad sows early and improve the reproductive ability of core sows. However,there are many factors not only influence sow’s litter size trait,but also influence each other. Traditional prediction methods may not be powerful enough to capture complex interactions while avoiding overfitting. In this case,learning algorithms that can learn from current data to predict the animal’s future performance offers promise. In this study,firstly,the sow’s production data,including total number of piglets born (TNB),number born alive (NBA),number of healthy piglets(NHP),number of piglets aged 5 day (N5D) and number of piglets weight above 1 kg (NPWA1) were processed and described statistically. Then,the R-package Boruta was used to screen out important eigenvalues affecting the litter size traits of sows,such as breed,parity,mating season,delivery season,gestation period,interval birth and birth litter weight. Last,regression analysis was performed by traditional linear regression method and three different machine learning methods including decision tree (DT),K-nearest neighbor (KNN) and support vector machine (SVM). The evaluation index of model including R2 and MSE are obtained by ten flod cross validation. Additionally,modeling methods was assessed by these indexes and best model was screened scatter plot using a part of original data. The results showed that the R2 of all regression analysis methods in TNB,NBA,NHP,N5D NPWA1 was over 0.71 (0.71-0.88),which showed that the selection of characteristics is correct. The SVM model was not only significantly better than other machine learning methods (P<0.05),but also better than traditional regression method in predicting TNB,NBA,NHP,N5D and NPWA1. The SVM model of NPWA1 is the best in all models. Therefore,machine learning methods will become a new approach for pig producers to breed high-fecundity sows in the future.