Our proposal uses such a skip connection, too, which provides the linear model part, and we weight these linear terms with potentially non-linear weights. Both proposals have in common that they use a so-called skip connection in the network architecture that gives a linear model part around which the non-linear network model is built. The LassoNet of Lemhadri et al. ( Citation2021) follows a similar philosophy, too, by performing Lasso regularization on network features. Such an approach follows a similar strategy as the ResNet proposal of He et al. ( Citation2016) that considers a linear term and then builds the network around this linear term. To some extent, our architecture preserves this linear structure of GLMs, but we make the coefficients of the linear predictors feature dependent. Of course, this (generalized) linear form is both transparent and interpretable. GLMs are characterized by the choice of a link function that maps the regression function to a linear predictor, and, thus, leading to a linear functional form that directly describes the influence of each predictor variable on the response variable. The starting point of our proposal is the framework of generalized linear models (GLMs) introduced by Nelder & Wedderburn ( Citation1972) and McCullagh & Nelder ( Citation1983). Moreover, this internal structure also allows for variable selection of tabular feature data and to extract interactions between feature components. We follow a different approach here, namely, we propose a new network architecture that has an internal structure that directly allows for interpreting and explaining. PDP, ALE, LIME and SHAP can be used for any machine learning method such as random forests, boosting or neural networks, whereas MACQ requires differentiability of the regression function which is the case for neural networks under differentiable activation functions for a review of more gradient-based methods, we refer to Merz et al. ( Citation2022). Friedman's partial dependence plot (PDP) (Friedman Citation2001), the accumulated local effects (ALE) method of Apley & Zhu ( Citation2020), the locally interpretable model-agnostic explanation (LIME) introduced by Ribeiro et al. ( Citation2016), the SHapley Additive exPlanations (SHAP) of Lundberg & Lee ( Citation2017) or the marginal attribution by conditioning on quantiles (MACQ) method proposed by Merz et al. ( Citation2022). Recent research has been focusing on interpreting machine learning predictions in retrospect, see, e.g. For this reason, deep learning solutions are often criticized to be non-explainable and interpretable, in particular, because this process of representation learning is performed in high-dimensional spaces analyzing bits and pieces of the feature information. ![]() This feature engineering is done internally in a nontransparent way by the deep learning model. This success is based on the fact that deep learning models perform representation learning of features, which means that they bring features into the right structure to be able to extract maximal information for the prediction task at hand. Deep learning models celebrate great success in statistical modeling because they often provide superior predictive power over classical regression models.
0 Comments
Leave a Reply. |