Machine Learning

Niemand hat dieses Rätsel bis jetzt gelöst.

							4			1	9		24
																			31
				19
						18			16
																26 33
			2 5					1
		18 30																	17
						29									20
														13 17				2
	10		32					12				3				8
			3				7			7					12
		9										22

	6 28					16				23								11 25
						14											11			4

											6 21							13

										15
			5
27				10	15							8		14

Using supervised learning algorithms to predict qualitative outcomes
Error introduced by using a too simple model
As it increases, bias will decrease and variance will increase
k-fold CV where k = n (acronym)
Methods that repetedly draw samples from a data set, used for model assessment and selection
In the validation set approach, we divide the original data set into so many parts
We learned about forward stepwise, backward stepwise, and ___ subset selection
We can no use R² for model selection with big data, because as we add more predictors, R² never ____
Dimension reduction technique that forms linear combinations of variables (acronym)
Regularization methods that shrinks coefficients towards, but not exactly to zero
Tuning parameter used in the regularization methods we learned about
Ridge and Lasso regression improve over OLS by adding a ___ term to the RSS
Methods where we use non-linear functions f, instead of coefficients beta, for every predictor (acronym, plural)
Method by which we choose hyperparameters such as number of knots or degree of polynomials in [13] (acronym)
Due to their additivity, [13] might miss ___s in the data
The first 'N' in KNN-classifier stands for ___
In KNN-classfication, as K increases, flexibility ___
___ boundaries describe areas in which an observation would be classified to a certain class
LDA and QDA are based on the ___ theorem
The probability of an event happening alone (e.g. P(X = x)) is the ___ probability
Opposed to LDA, QDA assumes that each ___ has its own specific variance-covariance matrix
As opposed to LDA/QDA, the Naive Bayes Classifier assumes that predictors are not ___
As opposed to LDA/QDA, the Naive Bayes Classifier does NOT assume that observations are ___ distributed within each class
The term P(X1 = x1 | X2 = x2) denotes a ___ probability
The distance between the separating hyperplane and a support vector
In the support vector classifier, the ___ C denotes the number of allowed misclassifications
SVMs improve their performance by appying a ___-function to the data
The final regions of regression trees are called ___ nodes
Method using a penalty term to build a smaller tree to avoid overfitting
A criterion for evaluating splits in classification trees is the ___-index
Tree-based method using bootstrap aggregation
Random forests improve over [31] by ___ the trees
Tree-based ensemble method that does NOT use bootstrapping