R语言中实现非参数单指标模型估计的函数mgcv::gam

cpp 复制代码
gam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,
    na.action,offset=NULL,method="GCV.Cp",
    optimizer=c("outer","newton"),control=list(),scale=0,
    select=FALSE,knots=NULL,sp=NULL,min.sp=NULL,H=NULL,gamma=1,
    fit=TRUE,paraPen=NULL,G=NULL,in.out,drop.unused.levels=TRUE,
    drop.intercept=NULL,nei=NULL,discrete=FALSE,...)

formula

A GAM formula, or a list of formulae (see formula.gam and also gam.models). These are exactly like the formula for a GLM except that smooth terms, s, te, ti and t2, can be added to the right hand side to specify that the linear predictor depends on smooth functions of predictors (or linear functionals of these).

family

This is a family object specifying the distribution and link to use in fitting etc (see glm and family). See family.mgcv for a full list of what is available, which goes well beyond exponential family. Note that quasi families actually result in the use of extended quasi-likelihood if method is set to a RE/ML method (McCullagh and Nelder, 1989, 9.6).

data

A data frame or list containing the model response variable and covariates required by the formula. By default the variables are taken from environment(formula): typically the environment from which gam is called.

weights

prior weights on the contribution of the data to the log likelihood. Note that a weight of 2, for example, is equivalent to having made exactly the same observation twice. If you want to re-weight the contributions of each datum without changing the overall magnitude of the log likelihood, then you should normalize the weights (e.g. weights <- weights/mean(weights)).

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain 'NA's. The default is set by the 'na.action' setting of 'options', and is 'na.fail' if that is unset. The "factory-fresh" default is 'na.omit'.

offset

Can be used to supply a model offset for use in fitting. Note that this offset will always be completely ignored when predicting, unlike an offset included in formula (this used to conform to the behaviour of lm and glm).

control

A list of fit control parameters to replace defaults returned by gam.control. Values not set assume default values.

method

The smoothing parameter estimation method. "GCV.Cp" to use GCV for unknown scale parameter and Mallows' Cp/UBRE/AIC for known scale. "GACV.Cp" is equivalent, but using GACV in place of GCV. "NCV" for neighbourhood cross-validation using the neighbourhood structure speficied be nei. "REML" for REML estimation, including of unknown scale, "P-REML" for REML estimation, but using a Pearson estimate of the scale. "ML" and "P-ML" are similar, but using maximum likelihood in place of REML. Beyond the exponential family "REML" is the default, and the only other options are "ML" or "NCV".

optimizer

An array specifying the numerical optimization method to use to optimize the smoothing parameter estimation criterion (given by method). "perf" (deprecated) for performance iteration. "outer" for the more stable direct approach. "outer" can use several alternative optimizers, specified in the second element of optimizer: "newton" (default), "bfgs", "optim", "nlm" and "nlm.fd" (the latter is based entirely on finite differenced derivatives and is very slow). "efs" for the extended Fellner Schall method of Wood and Fasiolo (2017).

scale

If this is positive then it is taken as the known scale parameter. Negative signals that the scale parameter is unknown. 0 signals that the scale parameter is 1 for Poisson and binomial and unknown otherwise. Note that (RE)ML methods can only work with scale parameter 1 for the Poisson and binomial cases.

select

If this is TRUE then gam can add an extra penalty to each term so that it can be penalized to zero. This means that the smoothing parameter estimation that is part of fitting can completely remove terms from the model. If the corresponding smoothing parameter is estimated as zero then the extra penalty has no effect. Use gamma to increase level of penalization.

knots

this is an optional list containing user specified knot values to be used for basis construction. For most bases the user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just k). See tprs for what happens in the "tp"/"ts" case. Different terms can use different numbers of knots, unless they share a covariate.

sp

A vector of smoothing parameters can be provided here. Smoothing parameters must be supplied in the order that the smooth terms appear in the model formula. Negative elements indicate that the parameter should be estimated, and hence a mixture of fixed and estimated parameters is possible. If smooths share smoothing parameters then length(sp) must correspond to the number of underlying smoothing parameters.

min.sp

Lower bounds can be supplied for the smoothing parameters. Note that if this option is used then the smoothing parameters full.sp, in the returned object, will need to be added to what is supplied here to get the smoothing parameters actually multiplying the penalties. length(min.sp) should always be the same as the total number of penalties (so it may be longer than sp, if smooths share smoothing parameters).

H

A user supplied fixed quadratic penalty on the parameters of the GAM can be supplied, with this as its coefficient matrix. A common use of this term is to add a ridge penalty to the parameters of the GAM in circumstances in which the model is close to un-identifiable on the scale of the linear predictor, but perfectly well defined on the response scale.

gamma

Increase this beyond 1 to produce smoother models. gamma multiplies the effective degrees of freedom in the GCV or UBRE/AIC. coden/gamma can be viewed as an effective sample size in the GCV score, and this also enables it to be used with REML/ML. Ignored with P-RE/ML or the efs optimizer.

fit

If this argument is TRUE then gam sets up the model and fits it, but if it is FALSE then the model is set up and an object G containing what would be required to fit is returned is returned. See argument G.

paraPen

optional list specifying any penalties to be applied to parametric model terms. gam.models explains more.

G

Usually NULL, but may contain the object returned by a previous call to gam with fit=FALSE, in which case all other arguments are ignored except for sp, gamma, in.out, scale, control, method optimizer and fit.

in.out

optional list for initializing outer iteration. If supplied then this must contain two elements: sp should be an array of initialization values for all smoothing parameters (there must be a value for all smoothing parameters, whether fixed or to be estimated, but those for fixed s.p.s are not used); scale is the typical scale of the GCV/UBRE function, for passing to the outer optimizer, or the the initial value of the scale parameter, if this is to be estimated by RE/ML.

drop.unused.levels

by default unused levels are dropped from factors before fitting. For some smooths involving factor variables you might want to turn this off. Only do so if you know what you are doing.

drop.intercept

Set to TRUE to force the model to really not have the a constant in the parametric model part, even with factor variables present. Can be vector when formula is a list.

nei

A list specifying the neighbourhood structure for NCV. k is the vector of indices to be dropped for each neighbourhood and m gives the end of each neighbourhood. So nei k [ ( n e i k[(nei k[(neim[j-1]+1):nei m [ j ] ] g i v e s t h e p o i n t s d r o p p e d f o r t h e n e i g h b o u r h o o d j . i i s t h e v e c t o r o f i n d i c e s o f p o i n t s t o p r e d i c t , w i t h c o r r e s p o n d i n g e n d p o i n t s m i . S o n e i m[j]] gives the points dropped for the neighbourhood j. i is the vector of indices of points to predict, with corresponding endpoints mi. So nei m[j]]givesthepointsdroppedfortheneighbourhoodj.iisthevectorofindicesofpointstopredict,withcorrespondingendpointsmi.Soneii[(nei m i [ j − 1 ] + 1 ) : n e i mi[j-1]+1):nei mi[j−1]+1):neimi[j]] indexes the points to predict for neighbourhood j. If nei==NULL (or k or m are missing) then leave-one-out cross validation is obtained.

discrete

experimental option for setting up models for use with discrete methods employed in bam. Do not modify.

...

further arguments for passing on e.g. to gam.fit (such as mustart).

相关推荐
LabEx2 天前
科研数据可视化核心技术:基于 AI 与 R 语言的热图、火山图及网络图绘制实践指南
人工智能·信息可视化·r语言·r语言绘图·乐备实·labex·科研数据绘图
Jet45052 天前
第100+43步 ChatGPT学习:R语言实现特征选择曲线图
学习·chatgpt·r语言
Chef_Chen2 天前
从0开始学习R语言--Day40--Kruskal-Wallis检验
开发语言·学习·r语言
quant_19863 天前
R语言如何接入实时行情接口
开发语言·经验分享·笔记·python·websocket·金融·r语言
小白学大数据4 天前
R语言爬虫实战:如何爬取分页链接并批量保存
开发语言·爬虫·信息可视化·r语言
开开心心_Every5 天前
便捷的Office批量转PDF工具
开发语言·人工智能·r语言·pdf·c#·音视频·symfony
Chef_Chen7 天前
从0开始学习R语言--Day39--Spearman 秩相关
开发语言·学习·r语言
q567315238 天前
R语言初学者爬虫简单模板
开发语言·爬虫·r语言·iphone
shootero@126.com9 天前
R语言开发记录,二(创建R包)
r语言
shootero@126.com9 天前
R语言开发记录,一
开发语言·r语言