|
Sponsored links: Site for sale. Please contact Vladimir Petrov for sales or advertising. |
This is a guide to extending R, describing the process of creating R add-on packages, writing R documentation, R's system and foreign language interfaces, and the R API.
The current version of this document is 2.3.1 (2006-06-01).
ISBN 3-900051-11-9
The contributions of Saikat DebRoy (who wrote the first draft of a guide
to using .Call and .External) and of Adrian Trapletti (who
provided information on the C++ interface) are gratefully acknowledged.
Packages provide a mechanism for loading optional code and attached documentation as needed. The R distribution provides several packages.
In the following, we assume that you know the `library()' command,
including its `lib.loc' argument, and we also assume basic
knowledge of the INSTALL utility. Otherwise, please look at R's
help pages
?library
?INSTALL
before reading on.
Once a source package is created, it must be installed by
the command R CMD INSTALL.
See Add-on-packages, for further details.
Other types of extensions are supported as from R 2.1.0: See Package types.
A package consists of a subdirectory containing a file DESCRIPTION and the subdirectories R, data, demo, exec, inst, man, po, src, and tests (some of which can be missing). The package subdirectory may also contain files INDEX, install.R (deprecated), R_PROFILE.R (deprecated), NAMESPACE, configure, cleanup, and COPYING. Other files such as README, NEWS or ChangeLog will be ignored by R, but may be useful to end-users.
The DESCRIPTION, INDEX, install.R and R_PROFILE.R files are described in the sections below. The NAMESPACE file is described in Package name spaces.
The optional files configure and cleanup are (Bourne shell) script files which are executed before and (provided that option --clean was given) after installation on Unix, see Configure and cleanup.
The optional file COPYING contains a copy of the license to the package, e.g. a copy of the GNU public license. Whereas you should feel free to include a licence file in your source distribution, please do not arrange to install yet another copy of the GNU COPYING or COPYING.LIB files but refer to the copies in the R distribution (e.g., in directory share/licenses in your own COPYING file).
The package subdirectory should be given the same name as the package. Because some file systems (e.g., those on Windows) are not case-sensitive, to maintain portability it is strongly recommended that case distinctions not be used to distinguish different packages. For example, if you have a package named foo, do not also create a package named Foo.
To ensure that file names are valid across file systems and supported
operating system platforms, the ASCII control characters as
well as the characters `"', `*', `:', `/', `<',
`>', `?', `\', and `|' are not allowed in file
names. In addition, files with names `con', `prn',
`aux', `clock$', `nul', `com1' to `com4', and
`lpt1' to `lpt3' after conversion to lower case and stripping
possible “extensions”, are disallowed. Also, file names in the same
directory must not differ only by case (see the previous paragraph).
In addition, the names of `.Rd' files will be used in URLs and so
must be ASCII and not contain %.
The R function package.skeleton can help to create the
structure for a new package: see its help page for details.
The DESCRIPTION file contains basic information about the package in the following format:
Package: pkgname Version: 0.5-1 Date: 2004-01-01 Title: My First Collection of Functions Author: Joe Developer <Joe.Developer@some.domain.net>, with contributions from A. User <A.User@whereever.net>. Maintainer: Joe Developer <Joe.Developer@some.domain.net> Depends: R (>= 1.8.0), nlme Suggests: MASS Description: A short (one paragraph) description of what the package does and why it may be useful. License: GPL version 2 or newer URL: http://www.r-project.org, http://www.another.url
Continuation lines (for example, for descriptions longer than one line) start with a space or tab. The `Package', `Version', `License', `Description', `Title', `Author', and `Maintainer' fields are mandatory, the remaining fields (`Date', `Depends', `URL', ...) are optional.
The DESCRIPTION file should be written entirely in ASCII for maximal portability.
The `Package' and `Version' fields give the name and the
version of the package, respectively. The name should consist of
letters, numbers, and the dot character and start with a letter. The
version is a sequence of at least two (and usually three)
non-negative integers separated by single `.' or `-'
characters. The canonical form is as shown in the example, and a
version such as `0.01' or `0.01.0' will be handled as if it
were `0.1-0'. (Translation packages are allowed names of the form
Translation-ll.)
The `License' field should contain an explicit statement or a well-known abbreviation (such as `GPL', `LGPL', `BSD', or `Artistic'), perhaps followed by a reference to the actual license file. It is very important that you include this information! Otherwise, it may not even be legally correct for others to distribute copies of the package.
The `Description' field should give a comprehensive description of what the package does. One can use several (complete) sentences, but only one paragraph.
The `Title' field should give a short description of the package. Some package listings may truncate the title to 65 characters in order to keep the overall size of the listing limited. It should be capitalized, not use any markup, not have any continuation lines, and not end in a period. Older versions of R used a separate file TITLE for giving this information; this is now defunct, and the `Title' field in DESCRIPTION is required.
The `Author' field describes who wrote the package. It is a plain text field intended for human readers, but not for automatic processing (such as extracting the email addresses of all listed contributors).
The `Maintainer' field should give a single name with a valid email address in angle brackets (for sending bug reports etc.). It should not end in a period or comma.
The optional `Date' field gives the release date of the current version of the package. It is strongly recommended to use the yyyy-mm-dd format conforming to the ISO standard.
The optional `Depends' field gives a comma-separated list of
package names which this package depends on. The package name may be
optionally followed by a comparison operator (currently only `>='
and `<=' are supported), whitespace and a valid version number in
parentheses. (List package names even if they are part of a bundle.)
You can also use the special package name `R' if your package
depends on a certain version of R. E.g., if the package works only with
R version 1.8.0 or newer, include `R (>= 1.8.0)' in the
`Depends' field. Both library and the R package checking
facilities use this field, hence it is an error to use improper syntax
or misuse the `Depends' field for comments on other software that
might be needed. Other dependencies (external to the R system)
should be listed in the `SystemRequirements' field or a separate
README file. The R INSTALL facilities check if the
version of R used is recent enough for the package being installed,
and the list of packages which is specified will be attached (after
checking version dependencies) before the current package, both when
library is called and when saving an image of the package's code
or preparing for lazy-loading.
The optional `Imports' field lists packages whose namespaces are imported from but which do not need to be attached.
The optional `Suggests' field uses the same syntax as `Depends' and lists packages that are not necessarily needed. This includes packages used only in examples or vignettes (see Writing package vignettes), and packages loaded in the body of functions. E.g., suppose an example from package foo uses a dataset from package bar. Then it is not necessary to have bar for routine use of foo, unless one wants to execute the examples: it is nice to have bar, but not necessary. The general rules are
library(pkgname) must be listed in the `Imports'
field.
library(pkgname) must be listed in the `Depends'
field.
R CMD check on
the package must be listed in one of `Depends' or `Suggests'
or `Imports'.
The optional `URL' field may give a list of URLs separated by commas or whitespace, for example the homepage of the author or a page where additional material describing the software can be found. These URLs are converted to active hyperlinks on CRAN.
Base and recommended packages (i.e., packages contained in the R source distribution or available from CRAN and recommended to be included in every binary distribution of R) have a `Priority' field with value `base' or `recommended', respectively. These priorities must not be used by “other” packages.
An optional `Collate' field (or OS-specific variants `Collate.OStype', such as e.g. `Collate.windows') can be used for controlling the collation order for the R code files in a package when these are concatenated into a single file upon installation from source. The default is to try collating according to the `C' locale. If present, the collate specification must list all R code files in the package (taking possible OS-specific subdirectories into account, see Package subdirectories) as a whitespace separated list of file paths relative to the R subdirectory. Paths containing white space or quotes need to be quoted. Applicable OS-specific collate specifications take precedence.
The optional `LazyLoad' and `LazyData' fields control whether
the R objects and the datasets (respectively) use lazy-loading: set
the field's value to `yes' or `true' for lazy-loading and
`no' or `false' for no lazy-loading. (Capitalized values
are also accepted.) Note that these values override the command-line
options of R CMD INSTALL, `--[no-]lazy' and
`--[no-]lazy-data'.
The optional `SaveImage' field controls whether the R objects are stored in a saved image see The install.R and R_PROFILE.R files. (It takes the same values as the field `LazyLoad'.)
If the package you are writing uses the methods package, specify (preferably) `LazyLoad: yes' or `SaveImage: yes'.
The optional `ZipData' field controls whether the automatic Windows build will zip up the data directory or no: set this to `no' if your package will not work with a zipped data directory.
If the DESCRIPTION file is not entirely in ASCII it should
contain an `Encoding' field specifying an encoding. This is
currently used as the encoding of the DESCRIPTION file itself,
and may in the future be taken as the encoding for other documentation
in the package. Only encoding names latin1, latin2 and
UTF-8 are known to be portable.
The optional `Type' field specifies the type of the package: see Package types.
Note: There should be no `Built' or `Packaged' fields, as these are added by the package management tools.
The optional file INDEX contains a line for each sufficiently
interesting object in the package, giving its name and a description
(functions such as print methods not usually called explicitly might not
be included). Normally this file is missing, and the corresponding
information is automatically generated from the documentation sources
(using Rdindex() from package tools) when installing from
source and when using the package builder (see Checking and building packages).
Rather than editing this file, it is preferable to put customized information about the package into an overview man page (see Documenting packages) and/or a vignette (see Writing package vignettes).
NB: This mechanism is now deprecated and will be removed at R 2.4.0.
The optional file install.R serves two purposes. First, its
presence tells the INSTALL utility to create a binary image of the
package workspace. A binary image is created by executing the code in
the R subdirectory and saving the resulting objects. When the
package is attached, the code is not executed again but loaded
from the saved image. It is preferable to use the `SaveImage'
field in the DESCRIPTION file, which takes precedence over the
presence of an install.R file.
The second purpose for install.R is to hold code that needs to be executed each time the package is attached, before the image is loaded. Very few packages have a need for such code.
The optional file R_PROFILE.R is executed before the code in the R subdirectory when saving an image or preparing for lazy-loading, and should be used to set up an environment needed only to evaluate the code (which is run with the --vanilla command-line flag). Packages should not need such code, as the packages in the `Depends' field should suffice.
Both install.R and R_PROFILE.R should be viewed as experimental; the mechanism to execute code before attaching or installing the package may change in the near future. With the other facilities available as from R 2.0.0 they should be removed if possible.
The R subdirectory contains R code files, only. The code
files to be installed must start with a (lower or upper case) letter or
digit1 and have one of the
extensions .R, .S, .q, .r, or .s. We
recommend using .R, as this extension seems to be not used by any
other software. It should be possible to read in the files using
source(), so R objects must be created by assignments. Note that
there need be no connection between the name of the file and the R
objects created by it. The R code files should only create R objects
and not call functions with side effects such as require and
options.
Two exceptions are allowed: if the R subdirectory contains a file
sysdata.rda (a saved image of R objects) this will be
lazy-loaded into the namespace/package environment – this is intended
for system datasets that are not intended to be user-accessible via
data. Also, files ending in `.in' will be allowed in the
R directory to allow a configure script to generate
suitable files,
Only ASCII characters (and the control characters tab, formfeed, LF and
CR) should be used in code files. Other characters are accepted in
comments, but then the comments may not be readable in e.g. a UTF-8
locale. Non-ASCII characters in object names will
normally2 fail when the
package is installed. Any byte will be allowed3 in a quoted
character string (but \uxxxx escapes should not be used), but
non-ASCII character strings may not be usable in some locales and may
display incorrectly in others.
Various R functions in a package can be used to initialize and clean
up. For packages without a name space, these are .First.lib and
.Last.lib. (For packages with a name space, See Load hooks.)
It is conventional to define these functions in a file called
zzz.R. If .First.lib is defined in a package, it is
called with arguments libname and pkgname after the
package is loaded and attached. (If a package is installed with version
information, the package name includes the version information, e.g.
`ash_1.0.9'.) A common use is to call library.dynam()
inside .First.lib() to load compiled code: another use is to call
those functions with side effects. If .Last.lib exists in a
package it is called (with argument the full path to the installed
package) just before the package is detached. It is uncommon to detach
packages and rare to have a .Last.lib function: one use is to
call library.dynam.unload to unload compiled code.
The man subdirectory should contain (only) documentation files for the objects in the package in R documentation (Rd) format. The documentation files to be installed must start with a (lower or upper case ASCII) letter or digit4 and have the extension .Rd (the default) or .rd. Further, the names must be valid in `file://' URLs, which means5 they must be entirely ASCII and not contain `%'. See Writing R documentation files, for more information. Note that all user-level objects in a package should be documented; if a package pkg contains user-level objects which are for “internal” use only, it should provide a file pkg-internal.Rd which documents all such objects, and clearly states that these are not meant to be called by the user. See e.g. the sources for package grid in the R distribution for an example.
The R and man subdirectories may contain OS-specific subdirectories named unix or windows.
The C, C++, or FORTRAN6 source
files for the compiled code are in src, plus optionally file
Makevars or Makefile. When a package is installed using
R CMD INSTALL, Make is used to control compilation and linking
into a shared object for loading into R. There are default variables
and rules for this (determined when R is configured and recorded in
R_HOME/etc/Makeconf). These rules can be tweaked by
setting macros in a file src/Makevars (see Using Makevars).
Note that this mechanism should be general enough to eliminate the need
for a package-specific Makefile. If such a file is to be
distributed, considerable care is needed to make it general enough to
work on all R platforms. In addition, it should have a target
`clean' which removes all files generated by Make. If necessary,
platform-specific files can be used, for example Makevars.win or
Makefile.win on Windows take precedence over Makevars or
Makefile.
The data subdirectory is for additional data files the package
makes available for loading using data(). Currently, data files
can have one of three types as indicated by their extension: plain R
code (.R or .r), tables (.tab, .txt, or
.csv), or save() images (.RData or .rda).
(All ports of R use the same binary (XDR) format and can read
compressed images. Use images saved with save(, compress =
TRUE), the default as from R 2.3.0. to save space.) Note that R
code should be “self-sufficient” and not make use of extra
functionality provided by the package, so that the data file can also be
used without having to load the package. It is no longer necessary to
provide a 00Index file in the data directory—the
corresponding information is generated automatically from the
documentation sources when installing from source, or when using the
package builder (see Checking and building packages). If your data
files are enormous you can speed up installation by providing a file
datalist in the data subdirectory. This should have one
line per topic that data() will find, in the format `foo' if
data(foo) provides `foo', or `foo: bar bah' if
data(foo) provides `bar' and `bah'.
The demo subdirectory is for R scripts (for running via
demo()) that demonstrate some of the functionality of the package.
Demos may be interactive and are not checked automatically, so if testing is
desired use code in the tests directory. The script files must
start with a (lower or upper case) letter7 and have one of the extensions .R or .r.
If present, the demo subdirectory should also have a
00Index file with one line for each demo, giving its name and a
description separated by white space. (Note that it is not possible to
generate this index file automatically.)
The contents of the inst subdirectory will be copied recursively
to the installation directory. Subdirectories of inst should not
interfere with those used by R (currently, R, data,
demo, exec, libs, man, help,
html, latex, R-ex, chtml, and Meta).
The copying of the inst happens after src is built so its
Makefile can create files to be installed. Note that with the
exceptions of INDEX and COPYING, information files at the
top level of the package will not be installed and so not be
known to users of Windows and MacOS X compiled packages (and not seen by
those who use R CMD INSTALL or install.packages on
the tarball). So any information files you wish an end user to see
should be included in inst. One thing you might like to add to
inst is a CITATION file for use by the citation
function.
Subdirectory tests is for additional package-specific test code,
similar to the specific tests that come with the R distribution.
Test code can either be provided directly in a .R file, or via a
.Rin file containing code which in turn creates the corresponding
.R file (e.g., by collecting all function objects in the package
and then calling them with the strangest arguments). The results of
running a .R file are written to a .Rout file. If there
is a corresponding .Rout.save file, these two are compared, with
differences being reported but not causing an error. The whose
tests is copied to the check area, and the tests are run with the
copy as the working directory and with R_LIBS set to ensure that
the copy of the package installed during testing will be found by
library(pkg_name).
Subdirectory exec could contain additional executables the package needs, typically scripts for interpreters such as the shell, Perl, or Tcl. This mechanism is currently used only by a very few packages, and still experimental.
Subdirectory po is used for files related to localization: see Localization.
Sometimes it is convenient to distribute several packages as a bundle. (An example is VR which contains four packages.) The installation procedures on both Unix and Windows can handle package bundles.
The DESCRIPTION file of a bundle has a `Bundle' field and no `Package' field, as in
Bundle: VR Priority: recommended Contains: MASS class nnet spatial Version: 7.2-12 Date: 2005-01-31 Depends: R (>= 2.0.0), graphics, stats Suggests: lattice, nlme, survival Author: S original by Venables & Ripley. R port by Brian Ripley <ripley@stats.ox.ac.uk>, following earlier work by Kurt Hornik and Albrecht Gebhardt. Maintainer: Brian Ripley <ripley@stats.ox.ac.uk> BundleDescription: Functions and datasets to support Venables and Ripley, `Modern Applied Statistics with S' (4th edition). License: GPL (version 2 or later) See file LICENCE. URL: http://www.stats.ox.ac.uk/pub/MASS4/
The `Contains' field lists the packages (space separated), which should be contained in separate subdirectories with the names given. During building and installation, packages will be installed in the order specified. Be sure to order this list so that dependencies are met appropriately.
The packages contained in a bundle are standard packages in all respects except that the DESCRIPTION file is replaced by a DESCRIPTION.in file which just contains fields additional to the DESCRIPTION file of the bundle, for example
Package: spatial Description: Functions for kriging and point pattern analysis. Title: Functions for Kriging and Point Pattern Analysis
Any files in the package bundle except the DESCRIPTION file and the named packages will be ignored.
The `Depends' field in the bundle's DESCRIPTION file should list the dependencies of all the constituent packages (and similarly for `Imports' and `Suggests'), and then DESCRIPTION.in files should not contain these fields.
Note that most of this section is Unix-specific: see the comments later on about the Windows port of R.
If your package needs some system-dependent configuration before
installation you can include a (Bourne shell) script configure in
your package which (if present) is executed by R CMD INSTALL
before any other action is performed. This can be a script created by
the Autoconf mechanism, but may also be a script written by yourself.
Use this to detect if any nonstandard libraries are present such that
corresponding code in the package can be disabled at install time rather
than giving error messages when the package is compiled or used. To
summarize, the full power of Autoconf is available for your extension
package (including variable substitution, searching for libraries,
etc.).
The (Bourne shell) script cleanup is executed as last thing by
R CMD INSTALL if present and option --clean was given,
and by R CMD build when preparing the package for building from
its source. It can be used to clean up the package source tree. In
particular, it should remove all files created by configure.
As an example consider we want to use functionality provided by a (C or
FORTRAN) library foo. Using Autoconf, we can create a configure
script which checks for the library, sets variable HAVE_FOO to
TRUE if it was found and with FALSE otherwise, and then
substitutes this value into output files (by replacing instances of
`@HAVE_FOO@' in input files with the value of HAVE_FOO).
For example, if a function named bar is to be made available by
linking against library foo (i.e., using -lfoo), one
could use
AC_CHECK_LIB(foo, fun, [HAVE_FOO=TRUE], [HAVE_FOO=FALSE])
AC_SUBST(HAVE_FOO)
......
AC_CONFIG_FILES([foo.R])
AC_OUTPUT
in configure.ac (assuming Autoconf 2.50 or better).
The definition of the respective R function in foo.R.in could be
foo <- function(x) {
if(!@HAVE_FOO@)
stop("Sorry, library 'foo' is not available"))
...
From this file configure creates the actual R source file foo.R looking like
foo <- function(x) {
if(!FALSE)
stop("Sorry, library 'foo' is not available"))
...
if library foo was not found (with the desired functionality).
In this case, the above R code effectively disables the function.
One could also use different file fragments for available and missing functionality, respectively.
You will very likely need to ensure that the same C compiler and compiler flags are used in the configure tests as when compiling R or your package. Under Unix, you can achieve this by including the following fragment early in configure.ac
: ${R_HOME=`R RHOME`}
if test -z "${R_HOME}"; then
echo "could not determine R_HOME"
exit 1
fi
CC=`"${R_HOME}/bin/R" CMD config CC`
CFLAGS=`"${R_HOME}/bin/R" CMD config CFLAGS`
(using `${R_HOME}/bin/R' rather than just `R' is necessary
in order to use the `right' version of R when running the script as
part of R CMD INSTALL.)
Note that earlier versions of this document recommended obtaining the
configure information by direct extraction (using grep and sed) from
R_HOME/etc/Makeconf, which only works for variables
recorded there as literals. R 1.5.0 has added R CMD config for
getting the value of the basic configuration variables, or the header
and library flags necessary for linking against R, see R CMD
config --help for details.
To check for an external BLAS library using the ACX_BLAS macro
from the official Autoconf Macro Archive, one can simply do
F77=`"${R_HOME}/bin/R" CMD config F77`
AC_PROG_F77
FLIBS=`"${R_HOME}/bin/R" CMD config FLIBS`
ACX_BLAS([], AC_MSG_ERROR([could not find your BLAS library], 1))
Note that FLIBS as determined by R must be used to ensure that
FORTRAN 77 code works on all R platforms. Calls to the Autoconf macro
AC_F77_LIBRARY_LDFLAGS, which would overwrite FLIBS, must
not be used (and hence e.g. removed from ACX_BLAS). (Recent
versions of Autoconf in fact allow an already set FLIBS to
override the test for the FORTRAN linker flags. Also, recent versions
of R can detect external BLAS and LAPACK libraries.)
You should bear in mind that the configure script may well not work on Windows systems (this seems normally to be the case for those generated by Autoconf, although simple shell scripts do work). If your package is to be made publicly available, please give enough information for a user on a non-Unix platform to configure it manually, or provide a configure.win script to be used on that platform.
In some rare circumstances, the configuration and cleanup scripts need to know the location into which the package is being installed. An example of this is a package that uses C code and creates two shared object/DLLs. Usually, the object that is dynamically loaded by R is linked against the second, dependent, object. On some systems, we can add the location of this dependent object to the object that is dynamically loaded by R. This means that each user does not have to set the value of the LD_LIBRARY_PATH (or equivalent) environment variable, but that the secondary object is automatically resolved. Another example is when a package installs support files that are required at run time, and their location is substituted into an R data structure at installation time. (This happens with the Java Archive files in the SJava package.) The names of the top-level library directory (i.e., specifiable via the `-l' argument) and the directory of the package itself are made available to the installation scripts via the two shell/environment variables R_LIBRARY_DIR and R_PACKAGE_DIR. Additionally, the name of the package (e.g., `survival' or `MASS') being installed is available from the shell variable R_PACKAGE_NAME.
Sometimes writing your own configure script can be avoided by supplying a file Makevars: also one of the commonest uses of a configure script is to make Makevars from Makevars.in.
The most common use of a Makevars file is to set additional
preprocessor (for example include paths) flags via PKG_CPPFLAGS,
and additional compiler flags by setting PKG_CFLAGS,
PKG_CXXFLAGS and PKG_FFLAGS, for C, C++, or FORTRAN
respectively (see Creating shared objects).
Also, Makevars can be used to set flags for the linker, for example `-L' and `-l' options.
When writing a Makevars file for a package you intend to distribute, take care to ensure that it is not specific to your compiler: flags such as -O2 -Wall -pedantic are all specific to GCC.
There are some macros which are built whilst configuring the building of R itself, are stored on Unix-alikes in R_HOME/etc/Makeconf and can be used in Makevars. These include
FLIBSPKG_LIBS.
BLAS_LIBSPKG_LIBS. Beware that if it is empty then
the R executable will contain all the double-precision and
double-complex BLAS routines, but no single-precision or complex
routines. If BLAS_LIBS is included, then FLIBS also needs
to be, as most BLAS libraries are written in FORTRAN.
LAPACK_LIBSPKG_LIBS. This may point to a dynamic library libRlapack
which contains all the double-precision LAPACK routines as well as those
double-complex LAPACK and BLAS routines needed to build R, or it
may point to an external LAPACK library, or may be empty if an external
BLAS library also contains LAPACK.
[There is no guarantee that the LAPACK library will provide more than all the double-precision and double-complex routines, and some do not provide all the auxiliary routines.]
The macros BLAS_LIBS and FLIBS should always be included
after LAPACK_LIBS.
SAFE_FFLAGSNote that Makevars should not normally contain targets, as it is included before the default Makefile and make is called without an explicit target. To circumvent that, use a suitable phony target before any actual targets: for example fastICA has
SLAMC_FFLAGS=$(R_XTRA_FFLAGS) $(FPICFLAGS) $(SHLIB_FFLAGS) $(SAFE_FFLAGS)
all: $(SHLIB)
slamc.o: slamc.f
$(F77) $(SLAMC_FFLAGS) -c -o slamc.o slamc.f
to ensure that the LAPACK routines find some constants without infinite looping.
It may be helpful to give an extended example of using a configure script to create a src/Makevars file: this is based on that in the RODBC package.
The configure.ac file follows: configure is created from this by running autoconf in the top-level package directory (containing configure.ac).
AC_INIT([RODBC], 1.1.5) dnl package name, version dnl Select an optional include path, from a configure option dnl or from an environment variable. AC_ARG_WITH([odbc-include], AC_HELP_STRING([--with-odbc-include=INCLUDE_PATH], [the location of ODBC header files]), [odbc_include_path=$withval]) if test [ -n "$odbc_include_path" ] ; then AC_SUBST([CPPFLAGS],["-I${odbc_include_path} ${CPPFLAGS}"]) else if test [ -n "${ODBC_INCLUDE}" ] ; then AC_SUBST([CPPFLAGS],["-I${ODBC_INCLUDE} ${CPPFLAGS}"]) fi fi dnl ditto for a library path AC_ARG_WITH([odbc-lib], AC_HELP_STRING([--with-odbc-lib=LIB_PATH], [the location of ODBC libraries]), [odbc_lib_path=$withval]) if test [ -n "$odbc_lib_path" ] ; then AC_SUBST([LIBS],[" -L${odbc_lib_path} ${LIBS}"]) else if test [ -n "${ODBC_LIBS}" ] ; then AC_SUBST([LIBS],["-L${ODBC_LIBS} ${LIBS}"]) fi fi dnl Another user-specifiable option AC_ARG_WITH([odbc-manager], AC_HELP_STRING([--with-odbc-manager=MGR], [specify the ODBC manager, e.g. odbc or iodbc]), [odbc_mgr=$withval]) dnl Now find the compiler and compiler flags to use : ${R_HOME=`R RHOME`} if test -z "${R_HOME}"; then echo "could not determine R_HOME" exit 1 fi CC=`"${R_HOME}/bin/R" CMD config CC` CFLAGS=`"${R_HOME}/bin/R" CMD config CFLAGS` dnl Check the headers can be found AC_CHECK_HEADERS(sql.h sqlext.h) if test "${ac_cv_header_sql_h}" = no || test "${ac_cv_header_sqlext_h}" = no; then AC_MSG_ERROR("ODBC headers sql.h and sqlext.h not found") fi dnl search for a library containing an ODBC function if test [ -n "${odbc_mgr}" ] ; then AC_SEARCH_LIBS(SQLTables, ${odbc_mgr}, , AC_MSG_ERROR("ODBC driver manager ${odbc_mgr} not found")) else AC_SEARCH_LIBS(SQLTables, odbc iodbc, , AC_MSG_ERROR("no ODBC driver manager found")) fi dnl substitute CPPFLAGS and LIBS AC_SUBST(CPPFLAGS) AC_SUBST(LIBS) dnl and do subsitution in the src/Makevars.in AC_OUTPUT(src/Makevars)
where src/Makevars.in would be simply
PKG_CPPFLAGS = @CPPFLAGS@
PKG_LIBS = @LIBS@
A user can then be advised to specify the location of the ODBC driver manager files by options like (lines broken for easier reading)
R CMD INSTALL
--configure-args='--with-odbc-include=/opt/local/include
--with-odbc-lib=/opt/local/lib --with-odbc-manager=iodbc'
RODBC
or by setting the environment variables ODBC_INCLUDE and
ODBC_LIBS.
R currently does not distinguish between FORTRAN 77 and FORTRAN 90/95
code, and assumes all FORTRAN comes in source files with extension
.f. Commercial Unix systems tyically use a F95 compiler, but
only since the release of gcc 4.0.0 in April 2005 have Linux and
other non-commercial OSes had much support for F95. The compiler used
for R on Windows is a F77 compiler.
This means that portable packages need to be written in correct FORTRAN 77, which will also be valid FORTRAN 95. See http://developer.r-project.org/Portability.html for reference resources. In particular, free source form F95 code is not portable.
On some systems an alternative F95 compiler is available: from the
gcc family this might be gfortran or g95.
Configuring R will try to find a compiler which (from its name)
appears to be a FORTRAN 90/95 compiler, and set it in macro `FC'.
Note that it does not check that such a compiler is fully (or even
partially) compliant with FORTRAN 90/95. Packages making use of
FORTRAN 90/95 features should use file extension `.f90' or
`.f95' for the source files: the variable PKG_FCFLAGS
specifies any special flags to be used. There is no guarantee that
compiled FORTRAN 90/95 code can be mixed with any other type of code,
nor that a build of R will have support for such packages.
There is a MinGW build of gfortran available from
http://gcc.gnu.org/wiki/GFortranBinaries and a MinGW
build8 of g95 from http://www.g95.org.
Set F95 in MkRules to point to the installed compiler.
Then R CMD SHLIB and R CMD INSTALL work for
packages containing FORTRAN 90/95 source code.
Before using these tools, please check that your package can be
installed and loaded. R CMD check will inter alia do
this, but you will get more informative error messages doing the checks
directly.
Using R CMD check, the R package checker, one can test whether
source R packages work correctly. It can be run on one or
more directories, or gzipped package tar
archives9 with extension
`.tar.gz' or `.tgz'. This runs a series of checks.
library.
To allow a configure script to generate suitable files, files ending in `.in' will be allowed in the R directory.
library.dynam (with
no extension). In addition, it is checked whether methods have all
arguments of the corresponding generic, and whether the final argument
of replacement functions is called `value'. All foreign function
calls (.C, .Fortran, .Call and .External
calls) are tested to see if they have a PACKAGE argument, and if
not, whether the appropriate DLL might be deduced from the namespace of
the package. Any other calls are reported. (The check is generous, and
users may want to supplement this by examining the output of
tools::checkFF("mypkg", verbose=TRUE), especially if the
intention were to always use a PACKAGE argument)
\name, \alias, \title,
\description and \keyword) fields. The Rd name and title
are checked for being non-empty, and the keywords found are compared to
the standard ones.
\usage
sections of Rd files are documented in the corresponding
\arguments section.
\examples to create executable example code.)
Of course, released packages should be able to run at least their own
examples. Each example is run in a `clean' environment (so earlier
examples cannot be assumed to have been run), and with the variables
T and F redefined to generate an error unless they are set
in the example: See Logical vectors.
Use R CMD check --help to obtain more information about the usage of the R package checker. A subset of the checking steps can be selected by adding flags.
Using R CMD build, the R package builder, one can build R
packages from their sources (for example, for subsequent release).
Prior to actually building the package in the common gzipped tar file format, a few diagnostic checks and cleanups are performed. In particular, it is tested whether object indices exist and can be assumed to be up-to-date, and C, C++ and FORTRAN source files are tested and converted to LF line-endings if necessary.
Run-time checks whether the package works correctly should be performed
using R CMD check prior to invoking the build procedure.
To exclude files from being put into the package, one can specify a list
of exclude patterns in file .Rbuildignore in the top-level source
directory. These patterns should be Perl regexps, one per line, to be
matched against the file names relative to the top-level source
directory. In addition, directories called CVS or .svn or
.arch-ids and files GNUMakefile or with base names
starting with `.#', or starting and ending with `#', or ending
in `~', `.bak' or `.swp', are excluded by default. In
addition, those files in the R, demo and man
directories which are flagged by R CMD check as having invalid
names will be excluded.
Use R CMD build --help to obtain more information about the usage of the R package builder.
Unless R CMD build is invoked with the --no-vignettes option, it will attempt to rebuild the vignettes (see Writing package vignettes) in the package. To do so it installs the current package/bundle into a temporary library tree, but any dependent packages need to be installed in an available library tree (see the Note: below).
One of the checks that R CMD build runs is for empty source
directories. These are in most cases unintentional, in which case they
should be removed and the build re-run.
It can be useful to run R CMD check --check-subdirs=yes on the
built tarball as a final check on the contents.
R CMD build can also build pre-compiled version of packages for
binary distributions, but R CMD INSTALL --build is preferred (and
is considerably more flexible). In particular, Windows users are
recommended to use R CMD INSTALL --build and install into the
main library tree (the default) so that HTML links are resolved.
Note:R CMD checkandR CMD buildrun R with --vanilla, so none of the user's startup files are read. If you need R_LIBS set (to find packages in a non-standard library) you will need to set it in the environment.
Note to Windows users:R CMD checkandR CMD buildwork well under Windows NT4/2000/XP/2003 but may not work correctly on Windows 95/98/ME because of problems with some versions of Perl on those limited OSes. Experiences vary. To use them you will need to have installed the files for building source packages (which is the default).
In addition to the available command line options, R CMD check
also allows customization by setting (Perl) configuration variables in a
configuration file, the location of which can be specified via the
--rcfile option and defaults to $HOME/.R/check.conf
provided that the environment variable HOME is set.
The following configuration variables are currently available.
$R_check_use_install_log$R_check_all_non_ISO_C$R_check_weave_vignettes$R_check_subdirs_nocase$R_check_subdirs_strictValues `1' or a string with lower-cased version `"yes"' or `"true"' can be used for setting the variables to true; similarly, `0' or strings with lower-cased version `"no"' or `"false"' give false.
For example, a configuration file containing
$R_check_use_install_log = "TRUE";
$R_check_weave_vignettes = 0;
results in using install logs and turning off weaving.
Future versions of R will enhance this customization mechanism, and
provide a similar scheme for R CMD build.
There are other internal settings that can be changed via environment
variables _R_CHECK_*_: see the Perl source code. One that may be
interesting is _R_CHECK_USE_CODETOOLS_ to make use of the
codetools package available from
http://www.stat.uiowa.edu/~luke/R/codetools.
In addition to the help files in Rd format, R packages allow the inclusion of documents in arbitrary other formats. The standard location for these is subdirectory inst/doc of a source package, the contents will be copied to subdirectory doc when the package is installed. Pointers from package help indices to the installed documents are automatically created. Documents in inst/doc can be in arbitrary format, however we strongly recommend to provide them in PDF format, such that users on all platforms can easily read them. To ensure that they can be accessed from a browser, the file names should start with an ASCII letter and be entirely in ASCII letters or digits or minus or underscore
A special case are documents in Sweave format, which we call
package vignettes. Sweave allows the integration of LaTeX
documents and R code and is contained in package utils which is
part of the base R distribution, see the Sweave help page for
details on the document format. Package vignettes found in directory
inst/doc are tested by R CMD check by executing all R
code chunks they contain to ensure consistency between code and
documentation. Code chunks with option eval=FALSE are not
tested. The R working directory for all vignette tests in R CMD
check is the installed version of the doc
subdirectory. Make sure all files needed by the vignette (data sets,
...) are accessible by either placing them in the inst/doc
hierarchy of the source package, or using calls to system.file().
R CMD build will automatically create PDF versions of the
vignettes for distribution with the package sources. By including the
PDF version in the package sources it is not necessary that the
vignettes can be compiled at install time, i.e., the package author can
use private LaTeX extensions which are only available on his machine.
Only the R code inside the vignettes is part of the checking
procedure, typesetting manuals is not part of the package QC.
By default R CMD build will run Sweave on all files in
Sweave format. If no Makefile is found in directory
inst/doc, then texi2dvi --pdf is run on all vignettes.
Whenever a Makefile is found, then R CMD build will try to
run make after the Sweave step, such that PDF manuals
can be created from arbitrary source formats (plain LaTeX files,
...). The Makefile should take care of both creation of PDF
files and cleaning up afterwards, i.e., delete all files that shall not
appear in the final package archive. Note that the make step is
executed independently from the presence of any files in Sweave format.
It is no longer necessary to provide a 00Index.dcf file in the
inst/doc directory—the corresponding information is generated
automatically from the \VignetteIndexEntry statements in all
Sweave files when installing from source, or when using the package
builder (see Checking and building packages). The
\VignetteIndexEntry statement is best placed in LaTeX comment,
such that no definition of the command is necessary.
At install time an HTML index for all vignettes is automatically
created from the \VignetteIndexEntry statements unless a file
index.html exists in directory inst/doc. This index is
linked into the HTML help system for each package.
CRAN is a network of WWW sites holding the R distributions and contributed code, especially R packages. Users of R are encouraged to join in the collaborative project and to submit their own packages to CRAN.
Before submitting a package mypkg, do run the following steps to test it is complete and will install properly. (Unix procedures only, run from the directory containing mypkg as a subdirectory.)
R CMD check to check that the package will install and will
runs its examples, and that the documentation is complete and can be
processed. If the package contains code that needs to be compiled, try
to enable a reasonable amount of diagnostic messaging (“warnings”)
when compiling, such as e.g. -Wall -pedantic for tools from
GCC, the Gnu Compiler Collection. (If R was not configured
accordingly, one can achieve this e.g. via PKG_CFLAGS and
related variables.)
R CMD build to make the release .tar.gz file.
Please ensure that you can run through the complete procedure with only
warnings that you understand and have reasons not to eliminate. In
principle, packages must pass R CMD check without warnings to be
admitted to the main CRAN package area.
When all the testing is done, upload the .tar.gz file, using
anonymous as log-in name and your e-mail address as password, to
ftp://cran.R-project.org/incoming/
(note: use ftp and not sftp to connect to this server) and
send a message to cran@r-project.org
about it. The CRAN maintainers will run these tests before
putting a submission in the main archive.
Note that the fully qualified name of the .tar.gz file must be of the form
package_version[_engine[_type]],
where the `[ ]' indicates that the enclosed component is optional, package and version are the corresponding entries in file DESCRIPTION, engine gives the S engine the package is targeted for and defaults to `R', and type indicated whether the file contains source or binaries for a certain platform, and defaults to `source'. I.e.,
OOP_0.1-3.tar.gz
OOP_0.1-3_R.tar.gz
OOP_0.1-3_R_source.tar.gz
are all equivalent and indicate an R source package, whereas
OOP_0.1-3_Splus6_sparc-sun-solaris.tar.gz
is a binary package for installation under Splus6 on the given platform.
This naming scheme has been adopted to ensure usability of code across S
engines. R code and utilities operating on package .tar.gz files
can only be assumed to work provided that this naming scheme is
respected. Of course, R CMD build automatically creates valid
file names.
R has a name space management system for packages. This system allows the package writer to specify which variables in the package should be exported to make them available to package users, and which variables should be imported from other packages.
The current mechanism10
for specifying a name space for a package is to place a NAMESPACE
file in the top level package directory. This file contains name
space directives describing the imports and exports of the name space.
Additional directives register any shared objects to be loaded and any
S3-style methods that are provided. Note that although the file looks
like R code (and often has R-style comments) it is not processed
as R code. Only very simple conditional processing of if
statements is implemented as of R 1.9.0.
Like other packages, packages with name spaces are loaded and attached
to the search path by calling library. Only the exported
variables are placed in the attached frame. Loading a package that
imports variables from other packages will cause these other packages to
be loaded as well (unless they have already been loaded), but they will
not be placed on the search path by these implicit loads.
Name spaces are sealed once they are loaded. Sealing means that imports and exports cannot be changed and that internal variable bindings cannot be changed. Sealing allows a simpler implementation strategy for the name space mechanism. Sealing also allows code analysis and compilation tools to accurately identify the definition corresponding to a global variable reference in a function body.
Exports are specified using the export directive in the
NAMESPACE file. A directive of the form
export(f, g)
specifies that the variables f and g are to be exported.
(Note that variable names may be quoted, and non-standard names such as
[<-.fractions must be.)
For packages with many variables to export it may be more convenient to
specify the names to export with a regular expression using
exportPattern. The directive
exportPattern("^[^\\.]")
exports all variables that do not start with a period.
All packages implicitly import the base name space. Variables from
other packages need to be imported explicitly using the directives
import and importFrom. The import directive
imports all exported variables from the specified package(s). Thus the
directives
import(foo, bar)
specifies that all exported variables in the packages foo and
bar are to be imported. If only some of the variables from a
package are needed, then they can be imported using importFrom.
The directive
importFrom(foo, f, g)
specifies that the exported variables f and g of the
package foo are to be imported.
If a package only needs one function from another package it can use a
fully qualified variable reference in the code instead of a formal
import. A fully qualified reference to the function f in package
foo is of the form foo::f. This is less efficient than a
formal import and also loses the advantage of recording all dependencies
in the NAMESPACE file, so this approach is usually not
recommended. Evaluating foo::f will cause package foo to
be loaded, but not attached, if it was not loaded already.
The standard method for S3-style UseMethod dispatching might fail
to locate methods defined in a package that is imported but not attached
to the search path. To ensure that these methods are available the
packages defining the methods should ensure that the generics are
imported and register the methods using S3method directives. If
a package defines a function print.foo intended to be used as a
print method for class foo, then the directive
S3method(print, foo)
ensures that the method is registered and available for UseMethod
dispatch. The function print.foo does not need to be exported.
Since the generic print is defined in base it does not need
to be imported explicitly. This mechanism is intended for use with
generics that are defined in a name space. Any methods for a generic
defined in a package that does not use a name space should be exported,
and the package defining and exporting the methods should be attached to
the search path if the methods are to be found.
There are a number of hooks that apply to packages with name spaces.
See help(".onLoad") for more details.
Packages with name spaces do not use the .First.lib function.
Since loading and attaching are distinct operations when a name space is
used, separate hooks are provided for each. These hook functions are
called .onLoad and .onAttach. They take the same
arguments as .First.lib; they should be defined in the name space
but not exported.
However, packages with name spaces do use the .Last.lib
function. There is also a hook .onUnload which is called when
the name space is unloaded (via a call to unloadNamespace) with
argument the full path to the directory in which the package was
installed. .onUnload should be defined in the name space and not
exported, but .Last.lib does need to be exported.
Packages are not likely to need .onAttach (except perhaps for a
start-up banner); code to set options and load shared objects should be
placed in a .onLoad function, or use made of the useDynLib
directive described next.
There can be one or more useDynLib directives which allow shared
objects that need to be loaded to be specified in the NAMESPACE
file. The directive
useDynLib(foo)
registers the shared object foo for loading with
library.dynam. Loading of registered object(s) occurs after the
package code has been loaded and before running the load hook function.
Packages that would only need a load hook function to load a shared
object can use the useDynLib directive instead.
User-level hooks are also available: see the help on function
setHook.
The useDynLib directive also accepts the names of the native
routines that are to be used in R via the .C, .Call,
.Fortran and .External interface functions. These are given as
additional arguments to the directive, for example,
useDynLib(foo, myRoutine, myOtherRoutine)
By specifying these names in the useDynLib directive, the
native symbols are resolved when the package is loaded and R variables
identifying these symbols are added to the package's name space with
these names. These can be used in the .C, .Call,
.Fortran and .External calls in place of the
name of the routine and the PACKAGE argument.
For instance, we can call the routine myRoutine from R
with the code
.Call(myRoutine, x, y)
rather than
.Call("myRoutine", x, y, PACKAGE = "foo")
There are at least two benefits to this approach. Firstly, the symbol lookup is done just once for each symbol rather than each time it the routine is invoked. Secondly, this removes any ambiguity in resolving symbols that might be present in several compiled libraries. In particular, it allows for correctly resolving routines when different versions of the same package are loaded concurrently in the same R session.
In some circumstances, there will already be an R variable in the
package with the same name as a native symbol. For example, we may have
an R function in the package named myRoutine. In this case,
it is necessary to map the native symbol to a different R variable
name. This can be done in the useDynLib directive by using named
arguments. For instance, to map the native symbol name myRoutine
to the R variable myRoutine_sym, we would use
useDynLib(foo, myRoutine_sym = myRoutine, myOtherRoutine)
We could then call that routine from R using the command
.Call(myRoutine_sym, x, y)
Symbols without explicit names are assigned to the R variable with that name.
In some cases, it may be preferable not to create R variables in the
package's name space that identify the native routines. It may be too
costly to compute these for many routines when the package is loaded
if many of these routines are not likely to be used. In this case,
one can still perform the symbol resolution correctly using the DLL,
but do this each time the routine is called. Given a reference to the
DLL as an R variable, say dll, we can call the routine
myRoutine using the expression
.Call(dll$myRoutine, x, y)
The $ operator resolves the routine with the given name in the
DLL using a call to getNativeSymbol. This is the same
computation as above where we resolve the symbol when the package is
loaded. The only difference is that this is done each time in the case
of dll$myRoutine.
In order to use this dynamic approach (e.g., dll$myRoutine), one
needs the reference to the DLL as an R variable in the package. The
DLL can be assigned to a variable by using the variable =
dllName format used above for mapping symbols to R variables. For
example, if we wanted to assign the DLL reference for the DLL
foo in the example above to the variable myDLL, we would
use the following directive in the NAMESPACE file:
myDLL = useDynLib(foo, myRoutine_sym = myRoutine, myOtherRoutine)
Then, the R variable myDLL is in the package's name space and
available for calls such as myDLL$dynRoutine to access routines
that are not explicitly resolved at load time.
If the package has registration information (see Registering native routines), then we can use that directly rather than specifying the
list of symbols again in the useDynLib directive in the
NAMESPACE file. Each routine in the registration information is
specified by giving a name by which the routine is to be specified along
with the address of the routine and any information about the number and
type of the parameters. Using the .registration argument of
useDynLib, we can instruct the name space mechanism to create
R variables for these symbols. For example, suppose we have the
following registration information for a DLL named myDLL:
R_CMethodDef cMethods[] = {
{"foo", &foo, 4, {REALSXP, INTSXP, STRSXP, LGLSXP}},
{"bar_sym", &bar, 0},
{NULL, NULL, 0}
};
R_CallMethodDef callMethods[] = {
{"R_call_sym", &R_call, 4},
{"R_version_sym", &R_version, 0},
{NULL, NULL, 0}
};
Then, the directive in the NAMESPACE file
useDynLib(myDLL, .registration = TRUE)
causes the DLL to be loaded and also for the R variables foo,
bar_sym, R_call_sym and R_version_sym to be
defined in the package's name space.
Note that the names for the R variables are taken from the entry in the
registration information and do not need to be the same as the name of
the native routine. This allows the creator of the registration
information to map the native symbols to non-conflicting variable names
in R, e.g. R_version to R_version_sym for use in an R
function such as
R_version <- function()
{
.Call(R_version_sym)
}
More information about this symbol lookup, along with some approaches for customizing it, is available from http://www.omegahat.org/examples/RDotCall.
As an example consider two packages named foo and bar. The R code for package foo in file foo.R is
x <- 1 f <- function(y) c(x,y) foo <- function(x) .Call("foo", x, PACKAGE="foo") print.foo <- function(x, ...) cat("<a foo>\n")
Some C code defines a C function compiled into DLL foo (with an
appropriate extension). The NAMESPACE file for this package is
useDynLib(foo) export(f, foo) S3method(print, foo)
The second package bar has code file bar.R
c <- function(...) sum(...) g <- function(y) f(c(y, 7)) h <- function(y) y+9
and NAMESPACE file
import(foo) export(g, h)
Calling library(bar) loads bar and attaches its exports to
the search path. Package foo is also loaded but not attached to
the search path. A call to g produces
> g(6)
[1] 1 13
This is consistent with the definitions of c in the two settings:
in bar the function c is defined to be equivalent to
sum, but in foo the variable c refers to the
standard function c in base.
To summarize, converting an existing package to use a name space involves several simple steps:
export directives.
S3method declarations.
require calls by
import directives.
.First.lib functions with .onLoad functions or
useDynLib directives.
Some code analysis tools to aid in this process are currently under development.
Some additional steps are needed for packages which make use of formal
(S4-style) classes and methods (unless these are purely used
internally). There needs to be an .onLoad action to
ensure that the methods package is loaded and attached:
.onLoad <- function(lib, pkg) require(methods)
and any classes and methods which are to be exported need to be declared as such in the NAMESPACE file. For example, the now-defunct mle package had
importFrom(graphics, plot)
importFrom(stats, profile, confint)
exportClasses("mle", "profile.mle", "summary.mle")
exportMethods("confint", "plot", "profile", "summary", "show")
All formal classes need to be listed in an exportClasses
directive. All generics for which formal methods are defined need to be
declared in an exportMethods directive, and where the generics
are formed by taking over existing functions, those functions need to be
imported (explicitly unless they are defined in the base
namespace).
In addition, a package using classes and methods defined in another package needs to import them, with directives
importClassesFrom(package, ...)
importMethodsFrom(package, ...)
listing the classes and functions with methods respectively. Suppose we
had two small packages A and B with B using A.
Then they could have NAMESPACE files
export(f1, ng1) exportMethods("[") exportClasses(c1)
and
importFrom(A, ng1) importClassesFrom(A, c1) importMethodsFrom(A, f1) export(f4, f5) exportMethods(f6, "[") exportClasses(c1, c2)
respectively.
R CMD check provides a basic set of checks, but often further
problems emerge when people try to install and use packages submitted to
CRAN – many of these involve compiled code. Here are some
further checks that you can do to make your package more portable.
gcc can be used
with options -Wall -pedantic to alert you to potential
problems. Do not be tempted to assume that these are pure pedantry: for
example R is regularly used on platforms where the C compiler does
not accept C++ comments.
long in C will be 32-bit
on most R platforms (including those used by the CRAN
maintainers), but 64-bit on many modern Unix and Linux platforms. It is
rather unlikely that the use of long in C code has been thought
through: if you need a longer type than int you should use a
configure test for a C99 type such as int_fast64_t (and failing
that, long long) and typedef your own type to be long or
long long, or use another suitable type (such as size_t).
Note that integer in FORTRAN corresponds to int
in C on all R platforms.
extern in all but one of the files.
nm -pg mypkg.so # or other extension such as `.sl' or `.dylib'
and checking if any of the symbols marked U is unexpected is a
good way to avoid this.
nm -pg), and to use unusual names, as
well as ensuring you have used the PACKAGE argument that R
CMD check checks for.
Now that diagnostic messages can be made available for translation, it is important to write them in a consistent style. Using the tools described in the next section to extract all the messages can give a useful overview of your consistency (or lack of it).
Some guidelines follow.
In R error messages do not construct a message with paste (such
messages will not be translated) but via multiple arguments to
stop or warning, or via gettextf.
sQuote or dQuote except where the argument is a
variable.
Conventionally single quotation marks are used for quotations such as
'ord' must be a positive integer, at most the number of knots
and double quotation marks when referring to an R character string such as
'format' must be "normal" or "short" - using "normal"
Since ASCII does not contain directional quotation marks, it
is best to use `'' and let the translator (including automatic
translation) use directional quotations where available. The range of
quotation styles is immense: unfortunately we cannot reproduce them in a
portable texinfo document. But as a taster, some languages use
`up' and `down' (comma) quotes rather than left or right quotes, and
some use guillemets (and some use what Adobe calls `guillemotleft' to
start and others use it to end).
library
if((length(nopkgs) > 0) && !missing(lib.loc)) {
if(length(nopkgs) > 1)
warning("libraries ",
paste(sQuote(nopkgs), collapse = ", "),
" contain no packages")
else
warning("library ", paste(sQuote(nopkgs)),
" contains no package")
}
and was replaced by
if((length(nopkgs) > 0) && !missing(lib.loc)) {
pkglist <- paste(sQuote(nopkgs), collapse = ", ")
msg <- sprintf(ngettext(length(nopkgs),
"library %s contains no packages",
"libraries %s contain no packages"),
pkglist)
warning(msg, domain=NA)
}
Note that it is much better to have complete clauses as here, since in another language one might need to say `There is no package in library %s' or `There are no packages in libraries %s'.
R 2.1.0 introduced mechanisms to translate the R- and C-level error and warning messages, and these have been implemented in package splines as a demonstration. There are only available if R is compiled with NLS support (which is requested by configure option --enable-nls, the default).
The procedures make use of msgfmt and xgettext which are
part of GNU gettext and this will need to be installed:
Windows users can find pre-compiled binaries at the GNU
archive mirrors and packaged with the poEdit package
(http://poedit.sourceforge.net/download.php#win32).
The process of enabling translations is
#include <R.h> /* to include Rconfig.h */
#ifdef ENABLE_NLS
#include <libintl.h>
#define _(String) dgettext ("pkg", String)
/* replace pkg as appropriate */
#else
#define _(String) (String)
#endif
_(...),
for example
error(_("'ord' must be a positive integer"));
xgettext --keyword=_ -o pkg.pot *.c
The file src/pkg.pot is the template file, and
conventionally this is shipped as po/pkg.pot. A translator
to another language makes a copy of this file and edits it (see the
gettext manual) to produce say ll.po, where ll
is the code for the language in which the translation is to be used.
(This file would be shipped in the po directory.) Next run
msgfmt on ll.po to produce ll.mo, and
copy that to inst/po/ll/LC_MESSAGES/pkg.mo. Now when
the package is loaded after installation it will look for translations
of its messages in the po/lang/LC_MESSAGES/pkg.mo file
for any language lang that matches the user's preferences (via the
setting of the LANGUAGE environment variable or from the locale
settings).
Mechanisms to support the automatic translation of R stop,
warning and message messages are in place, provided the
package has a namespace. They make use of message catalogs in the same
way as C-level messages, but using domain R-pkg rather than
pkg. Translation of character strings inside stop,
warning and message calls is automatically enabled, as
well as other messages enclosed in calls to gettext or
gettextf. (To suppress this, use argument domain=NA.)
Tools to prepare the R-pkg.pot file are provided in package
tools: xgettext2pot will prepare a file from all strings
occurring inside gettext/gettextf, stop,
warning and message calls. Some of these are likely to be
spurious and so the file is likely to need manual editing.
xgettext extracts the actual calls and so is more useful when
tidying up error messages.
Translation of messages which might be singular or plural can be very
intricate: languages can have up to four different forms. The R
function ngettext provides an interface to the C function of the
same name, and will choose an appropriate singular or plural form for
the selected language depending on the value of its first argument
n.
Packages without namespaces will need to use domain="R-pkg"
explicitly in calls to stop, warning, message,
gettext/gettextf and ngettext.
The DESCRIPTION file has an optional field Type which if
missing is assumed to be Package, the sort of extension discussed
so far in this chapter. Currently two other types are recognized, both
of which need write permission in the R installation tree.
This is a rather general mechanism, designed for adding new front-ends
such as the gnomeGUI package. If a configure file is found
in the top-level directory of the package it is executed, and then if a
Makefile is found (often gener\ated by configure),
make is called. If R CMD INSTALL --clean is used
make clean is called. No other action is taken.
R CMD build can package up this type of extension, but R
CMD check will check the type and skip it.
Conventionally, a translation package for language ll is called
Translation-ll and has Type: Translation. It needs
to contain the directories share/locale/ll and
library/pkgname/po/ll, or at least those for
which translations are available. The files .mo are installed in
the parallel places in the R installation tree.
For example, a package Translation-it might be prepared from an installed (and tested) version of R by
mkdir Translation-it
cd Translation-it
(cd $R_HOME; tar cf - share/locale/it library/*/po/it) | tar xf -
# the next step is not needed on Windows
msgfmt -c -o share/locale/it/LC_MESSAGES/RGui.mo $R_SRC_HOME/po/RGui-it.gmo
# create a DESCRIPTION file
cd ..
R CMD build Translation-it
It is probably appropriate to give the package a version number based on the version of R which has been translated. So the DESCRIPTION file might look like
Package: Translation-it
Type: Translation
Version: 2.2.1-1
Title: Italian Translations for R 2.2.1
Description: Italian Translations for R 2.2.1
Author: The translators
Maintainer: Some Body <somebody@some.where.net>
Licence: GPL Version 2 or later.