| penguins {datasets} | R Documentation |
Measurements of Penguins near Palmer Station, Antarctica
Description
Data on adult penguins covering three species found on three islands in the Palmer Archipelago, Antarctica, including their size (flipper length, body mass, bill dimensions), and sex.
The columns of penguins are a subset of the more extensive
penguins_raw data frame, which includes nesting observations
and blood isotope data. There are differences in the column names
and data types. See the ‘Format’ section for details.
Usage
penguins
penguins_raw
Format
penguins is a data frame with 344 rows and 8 variables:
speciesfactor, with levelsAdelie,Chinstrap, andGentooislandfactor, with levelsBiscoe,Dream, andTorgersenbill_lennumeric, bill length (millimeters)bill_depnumeric, bill depth (millimeters)flipper_leninteger, flipper length (millimeters)body_massinteger, body mass (grams)sexfactor, with levelsfemaleandmaleyearinteger, study year: 2007, 2008, or 2009
penguins_raw is a data frame with 344 rows and 17 variables.
8 columns correspond to columns in penguins,
though with different variable names and/or classes:
SpeciescharacterIslandcharacterCulmen Length (mm)numeric, bill lengthCulmen Depth (mm)numeric, bill depthFlipper Length (mm)numeric, flipper lengthBody Mass (g)numeric, body massSexcharacterDate EggDate, when study nest observed with 1 egg. The year component is theyearcolumn inpenguins
There are 9 further columns in penguins_raw:
studyNamecharacter, expedition during which the data was collectedSample Numbernumeric, continuous numbering sequence for each sampleRegioncharacter, the region of Palmer LTER sampling gridStagecharacter, denoting reproductive stage at samplingIndividual IDcharacter, unique ID for each individual in datasetClutch Completioncharacter, if the study nest was observed with a full clutch, i.e., 2 eggsDelta 15 N (o/oo)numeric, the ratio of stable isotopes 15N:14NDelta 13 C (o/oo)numeric, the ratio of stable isotopes 13C:12CCommentscharacter, additional relevant information
Details
Gorman, Williams, and Fraser (2014) used the data to study sex dimorphism separately for the three species.
Horst, Presmanes Hill, and Gorman (2022) popularized the data as
an illustration for different statistical methods, as an alternative
to the iris data.
Kaye, Turner, Gorman, Horst, and Hill (2025) provide the scripts used to create these data sets from the original source data, and a notebook reproducing results from Gorman et al. (2014).
Note
These data sets are also available in the palmerpenguins package. See the package website for further details and resources.
The penguins data has some shorter variable names than the palmerpenguins version,
for compact code and data display.
Source
- Adélie penguins:
-
Palmer Station Antarctica LTER, Gorman K (2020). “Structural size measurements and isotopic signatures of foraging among adult male and female Adélie penguins (Pygoscelis adeliae) nesting along the Palmer Archipelago near Palmer Station, 2007-2009.” doi:10.6073/pasta/98b16d7d563f265cb52372c8ca99e60f.
- Gentoo penguins:
-
Palmer Station Antarctica LTER, Gorman K (2020). “Structural Size Measurements and Isotopic Signatures of Foraging among Adult Male and Female Gentoo Penguin (Pygoscelis papua) Nesting along the Palmer Archipelago near Palmer Station, 2007-2009.” doi:10.6073/pasta/7fca67fb28d56ee2ffa3d9370ebda689.
- Chinstrap penguins:
-
Palmer Station Antarctica LTER, Gorman K (2020). “Structural Size Measurements and Isotopic Signatures of Foraging among Adult Male and Female Chinstrap Penguin (Pygoscelis antarctica) Nesting along the Palmer Archipelago near Palmer Station, 2007-2009.” doi:10.6073/pasta/c14dfcfada8ea13a17536e73eb6fbe9e.
The title naming convention for the source for the Gentoo and Chinstrap data is that same as for Adélie penguins.
References
Gorman KB, Williams TD, Fraser WR (2014). “Ecological Sexual Dimorphism and Environmental Variability within a Community of Antarctic Penguins (Genus Pygoscelis).” PLoS ONE, 9(3), e90081. doi:10.1371/journal.pone.0090081.
Henderson HV, Velleman PF (1981). “Building Multiple Regression Models Interactively.” Biometrics, 37(2), 391–411. doi:10.2307/2530428.
Horst AM, Presmanes Hill A, Gorman KB (2022). “Palmer Archipelago Penguins Data in the palmerpenguins R Package - An Alternative to Anderson’s Irises.” The R Journal, 14(1), 244–254. doi:10.32614/RJ-2022-020.
Kaye E, Turner H, Gorman KB, Horst A, Hill A (2025). “Preparing the Palmer Penguins Data for the datasets Package in R.” doi:10.5281/zenodo.14902740.
Examples
## view summaries
summary(penguins)
summary(penguins_raw) # not useful for character vectors
## convert character vectors to factors first
dFactor <- function(dat) {
dat[] <- lapply(dat, \(.) if (is.character(.)) as.factor(.) else .)
dat
}
summary(dFactor(penguins_raw))
## visualise distribution across factors
plot(island ~ species, data = penguins)
plot(sex ~ interaction(island, species, sep = "\n"), data = penguins)
## bill depth vs. length by species (color) and sex (symbol):
## positive correlations for all species, males tend to have bigger bills
sym <- c(1, 16)
pal <- c("darkorange","purple","cyan4")
plot(bill_dep ~ bill_len, data = penguins, pch = sym[sex], col = pal[species])
## simplified sex dimorphism analysis for Adelie species:
## proportion of males increases with several size measurements
adelie <- subset(penguins, species == "Adelie")
plot(sex ~ bill_len, data = adelie)
plot(sex ~ bill_dep, data = adelie)
plot(sex ~ body_mass, data = adelie)
m <- glm(sex ~ bill_len + bill_dep + body_mass, data = adelie, family = binomial)
summary(m)
## Produce the long variable names as from {palmerpenguins} pkg:
long_nms <- sub("len", "length_mm",
sub("dep","depth_mm",
sub("mass", "mass_g", colnames(penguins))))
## compare long and short names:
noquote(rbind(long_nms, nms = colnames(penguins)))
## Not run: # << keeping shorter 'penguins' names in this example:
colnames(penguins) <- long_nms
## End(Not run)