match.data function - RDocumentation (2024)

Description

match.data() and get_matches() create a data frame withadditional variables for the distance measure, matching weights, andsubclasses after matching. This dataset can be used to estimate treatmenteffects after matching or subclassification. get_matches() is mostuseful after matching with replacement; otherwise, match.data() ismore flexible. See Details below for the difference between them.

Usage

match.data( object, group = "all", distance = "distance", weights = "weights", subclass = "subclass", data = NULL, include.s.weights = TRUE, drop.unmatched = TRUE)get_matches( object, distance = "distance", weights = "weights", subclass = "subclass", id = "id", data = NULL, include.s.weights = TRUE)

Value

A data frame containing the data supplied in the data argument or in theoriginal call to matchit() with the computedoutput variables appended as additional columns, named according thearguments above. For match.data(), the group anddrop.unmatched arguments control whether only subsets of the data arereturned. See Details above for how match.data() andget_matches() differ. Note that get_matches sorts the data bysubclass and treatment status, unlike match.data(), which uses theorder of the data.

The returned data frame will contain the variables in the original data setor dataset supplied to data and the following columns:

distance

The propensity score, if estimated or supplied to thedistance argument in matchit() as a vector.

weights

The computed matching weights. These must be used in effectestimation to correctly incorporate the matching.

subclass

Matchingstrata membership. Units with the same value are in the same stratum.

id

The ID of each unit, corresponding to the row names in theoriginal data or dataset supplied to data. Only included inget_matches output. This column can be used to identify which rowsbelong to the same unit since the same unit may appear multiple times ifreused in matching with replacement.

Arguments

object

a matchit object; the output of a call to matchit().

group

which group should comprise the matched dataset: "all"for all units, "treated" for just treated units, or "control"for just control units. Default is "all".

distance

a string containing the name that should be given to thevariable containing the distance measure in the data frame output. Defaultis "distance", but "prop.score" or similar might be a goodalternative if propensity scores were used in matching. Ignored if adistance measure was not supplied or estimated in the call tomatchit().

Details

match.data() creates a dataset with one row per unit. It will beidentical to the dataset supplied except that several new columns will beadded containing information related to the matching. Whendrop.unmatched = TRUE, the default, units with weights of zero, whichare those units that were discarded by common support or the caliper or weresimply not matched, will be dropped from the dataset, leaving only thesubset of matched units. The idea is for the output of match.data()to be used as the dataset input in calls to glm() or similar toestimate treatment effects in the matched sample. It is important to includethe weights in the estimation of the effect and its standard error. Thesubclass column, when created, contains pair or subclass membership andshould be used to estimate the effect and its standard error. Subclasseswill only be included if there is a subclass component in thematchit object, which does not occur with matching with replacement,in which case get_matches() should be used. Seevignette("estimating-effects") for information on how to usematch.data() output to estimate effects.

get_matches() is similar to match.data(); the primarydifference occurs when matching is performed with replacement, i.e., whenunits do not belong to a single matched pair. In this case, the output ofget_matches() will be a dataset that contains one row per unit foreach pair they are a part of. For example, if matching was performed withreplacement and a control unit was matched to two treated units, thatcontrol unit will have two rows in the output dataset, one for each pair itis a part of. Weights are computed for each row, and, for control units, are equal to theinverse of the number of control units in each control unit's subclass; treated units get a weight of 1.Unmatched units are dropped. An additional column with unit IDs will becreated (named using the id argument) to identify when the same unitis present in multiple rows. This dataset structure allows for the inclusionof both subclass membership and repeated use of units, unlike the output ofmatch.data(), which lacks subclass membership when matching is donewith replacement. A match.matrix component of the matchitobject must be present to use get_matches(); in some forms ofmatching, it is absent, in which case match.data() should be usedinstead. See vignette("estimating-effects") for information on how touse get_matches() output to estimate effects after matching withreplacement.

Examples

Run this code

data("lalonde")# 4:1 matching w/replacementm.out1 <- matchit(treat ~ age + educ + married + race + nodegree + re74 + re75, data = lalonde, replace = TRUE, caliper = .05, ratio = 4)m.data1 <- match.data(m.out1, data = lalonde, distance = "prop.score")dim(m.data1) #one row per matched unithead(m.data1, 10)g.matches1 <- get_matches(m.out1, data = lalonde, distance = "prop.score")dim(g.matches1) #multiple rows per matched unithead(g.matches1, 10)

Run the code above in your browser using DataLab