-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Effect of latent variable scaling on case influence #89
Comments
Thanks for the example. The differences make senses because the scaling choices should legitimately change the values of some parameters, including the population values if the model is true. Therefore, the standardized changes should also be different. This phenomenon affects only some models and so I think a By the way, even though the gCD values are similar, they are numerically different. This is interesting, though only theoretically because the differences are small. This suggests that gCD is not invariant to parameterization. This makes sense because it is a function of parameters. Maybe somebody has discussed this before? |
Thanks for sharing your thoughts, Shu Fai. Just to clarify, the difference is not just due to differences in population parameter values, but the choice of scaling methods. For example, if there are no latent variables, simply rescaling the observed variables lead to different parameter values (by some constants), but the parameter change is not affected: library(semfindr)
library(lavaan)
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.
mod <-
"
m1 ~ iv1 + iv2
dv ~ m1
"
# Original data
fit <- sem(mod, pa_dat)
fit_rerun <- lavaan_rerun(fit)
#> The expected CPU time is 7.3 second(s).
#> Could be faster if run in parallel.
# New data with rescaled variables
pa_dat_s <- pa_dat
pa_dat_s$m1 <- pa_dat_s$m1 / 5
pa_dat_s$dv <- pa_dat_s$dv * 1.5
fit_s <- sem(mod, pa_dat_s)
fit_s_rerun <- lavaan_rerun(fit_s)
#> The expected CPU time is 2.7 second(s).
#> Could be faster if run in parallel.
# Coefficients are different
rbind(original = coef(fit), rescaled = coef(fit_s))
#> m1~iv1 m1~iv2 dv~m1 m1~~m1 dv~~dv
#> original 0.21487811 0.5216367 0.5172771 0.90294272 1.321043
#> rescaled 0.04297562 0.1043273 3.8795782 0.03611771 2.972346
identical(coef(fit), coef(fit_s))
#> [1] FALSE
# Standardized case influence is the same
fit_est_change <- est_change(fit_rerun)
print(fit_est_change, first = 3)
#>
#> -- Standardized Case Influence on Parameter Estimates --
#>
#> m1~iv1 m1~iv2 dv~m1 m1~~m1 dv~~dv gcd
#> 16 0.052 -0.038 -0.237 -0.004 0.624 0.450
#> 43 -0.403 -0.263 -0.135 0.223 0.120 0.302
#> 65 0.152 0.191 0.363 0.076 0.161 0.221
#>
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 3 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance.
fit_s_est_change <- est_change(fit_s_rerun)
print(fit_s_est_change, first = 3)
#>
#> -- Standardized Case Influence on Parameter Estimates --
#>
#> m1~iv1 m1~iv2 dv~m1 m1~~m1 dv~~dv gcd
#> 16 0.052 -0.038 -0.237 -0.004 0.624 0.450
#> 43 -0.403 -0.263 -0.135 0.223 0.120 0.302
#> 65 0.152 0.191 0.363 0.076 0.161 0.221
#>
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 3 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance.
all.equal(fit_est_change, fit_s_est_change,
check.attributes = FALSE)
#> [1] TRUE Created on 2023-06-25 with reprex v2.0.2 Using the same items as markers, but fixing the loadings to different values, also has no impact, even though it changes the population parameter values: library(semfindr)
library(lavaan)
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.
## The famous Holzinger and Swineford (1939) example
## First loadings
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit_marker1 <- cfa(HS.model, data = HolzingerSwineford1939)
rerun_marker1 <- lavaan_rerun(fit_marker1)
#> The expected CPU time is 21.67 second(s).
#> Could be faster if run in parallel.
inf_marker1 <- influence_stat(rerun_marker1)
print(inf_marker1, what = "parameters", first = 5)
#>
#> -- Standardized Case Influence on Parameter Estimates --
#>
#> visual=~x2 visual=~x3 textual=~x5 textual=~x6 speed=~x8 speed=~x9 x1~~x1
#> 180 -0.010 0.032 0.023 0.021 0.238 0.152 0.040
#> 163 0.372 0.386 -0.207 0.156 -0.076 0.403 0.315
#> 262 -0.038 -0.055 -0.119 0.011 0.121 0.302 -0.076
#> 268 0.347 0.424 -0.023 -0.002 -0.003 0.026 0.713
#> 47 0.158 0.143 0.057 -0.160 0.072 0.166 0.093
#> x2~~x2 x3~~x3 x4~~x4 x5~~x5 x6~~x6 x7~~x7 x8~~x8 x9~~x9 visual~~visual
#> 180 -0.038 -0.075 -0.039 -0.065 0.078 0.086 0.518 -0.159 0.035
#> 163 -0.098 -0.181 -0.093 0.141 0.151 0.055 0.229 0.021 -0.254
#> 262 -0.020 0.054 0.019 0.397 -0.022 0.135 0.582 -0.282 0.028
#> 268 0.058 -0.116 0.012 -0.036 -0.038 -0.019 0.004 -0.037 -0.467
#> 47 0.006 -0.143 -0.040 0.057 0.060 0.123 0.056 -0.110 0.011
#> textual~~textual speed~~speed visual~~textual visual~~speed textual~~speed
#> 180 -0.008 -0.061 -0.138 0.215 -0.245
#> 163 0.205 -0.019 0.046 0.130 0.371
#> 262 -0.015 -0.169 0.049 -0.045 -0.026
#> 268 0.031 -0.027 -0.268 -0.096 0.005
#> 47 0.149 -0.132 -0.333 0.003 -0.099
#> gcd
#> 180 1.141
#> 163 1.047
#> 262 0.811
#> 268 0.644
#> 47 0.600
#>
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 5 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance.
## Second loadings
HS.model2 <- ' visual =~ 2 * x1 + x2 + x3
textual =~ 2 * x4 + x5 + x6
speed =~ 2 * x7 + x8 + x9 '
fit_marker2 <- cfa(HS.model2, data = HolzingerSwineford1939)
# Coefficients are different
rbind(head(coef(fit_marker1)),
head(coef(fit_marker2)))
#> visual=~x2 visual=~x3 textual=~x5 textual=~x6 speed=~x8 speed=~x9
#> [1,] 0.5535003 0.7293702 1.113077 0.9261462 1.179951 1.081530
#> [2,] 1.1069956 1.4587426 2.226153 1.8522922 2.359901 2.163062
rerun_marker2 <- lavaan_rerun(fit_marker2)
#> The expected CPU time is 9.93 second(s).
#> Could be faster if run in parallel.
inf_marker2 <- influence_stat(rerun_marker2)
print(inf_marker2, what = "parameters", first = 5)
#>
#> -- Standardized Case Influence on Parameter Estimates --
#>
#> visual=~x2 visual=~x3 textual=~x5 textual=~x6 speed=~x8 speed=~x9 x1~~x1
#> 180 -0.010 0.032 0.023 0.021 0.238 0.152 0.040
#> 163 0.372 0.386 -0.207 0.156 -0.076 0.403 0.315
#> 262 -0.038 -0.055 -0.120 0.011 0.121 0.302 -0.076
#> 268 0.347 0.424 -0.023 -0.002 -0.003 0.026 0.713
#> 47 0.158 0.143 0.057 -0.160 0.072 0.166 0.093
#> x2~~x2 x3~~x3 x4~~x4 x5~~x5 x6~~x6 x7~~x7 x8~~x8 x9~~x9 visual~~visual
#> 180 -0.038 -0.075 -0.039 -0.065 0.078 0.086 0.518 -0.159 0.035
#> 163 -0.098 -0.181 -0.093 0.141 0.151 0.055 0.229 0.021 -0.254
#> 262 -0.020 0.054 0.019 0.397 -0.022 0.135 0.582 -0.282 0.028
#> 268 0.058 -0.116 0.012 -0.036 -0.038 -0.019 0.004 -0.037 -0.467
#> 47 0.006 -0.143 -0.040 0.057 0.060 0.123 0.056 -0.110 0.011
#> textual~~textual speed~~speed visual~~textual visual~~speed textual~~speed
#> 180 -0.008 -0.061 -0.138 0.215 -0.245
#> 163 0.205 -0.019 0.046 0.130 0.371
#> 262 -0.015 -0.169 0.049 -0.045 -0.026
#> 268 0.031 -0.027 -0.268 -0.096 0.005
#> 47 0.149 -0.132 -0.333 0.003 -0.099
#> gcd
#> 180 1.141
#> 163 1.047
#> 262 0.811
#> 268 0.644
#> 47 0.600
#>
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 5 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance. Created on 2023-06-25 with reprex v2.0.2 I think these are not invariant when the ratio of the parameter value and the standard error is changed, which happens when using different methods of scaling the latent variables. I agree that a |
(@marklhc . I happened to be working on this issue too. Then I found that you posted a follow-up right before I were going to post the following. :) I post it anyway, just for our reference. I will study your post soon. Thanks a lot.) I found that your observation also applied to the choice of the reference indicators (the indicators with loadings fixed to 1): library(semfindr)
library(lavaan)
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit_marker <- cfa(HS.model, data = HolzingerSwineford1939)
rerun_marker <- lavaan_rerun(fit_marker)
#> The expected CPU time is 45.15 second(s).
#> Could be faster if run in parallel.
inf_marker <- influence_stat(rerun_marker)
print(inf_marker, what = "parameters", first = 5)
#>
#> -- Standardized Case Influence on Parameter Estimates --
#>
#> visual=~x2 visual=~x3 textual=~x5 textual=~x6 speed=~x8 speed=~x9 x1~~x1
#> 180 -0.010 0.032 0.023 0.021 0.238 0.152 0.040
#> 163 0.372 0.386 -0.207 0.156 -0.076 0.403 0.315
#> 262 -0.038 -0.055 -0.119 0.011 0.121 0.302 -0.076
#> 268 0.347 0.424 -0.023 -0.002 -0.003 0.026 0.713
#> 47 0.158 0.143 0.057 -0.160 0.072 0.166 0.093
#> x2~~x2 x3~~x3 x4~~x4 x5~~x5 x6~~x6 x7~~x7 x8~~x8 x9~~x9 visual~~visual
#> 180 -0.038 -0.075 -0.039 -0.065 0.078 0.086 0.518 -0.159 0.035
#> 163 -0.098 -0.181 -0.093 0.141 0.151 0.055 0.229 0.021 -0.254
#> 262 -0.020 0.054 0.019 0.397 -0.022 0.135 0.582 -0.282 0.028
#> 268 0.058 -0.116 0.012 -0.036 -0.038 -0.019 0.004 -0.037 -0.467
#> 47 0.006 -0.143 -0.040 0.057 0.060 0.123 0.056 -0.110 0.011
#> textual~~textual speed~~speed visual~~textual visual~~speed textual~~speed
#> 180 -0.008 -0.061 -0.138 0.215 -0.245
#> 163 0.205 -0.019 0.046 0.130 0.371
#> 262 -0.015 -0.169 0.049 -0.045 -0.026
#> 268 0.031 -0.027 -0.268 -0.096 0.005
#> 47 0.149 -0.132 -0.333 0.003 -0.099
#> gcd
#> 180 1.141
#> 163 1.047
#> 262 0.811
#> 268 0.644
#> 47 0.600
#>
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 5 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance.
HS.model.2 <- ' visual =~ NA*x1 + 1*x2 + x3
textual =~ NA*x4 + 1*x5 + x6
speed =~ NA*x7 + 1*x8 + x9 '
fit_marker_2 <- cfa(HS.model.2, data = HolzingerSwineford1939)
rerun_marker_2 <- lavaan_rerun(fit_marker_2)
#> The expected CPU time is 12.04 second(s).
#> Could be faster if run in parallel.
inf_marker_2 <- influence_stat(rerun_marker_2)
print(inf_marker_2, what = "parameters", first = 5)
#>
#> -- Standardized Case Influence on Parameter Estimates --
#>
#> visual=~x1 visual=~x3 textual=~x4 textual=~x6 speed=~x7 speed=~x9 x1~~x1
#> 180 0.010 0.036 -0.023 -0.002 -0.230 -0.087 0.040
#> 163 -0.348 -0.049 0.209 0.366 0.077 0.502 0.315
#> 262 0.038 -0.008 0.120 0.129 -0.119 0.185 -0.076
#> 268 -0.327 0.004 0.023 0.020 0.003 0.030 0.713
#> 47 -0.154 -0.039 -0.056 -0.216 -0.071 0.097 0.093
#> x2~~x2 x3~~x3 x4~~x4 x5~~x5 x6~~x6 x7~~x7 x8~~x8 x9~~x9 visual~~visual
#> 180 -0.038 -0.075 -0.039 -0.065 0.078 0.086 0.518 -0.159 0.009
#> 163 -0.098 -0.181 -0.093 0.141 0.151 0.055 0.229 0.021 0.285
#> 262 -0.020 0.055 0.019 0.397 -0.022 0.135 0.582 -0.282 -0.027
#> 268 0.058 -0.116 0.012 -0.036 -0.038 -0.019 0.004 -0.038 0.122
#> 47 0.006 -0.143 -0.040 0.057 0.060 0.123 0.056 -0.110 0.191
#> textual~~textual speed~~speed visual~~textual visual~~speed textual~~speed
#> 180 0.016 0.295 -0.108 0.317 -0.130
#> 163 -0.012 -0.143 0.275 0.343 0.298
#> 262 -0.137 -0.026 -0.021 0.000 0.008
#> 268 0.008 -0.038 0.049 0.160 -0.001
#> 47 0.210 -0.054 -0.115 0.159 -0.054
#> gcd
#> 180 1.152
#> 163 1.036
#> 262 0.806
#> 268 0.643
#> 47 0.632
#>
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 5 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance.
HS.model.3 <- ' visual =~ NA*x1 + NA*x2 + 1*x3
textual =~ NA*x4 + NA*x5 + 1*x6
speed =~ NA*x7 + NA*x8 + 1*x9 '
fit_marker_3 <- cfa(HS.model.3, data = HolzingerSwineford1939)
rerun_marker_3 <- lavaan_rerun(fit_marker_3)
#> The expected CPU time is 12.04 second(s).
#> Could be faster if run in parallel.
inf_marker_3 <- influence_stat(rerun_marker_3)
# Compare standardized changes and gCD valeus:
print(inf_marker, what = "parameters", first = 5)
#>
#> -- Standardized Case Influence on Parameter Estimates --
#>
#> visual=~x2 visual=~x3 textual=~x5 textual=~x6 speed=~x8 speed=~x9 x1~~x1
#> 180 -0.010 0.032 0.023 0.021 0.238 0.152 0.040
#> 163 0.372 0.386 -0.207 0.156 -0.076 0.403 0.315
#> 262 -0.038 -0.055 -0.119 0.011 0.121 0.302 -0.076
#> 268 0.347 0.424 -0.023 -0.002 -0.003 0.026 0.713
#> 47 0.158 0.143 0.057 -0.160 0.072 0.166 0.093
#> x2~~x2 x3~~x3 x4~~x4 x5~~x5 x6~~x6 x7~~x7 x8~~x8 x9~~x9 visual~~visual
#> 180 -0.038 -0.075 -0.039 -0.065 0.078 0.086 0.518 -0.159 0.035
#> 163 -0.098 -0.181 -0.093 0.141 0.151 0.055 0.229 0.021 -0.254
#> 262 -0.020 0.054 0.019 0.397 -0.022 0.135 0.582 -0.282 0.028
#> 268 0.058 -0.116 0.012 -0.036 -0.038 -0.019 0.004 -0.037 -0.467
#> 47 0.006 -0.143 -0.040 0.057 0.060 0.123 0.056 -0.110 0.011
#> textual~~textual speed~~speed visual~~textual visual~~speed textual~~speed
#> 180 -0.008 -0.061 -0.138 0.215 -0.245
#> 163 0.205 -0.019 0.046 0.130 0.371
#> 262 -0.015 -0.169 0.049 -0.045 -0.026
#> 268 0.031 -0.027 -0.268 -0.096 0.005
#> 47 0.149 -0.132 -0.333 0.003 -0.099
#> gcd
#> 180 1.141
#> 163 1.047
#> 262 0.811
#> 268 0.644
#> 47 0.600
#>
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 5 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance.
print(inf_marker_2, what = "parameters", first = 5)
#>
#> -- Standardized Case Influence on Parameter Estimates --
#>
#> visual=~x1 visual=~x3 textual=~x4 textual=~x6 speed=~x7 speed=~x9 x1~~x1
#> 180 0.010 0.036 -0.023 -0.002 -0.230 -0.087 0.040
#> 163 -0.348 -0.049 0.209 0.366 0.077 0.502 0.315
#> 262 0.038 -0.008 0.120 0.129 -0.119 0.185 -0.076
#> 268 -0.327 0.004 0.023 0.020 0.003 0.030 0.713
#> 47 -0.154 -0.039 -0.056 -0.216 -0.071 0.097 0.093
#> x2~~x2 x3~~x3 x4~~x4 x5~~x5 x6~~x6 x7~~x7 x8~~x8 x9~~x9 visual~~visual
#> 180 -0.038 -0.075 -0.039 -0.065 0.078 0.086 0.518 -0.159 0.009
#> 163 -0.098 -0.181 -0.093 0.141 0.151 0.055 0.229 0.021 0.285
#> 262 -0.020 0.055 0.019 0.397 -0.022 0.135 0.582 -0.282 -0.027
#> 268 0.058 -0.116 0.012 -0.036 -0.038 -0.019 0.004 -0.038 0.122
#> 47 0.006 -0.143 -0.040 0.057 0.060 0.123 0.056 -0.110 0.191
#> textual~~textual speed~~speed visual~~textual visual~~speed textual~~speed
#> 180 0.016 0.295 -0.108 0.317 -0.130
#> 163 -0.012 -0.143 0.275 0.343 0.298
#> 262 -0.137 -0.026 -0.021 0.000 0.008
#> 268 0.008 -0.038 0.049 0.160 -0.001
#> 47 0.210 -0.054 -0.115 0.159 -0.054
#> gcd
#> 180 1.152
#> 163 1.036
#> 262 0.806
#> 268 0.643
#> 47 0.632
#>
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 5 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance.
print(inf_marker_3, what = "parameters", first = 5)
#>
#> -- Standardized Case Influence on Parameter Estimates --
#>
#> visual=~x1 visual=~x2 textual=~x4 textual=~x5 speed=~x7 speed=~x8 x1~~x1
#> 180 -0.031 -0.035 -0.021 0.002 -0.149 0.088 0.040
#> 163 -0.364 0.050 -0.154 -0.358 -0.382 -0.470 0.315
#> 262 0.056 0.008 -0.011 -0.128 -0.290 -0.181 -0.076
#> 268 -0.399 -0.004 0.002 -0.020 -0.026 -0.030 0.713
#> 47 -0.140 0.039 0.161 0.218 -0.162 -0.096 0.093
#> x2~~x2 x3~~x3 x4~~x4 x5~~x5 x6~~x6 x7~~x7 x8~~x8 x9~~x9 visual~~visual
#> 180 -0.038 -0.075 -0.039 -0.065 0.078 0.086 0.518 -0.159 0.070
#> 163 -0.098 -0.181 -0.093 0.141 0.151 0.055 0.229 0.021 0.307
#> 262 -0.020 0.054 0.019 0.397 -0.022 0.135 0.582 -0.282 -0.050
#> 268 0.058 -0.116 0.012 -0.036 -0.038 -0.019 0.004 -0.037 0.172
#> 47 0.006 -0.143 -0.040 0.057 0.060 0.123 0.056 -0.110 0.198
#> textual~~textual speed~~speed visual~~textual visual~~speed textual~~speed
#> 180 0.014 0.146 -0.091 0.316 -0.170
#> 163 0.365 0.559 0.371 0.630 0.620
#> 262 -0.003 0.227 0.005 0.100 0.123
#> 268 0.029 0.005 0.064 0.193 0.017
#> 47 -0.019 0.082 -0.229 0.197 -0.053
#> gcd
#> 180 1.146
#> 163 1.041
#> 262 0.809
#> 268 0.643
#> 47 0.615
#>
#> Note:
#> - Changes are standardized raw changes if a case is included.
#> - Only the first 5 case(s) is/are displayed. Set 'first' to NULL to display all cases.
#> - Cases sorted by generalized Cook's distance.
# Check fit measures to verify that model fits are identical
all.equal(inf_marker[, c("chisq", "cfi")], inf_marker_2[, c("chisq", "cfi")])
#> [1] TRUE
all.equal(inf_marker[, c("chisq", "cfi")], inf_marker_3[, c("chisq", "cfi")])
#> [1] TRUE
# Compare gCD values
plot(inf_marker[, "gcd"], inf_marker_2[, "gcd"]); abline(a = 0, b =1) plot(inf_marker[, "gcd"], inf_marker_3[, "gcd"]); abline(a = 0, b =1) Created on 2023-06-26 with reprex v2.0.2 |
@marklhc , this article may be relevant: Gonzalez, R., & Griffin, D. (2001). Testing parameters in structural equation modeling: Every “one” matters. Psychological Methods, 6(3), 258–269. https://doi.org/10.1037//1082-989X.6.3.258 |
(Edit: 2023-06-26 21:42: Correct a careless-but-harmless mistake in randomly selecting cases. Incorrectly set An illustration based on Gonzalez and Griffin (2001). The target parameter is the factor covariance. Although different reference indicator will lead to different population value, the p-value for the null hypothesis that the factor covariance should be the same. This is the case if this hypothesis is tested by the likelihood ratio test, done at the end of the following illustration, as argued by Gonzalez and Griffin. However, this is not the case when the Wald test is used, as shown below: library(lavaan)
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.
# Use a subset with smaller sample size such that
# the p-value differences are larger, for the ease
# of comparison.
set.seed(87015)
dat <- HolzingerSwineford1939[sample(301, 60), ]
mod1 <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6'
fit1 <- cfa(mod1, data = dat)
mod2 <- ' visual =~ NA*x1 + 1*x2 + x3
textual =~ x4 + x5 + x6'
fit2 <- cfa(mod2, data = dat)
mod3 <- ' visual =~ NA*x1 + NA*x2 + 1*x3
textual =~ x4 + x5 + x6'
fit3 <- cfa(mod3, data = dat)
# Verify that the model chi-squares are identical
all.equal(fitMeasures(fit1, "chisq"),
fitMeasures(fit2, "chisq"),
fitMeasures(fit3, "chisq"))
#> [1] TRUE
# However, the z values and p-values for the factor covariance
# are not the same:
parameterEstimates(fit1)[15, ]
#> lhs op rhs est se z pvalue ci.lower ci.upper
#> 15 visual ~~ textual 0.375 0.153 2.46 0.014 0.076 0.675
parameterEstimates(fit2)[15, ]
#> lhs op rhs est se z pvalue ci.lower ci.upper
#> 15 visual ~~ textual 0.198 0.106 1.866 0.062 -0.01 0.406
parameterEstimates(fit3)[15, ]
#> lhs op rhs est se z pvalue ci.lower ci.upper
#> 15 visual ~~ textual 0.227 0.12 1.89 0.059 -0.008 0.461
# Test the factor covariance by likelihood ratio test,
# as implied by Gonzalez and Griffin (2001)
mod1b <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
visual ~~ 0*textual'
fit1b <- cfa(mod1b, dat)
mod2b <- ' visual =~ NA*x1 + 1*x2 + x3
textual =~ x4 + x5 + x6
visual ~~ 0*textual'
fit2b <- cfa(mod2b, dat)
mod3b <- ' visual =~ NA*x1 + NA*x2 + 1*x3
textual =~ x4 + x5 + x6
visual ~~ 0*textual'
fit3b <- cfa(mod3b, dat)
# Verify that the model chi-squares are identical
all.equal(fitMeasures(fit1b, "chisq"),
fitMeasures(fit2b, "chisq"),
fitMeasures(fit3b, "chisq"))
#> [1] TRUE
# Naturally, the chi-square difference test
# of the factor covariance yield identical p-values
lavTestLRT(fit1, fit1b)
#>
#> Chi-Squared Difference Test
#>
#> Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
#> fit1 8 1034.3 1061.6 19.281
#> fit1b 9 1038.4 1063.6 25.356 6.0745 0.29082 1 0.01371 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lavTestLRT(fit2, fit2b)
#>
#> Chi-Squared Difference Test
#>
#> Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
#> fit2 8 1034.3 1061.6 19.281
#> fit2b 9 1038.4 1063.6 25.356 6.0745 0.29082 1 0.01371 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lavTestLRT(fit3, fit3b)
#>
#> Chi-Squared Difference Test
#>
#> Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
#> fit3 8 1034.3 1061.6 19.281
#> fit3b 9 1038.4 1063.6 25.356 6.0745 0.29082 1 0.01371 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Created on 2023-06-26 with reprex v2.0.2 Given that the standardized changes are computed using the same sampling variances used in Wald test, they are also expected to be affected by the way to set the scale, including the reference indicators selected. P.S.: Not related to |
The p values for the standardized coefficients are invariant. library(lavaan)
#> This is lavaan 0.6-17
#> lavaan is FREE software! Please report any bugs.
set.seed(87015)
dat <- HolzingerSwineford1939[sample(301, 60), ]
mod1 <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6'
fit1 <- cfa(mod1, data = dat)
mod2 <- ' visual =~ NA*x1 + 1*x2 + x3
textual =~ x4 + x5 + x6'
fit2 <- cfa(mod2, data = dat)
mod3 <- ' visual =~ NA*x1 + NA*x2 + 1*x3
textual =~ x4 + x5 + x6'
fit3 <- cfa(mod3, data = dat)
# Verify that the model chi-squares are identical
all.equal(fitMeasures(fit1, "chisq"),
fitMeasures(fit2, "chisq"),
fitMeasures(fit3, "chisq"))
#> [1] TRUE
# The Wald test for the standardized coefficients are the same
standardizedSolution(fit1)[15, ]
#> lhs op rhs est.std se z pvalue ci.lower ci.upper
#> 15 visual ~~ textual 0.439 0.146 3.017 0.003 0.154 0.724
standardizedSolution(fit2)[15, ]
#> lhs op rhs est.std se z pvalue ci.lower ci.upper
#> 15 visual ~~ textual 0.439 0.146 3.017 0.003 0.154 0.724
standardizedSolution(fit3)[15, ]
#> lhs op rhs est.std se z pvalue ci.lower ci.upper
#> 15 visual ~~ textual 0.439 0.146 3.017 0.003 0.154 0.724 Created on 2024-02-21 with reprex v2.1.0 |
This is an observation, but I think users may also want to know about this. When latent variables are present, the computation of the standard errors and thus the test statistics are dependent on the way that the latent variables are scaled. This is probably the reason we also see different results when running
influence_stat()
on models with the latent variables scaled with (a) fixing the first loadings to one and (b) fixing the variances of the latent variables to one. Here is an example:Created on 2023-06-24 with reprex v2.0.2
While the two equivalent models have different sets of parameters, for parameters that are common and involve the latent variables, some of the values are quite different. The good news, at least for this example, is that the gCD values are similar.
What do you think about this? Should we print a message or write a vignette to inform users that standardized case influence could depend on scaling choices?
The text was updated successfully, but these errors were encountered: