We start by downloading the Chilean Census 2017 from ECLAC website (link valid as of 2024-10-01):
"https://redatam.org/cdr/descargas/censos/poblacion/CP2017CHL.zip"
url <- "CP2017CHL.zip"
zip <-
if (!file.exists(zip)) {
download.file(url, zip, method = "wget")
}
Now we can extract the files:
# install.packages("archive")
basename(zip)
dout <- sub("\\.zip$", "", dout)
dout <-::archive_extract(zip, dir = dout) archive
You can use unzip()
from base R, but this file in particular gave me an error. The archive
package is a wrapper around ‘libarchive’ that provides multi-format archive and compression support.
The REDATAM files are now stored in the CP2017CHL
directory. We can read the REDATAM dictionary file (DIC or DICX for this particular case):
library(redatam)
"chile2017.rds"
fout <-
if (!file.exists(fout)) {
read_redatam("CP2017CHL/BaseOrg16/CPV2017-16.dicx")
chile2017 <-saveRDS(chile2017, fout)
else {
} readRDS(fout)
chile2017 <- }
One of the many possibilities with this census is to obtain the number of houses with overcrowding. For this, the Secretary for Social Development and Family (Ministerio de Desarrollo Social y Familia) divides the number of people residing in a dwelling and the number of bedrooms in the dwelling, with the special case of adding one to studio apartments and similar units, and the result is discretized as follows.
According to the census documentation in the previous ZIP file, this consists in dividing the variables cant_pers
and p04
from the vivienda
(housing) table to then discretize the result. The documentation also states that we must join the vivienda
table with zonaloc
(zones), area
, distrito
(district) and communa
(municipality) to match each house with its corresponding municipality. This can be done with dplyr
:
library(dplyr)
chile2017$comuna %>%
overcrowding <- select(ncomuna, comuna_ref_id) %>%
inner_join(
$distrito %>%
chile2017 select(distrito_ref_id, comuna_ref_id)
%>%
) inner_join(
$area %>%
chile2017 select(area_ref_id, distrito_ref_id)
%>%
) inner_join(
$zonaloc %>%
chile2017 select(zonaloc_ref_id, area_ref_id)
%>%
) inner_join(
$vivienda %>%
chile2017 select(zonaloc_ref_id, vivienda_ref_id, cant_per, p04) %>%
mutate(
p04 = case_when(
== 98 ~ NA_integer_,
p04 == 99 ~ NA_integer_,
p04 TRUE ~ p04
)%>%
) filter(!is.na(p04))
%>%
) mutate(
overcrowding = case_when(
>=1 ~ cant_per / p04,
p04 ==0 ~ cant_per / (p04 + 1)
p04
)%>%
) mutate(
overcrowding_discrete = case_when(
< 2.5 ~ "No Overcrowding",
overcrowding >= 2.5 & overcrowding < 3.5 ~ "Mean",
overcrowding >= 3.5 & overcrowding < 5 ~ "High",
overcrowding >= 5 ~ "Critical"
overcrowding
)%>%
) group_by(comuna = ncomuna, overcrowding_discrete) %>%
count()
Now we can filter for any municipality of our interest, for example:
%>%
overcrowding filter(comuna == "VITACURA")
%>%
overcrowding filter(comuna == "LA PINTANA")
# A tibble: 4 × 3
# Groups: comuna, overcrowding_discrete [4]
comuna overcrowding_discrete n
<fct> <chr> <int>
1 VITACURA Critical 9
2 VITACURA High 18
3 VITACURA Mean 174
4 VITACURA No Overcrowding 26752
# A tibble: 4 × 3
# Groups: comuna, overcrowding_discrete [4]
comuna overcrowding_discrete n
<fct> <chr> <int>
1 LA PINTANA Critical 497
2 LA PINTANA High 1112
3 LA PINTANA Mean 4522
4 LA PINTANA No Overcrowding 39163