This vignette provides a minimal introduction to the
realestatebr package, showing how to use its core
functions. Since realestatebr returns tibble
as default values, we recommend using it together with the
dplyr package, though conversion do data.table
is trivial.
The code below defines a common theme for all plots in this vignette and is required to fully replicate the code in this document. Despite this, this code is entirely optional and can be omitted.
library(ggplot2)
color_palette <- c(
"#1E3A5F",
"#DD6B20",
"#2C7A7B",
"#D69E2E",
"#805AD5",
"#C53030"
)
theme_series <- function() {
theme_minimal(
# swap for other font if needed
base_family = "Avenir",
base_size = 10
) +
theme(
plot.title = element_text(size = 16),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
axis.line.x = element_line(color = "gray10", linewidth = 0.5),
axis.ticks.x = element_line(color = "gray10", linewidth = 0.5),
axis.title.x = element_blank(),
legend.position = "bottom",
palette.color.discrete = color_palette
)
}realestatebr provides a unified interface to Brazilian
real estate data from multiple public sources. All datasets are returned
as tidy tibble objects.
The goal of realestatebr is to provide a unified
interface to Brazilian real estate data from multiple public sources.
All datasets are returned as tidy tibble objects. The
package is centered around a key function:
get_dataset(name, table) which retrieves any dataset by
name. Without a table argument it returns the default
table; use table to select a specific sub-table.
get_dataset() main function to retrieve
datasets.# Default table
abecip <- get_dataset("abecip")
# Specific table
sbpe <- get_dataset("abecip", table = "units")In order to explore which datasets are available, use
list_datasets() and get_dataset_info().
list_datasets() returns a catalogue of
all available datasets and their tables.get_dataset_info() shows available
tables and metadata for a given dataset.source ArgumentThe source argument from get_dataset()
controls where data comes from. The default ("auto") reads
the in-session memo if present, falls back to the package’s GitHub
release, and finally falls back to a fresh download from the original
source. Typically the default is fine. Use "github" to
force the pre-processed asset, or "fresh" to always pull
from the original source (slower but guaranteed up-to-date).
get_dataset("abecip", source = "github") # pre-processed asset from GitHub release
get_dataset("abecip", source = "fresh") # direct from the original sourceRepeated calls within one R session are served from an in-memory
memo, so fetching the same dataset twice does not re-download. Use
clear_session_cache() to drop the memo without restarting
R.
SBPE (Sistema Brasileiro de Poupança e Empréstimo) is the primary
funding mechanism for residential mortgages in Brazil. The table
sbpe fromabecip` tracks the deposits and
withdrawals from saving accounts, that help finance real estate
construction and acquisition.
The plot below shows the annual net savings flow in recent years.
# Annual net credit flow
sbpe_annual <- sbpe |>
filter(date >= as.Date("2019-01-01")) |>
mutate(year = lubridate::year(date)) |>
summarise(net_flow = sum(sbpe_netflow, na.rm = TRUE) / 1e3, .by = year) |>
mutate(
label_num = format(round(net_flow, 1)),
ypos = if_else(net_flow > 0, net_flow + 10, net_flow - 10)
)
ggplot(sbpe_annual, aes(year, net_flow)) +
geom_col(fill = color_palette[1], alpha = 0.9, width = 0.8) +
geom_text(aes(y = ypos, label = label_num), size = 3) +
geom_hline(yintercept = 0) +
scale_x_continuous(breaks = 2019:2026) +
labs(
title = "Annual Net Savings Flow (SBPE)",
x = NULL,
y = "R$ billions"
) +
theme_series()The companion table "units" contains monthly counts of
financed units.
The plot shows the amount of units financed per month together with a LOESS trend line.
# SBPE units financed per year
units_recent <- units |>
filter(date >= as.Date("2019-01-01"))
ggplot(units_recent, aes(date, units_total)) +
geom_point(alpha = 0.5, size = 0.8, color = color_palette[1]) +
geom_smooth(
color = color_palette[1],
lwd = 0.7,
se = FALSE,
method = stats::loess,
method.args = list(span = 0.4)
) +
scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
labs(
title = "Monthly Financed Units",
y = "Units"
) +
theme_series()The bcb_realestate dataset imports all real estate
statistics from the Brazilian Central Bank. This is a relatively large
dataset and exploring can be cumbersome. Each series is uniquely
identified by date and series_info. Helper
functions v1, v2, …, v5,
abbrev_state, category, and type
are provided to simplify the use of the dataset.
The code below shows how to access a specific series and also how to fetch a group of related series.
bcb <- get_dataset("bcb_realestate")
# Get a specific series
sfh_pf <- bcb |>
filter(series_info == "credito_estoque_carteira_credito_pf_sfh_br")
# Get the all the related series for 'estoque_carteira_credito_pf'
credit_stock <- bcb |>
filter(
category == "credito",
type == "estoque",
v1 == "carteira",
v2 == "credito",
v3 == "pf",
# since v4 is left blank, we get all credit lines
v5 == "br"
)
# The helper columns essentially separate the 'series_info' column allowing
# for easier filtering. It's equivalent to filtering by regex
credit_stock <- bcb |>
filter(grepl(
"(?<=credito_estoque_carteira_credito_pf_).+_br$",
series_info,
perl = TRUE
))The single series shows only the values from SFH (specific credit line).
ggplot(sfh_pf, aes(date, value / 1e9)) +
geom_line(lwd = 0.7, color = color_palette[1]) +
labs(title = "SFH", y = "R$ (billions)") +
theme_series()The grouped series show the entire household credit stock by credit line.
credit_labels <- c(
"Home Equity" = "home-equity",
"Comercial" = "comercial",
"Livre" = "livre",
"FGTS" = "fgts",
"SFH" = "sfh"
)
credit_stock <- credit_stock |>
mutate(
credit_line_label = factor(
v4,
levels = credit_labels,
labels = names(credit_labels)
)
)
ggplot(credit_stock, aes(date, value / 1e9)) +
geom_area(aes(fill = credit_line_label), alpha = 0.9) +
scale_fill_manual(values = rev(color_palette[1:5])) +
scale_x_date(expand = expansion(mult = c(0.01))) +
scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
labs(
title = "Real Estate Credit Stock",
subtitle = "Household real estate credit stock (total debt) by credit line",
y = "R$ (billions)",
fill = NULL
) +
theme_series()As a final warning, note that the bcb_realestate dataset
follows the YYYY-MM-DD format using the last day of the
month as default value (e.g. 2023-01-31). This can cause
issues when merging with other datasets, since the first day of the
month is the more common date format (e.g. 2023-01-01).
To avoid this, use lubridate::floor_date(date, 'month').
Future versions of realestatebr might provide this as a
default behavior.
The available datasets are listed below.
| Dataset | Source | Tables | Status |
|---|---|---|---|
abecip |
ABECIP | sbpe, units, cgi |
Active |
abrainc |
ABRAINC / FIPE | indicator, radar,
leading |
Active |
bcb_realestate |
Banco Central do Brasil | accounting, application,
indices, sources, units |
Active |
bcb_series |
Banco Central do Brasil | core, primary, secondary,
tertiary, full |
Active |
fgv_ibre |
FGV IBRE | — | Active |
rppi |
FIPE/ZAP, IVGR, IGMI, IQA, IVAR, SECOVI-SP | sale, rent, fipezap,
ivgr, igmi, iqa,
iqaiw, ivar, secovi_sp |
Active |
rppi_bis |
Bank for International Settlements | selected, detailed_monthly,
detailed_quarterly, detailed_annual,
detailed_halfyearly |
Active |
secovi |
SECOVI-SP | condo, rent, launch,
sale |
Active |