This vignette covers all topics concerned with downloading resources from a server in some depth. If you are interested in a quick overview, please have a look at the fhircrackr:intro vignette.
Before running any of the following code, you need to load the
fhircrackr
package:
To download data from a FHIR server, you need to specify which
resources you want to get with a FHIR search request. You can
just define your search request as a simple string that you provide to
fhir_search()
. In that case, however, no checking of
spelling mistakes of resource types and URL encoding will be done for
you. If you are comfortable with this, you can skip the following
paragraph, as the first part of this vignette introduces the basics of
FHIR search and some functions to build valid FHIR search requests with
the fhircrackr
.
A FHIR search request will mostly have the form
[base]/[type]?parameter(s)
, where [base]
is
the base URL to the FHIR server you are trying to access,
[type]
refers to the type of resource you are looking for
and parameter(s)
characterize specific properties those
resources should have. The function fhir_url()
offers a
solution to bring those three components together correctly, taking care
of proper formatting.
In the simplest case, fhir_url()
takes only the base url
and the resource type you are looking for like this:
fhir_url(url = "http://hapi.fhir.org/baseR4", resource = "Patient")
# An object of class "fhir_url"
# [1] "http://hapi.fhir.org/baseR4/Patient"
Internally, the function fhir_resource_type()
is called
to check the type you provided against list of all currently available
resource types can be found at
https://hl7.org/FHIR/resourcelist.html.
Case errors are corrected automatically and the function throws a
warning, if the resource type doesn’t match the list under hl7.org:
fhir_resource_type(string = "Patient") #correct
# A fhir_resource_type object: Patient
fhir_resource_type(string = "medicationstatement") #fixed
# Changing resource type "medicationstatement" into "MedicationStatement".
# A fhir_resource_type object: MedicationStatement
fhir_resource_type(string = "medicationstatement", fix_capitalization = FALSE) #not fixed
# A fhir_resource_type object: medicationstatement
fhir_resource_type(string = "Hospital") #an unknown resource type, a warning is issued
# A fhir_resource_type object: Hospital
# Warning:
# In fhir_resource_type("Hospital") :
# You gave "Hospital" as the resource type.
# This doesn't match any of the resource types defined under
# https://hl7.org/FHIR/resourcelist.html.
# If you are sure the resource type is correct anyway, you can ignore this warning.
Besides telling the server which resource type to give back, the resource type also determines the kinds of search parameters that are allowed. Search parameters are used to further qualify the resources you want to download, e.g by restricting the search result to Patient resources of female patients only.
You can add several parameters to the search request. If you don’t
give any parameters, the search will just return all resources (if not
explicitly limited by the parameter max_bundles
) of the
specified type from the server. Search parameters generally come in the
form key = value
. There are also a number of resource
independent parameters that can be found under
https://www.hl7.org/fhir/search.html#Summary.
These parameters usually have a _
at the beginning.
"_sort" = "status"
for examples sorts the results by their
status, "_include" = "Observation:patient"
includes the
linked Patient resources in a search for Observation resources.
Apart from the resource independent parameters, there are also
resource dependent parameters referring to elements specific to that
resource type. These parameters come without a _
and you
can find a list of them at the end of every resource site e.g. at
https://www.hl7.org/fhir/patient.html#search
for the Patient resource. An example of such a parameter would be
"birthdate" = "lt2000-01-01"
for patients born before the
year 2000 or "gender" = "female"
to get female patients
only.
You can add search parameters to your request via a named list or a named character vector:
request <- fhir_url(
url = "http://hapi.fhir.org/baseR4",
resource = "Patient",
parameters = list(
"birthdate" = "lt2000-01-01",
"code" = "http://loinc.org|1751-1"))
request
# An object of class "fhir_url"
# [1] "http://hapi.fhir.org/baseR4/Patient?birthdate=lt2000-01-01&code=http://
# loinc.org%7C1751-1"
As you can see, fhir_url()
performs automatic url
encoding and the |
is transformed to %7C
.
Whenever you call fhir_url()
or
fhir_search()
, the corresponding FHIR search request will
be saved implicitly and can be accessed with
fhir_current_request()
If you call fhir_search()
without providing an explicit
request, the function will automatically call
fhir_current_request()
.
To download resources from a server, you use the function
fhir_search()
and provide a FHIR search request.
We will start with a very simple example and use
fhir_search()
to download Patient resources from a public
HAPI server:
request <- fhir_url(url = "https://hapi.fhir.org/baseR4", resource = "Patient")
patient_bundles <- fhir_search(request = request, max_bundles = 2, verbose = 0)
In general, a FHIR search request returns a bundle of the
resources you requested. If there are a lot of resources matching your
request, the search result isn’t returned in one big bundle but
distributed over several of them, also called pages, the size
of which is determined by the FHIR server. If the argument
max_bundles
is not set, its default Inf
will
be applied. fhir_search()
will then return all available
bundles/pages, meaning all resources matching your request. If you set
it to 2
as in the example above, the download will stop
after the second bundle. Note that in this case, the result may not
contain all the resources from the server matching your request,
but it can be useful to first look at the first couple of search results
before you download all of them.
If you want to connect to a FHIR server that uses basic
authentication, you can supply the arguments username
and
password
. If the server uses some bearer token
authentication, you can provide the token in the argument
token
. See below for more information on
authentication.
Because servers can sometimes be hard to reach,
fhir_search()
will start five attempts to connect to the
server before it gives up. With the argument
delay_between_attempts
you can control the number of
attempts as well the time interval between them.
As you can see in the next block of code, fhir_search()
returns an object of class fhir_bundle_list
where each
element represents one bundle of resources, so a list of two in our
case:
patient_bundles
# An object of class "fhir_bundle_list"
# [[1]]
# A fhir_bundle_xml object
# No. of entries : 20
# Self Link: http://hapi.fhir.org/baseR4/Patient
# Next Link: http://hapi.fhir.org/baseR4?_getpages=ce958386-53d0-4042-888c-cad53bf5d5a1 ...
#
# {xml_node}
# <Bundle>
# [1] <id value="ce958386-53d0-4042-888c-cad53bf5d5a1"/>
# [2] <meta>\n <lastUpdated value="2021-05-10T12:12:43.317+00:00"/>\n</meta>
# [3] <type value="searchset"/>
# [4] <link>\n <relation value="self"/>\n <url value="http://hapi.fhir.org/b ...
# [5] <link>\n <relation value="next"/>\n <url value="http://hapi.fhir.org/b ...
# [6] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837602"/ ...
# [7] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/example-r ...
# [8] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837624"/ ...
# [9] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837626"/ ...
# [10] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837631"/ ...
# [11] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837716"/ ...
# [12] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837720"/ ...
# [13] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837714"/ ...
# [14] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837721"/ ...
# [15] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837722"/ ...
# [16] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837723"/ ...
# [17] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837724"/ ...
# [18] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/cfsb16116 ...
# [19] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837736"/ ...
# [20] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837737"/ ...
# ...
#
# [[2]]
# A fhir_bundle_xml object
# No. of entries : 20
# Self Link: http://hapi.fhir.org/baseR4?_getpages=ce958386-53d0-4042-888c-cad53bf5d5a1 ...
# Next Link: http://hapi.fhir.org/baseR4?_getpages=ce958386-53d0-4042-888c-cad53bf5d5a1 ...
#
# {xml_node}
# <Bundle>
# [1] <id value="ce958386-53d0-4042-888c-cad53bf5d5a1"/>
# [2] <meta>\n <lastUpdated value="2021-05-10T12:12:43.317+00:00"/>\n</meta>
# [3] <type value="searchset"/>
# [4] <link>\n <relation value="self"/>\n <url value="http://hapi.fhir.org/b ...
# [5] <link>\n <relation value="next"/>\n <url value="http://hapi.fhir.org/b ...
# [6] <link>\n <relation value="previous"/>\n <url value="http://hapi.fhir.o ...
# [7] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837760"/ ...
# [8] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837766"/ ...
# [9] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837768"/ ...
# [10] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837781"/ ...
# [11] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837783"/ ...
# [12] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837784"/ ...
# [13] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837787"/ ...
# [14] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837788"/ ...
# [15] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837789"/ ...
# [16] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837790"/ ...
# [17] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837791"/ ...
# [18] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837792"/ ...
# [19] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837793"/ ...
# [20] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837794"/ ...
# ...
If for some reason you cannot connect to a FHIR server at the moment
but want to explore the bundles anyway, the package provides an example
list of bundles containing Patient resources. See
?patient_bundles
for how to use it.
In many cases, you will want to download different types of FHIR
resources belonging together. For example you might want to download all
MedicationStatement resources with the snomed code
429374003
and also download the Patient resources these
MedicationStatements refer to. The FHIR search request to do this can be
built like this:
request <- fhir_url(
url = "https://hapi.fhir.org/baseR4/",
resource = "MedicationStatement",
parameters = list(
"code" = "http://snomed.info/ct|429374003",
"_include" = "MedicationStatement:subject"))
Then you provide the request to fhir_search()
:
These bundles now contain two types of resources, MedicationStatement resources as well as Patient resources. If you want to have a look at the bundles, it is not very useful to print them to the console. Instead just save them as xml-files to a directory of you choice and look at the resources there:
If you want to have a look at a bundle like this but don’t have
access to a FHIR server at the moment, check out
?medication_bundles
.
If your FHIR server is protected with some kind of bearer token
authentication, fhir_search()
lets you provide the token as
a string or as an object of class Token
from the
httr
package. You can use fhir_authenticate()
to create a token generated by an OAuth2/OpenID Connect process. See
?fhir_authenticate
for more information on that topic.
The default behaviour of fhir_search()
is to send the
FHIR search request as a GET
request to the server. In some
special cases, however, it can be useful to use the POST
based search described
here
instead. This is mostly the case when the URL of you FHIR search request
gets long enough to exceed the allowed url length. A common scenario for
this would be a request querying an explicit list of identifiers. Let’s
for example say you are looking for the following list of patient
identifiers:
ids <- c("72622884-0a09-4ea9-9a91-685bce3b0fe3",
"2ca48b68-a641-4be7-a39d-9ffe2691a29a",
"8bcdd92d-5f96-4e07-9f6a-e22a3591ee30",
"2067558f-c9ed-489a-9c2f-7387bb3426a2",
"5077b4b0-07c9-4d03-b9ec-1f9f218f8239")
You can use them comma separated in the value of the
identifier
search parameter like this:
But this string would make the FHIR search request URL very long, especially if it is combined with additional other search parameters.
In a search via POST, the search parameters (everything that would
usually follow the resource type after the ?) can be transferred to a
body of type application/x-www-form-urlencoded
and sent via
POST. A body of this kind can be created the same way the parameters are
usually given to the parameters
argument of
fhir_url()
, i.e. as a named list or character:
#note the list()-expression
body <- fhir_body(content = list(
"identifier" = id_strings,
"_revinclude" = "Observation:patient"))
The body will then automatically be assigned the content type
application/x-www-form-urlencoded
. If you provide a body
like this in fhir_search()
, the url in request should
only contain the base URL and the resource type. The
function will automatically amend it with the suffix
_search
and perform a POST:
fhir_search()
internally sends a GET
or
POST
request to the server. If anything goes wrong,
e.g. because your request wasn’t valid or the server caused an error,
the result of you request will be a HTTP error.
fhir_search()
will print the error code along with some
suggestions for the most common errors to the console.
To get more detailed information on the error response, you can
either call fhir_recent_http_error()
to print more
information into the console or you can pass a string with a file name
to the argument log_errors
. This will write a log with
error information to the specified file:
There are two ways of saving the FHIR bundles you downloaded: Either you save them as R objects, or you write them to an xml file. This is possible while downloading the bundles or after all bundles have been downloaded. The following section covers saving after downloading. See the Dealing with large data sets section for how to save bundles during downloading.
If you want to save the list of downloaded bundles as an
.rda
or .RData
file, you can’t just use R’s
save()
or save_image()
on it, because this
will break the external pointers in the xml objects representing your
bundles. Instead, you have to serialize the bundles before saving and
unserialize them after loading. For single xml objects the package
xml2
provides serialization functions. For convenience,
however, fhircrackr
provides the functions
fhir_serialize()
and fhir_unserialize()
that
can be used directly on the bundles returned by
fhir_search()
:
#serialize bundles
serialized_bundles <- fhir_serialize(bundles = patient_bundles)
#have a look at them
head(serialized_bundles[[1]])
# [1] 58 0a 00 00 00 03
#create temporary directory for saving
temp_dir <- tempdir()
#save
save(serialized_bundles, file = paste0(temp_dir, "/bundles.rda"))
If you load this bundle again, you have to unserialize it before you can work with it:
#unserialize
bundles <- fhir_unserialize(bundles = serialized_bundles)
#have a look
bundles
# An object of class "fhir_bundle_list"
# [[1]]
# A fhir_bundle_xml object
# No. of entries : 20
# Self Link: http://hapi.fhir.org/baseR4/Patient
# Next Link: http://hapi.fhir.org/baseR4?_getpages=ce958386-53d0-4042-888c-cad53bf5d5a1 ...
#
# {xml_node}
# <Bundle>
# [1] <id value="ce958386-53d0-4042-888c-cad53bf5d5a1"/>
# [2] <meta>\n <lastUpdated value="2021-05-10T12:12:43.317+00:00"/>\n</meta>
# [3] <type value="searchset"/>
# [4] <link>\n <relation value="self"/>\n <url value="http://hapi.fhir.org/b ...
# [5] <link>\n <relation value="next"/>\n <url value="http://hapi.fhir.org/b ...
# [6] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837602"/ ...
# [7] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/example-r ...
# [8] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837624"/ ...
# [9] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837626"/ ...
# [10] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837631"/ ...
# [11] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837716"/ ...
# [12] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837720"/ ...
# [13] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837714"/ ...
# [14] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837721"/ ...
# [15] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837722"/ ...
# [16] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837723"/ ...
# [17] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837724"/ ...
# [18] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/cfsb16116 ...
# [19] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837736"/ ...
# [20] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837737"/ ...
# ...
#
# [[2]]
# A fhir_bundle_xml object
# No. of entries : 20
# Self Link: http://hapi.fhir.org/baseR4?_getpages=ce958386-53d0-4042-888c-cad53bf5d5a1 ...
# Next Link: http://hapi.fhir.org/baseR4?_getpages=ce958386-53d0-4042-888c-cad53bf5d5a1 ...
#
# {xml_node}
# <Bundle>
# [1] <id value="ce958386-53d0-4042-888c-cad53bf5d5a1"/>
# [2] <meta>\n <lastUpdated value="2021-05-10T12:12:43.317+00:00"/>\n</meta>
# [3] <type value="searchset"/>
# [4] <link>\n <relation value="self"/>\n <url value="http://hapi.fhir.org/b ...
# [5] <link>\n <relation value="next"/>\n <url value="http://hapi.fhir.org/b ...
# [6] <link>\n <relation value="previous"/>\n <url value="http://hapi.fhir.o ...
# [7] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837760"/ ...
# [8] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837766"/ ...
# [9] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837768"/ ...
# [10] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837781"/ ...
# [11] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837783"/ ...
# [12] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837784"/ ...
# [13] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837787"/ ...
# [14] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837788"/ ...
# [15] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837789"/ ...
# [16] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837790"/ ...
# [17] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837791"/ ...
# [18] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837792"/ ...
# [19] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837793"/ ...
# [20] <entry>\n <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837794"/ ...
# ...
After unserialization, the pointers are restored and you can continue
to work with the bundles. Note that the example bundles
medication_bundles
and patient_bundles
that
are provided with the fhircrackr
package are also provided
in their serialized form and have to be unserialized as described on
their help page.
If you want to store the bundles in xml files instead of R objects,
you can use the functions fhir_save()
and
fhir_load()
. fhir_save()
takes a list of
bundles in form of xml objects (as returned by
fhir_search()
) and writes them into the directory specified
in the argument directory
. Each bundle is saved as a
separate xml-file named by its index, e.g. 3.xml for the third
downloaded bundle. If the folder defined in directory
doesn’t exist, it is created in the current working directory.
To read bundles saved with fhir_save()
back into R, you
can use fhir_load()
:
fhir_load()
takes the name of the directory (or path to
it) as its only argument. All xml-files in this directory will be read
into R and returned as a list of bundles in xml format just as returned
by fhir_search()
.
If for some reason your bundles are stored as a character vector
containing xml strings (e.g. because you extracted them from somewhere
other than a FHIR API), you can use the function as_fhir()
to coerce this character vector to a fhir_bundle_list
:
#character vector containing fhir bundles
bundle_strings <- c(
"<Bundle>
<type value='searchset'/>
<entry>
<resource>
<Patient>
<id value='id1'/>
<name>
<given value='Marie'/>
</name>
</Patient>
</resource>
</entry>
</Bundle>",
"<Bundle>
<type value='searchset'/>
<entry>
<resource>
<Patient>
<id value='id2'/>
<name>
<given value='Max'/>
</name>
</Patient>
</resource>
</entry>
</Bundle>"
)
#convert to FHIR bundle list
bundles <- as_fhir(bundle_strings)
If you need to download a particularly large data set from a FHIR server this can lead to challenges in two areas: computation time and memory usage. Downloading the FHIR Bundles will be time consuming because the paging mechanism leading from one bundle to the next is not optimized for speed in most FHIR server implementations and neither is the execution of complex search queries. Keeping a lot of bundles in the working memory is memory consuming because the xml structures contain a lot of overhead that will be removed, once the relevant bits of information will be transferred into a table.
There are several options to alleviate these problems, a couple of which will be shown in the following.
_elements
If you know you are just going to need a few elements from each
resource, you can restrict the downloaded resources to those elements,
which will result in much smaller resources and thus much smaller
bundles. The following example downloads the first bundle of Patient
resources that are trimmed down to the name
,
gender
and birthDate
elements, which are
specified in the _elements
parameter.
_elements
takes a comma separated list of base level
elements for the resource and will make sure that the downloaded
resources only contain those elements plus the mandatory elements
id
and meta
. The _count
parameter
in the examples restricts the number of resources in the bundle to 2,
which is just done to make the result more printable for this
vignette.
request <- fhir_url(url = "http://hapi.fhir.org/baseR4",
resource = "Patient",
parameters = c("_elements" = "name,gender,birthDate",
"_count"= "2"))
bundles <- fhir_search(request, max_bundles = 1)
cat(toString(bundles[[1]]))
# <Bundle>
# <id value="8e0db3ce-817b-48cd-ba3e-1a0d20f64366"/>
# <meta>
# <lastUpdated value="2022-03-31T08:48:55.934+00:00"/>
# </meta>
# <type value="searchset"/>
# <link>
# <relation value="self"/>
# <url value="http://hapi.fhir.org/baseR4/Patient?_count=2&_elements=name%2Cgender%2CbirthDate"/>
# </link>
# <link>
# <relation value="next"/>
# <url value="http://hapi.fhir.org/baseR4?_getpages=8e0db3ce-817b-48cd-ba3e-1a0d20f64366&_getpagesoffset=2&_count=2&_pretty=true&_bundletype=searchset&_elements=birthDate,gender,name"/>
# </link>
# <entry>
# <fullUrl value="http://hapi.fhir.org/baseR4/Patient/2564886"/>
# <resource>
# <Patient>
# <id value="2564886"/>
# <meta>
# <versionId value="1"/>
# <lastUpdated value="2021-09-28T01:04:35.774+00:00"/>
# <source value="#rxRuwftRVG3erMwy"/>
# <tag>
# <system value="http://terminology.hl7.org/CodeSystem/v3-ObservationValue"/>
# <code value="SUBSETTED"/>
# <display value="Resource encoded in summary mode"/>
# </tag>
# </meta>
# <name>
# <text value="반영훈 사원"/>
# <family value="반"/>
# <given value="영훈"/>
# <prefix value="사원"/>
# </name>
# <gender value="male"/>
# <birthDate value="1992-01-12"/>
# </Patient>
# </resource>
# <search>
# <mode value="match"/>
# </search>
# </entry>
# <entry>
# <fullUrl value="http://hapi.fhir.org/baseR4/Patient/2564911"/>
# <resource>
# <Patient>
# <id value="2564911"/>
# <meta>
# <versionId value="1"/>
# <lastUpdated value="2021-09-28T01:12:59.207+00:00"/>
# <source value="#rmWF4JDz6p1WVwzl"/>
# <security>
# <system value="http://terminology.hl7.org/CodeSystem/v2-0203"/>
# <code value="RM"/>
# </security>
# <tag>
# <system value="http://terminology.hl7.org/CodeSystem/v2-0203sbtest05"/>
# <code value="SBTest05m"/>
# </tag>
# <tag>
# <system value="http://terminology.hl7.org/CodeSystem/v3-ObservationValue"/>
# <code value="SUBSETTED"/>
# <display value="Resource encoded in summary mode"/>
# </tag>
# </meta>
# <name>
# <use value="usual"/>
# <text value="human name"/>
# <family value="Jonathan"/>
# <given value="token_sort_test_data05"/>
# </name>
# <gender value="male"/>
# <birthDate value="2021-09-01"/>
# </Patient>
# </resource>
# <search>
# <mode value="match"/>
# </search>
# </entry>
# </Bundle>
As you can see, the resulting Bundle is much smaller than it would be if the full resources where downloaded.
You can spare working memory by saving the bundles to your hard drive
during the download instead of keeping them all in the working memory of
your R session at once. If you pass the name of a directory to the
argument save_to_disc
in your call to
fhir_search()
, the bundles will not be combined in a bundle
list that is returned when the downloading is done, but will instead be
saved as xml-files to the directory specified in the argument
directory
one by one. If the directory you specified
doesn’t exist yet, fhir_search()
will create it for you.
This way, the R session will only have to keep one bundle at a time in
the working memory. You can later load them using
fhir_load()
and crack them one after another:
Alternatively, you can also use fhir_next_bundle_url()
.
This function returns the url to the next bundle from you most recent
call to fhir_search()
:
To get a better overview, we can split this very long link along the
&
:
strsplit(fhir_next_bundle_url(), "&")
# [[1]]
# [1] "http://hapi.fhir.org/baseR4?_getpages=0be4d713-a4db-4c27-b384-b772deabcbc4"
# [2] "_getpagesoffset=200"
# [3] "_count=20"
# [4] "_pretty=true"
# [5] "_bundletype=searchset"
You can see two interesting numbers: _count=20
tells you
that the queried hapi server has a default bundle size of 20.
getpagesoffset=200
tells you that the bundle referred to in
this link starts after resource no. 200, which makes sense since the
fhir_search()
request above downloaded 10 bundles with 20
resources each, i.e. 200 resources. If you use this link in a new call
to fhir_search
, the download will start from this bundle
(i.e. the 11th bundle with resources 201-220) and will go on to the
following bundles from there.
When there is no next bundle (because all available resources have
been downloaded), fhir_next_bundle_url()
returns
NULL
.
If a download with fhir_search()
is interrupted due to a
server error somewhere in between, you can use
fhir_next_bundle_url()
to see where the download was
interrupted.
You can also use this function to avoid memory issues. The following
block of code utilizes fhir_next_bundle_url()
to download
all available Observation resources in small batches of 10 bundles that
are immediately cracked and saved before the next batch of bundles is
downloaded. Note that this example can be very time consuming if there
are a lot of resources on the server. To limit the number of iterations
uncomment the if
statement at the end of the
while
loop:
#Starting fhir search request
url <- fhir_url(
url = "http://hapi.fhir.org/baseR4",
resource = "Observation",
parameters = list("_count" = "500"))
count <- 0
table_description <- fhir_table_description(resource = "Observation")
while(!is.null(url)){
#load 10 bundles
bundles <- fhir_search(request = url, max_bundles = 10)
#crack bundles
dfs <- fhir_crack(bundles = bundles, design = table_description)
#save cracked bundle to RData-file (can be exchanged by other data type)
save(tables, file = paste0(tempdir(), "/table_", count, ".RData"))
#retrieve starting point for next 10 bundles
url <- fhir_next_bundle_url()
count <- count + 1
# if(count >= 20) {break}
}
In most cases the bottle neck in your analysis will be the download
time from the server, because most FHIR server are optimized for
handling a lot of simultaneous small requests instead of a single big
one. You can gain time by splitting up your request into chunks and
sending it to the server in parallel using a parallelized version of
lapply()
but there are couple of issues to keep in
mind.
The easiest to use version of parallelization is the function
parallel::mclapply()
which uses forking to process list
elements from a lapply()
call in parallel. As windows
doesn’t support forking, this solution only can only be used on osx or
linux operating systems. If you want to achieve similar results on a
windows machine, you can either run the fhircrackr in an R
installation/RStudio Server that you set up in WSL2 (see here
for an installation guide) or you can try out the windows mclapply
hack written by Nathan vanHoudnos.
The xml objects that represent the FHIR bundles contain external
pointers that will break when they are exported to/from a cluster. This
means that objects of type fhir_bundle
or
fhir_bundle_list
always have to be serialized using
fhir_serialize()
when they are downloaded in parallel.
Splitting up a FHIR request isn’t always trivial. We’ll show you two scenarios where you can split up a request into smaller chunks.
fhir_search()
but there is also a convenience function
for exactly that use case called
fhir_get_resources_by_ids()
. The following minimal example
of course only works if the ids defined here are actually found on the
server:# define list of Patient resource ids
ids <- c("4b7736c3-c005-4383-bf7c-99710811efd9", "bef39d3a-62bb-48c0-83ff-3bb70b51d831",
"f371ed2f-5cb0-4093-a491-9df6e6bfcdf2", "277c4631-955e-4b52-bd40-78ddcde333b1",
"72173a13-d32f-4489-a7b4-dfc301df087f", "4a97acec-028e-4b45-a72f-2b7e08cf80ba")
#split into smaller chunks of 2
id_list <- split(ids, ceiling(seq_along(ids)/2))
#Define function that downloads one chunk of patients and serializes the result
extract_and_serialize <- function(x){
b <- fhir_get_resources_by_ids(base_url = "http://hapi.fhir.org/baseR4",
resource = "Patient",
ids = x)
fhir_serialize(b)
}
#Download using 2 cores on linux:
bundles_serialized <- parallel::mclapply(
X = pat_list,
FUN = extract_and_serialize,
mc.cores = 2
)
#Unserialize the resulting list and create one fhir_bundle_list object from it
bundles_unserialized <- lapply(bundles_serialized, fhir_unserialize)
result <- fhir_bundle_list(unlist(bundles_unserialized, recursive = FALSE))
"http://hapi.fhir.org/baseR4/Encounter?_include=Encounter:patient"
,
which downloads all Encounters as well as the Patient resources the
Encounter is referencing. This type of request will often take a lot of
time and can (depending on your system) be sped up if you only load the
encounters in a first step, extract the ids of the referenced Patient
resources and download those in parallel in a second step:#Download all Encounters
encounter_bundles <- fhir_search(request = "http://hapi.fhir.org/baseR4/Encounter")
#Flatten
encounter_table <- fhir_crack(
bundles = encounter_bundles,
design = fhir_table_description(resource = "Encounter")
)
#Extract Patient ids
pat_ids <- sub("Patient/", "", encounter_table$subject.reference)
#Split into chunks of 20
pat_id_list <- split(pat_ids, ceiling(seq_along(pat_ids)/20))
#Define function that downloads one chunk and serializes the result
extract_and_serialize <- function(x){
b <- fhir_get_resources_by_ids(base_url = "http://hapi.fhir.org/baseR4",
resource = "Patient",
ids = x)
fhir_serialize(b)
}
#Download using 4 cores on linux:
bundles_serialized <- parallel::mclapply(
X = pat_id_list,
FUN = extract_and_serialize,
mc.cores = 4
)
#Unserialize the resulting list and create one fhir_bundle_list object from it
bundles_unserialized <- lapply(bundles_serialized, fhir_unserialize)
result <- fhir_bundle_list(unlist(bundles_unserialized, recursive = FALSE))
Sometimes it can be useful to download a random sample of resources
from a server. The fhircrackr offers a function
fhir_sample_resources()
which takes a base url, a resource
type and (optionally) some FHIR Search parameters and returns a random
sample with a given size of those resources. For example you could
download 10 random Patient resources of all female patients born before
1960 like this:
bundle <- fhir_sample_resources(
base_url = "http://hapi.fhir.org/baseR4",
resource = "Patient",
parameters = c(gender = "female", birthdate = "lt1960-01-01"),
sample_size = 10
)
This request may take some time because in the first step, the resource (aka logical) IDs of all resources matching the request (i.e. all Patient resources of females born before 1960) are downloaded. This is necessary because the sampling is actually done in this vector of resource IDs.
The following code shows that the result is actually 10 Patient resources who are female and born before 1960. If you want to know more about how to extract information from the resources like this, please see the vignette on flattening resources.
pat <- fhir_table_description(resource = "Patient",
cols = c("id", "gender", "birthDate"))
fhir_crack(bundles = bundle, design = pat)
# Cracking 1 Patients' Bundle on a WINDOWS-Engine using 1/1 CPU ...
# finished.
# id gender birthDate
# 1 15c8f95d-3eb3-4849-8ac7-c5675404baf4 female 1953-06-09
# 2 1160954 female 1934-04-11
# 3 2060404 female 1935-12-09
# 4 2060585 female 1935-12-09
# 5 2062708 female 1935-12-09
# 6 2365237 female 1924-10-10
# 7 2885700 female 1940-08-08
# 8 2930446 female 1833-05-29
# 9 2932864 female 1830-04-27
# 10 2932964 female 1866-08-22
Internally fhir_sample_resources()
performs the
following steps:
Extract the logical IDs of all resources matching the resource
type and search parameters given in resource
and
parameters
with the function
fhir_get_resource_ids()
. This function uses the
_elements
parameter of FHIR Search to avoid downloading all
resources in full and you can use this function as a standalone function
too, see ?fhir_get_resource_ids()
.
Draw a random sample (without replacement) from the vector of IDs created in 1).
Download the resources belonging to the sampled IDs using
fhir_get_resources_by_ids()
If you want to sample resources based on another element then the
logical ID, e.g. based on an identifier value or based on a reference,
you can use the function fhir_sample_resources_by_ids()
provided you have a vector of identifiers/references you want to sample
from. Note that in this case the number of actually returned resources
won’t necessarily match the number in sample_size
, because
as opposed to the logical ID, an identifier or reference doesn’t have to
be unique for each resource.
The
capability
statement documents a set of capabilities (behaviors) of a FHIR
Server for a particular version of FHIR. You can download this statement
using the function fhir_capability_statement()
:
fhir_capability_statement()
takes the base URL of a FHIR
server and returns a list of three data frames containing all
information from the capability statement of this server. The first one
is called Meta
and contains some general server
information. The second is called Rest
and contains
information on the operations the server implements. The third is called
Resources
and gives information on the resource types and
associated parameters the server supports. This information can be
useful to determine, for example, which FHIR search parameters are
implemented in you FHIR server.
FHIR resources can contain a considerable amount of HTML code
(e.g. in a narrative object), which is often created by the server for
example to provide a human-readable summary of the resource. This data
is usually not the aim of structured statistical analysis, so in the
default setting fhir_search()
will remove the html parts
immediately after download to reduce memory usage (on a hapi server
typically by around 30%, see fhir_rm_div()
). The memory
gain is payed with a runtime increase of 10%-20%. The html removal can
be disabled by setting rm_tag = NULL to increase speed at the cost of
increased memory usage.
To learn about how fhircrackr
allows you to convert the
downloaded FHIR resources into data.frames/data.tables, see the vignette
on flattening FHIR resources.