Frictionless Science: The Trolley Dilemma

Behavioral researchers increasingly use large language models (LLMs) to simulate human judgments. This vignette runs a classical moral philosophy experiment, the Trolley Dilemma, with the LLMR package. It skips the single-call chat functions and goes straight to a vectorized experimental design built with llm_mutate().

For this demonstration, we utilize an open-weights model provided via the Groq API.

Designing the Experiment

We construct a fundamental stimulus set representing two standard variants of the Trolley Dilemma.

dilemmas <- tibble::tibble(
  condition = c("Switch", "Footbridge"),
  scenario = c(
    "A runaway trolley is heading down the tracks toward five workers who will be killed. You are standing next to a switch. If you pull the switch, the trolley will be diverted onto a side track where it will kill one worker. Do you pull the switch?",
    "A runaway trolley is heading toward five workers. You are standing on a footbridge above the tracks next to a large stranger. If you push the stranger onto the tracks below, his mass will stop the trolley, saving the five workers but killing the stranger. Do you push the stranger?"
  )
)

Vectorised Execution with Soft Structuring

To extract the model’s decisions, we call llm_mutate(). Rather than imposing a rigid JSON schema, which some inference endpoints handle poorly, we ask the model to mark its answer with simple XML-like tags. Tags place fewer demands on the provider than schema validation, so the same prompt works across a wider range of endpoints.

experiment_results <- dilemmas |>
  llm_mutate(
    response = c(
      system = "You are a participant in a moral psychology experiment. Read the scenario and provide a definitive YES or NO decision, followed by a brief rationale. Enclose your decision in <decision>...</decision> tags and your reasoning in <rationale>...</rationale> tags.",
      user = "{scenario}"
    ),
    .config = cfg,
    .tags = c("decision", "rationale")
  )

By specifying the .tags argument, LLMR automatically parses the response string and appends the extracted content as distinct columns in the original dataset.

experiment_results |>
  select(condition, decision, rationale) |>
  print(n = Inf)

Conclusion

The example shows the pattern LLMR is built for. The researcher defines the conditions in a data frame, writes one prompt, and receives a structured dataset ready for statistical analysis. The tag parsing and the iteration over rows are handled by llm_mutate(), so no explicit loop or string-parsing code is needed.