Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
In this tutorial, we chart a practical path from raw capability to trustworthy reasoning with foundation models. We begin by motivating why trustworthy reasoning is essential: when models bluff multiplications or invent drug interactions, their value collapses and risks increase. We adopt four pillars of trustworthiness, i.e., capability, safety, robustness, and explainability, as the organizing framework for the entire session.
In Part I, we trace the evolution from early language models to today’s foundation models that produce extended chains of thought and act in the world. Through concrete case studies, we dissect jailbreaks, hallucinations, and brittle logic, and we connect these failure modes to regulatory pressure such as the EU AI Act. The takeaway is clear: we must design for trustworthy reasoning from the outset, especially in high-stakes domains such as clinical or financial decision-making.
In Part II, we move from leaderboards to a science of measurement. We show how to build reliable, valid evaluations using psychometric tools, including item response theory, amortized evaluation, and predictability analysis. We implement three open-source pipelines hands-on: TruthfulQA for hallucination detection, HellaSwag for robustness testing, and MATH with formal-verification hooks in Lean4. Along the way, we demonstrate red-teaming stress tests and reasoning-trace metrics that surface subtle errors leaderboards miss, and we practice calibration, dataset curation, and transparent reporting for honest progress tracking.
In Part III, we deliver a compact methodology for trustworthy machine reasoning. We cover training-free prompting methods (chain-of-thought, retrieval-augmented generation, constrained decoding), post-training algorithms (supervised fine-tuning, RLHF, verifiable rewards, self-reward), and test-time techniques (self-consistency, reflection, tree search, tool-augmented verification). We introduce guardrails—safe sampling and semantic filters—that reduce risk without crippling capability. For each technique, we map effects to the four pillars, highlight trade-offs and failure signatures, and summarize when to combine methods for maximum leverage.
In Part IV, we turn to deployment. We walk through real-world agents and workflows, e.g., Lean4-based code verification assistants and bioinformatics pipelines proposing candidate compounds. We share step-by-step recipes, failure checklists, and diagnostics so participants can preserve trust while shipping. We also outline governance artifacts—risk registers, evaluation cards, and incident playbooks—that align technical practice with policy expectations.
We emphasize open, reproducible assets and decision rubrics that translate research into dependable products. Our goal is simple: help you move from compelling demos to trustworthy systems that earn and deserve user trust.
All materials will be available at: https://trustworthy-machine-reasoning.github.io/
