Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
We introduce PandemIQ Llama, a domain-adapted large language model (LLM) designed specifically for pandemic intelligence applications. Building on the pre-trained Llama-3.1-8B model, we conducted continuous training using our curated Pandemic Corpus. This dataset was assembled from authoritative public health sources, scientific literature, and specialized knowledge repositories, comprising 508,924 documents totaling 5.8 billion tokens, which is the largest pandemic domain specific data cohort for LLM training. Benefited from our curated large data cohorts and through continuous training leveraging extensive computational resources, the developed PandemIQ Llama model can extract critical domain knowledge on pandemic, which is typically underrepresented in general-purpose language models, To evaluate its performance, we conducted comprehensive comparison of PandemIQ Llama with both prompt-engineered and task-specific fine-tuned baseline models using two tasks: the Biomedical Alert News Question Answering task (1,508 reports with 30 expert-generated questions each) and the Disease Event Type Classification benchmark (4,500 news snippets across eight disease categories). PandemIQ Llama demonstrated substantial improvements over strong baseline models, achieving performance gains ranging from 3.8% to 10.97%. These results suggest that PandemIQ Llama could significantly enhance public health surveillance and analysis capabilities. In addition, our result also suggests that the LLMs can perform better with continuous training than fine-tuning on domain specific tasks. Social Impact: This model will be integrated with Epidemic Intelligence from Open Sources (EIOS) run by World Health Organization (WHO). This integration will empower a large community of decision makers and stakeholders in all WHO member countries with the first LLM-based AI tool for pandemic surveillance.