
3
presentations
Presentations

DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories
Jia Li and 19 other authors

Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models
Yihong Dong and 6 other authors

PACE: Improving Prompt with Actor-Critic Editing for Large Language Model
Yihong Dong and 4 other authors