arxiv:2501.04575
huxueyu
huxueyu
AI & ML interests
Large Language Models
Recent Activity
upvoted
a
paper
about 22 hours ago
AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios
submitted
a paper
about 22 hours ago
AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios
upvoted
a
paper
about 22 hours ago
SWE-Universe: Scale Real-World Verifiable Environments to Millions