WildClawBench

About WildClawBench

WildClawBench []() []() Hard, practical, end-to-end evaluation for AI agents — in the wild. WildClawBench is an agent benchmark that tests what actually matters: can an AI agent do real work, end-to-end, without hand-holding? We drop agents into a live OpenClaw environment — the same open-source personal AI assistant that real users rely on daily — and throw 60 original tasks at them: clipping goal highlights from a football match, negotiating meeting times over multi-round emails, hunting down contradictions in search results, writing inference scripts for undocumented codebases, catching...

agentic-ai agentic-evaluation agents benchmarks openclaw

Quick Facts

Stars	452
Forks	45
Language	Python
Category	Codex Skill
License	MIT
Quality Score	71.2414881444168/100
Open Issues	5
Last Updated	2026-06-25
Created	2026-03-23
Platforms	python
Est. Tokens	~18k

Compatible Skills

These tools work well together with WildClawBench for enhanced workflows:

team-tasks — semantic(0.16)+complementary+same_lang+similar_pop+shared_platform (56%)
AEnvironment — semantic(0.30)+complementary+same_lang+similar_pop+shared_platform (56%)
get-physics-done — semantic(0.16)+complementary+same_lang+similar_pop+shared_platform (55%)
agent-builder — semantic(0.15)+complementary+same_lang+similar_pop+shared_platform (55%)

More Codex Skill Tools

Explore other popular codex skill tools:

openclaw ⭐ 382.1k
hermes-agent ⭐ 211.2k
ui-ux-pro-max-skill ⭐ 101.6k
graphify ⭐ 79.9k
graphify ⭐ 77.1k
open-design ⭐ 76.1k
agent-skills ⭐ 71.4k
career-ops ⭐ 59.1k
taste-skill ⭐ 56.5k
system_prompts_leaks ⭐ 53.5k

View all Codex Skill tools →

Popular Python Agent Tools

hermes-agent ⭐ 211.2k · Codex Skill
AutoGPT ⭐ 185.4k · Agent Tool
skills ⭐ 157.3k · Claude Skill
langflow ⭐ 151.3k · Agent Tool
open-webui ⭐ 143.9k · MCP Server

Frequently Asked Questions

What is WildClawBench?

WildClawBench is An in-the-wild benchmark for AI agents in the OpenClaw Environment.. It is categorized as a Codex Skill with 452 GitHub stars.

What programming language is WildClawBench written in?

WildClawBench is primarily written in Python. It covers topics such as agentic-ai, agentic-evaluation, agents.

How do I install or use WildClawBench?

You can find installation instructions and usage details in the WildClawBench GitHub repository at github.com/InternLM/WildClawBench. The project has 452 stars and 45 forks, indicating an active community.

What license does WildClawBench use?

WildClawBench is released under the MIT license, making it free to use and modify according to the license terms.

View on GitHub → Browse Codex Skill tools