Testing Skills

Complete guide to testing Skill Engine skills.

Overview

Testing skills ensures they work correctly across different:

Runtimes: WASM, Docker, Native
Environments: Development, staging, production
Inputs: Valid, invalid, edge cases
Integrations: External APIs and services

Testing Strategies

1. Manual Testing

Quick validation during development

2. Unit Testing

Test individual tool functions

3. Integration Testing

Test full skill execution

4. End-to-End Testing

Test with real AI agents (Claude Code)

Manual Testing

Basic Execution

bash

# Test skill installation
skill install ./my-skill

# List available tools
skill list my-skill

# Test tool execution
skill run my-skill my-tool --param value

# Check output
echo $?  # Exit code (0 = success)

Dry Run Mode

bash

# Preview without executing
skill run --dry-run my-skill my-tool --param value

# Shows what would be executed

Debug Mode

bash

# Enable debug logging
export SKILL_LOG_LEVEL=debug
skill run my-skill my-tool --param value

# Rust-level debugging
export RUST_LOG=skill_runtime=trace
skill run my-skill my-tool

Test Different Instances

bash

# Test development instance
skill run my-skill:dev my-tool

# Test staging instance
skill run my-skill:staging my-tool

# Test production instance
skill run my-skill:prod my-tool

Unit Testing

WASM Skills (JavaScript/TypeScript)

typescript

// src/skill.test.ts
import { describe, it, expect } from 'vitest';
import { execute } from './skill';

describe('MySkill', () => {
  it('should process valid input', async () => {
    const result = await execute({
      tool: 'process',
      parameters: {
        input: 'test data'
      }
    });

    expect(result.status).toBe('success');
    expect(result.output).toContain('processed');
  });

  it('should reject invalid input', async () => {
    await expect(
      execute({
        tool: 'process',
        parameters: {
          input: ''  // Empty input
        }
      })
    ).rejects.toThrow('Input required');
  });

  it('should handle edge cases', async () => {
    const result = await execute({
      tool: 'process',
      parameters: {
        input: 'a'.repeat(10000)  // Large input
      }
    });

    expect(result.status).toBe('success');
  });
});

Run tests:

bash

npm test

Native Skills (Node.js)

javascript

// skill.test.js
const { execute } = require('./skill');

describe('Kubernetes Skill', () => {
  beforeEach(() => {
    // Mock kubectl command
    process.env.KUBECTL = 'echo';
  });

  test('get pods should return valid JSON', async () => {
    const result = await execute({
      tool: 'get',
      parameters: {
        resource: 'pods',
        namespace: 'default'
      }
    });

    expect(result).toBeInstanceOf(Object);
    expect(result.items).toBeInstanceOf(Array);
  });

  test('should validate resource names', async () => {
    await expect(
      execute({
        tool: 'get',
        parameters: {
          resource: 'invalid-resource',
          namespace: 'default'
        }
      })
    ).rejects.toThrow();
  });
});

Run tests:

bash

npm test
# or
node --test skill.test.js

Rust Skills

rust

// src/lib.rs
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_execute_valid_input() {
        let params = ToolParameters {
            param1: "value1".to_string(),
            param2: Some("value2".to_string()),
        };

        let result = execute_tool("my-tool", params);
        assert!(result.is_ok());
        assert_eq!(result.unwrap().status, "success");
    }

    #[test]
    fn test_execute_invalid_input() {
        let params = ToolParameters {
            param1: "".to_string(),  // Invalid
            param2: None,
        };

        let result = execute_tool("my-tool", params);
        assert!(result.is_err());
    }

    #[tokio::test]
    async fn test_async_execution() {
        let result = execute_async_tool("api-call").await;
        assert!(result.is_ok());
    }
}

Run tests:

bash

cargo test

Integration Testing

Test Script

bash

#!/bin/bash
# test-skill.sh
set -e

echo "Testing skill installation..."
skill install ./my-skill

echo "Testing tool execution..."
result=$(skill run my-skill my-tool --input "test")

if [[ "$result" == *"success"* ]]; then
  echo "✓ Test passed"
  exit 0
else
  echo "✗ Test failed"
  exit 1
fi

Test Manifest

toml

# test-manifest.toml
version = "1"

[skills.test-skill]
source = "./my-skill"
runtime = "wasm"

[skills.test-skill.instances.test]
config.base_url = "http://localhost:8000"
env.LOG_LEVEL = "debug"
capabilities.network_access = true

Run integration test:

bash

skill --manifest test-manifest.toml run test-skill:test my-tool

Automated Integration Tests

bash

#!/bin/bash
# integration-tests.sh
set -e

# Setup
export MANIFEST=test-manifest.toml
skill --manifest $MANIFEST validate

# Test each tool
echo "Testing tool1..."
skill --manifest $MANIFEST run test-skill tool1 --param1 value1

echo "Testing tool2..."
skill --manifest $MANIFEST run test-skill tool2 --param2 value2

# Test error handling
echo "Testing error cases..."
if skill --manifest $MANIFEST run test-skill invalid-tool 2>/dev/null; then
  echo "✗ Should have failed"
  exit 1
fi

echo "✓ All integration tests passed"

End-to-End Testing

Claude Code Testing

Test with actual AI agent:

bash

# 1. Generate Claude Agent Skills
skill claude-bridge generate --skill my-skill

# 2. Start Claude Code
claude

# 3. Test with prompts
> Use my-skill to process data
> List available tools in my-skill
> Execute my-skill tool with parameters

Validation checklist:

[ ] Skill appears in Claude's context
[ ] Tools are discovered correctly
[ ] Parameters are validated
[ ] Execution succeeds
[ ] Output is formatted properly
[ ] Errors are handled gracefully

MCP Testing

bash

# Start MCP server
skill serve &
SERVER_PID=$!

# Test MCP protocol
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | \
  skill serve

# Test tool execution via MCP
echo '{
  "jsonrpc":"2.0",
  "id":2,
  "method":"tools/call",
  "params":{
    "name":"skill-engine/execute",
    "arguments":{
      "skill_name":"my-skill",
      "tool_name":"my-tool",
      "parameters":{"input":"test"}
    }
  }
}' | skill serve

# Cleanup
kill $SERVER_PID

Docker Skill Testing

Test Docker Configuration

bash

# Test docker configuration
cat > test-docker-skill.toml << 'EOF'
version = "1"

[skills.test-docker]
source = "docker:alpine:latest"
runtime = "docker"

[skills.test-docker.docker]
image = "alpine:latest"
entrypoint = "echo"
EOF

# Test execution
skill --manifest test-docker-skill.toml run test-docker "Hello"

Volume Mounting Test

bash

# Create test file
echo "test data" > /tmp/test-input.txt

# Test volume mounting
skill run docker-skill process \
  --input /input/test-input.txt \
  --output /output/result.txt

# Verify output
cat /tmp/test-output.txt

Mocking and Stubbing

Mock External APIs

javascript

// test-helpers.js
export function mockGitHubAPI() {
  return {
    getIssues: jest.fn().mockResolvedValue([
      { id: 1, title: 'Test issue' }
    ]),
    createIssue: jest.fn().mockResolvedValue({
      id: 2,
      title: 'New issue'
    })
  };
}

// skill.test.js
import { mockGitHubAPI } from './test-helpers';

test('should create issue', async () => {
  const api = mockGitHubAPI();

  const result = await execute({
    tool: 'create-issue',
    parameters: { title: 'Bug' }
  });

  expect(api.createIssue).toHaveBeenCalled();
});

Mock Native Commands

javascript

// For native skills wrapping CLI tools
jest.mock('child_process', () => ({
  execFile: jest.fn((cmd, args, callback) => {
    // Mock kubectl output
    if (cmd === 'kubectl' && args[0] === 'get') {
      callback(null, JSON.stringify({
        items: [{ name: 'pod-1' }]
      }), '');
    }
  })
}));

Test Data

Fixtures

tests/
├── fixtures/
│   ├── valid-input.json
│   ├── invalid-input.json
│   ├── large-input.json
│   └── edge-cases.json
└── skill.test.ts

Load in tests:

typescript

import validInput from './fixtures/valid-input.json';

test('should handle valid input', async () => {
  const result = await execute({
    tool: 'process',
    parameters: validInput
  });

  expect(result.status).toBe('success');
});

Test Manifests

tests/
├── manifests/
│   ├── minimal.toml
│   ├── full-features.toml
│   └── multi-instance.toml
└── integration.test.sh

Continuous Integration

GitHub Actions

yaml

# .github/workflows/test.yml
name: Test Skills
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install Skill Engine
        run: |
          curl -fsSL https://dqkbk9o7ynwhxfjx.public.blob.vercel-storage.com/install.sh | sh
          echo "$HOME/.skill/bin" >> $GITHUB_PATH

      - name: Run unit tests
        run: npm test

      - name: Run integration tests
        run: ./tests/integration-tests.sh

      - name: Validate manifest
        run: skill validate

      - name: Test skill execution
        run: |
          skill install ./my-skill
          skill run my-skill my-tool --input test

GitLab CI

yaml

# .gitlab-ci.yml
test:
  image: ubuntu:22.04
  before_script:
    - curl -fsSL https://install-url.com/install.sh | sh
    - export PATH="$HOME/.skill/bin:$PATH"
  script:
    - npm test
    - ./tests/integration-tests.sh
    - skill validate

Performance Testing

Measure Execution Time

bash

#!/bin/bash
# benchmark.sh

echo "Running performance tests..."

for i in {1..100}; do
  start=$(date +%s%N)
  skill run my-skill my-tool --input "test $i"
  end=$(date +%s%N)

  duration=$((($end - $start) / 1000000))  # Convert to ms
  echo "$i,$duration" >> results.csv
done

echo "Average: $(awk -F, '{sum+=$2; count++} END {print sum/count}' results.csv) ms"

Load Testing

bash

#!/bin/bash
# load-test.sh

# Concurrent executions
for i in {1..10}; do
  skill run my-skill my-tool --input "test $i" &
done

wait
echo "Completed 10 concurrent executions"

Benchmark with hyperfine

bash

# Install hyperfine
cargo install hyperfine

# Benchmark skill execution
hyperfine 'skill run my-skill my-tool --input test' \
  --warmup 3 \
  --min-runs 10

# Compare runtimes
hyperfine \
  'skill run my-skill:wasm my-tool' \
  'skill run my-skill:docker my-tool' \
  'skill run my-skill:native my-tool'

Coverage

JavaScript/TypeScript

bash

# Using vitest
npm test -- --coverage

# Using c8
c8 npm test

Rust

bash

# Using tarpaulin
cargo install cargo-tarpaulin
cargo tarpaulin --out Html --output-dir coverage

Best Practices

1. Test All Tool Parameters

typescript

describe('Tool parameters', () => {
  it('should handle required parameters', async () => {
    const result = await execute({
      tool: 'my-tool',
      parameters: { required_param: 'value' }
    });
    expect(result.status).toBe('success');
  });

  it('should reject missing required parameters', async () => {
    await expect(
      execute({
        tool: 'my-tool',
        parameters: {}
      })
    ).rejects.toThrow('required_param is required');
  });

  it('should use default for optional parameters', async () => {
    const result = await execute({
      tool: 'my-tool',
      parameters: { required_param: 'value' }
    });
    // Optional param should use default
  });
});

2. Test Error Handling

typescript

describe('Error handling', () => {
  it('should handle network errors', async () => {
    // Mock network failure
    mockAPI.mockRejectedValue(new Error('Network error'));

    await expect(
      execute({ tool: 'api-call', parameters: {} })
    ).rejects.toThrow('Network error');
  });

  it('should handle timeouts', async () => {
    // Mock slow response
    mockAPI.mockImplementation(() =>
      new Promise(resolve => setTimeout(resolve, 60000))
    );

    await expect(
      execute({ tool: 'slow-operation', parameters: {} })
    ).rejects.toThrow('timeout');
  });
});

3. Test Edge Cases

typescript

describe('Edge cases', () => {
  it('should handle empty input', async () => {
    const result = await execute({
      tool: 'process',
      parameters: { input: '' }
    });
    expect(result.output).toBe('');
  });

  it('should handle very large input', async () => {
    const result = await execute({
      tool: 'process',
      parameters: { input: 'a'.repeat(1000000) }
    });
    expect(result.status).toBe('success');
  });

  it('should handle special characters', async () => {
    const result = await execute({
      tool: 'process',
      parameters: { input: '!@#$%^&*()' }
    });
    expect(result.status).toBe('success');
  });
});

4. Use Test Fixtures

typescript

import * as fs from 'fs';
import * as path from 'path';

function loadFixture(name: string): any {
  const fixturePath = path.join(__dirname, 'fixtures', `${name}.json`);
  return JSON.parse(fs.readFileSync(fixturePath, 'utf-8'));
}

test('should handle complex input', async () => {
  const input = loadFixture('complex-input');
  const result = await execute({
    tool: 'process',
    parameters: input
  });
  expect(result.status).toBe('success');
});

5. Clean Up After Tests

typescript

afterEach(async () => {
  // Clean up test data
  await cleanup();
});

afterAll(async () => {
  // Close connections
  await closeConnections();
});

Troubleshooting Tests

Tests Fail Intermittently

Cause: Race conditions, timing issues

Solution:

typescript

// Add proper async handling
await Promise.all([
  operation1(),
  operation2()
]);

// Add timeouts
await waitFor(() => expect(element).toBeInTheDocument(), {
  timeout: 5000
});

Tests Pass Locally, Fail in CI

Cause: Environment differences

Solution:

yaml

# Ensure consistent environment
env:
  NODE_ENV: test
  SKILL_LOG_LEVEL: error

Slow Test Suite

Cause: Not using parallelization

Solution:

bash

# Run tests in parallel
npm test -- --parallel

# Rust tests in parallel (default)
cargo test

Skill Development - Creating skills
Environment Variables - Test configuration
Security Model - Security testing
CLI Reference - Test commands

External Resources

Vitest - Fast unit testing
Jest - JavaScript testing framework
cargo-tarpaulin - Rust coverage
hyperfine - Benchmarking tool

Testing Skills ​

Overview ​

Testing Strategies ​

1. Manual Testing ​

2. Unit Testing ​

3. Integration Testing ​

4. End-to-End Testing ​

Manual Testing ​

Basic Execution ​

Dry Run Mode ​

Debug Mode ​

Test Different Instances ​

Unit Testing ​

WASM Skills (JavaScript/TypeScript) ​

Native Skills (Node.js) ​

Rust Skills ​

Integration Testing ​

Test Script ​

Test Manifest ​

Automated Integration Tests ​

End-to-End Testing ​

Claude Code Testing ​

MCP Testing ​

Docker Skill Testing ​

Test Docker Configuration ​

Volume Mounting Test ​

Mocking and Stubbing ​

Mock External APIs ​

Mock Native Commands ​

Test Data ​

Fixtures ​

Test Manifests ​

Continuous Integration ​

GitHub Actions ​

GitLab CI ​

Performance Testing ​

Measure Execution Time ​

Load Testing ​

Benchmark with hyperfine ​

Coverage ​

JavaScript/TypeScript ​

Rust ​

Best Practices ​

1. Test All Tool Parameters ​

2. Test Error Handling ​

3. Test Edge Cases ​

4. Use Test Fixtures ​

5. Clean Up After Tests ​

Troubleshooting Tests ​

Tests Fail Intermittently ​

Tests Pass Locally, Fail in CI ​

Slow Test Suite ​

Related Documentation ​

External Resources ​

Testing Skills

Overview

Testing Strategies

1. Manual Testing

2. Unit Testing

3. Integration Testing

4. End-to-End Testing

Manual Testing

Basic Execution

Dry Run Mode

Debug Mode

Test Different Instances

Unit Testing

WASM Skills (JavaScript/TypeScript)

Native Skills (Node.js)

Rust Skills

Integration Testing

Test Script

Test Manifest

Automated Integration Tests

End-to-End Testing

Claude Code Testing

MCP Testing

Docker Skill Testing

Test Docker Configuration

Volume Mounting Test

Mocking and Stubbing

Mock External APIs

Mock Native Commands

Test Data

Fixtures

Test Manifests

Continuous Integration

GitHub Actions

GitLab CI

Performance Testing

Measure Execution Time

Load Testing

Benchmark with hyperfine

Coverage

JavaScript/TypeScript

Rust

Best Practices

1. Test All Tool Parameters

2. Test Error Handling

3. Test Edge Cases

4. Use Test Fixtures

5. Clean Up After Tests

Troubleshooting Tests

Tests Fail Intermittently

Tests Pass Locally, Fail in CI

Slow Test Suite

Related Documentation

External Resources