Skip to content

Conversation

@NathanSavageKaimai
Copy link
Contributor

@NathanSavageKaimai NathanSavageKaimai commented Jan 22, 2026

feat: Add OpenTelemetry instrumentation support

Summary

This PR introduces comprehensive OpenTelemetry (OTel) instrumentation support for Crawlee, enabling users to trace and monitor their crawlers with industry-standard observability tools. The implementation includes a new @crawlee/otel package that provides automatic instrumentation for core crawler operations, manual span wrapping utilities, and log forwarding capabilities.

What's New

New Package: @crawlee/otel

A dedicated OpenTelemetry instrumentation package that integrates seamlessly with Crawlee crawlers:

  • Automatic Instrumentation: Automatically instruments core crawler methods including:

    • BasicCrawler.run(), _runTaskFunction(), _requestFunctionErrorHandler(), _handleFailedRequestHandler(), _executeHooks()
    • BrowserCrawler._handleNavigation(), _runRequestHandler()
    • HttpCrawler._handleNavigation(), _runRequestHandler()
  • Manual Instrumentation: wrapWithSpan() utility function for wrapping custom handlers, hooks, and error handlers with OpenTelemetry spans

  • Log Forwarding: Automatic forwarding of Crawlee logs to OpenTelemetry logs with proper severity level mapping

  • Custom Instrumentation: Support for instrumenting custom class methods with configurable span names and attributes

Key Features

  1. Zero-Configuration Automatic Instrumentation: Works out of the box with sensible defaults
  2. Flexible Configuration: Enable/disable specific instrumentation types, add custom instrumentation
  3. Type-Safe API: Full TypeScript support with proper type inference
  4. Performance Optimized: Minimal overhead with efficient span creation
  5. Error Handling: Proper error recording and status codes in spans
  6. Context Propagation: Automatic context propagation across async boundaries

Implementation Details

Core Components

  • CrawleeInstrumentation: Main instrumentation class extending OpenTelemetry's InstrumentationBase
  • wrapWithSpan(): Utility function for wrapping functions with spans, supporting dynamic span names and attributes
  • Module Definition Builder: Groups and validates instrumentation methods by module
  • Log Patch: Integrates with @apify/log to forward logs to OpenTelemetry

Instrumented Methods

The following methods are automatically instrumented when requestHandlingInstrumentation is enabled:

Crawler Method Span Name
BasicCrawler run crawlee.crawler.run
BasicCrawler _runTaskFunction crawlee.crawler.runTaskFunction
BasicCrawler _requestFunctionErrorHandler crawlee.crawler.requestFunctionErrorHandler
BasicCrawler _handleFailedRequestHandler crawlee.crawler.handleFailedRequestHandler
BasicCrawler _executeHooks crawlee.crawler.executeHooks
BrowserCrawler _handleNavigation crawlee.browser.handleNavigation
BrowserCrawler _runRequestHandler crawlee.browser.runRequestHandler
HttpCrawler _handleNavigation crawlee.http.handleNavigation
HttpCrawler _runRequestHandler crawlee.http.runRequestHandler

Request handler spans automatically include attributes:

  • crawlee.request.id
  • crawlee.request.url
  • crawlee.request.method
  • crawlee.request.retry_count

Usage Examples

Basic Setup

import { NodeSDK } from '@opentelemetry/sdk-node';
import { CrawleeInstrumentation } from '@crawlee/otel';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';

const crawleeInstrumentation = new CrawleeInstrumentation();

const sdk = new NodeSDK({
    instrumentations: [crawleeInstrumentation],
    // ... other config
});

sdk.start();

Manual Span Wrapping

import { wrapWithSpan } from '@crawlee/otel';
import { context, trace } from '@opentelemetry/api';

const crawler = new CheerioCrawler({
    requestHandler: wrapWithSpan(
        async ({ request, $ }) => {
            const span = trace.getSpan(context.active());
            const title = $('title').text();
            
            if (span) {
                span.setAttribute('page.title', title);
            }
        },
        {
            spanName: ({ request }) => `scrape ${request.url}`,
            spanOptions: ({ request }) => ({
                attributes: {
                    'crawlee.request.url': request.url,
                },
            }),
        },
    ),
});

Custom Instrumentation

const crawleeInstrumentation = new CrawleeInstrumentation({
    requestHandlingInstrumentation: false,
    customInstrumentation: [
        {
            moduleName: '@crawlee/basic',
            className: 'BasicCrawler',
            methodName: 'run',
            spanName: 'my-custom-span',
        },
    ],
});

Documentation

  • Guide: Complete guide at docs/guides/trace-and-monitor-crawlers.mdx covering:
    • Setup instructions
    • Basic and advanced usage examples
    • Integration with Jaeger and other backends
    • Configuration options
    • Manual instrumentation patterns

Testing

Comprehensive test coverage including:

  • Unit Tests:

    • Instrumentation configuration and initialization
    • wrapWithSpan functionality (sync/async, errors, context)
    • Module definition building
    • Log level mapping
  • Integration Tests:

    • Full crawler runs with instrumentation
    • Span creation and hierarchy
    • Attribute propagation
    • Error handling

Dependencies

Peer Dependencies

  • @opentelemetry/api: ^1.3.0
  • @opentelemetry/api-logs: ^0.210.0

Dependencies

  • @opentelemetry/instrumentation: ^0.210.0

Related Issues

Closes #2955


Note: This implementation follows the OpenTelemetry instrumentation best practices and integrates seamlessly with the OpenTelemetry ecosystem. Users can export traces to any OpenTelemetry-compatible backend (Jaeger, Zipkin, Signoz, etc.).

@NathanSavageKaimai NathanSavageKaimai changed the title Otel package feat: instrumentation based opentelemetry collection Jan 22, 2026
@janbuchar janbuchar requested a review from Pijukatel January 26, 2026 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Opentelemetry instrumentation

1 participant