Serverless Webhooks with Cerebrium

Processing large PDF files often leads to HTTP timeouts. You send a document, wait, and the connection dies before processing completes. Cerebrium’s serverless platform solves this with custom FastAPI webhooks and built-in security.

Wayne Lau

October 21, 2025 · 4 min read

The code base can be found on GitHub

What is Cerebrium? #

Cerebrium is a serverless GPU provider that enables on-demand serverless applications. This is perfect for our PDF processing use case since we only pay for GPU time when actually converting documents, avoiding the cost of running dedicated servers.

GPU pricing is straightforward: L4 costs $0.000222/second while L40s costs $0.000542/second. For a 600-page PDF, L4 takes about 20 minutes ($0.27) while L40s completes in 10 minutes ($0.33). The $30 free credit covers 100+ large document conversions.

The Problem #

Large PDFs can take minutes to convert to markdown. A 600-page PDF takes 10-20 minutes depending on GPU tier, and further optimization is possible with deployment tuning.

Webhooks #

Cerebrium allows you to write custom FastAPI endpoints with built-in webhook security. Instead of keeping HTTP connections open:

Send PDF → Get immediate “processing started” response
Process in background → Cerebrium serverless function converts PDF
Receive callback → Webhook delivers results to your endpoint

Cerebrium provides $30 of free credits for testing, making it easy to get started with serverless PDF processing.

Implementation #

Submit PDF with webhook:

import urllib.parse

webhook_url_encoded = urllib.parse.quote(webhook_url, safe="")
api_url = f"{pdf_endpoint}?async=true&webhookEndpoint={webhook_url_encoded}"
headers = {
    "accept": "application/json",
    "Authorization": f"Bearer {api_key}",
    "X-Webhook-Secret": webhook_secret,
}

response = requests.post(api_url, headers=headers, files=files)
# Returns immediately: {"run_id": "random-uuid-here"}

Webhook server receives results:

@app.post("/webhook/pdf-results")
async def receive_pdf_results(request: Request):
    result = await request.json()
    # You can implement your own saving functions
    save_markdown(result["data"]["text"])
    return {"status": "received"}

Cerebrium recommends verifying webhook signatures to ensure security.

Webhook Signature Verification

import hmac
import hashlib

def verify_webhook_signature(request_id, timestamp, body, signature, secret):
    # Remove the 'v1,' prefix from the signature
    signature = signature.split(',')[1] if ',' in signature else signature

    # Construct the signed content
    signed_content = f"{request_id}.{timestamp}.{body}"

    # Calculate expected signature
    expected_signature = hmac.new(
        secret.encode(),
        signed_content.encode(),
        hashlib.sha256
    ).hexdigest()

    # Compare signatures
    return hmac.compare_digest(expected_signature, signature)

Benefits #

No timeouts - HTTP connection closes immediately
Scalable - Handle multiple PDFs concurrently
Reliable - Results delivered even if processing takes hours
Connection resilience - Short-lived HTTP requests handle unstable connections better than long-running ones
Batch processing - Perfect for evaluating multiple documents where immediate responses aren’t needed

Caveats #

Cerebrium webhook do not have a retry mechanism built-in. Other providers may have retries.

Accessibility #

One challenge with webhooks is making your endpoint accessible from serverless functions. During development, you can use tunneling services:

# Using ngrok
ngrok http 5000

# Or cloudflared tunnel
cloudflared tunnel --url http://localhost:5000

These create public URLs that forward to your local webhook server, allowing serverless functions to reach your endpoint during development.

When to Use Webhooks #

Use webhooks for long-running tasks where immediate response isn’t feasible.
Ideal for batch processing, large datasets or operations taking minutes to hours.

Production Patterns #

When deploying webhook-based processing to production, consider these patterns:

Track job IDs - Store the run_id returned from initial requests with document metadata for tracking and retrieval
Webhook reliability - Cerebrium has no retry mechanism. Lost webhooks mean lost results. Ensure your webhook endpoint is highly available
Queue systems - For high-volume processing, implement a queue-based system to manage async jobs and handle backpressure
Database tracking - Maintain job state in your database (pending, processing, completed, failed) to provide status updates to users

Extensions #

This webhook pattern serves as a foundation for more complex systems:

Caching strategies - Cache extracted content by document hash to prevent repeated processing. Use Redis, disk cache, or object storage with TTLs based on document update frequency
Self-hosted queues - Implement Celery, BullMQ, or similar queue systems for managing async job lifecycles with retry logic
Custom monitoring - Build dashboards tracking job completion rates, processing times, and failure patterns
Webhook middleware - Add retry mechanisms, circuit breakers, and fallback handlers to handle delivery failures
Hybrid approaches - Combine serverless processing with self-hosted orchestration for cost optimization

Conclusion #

Serverless webhooks eliminate the issues of timeouts in long running jobs. By leveraging Cerebrium’s FastAPI endpoints and secure webhook system, you can efficiently process large PDFs without worrying about connection drops or timeouts.

The pattern works great for PDF processing, image conversion, or any operation that takes more than a few minutes to complete.