Building AI Microservices: Moleculer, Ollama, and Render Integration

Unleashing Local AI: Moleculer, Ollama, and Render Integration

The landscape of web development is rapidly evolving, with AI integration becoming a cornerstone of modern applications. While cloud-based LLMs offer convenience, cost, privacy, and latency concerns often drive the need for local or self-hosted solutions. This post demonstrates how to build a scalable, AI-powered microservices application using Moleculer, integrate it with the Ollama API for local LLM inference, and deploy everything effortlessly on Render.

The Integrated Stack

Moleculer.js: A progressive microservices framework for Node.js. Moleculer simplifies building distributed systems with features like service discovery, load balancing, and fault tolerance, making it ideal for scalable AI backends. Moleculer Documentation
Ollama API: Provides a local API for running open-source large language models (LLMs) like Llama 2, Mistral, and many others. It abstracts away the complexities of model management and inference, offering a simple HTTP interface. Ollama Documentation
Render: A unified cloud platform to build and run all your apps and websites. Render streamlines deployment with features like automatic builds, global CDN, and managed databases, perfect for hosting microservices. Render Documentation

Architectural Blueprint

Our architecture will consist of at least two Moleculer services: an api-gateway to expose endpoints and an ai-service responsible for interacting with the Ollama API. Both services will run within their own containers, deployed on Render. Ollama itself can run on a dedicated server or a sufficiently provisioned Render instance, accessible to our ai-service.

graph TD
    A[Client Application] --> B(API Gateway Service - Moleculer)
    B --> C(AI Service - Moleculer)
    C --> D(Ollama Instance API)
    D --> E(LLM Model Weights)

    subgraph Render Deployment
        B
        C
    end
    subgraph Local/Dedicated Server
        D
        E
    end

Building the Moleculer AI Service

First, let's create a Moleculer service that can communicate with an Ollama instance. We'll use axios for HTTP requests.

// services/ai.service.ts
import { Service, ServiceBroker, Context } from "moleculer";
import axios from "axios";

interface GenerateParams {
  model: string;
  prompt: string;
  stream?: boolean;
}

export default class AIService extends Service {
  public constructor(broker: ServiceBroker) {
    super(broker);
    this.parseServiceSchema({
      name: "ai",
      settings: {
        ollamaApiUrl: process.env.OLLAMA_API_URL || "http://localhost:11434/api",
      },
      actions: {
        generate: {
          params: {
            model: { type: "string" },
            prompt: { type: "string" },
            stream: { type: "boolean", optional: true, default: false },
          },
          handler: this.generateText,
        },
      },
    });
  }

  private async generateText(ctx: Context<GenerateParams>) {
    const { model, prompt, stream } = ctx.params;
    this.logger.info(`Generating text with model: ${model}, prompt: "${prompt.substring(0, 50)}..."`);

    try {
      const response = await axios.post(`${this.settings.ollamaApiUrl}/generate`, {
        model,
        prompt,
        stream,
      });

      // Ollama's /generate endpoint returns a JSON object for non-streaming
      // For streaming, it's newline-delimited JSON, requiring a different handler.
      // This example assumes non-streaming for simplicity.
      return response.data.response;
    } catch (error) {
      this.logger.error("Error calling Ollama API:", error.message || error);
      throw new Error("Failed to generate text from Ollama.");
    }
  }
}

This ai.service.ts defines a generate action that takes a model and prompt, then forwards it to the configured Ollama API endpoint. The OLLAMA_API_URL is an environment variable, crucial for deployment.

Integrating with an API Gateway

To expose our AI capabilities to external clients, an API Gateway service is essential. This Moleculer service will handle incoming HTTP requests and forward them to our ai.service.

// services/api-gateway.service.ts
import { Service, ServiceBroker } from "moleculer";
import ApiGateway from "moleculer-web";

export default class ApiGatewayService extends Service {
  public constructor(broker: ServiceBroker) {
    super(broker);
    this.parseServiceSchema({
      name: "api-gateway",
      mixins: [ApiGateway],
      settings: {
        port: process.env.PORT || 3000,
        routes: [
          {
            path: "/api",
            whitelist: ["ai.*"], // Allow all actions from the 'ai' service
            autoAliases: true, // Automatically create aliases for actions
            bodyParsers: {
              json: true,
              urlencoded: { extended: true },
            },
            onBeforeCall: async (ctx, route, req, res) => {
              // Add any authentication/authorization logic here
            },
          },
        ],
      },
    });
  }
}

The api-gateway service uses moleculer-web to create HTTP endpoints. The whitelist ensures only actions from the ai service are exposed under /api, and autoAliases simplifies endpoint mapping (e.g., /api/ai/generate).

Deploying on Render

Render makes deploying microservices straightforward. Each Moleculer service (e.g., api-gateway, ai-service) can be deployed as a separate Render Web Service, typically containerized using Docker.

Dockerize Services: Create a Dockerfile for each Moleculer service.

# Dockerfile for a Moleculer service (e.g., ai-service)
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install --production
COPY . .
CMD ["npm", "start"] # Or "node", "index.js" if you have a main entry

Render Configuration:
- Create new "Web Services" on Render.
- Connect to your Git repository.
- Configure the build command (e.g., npm install && npm run build if using TypeScript compilation) and start command (e.g., npm start).
- Crucially, set environment variables:
  - For ai-service: OLLAMA_API_URL pointing to your Ollama instance (e.g., http://ollama-instance-ip:11434/api).
  - For api-gateway: PORT (Render sets this automatically).
  - For both: SERVICES=ai,api-gateway (or specific service names if running individually) and NODE_ENV=production.
- Ollama Instance: If running Ollama on Render, it would be another web service with appropriate resources. Alternatively, connect to an external Ollama deployment.

Best Practices and Actionable Insights

Scalability: Moleculer's architecture allows you to scale individual services independently. If your AI service becomes a bottleneck, simply spin up more instances of the ai-service on Render.
Cost-Effectiveness: Running Ollama locally or on a dedicated machine can be significantly cheaper than relying solely on commercial cloud LLM APIs, especially for high-volume inference. Render's cost-efficient hosting further enhances this.
Observability: Integrate Moleculer's built-in metrics and tracing with tools like Prometheus and Jaeger to monitor your services on Render effectively.
Security: Implement robust authentication and authorization in your api-gateway to protect your AI endpoints.
Model Management: Ollama simplifies switching between different LLMs. Your ai-service can be easily updated to use new models without redeploying the entire application.

Conclusion

By combining Moleculer's robust microservices capabilities with Ollama's accessible local LLM API and Render's seamless deployment, developers can build powerful, scalable, and cost-effective AI applications. This integrated approach empowers you to leverage cutting-edge AI models while maintaining control over your infrastructure and data.