Essays

Microservice Communication: Problems, Solutions, and Practical Recommendations

A comprehensive guide to the nine most common problems in microservice communication — from cascading failures and data consistency to observability and API evolution — with practical patterns and code examples for each.

Everyone was talking about microservices. Flexibility. Scalability. Independent teams. It sounded like a dream. Many companies rushed to break apart their monoliths. Development did speed up. Individual components became easier to update and deploy.

And then the services needed to communicate. And the dream turned into a complex, multidimensional puzzle.

A simple function call inside a monolith became a network request. Reliability dropped. Latency increased. Debugging turned into a quest through distributed logs.

Problem #1 — The Domino Effect: Cascading Failures and System Brittleness

One service is unavailable. The service calling it waits for a response. It runs out of threads or connections. It also becomes unavailable. The domino effect brings down the entire system.

Solution A: Circuit Breaker Pattern — an automatic switch between services with three states: Closed (normal mode), Open (trips on errors), Half-Open (recovery test mode).
Solution B: Bulkhead Pattern — dividing resources into groups for each service, preventing cascade failure.

#include <mutex>
#include <semaphore>
#include <thread>

class Bulkhead {
public:
    explicit Bulkhead(int maxConcurrent) : semaphore(maxConcurrent) {}

    bool execute(auto function) {
        if (!semaphore.try_acquire()) {
            return false;
        }
        std::thread t([this, function]() {
            try {
                function();
            } catch(...) {
            }
            semaphore.release();
        });
        t.detach();
        return true;
    }

private:
    std::counting_semaphore<> semaphore;
};

Solution C: Asynchronous Communication — publishing messages to a queue (RabbitMQ, Kafka) instead of direct calls.

Problem #2 — The Saga of Data Consistency

Business operations spanning multiple services (order, payment, inventory) require distributed transactions.

Solution A: Saga Pattern (Orchestration) — a separate coordinator manages the process and compensation on errors.
Solution B: Saga Pattern (Choreography) — services communicate through events without a central coordinator.
Solution C: Transactional Outbox Pattern — events are written to an outbox table in the same transaction as business data.

Problem #3 — Performance Degradation

Every network call adds milliseconds. A chain of ten calls adds hundreds of milliseconds.

Solution A: gRPC with Protobuf — binary format works several times faster than JSON.
Solution B: CQRS — denormalized data copies subscribe to events for fast local access.
Solution C: API Gateway/BFF — data aggregation at the facade instead of multiple client requests.

Problem #4 — Local Development Environment

Spinning up 15 services, Kafka, PostgreSQL, and Redis locally is unrealistic.

Solution A: Consumer-Driven Contracts — testing contracts between services in isolation.
Solution B: Mocks/Stubs — service virtualization with tools like WireMock.
Solution C: Telepresence/Gefyra — a local service becomes part of a remote cluster.

Problem #5 — The Distributed Data Dilemma

Directly querying another service's database creates the tightest possible coupling.

Solution A: API Composition — an aggregator makes multiple requests and merges data in memory.
Solution B: Event-Carried State Transfer — events contain all key data for consumers.
Solution C: Shared Database — an anti-pattern that creates hidden dependencies.

Problem #6 — System Blindness: Observability

The problem is somewhere in a chain of seven services. Logs are scattered, timings are unknown.

Solution A: Distributed Tracing — a Correlation ID is passed through calls and collected in Jaeger/Zipkin.
Solution B: Centralized Logging — structured logs (JSON) in ELK, Loki, Splunk.
Solution C: Business Metrics — tracking metrics like "orders per minute."

Problem #7 — Operational Chaos

100 services: how does service A find the IP address of service B?

Solution A: Service Discovery — services register in a registry (Consul, etcd, Eureka).
Solution B: Centralized Configuration Server — Spring Cloud Config, Consul KV.
Solution C: Service Mesh — Istio, Linkerd manage discovery, load balancing, and encryption.

Problem #8 — API Evolution

A service updated, the API changed. All dependent services broke.

Solution A: Versioning — new versions via /api/v2 or headers.
Solution B: Tolerant Reader — the client ignores unknown fields in the response.
Solution C: Anti-Corruption Layer — an adapter that translates between old and new formats.

Problem #9 — Security: Trust and Authorization

Anyone on the network can attempt to call your service.

Solution A: Service Mesh for mTLS — mutual TLS with certificates for each service.
Solution B: API Gateway — authentication check via JWT before reaching the internal network.
Solution C: OAuth2 Client Credentials — tokens with scopes between services.

Practical Recommendations

Correlation ID is not optional, it's the law. Without an end-to-end trace ID, you're blind.
Idempotency. The network retries requests. Be ready for that.
Data ownership is sacred. Each piece of data has one owner service.
Monitor p99 latency, not the average. Averages hide problems.
Chaos Engineering. Break the system yourself in a test environment.
Start simple. REST, Circuit Breaker, Service Discovery. Add complexity as needed.
Think about developers. A simple way to run locally = happy developers.

FAQ

What is this article about in one sentence?

This article explains the core idea in practical terms and focuses on what you can apply in real work.

Who is this article for?

It is written for engineers, technical leaders, and curious readers who want a clear, implementation-focused explanation.

What should I read next?

Use the related articles below to continue with closely connected topics and concrete examples.

Problem #1 — The Domino Effect: Cascading Failures and System Brittleness

Problem #2 — The Saga of Data Consistency

Problem #3 — Performance Degradation

Problem #4 — Local Development Environment

Problem #5 — The Distributed Data Dilemma

Problem #6 — System Blindness: Observability

Problem #7 — Operational Chaos

Problem #8 — API Evolution

Problem #9 — Security: Trust and Authorization

Practical Recommendations

FAQ

Related Articles

Why Airships Never Took Off. Part 12: Italian Semi-Rigid Airships

Why Airships Never Took Off. Part 11: Aircraft Carriers in the Sky

Why Airships Never Took Off. Part 10: The Most Famous and Successful Zeppelin

Why Airships Never Took Off. Part 9: Ashes of War and New Opportunities

Why Airships Never Took Off. Part 8: The End of Wartime Zeppelins

Why Airships Never Took Off. Part 7: Fire in the Sky