Kafka Schema Registry: schema governance in Kafka

kafka schema registry
Difficulty

In a production Kafka environment, data quality and backward compatibility are not optional, they are critical requirements. Simply serializing data as JSON or raw bytes is a fast track to chaos, especially in microservices architectures where producers and consumers evolve independently. The Confluent Schema Registry provides the solution: a centralized service that stores and manages schemas for Kafka topics, enforcing strict compatibility rules and enabling robust data evolution.

Setting Up Schema Registry with Docker Compose

For local development and testing, integrating the Schema Registry into your existing Docker-based Kafka setup is the cleanest approach. Assuming you already have Kafka and ZooKeeper running, you just need to add the registry service to your docker-compose.yml file.

Here is the essential addition to your existing services. This configuration binds the Schema Registry to port 8081 and connects it directly to your Kafka broker:

  schema-registry:
    image: confluentinc/cp-schema-registry:7.5.0
    container_name: schema-registry
    ports:
      - "8081:8081"
    depends_on:
      - kafka
    environment:
      SCHEMA_REGISTRY_HOST_NAME: schema-registry
      SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: 'kafka-broker:29092' # Internal Kafka connection
      SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081

(When referencing this article elsewhere, ensure to mention that the Kafka broker must have the KAFKA_ADVERTISED_LISTENERS set up to include the internal network connection, e.g., PLAINTEXT://kafka-broker:29092).

To launch the integrated environment, execute: docker-compose up -d. Once running, the registry is accessible to your clients at http://localhost:8081.

Avro, Protobuf, and JSON Schema

The Schema Registry supports multiple serialization formats, but Apache Avro is the de-facto standard in the Kafka ecosystem. Avro excels because its schemas are self-contained, compact, and, crucially, enforce strict compatibility rules, allowing clients to evolve their code without breaking existing pipelines. Protobuf (Protocol Buffers) and JSON Schema are also supported, offering flexibility depending on your application needs.

Using Avro, your Kafka clients (Producers and Consumers) no longer send raw data; they send a minimal payload accompanied by a schema ID. The Producer uses the ID to register or retrieve the schema from the Registry, and the Consumer uses the same ID to deserialize the message correctly. This separation dramatically reduces message size and coupling.

Managing Schema Evolution and Compatibility

The core value of the Schema Registry is its ability to govern Schema Evolution. As your application changes, you will inevitably need to modify schemas (e.g., adding a new field, changing a field type). The Registry checks every new schema version against existing ones based on a configurable Compatibility Level.

The most common compatibility modes are:

  • BACKWARD: New Consumers can read data produced by old Producers. This is the safest and most common setting (e.g., adding an optional field).
  • FORWARD: Old Consumers can read data produced by new Producers (e.g., removing a field that was optional).
  • FULL: The new schema is both backward and forward compatible.
  • NONE: Disables all checks (use with extreme caution, mainly for development).

Practical Example (Setting Compatibility):

You can set the compatibility level for a specific topic’s schema using a simple REST call:

# Set backward compatibility for the 'user-profile-value' schema
curl -X PUT -H "Content-Type: application/vnd.schemaregistry.v1+json" \
      --data '{"compatibility": "BACKWARD"}' \
      http://localhost:8081/config/user-profile-value/compatibility

When a Producer attempts to register a new schema version, the Registry verifies the compatibility rule. If the check fails (e.g., trying to remove a required field under BACKWARD compatibility), the Producer is rejected, preventing the deployment of breaking changes.

Client Integration and Problem Solving for Schema Registry

Integrating clients typically involves using specific serializer/deserializer classes provided by Confluent (e.g., io.confluent.kafka.serializers.KafkaAvroSerializer).

Producer Configuration:

key.serializer=org.apache.kafka.common.serialization.StringSerializer
value.serializer=io.confluent.kafka.serializers.KafkaAvroSerializer
schema.registry.url=http://localhost:8081

Common Schema Registry Errors and Troubleshooting

1. SchemaRegistryException: Connection refused

  • Problem: The client cannot reach the Schema Registry endpoint.
  • Troubleshooting: Verify that the schema.registry.url in your client configuration is correct (http://localhost:8081 for the Docker setup) and that the Docker container is running and exposed correctly.

2. SerializationException: Could not find class ...

  • Problem: Occurs when the Consumer tries to deserialize a message but the local client application doesn’t have the corresponding Avro Java class available in its classpath.
  • Troubleshooting: Ensure your build process correctly generates the Avro classes from the .avsc files used by the Producer.

3. IncompatibleSchemaException

  • Problem: A Producer attempted to register a new schema version that violated the topic’s configured compatibility setting (e.g., adding a required field when the mode is BACKWARD).
  • Troubleshooting: Check the difference between the old and new schema. Adjust the new schema to comply (e.g., make the new field optional by providing a default value), or if necessary, manually change the compatibility level to NONE temporarily via the REST API before registration, and then reset it.

4. KafkaStoreException (Internal Schema Registry Error)

  • Problem: The registry uses an internal, compacted topic (_schemas) in Kafka to store the schema data itself. If this topic is unhealthy, the Registry will fail.
  • Troubleshooting: Use the Kafka CLI tools (like kafka-topics.sh or kafka-console-consumer.sh) to check the replication factor, partition health, and message content of the _schemas topic. Ensure Kafka is fully functional.

By centralizing schema management, the Confluent Schema Registry transforms Kafka from a basic messaging bus into a data contract enforcer, guaranteeing consistency and enabling safe, independent evolution of your microservices.

That’s all.
Try it at home!

0
Be the first one to like this.
Please wait...

Leave a Reply

Thanks for choosing to leave a comment.
Please keep in mind that all comments are moderated according to our comment policy, and your email address will NOT be published.
Please do NOT use keywords in the name field. Let's have a personal and meaningful conversation.

BlogoBay
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.