Skip to main content

Why you should use a schema registry

We highly recommend to not use JSON encoded messages for your usecases - only minor testing. JSON-encoded messages are easy to use and human-readable, but there are several challenges associated with them, especially when dealing with big data pipelines like those typically found with Kafka. Here are some issues you might encounter:

  1. Size: JSON messages are verbose and can be quite large. Every JSON message includes both the field names and the values, leading to a lot of redundancy and bigger payloads. This can slow down network transmission and increase storage needs.

  2. Efficiency: JSON isn't the most efficient format for serialization and deserialization. It requires more CPU resources to process, which can be a bottleneck in high-throughput data processing systems.

  3. Schema Evolution: JSON doesn't have native support for schema evolution. If the structure of your data changes over time, old consumers may break when they encounter a new field or a missing expected field.

  4. Lack of Metadata: JSON doesn't include metadata, such as data types, in the encoded data. This can lead to misinterpretation of the data if the producer and consumer have different assumptions about the data types.

To address these problems, many Kafka-based systems use Avro encoding, along with a Schema Registry. Here's why:

  1. Compact Data: Avro-encoded data doesn't include the field names, just the values. This results in much smaller messages compared to JSON, improving network efficiency and reducing storage requirements.

  2. Efficiency: Avro serialization and deserialization is typically faster and more CPU-efficient than JSON.

  3. Schema Evolution Support: Avro has built-in support for schema evolution. You can add fields, remove fields, or change fields, and Avro will handle it gracefully as long as you follow certain rules.

  4. Schema Enforcement: Using Avro with a Schema Registry ensures that the producer and consumer agree on the schema. This eliminates many potential misunderstandings about the structure and types of the data.

  5. Integrated Metadata: Avro data includes the data types in the schema, which eliminates some of the ambiguity present with JSON data.

By using Avro with a Schema Registry, you can overcome many of the challenges associated with JSON, creating more efficient, reliable, and scalable Kafka-based systems. Our managed Event Streaming service comes with the open source schema registry Apicurio. Remember to use the correct endpoint, if your application requires the confluence schema registry . Apicurio is a drop in replacement if used correctly. Please refer to the REST documentation of your apicurio registry service. You can find the different compatibility endpoints looking at <schema.registry.url>/apis. If using firefox, you convert your certificate using openssl with the command

openssl pkcs12 -export -out cert.p12 -in client.cert -inkey client.key

Afterwards import this certificate under Firefox > Settings > Certificates to access your REST api of your schema registry service.