Kafka consumers commit offsets? here is why?
Kafka consumers commit offsets to track their progress in a topic. This "bookmark" tells Kafka which messages have been successfully processed, ensuring that if a consumer restarts or a rebalance occurs, it knows exactly where to pick back up.
Internally, these offsets are stored in a special, compacted internal topic named __consumer_offsets.
1. Automatic Commit (Default)
By default, the consumer is configured to commit offsets automatically.
How it works: When enable.auto.commit is set to true, the consumer automatically commits the largest offset returned by the poll() method at a fixed interval. (When you call poll(), Kafka doesn't just give you one message; it gives you a batch of messages (e.g., 500 messages at once). The consumer tracks your progress by the highest number in that batch.)
Frequency: Controlled by auto.commit.interval.ms (default is 5000ms or 5 seconds).
Risk: This is the easiest method but can lead to data loss (if the consumer crashes after committing but before finishing processing) or duplicate processing (if the consumer crashes after processing but before the next auto-commit).
2. Manual Commit
To gain more control over when a message is considered "done," you can disable auto-commit (enable.auto.commit=false) and use the Kafka Consumer API to commit manually
Read article about manual commit: https://syedblog61220.blogspot.com/2026/05/manual-offset-commit-by-consumer.html
3. Key Concepts to Remember
The "Next" Offset: When you commit, you are technically committing the offset of the next message you want to read, not the one you just finished. For example, if you just processed offset 50, you commit offset 51.
Consumer Groups: Offsets are tracked per consumer group. This allows different applications to read from the same topic at their own pace.
Rebalancing: When a new consumer joins or leaves a group, Kafka triggers a rebalance. If offsets haven't been committed, the new consumer might re-process old messages.
Manual Specific Offsets: You can also commit specific offsets for specific partitions if you need fine-grained control over a batch of records using the OffsetAndMetadata class.
For detailed implementation examples in different languages, you can check the
Comments
Post a Comment