Kafka Series 3 -- Kafka Manager(Monitoring)
Reference:[https://github.com/yahoo/kafka-manager]
What is Kafka Manager?
Kafka Manager is A tool for managing Apache Kafka.
It supports the following :
Manage multiple clusters
Easy inspection of cluster state (topics, consumers, offsets, brokers, replica distribution, partition distribution)
Run preferred replica election
Generate partition assignments with option to select brokers to use
Run reassignment of partition (based on generated assignments)
Create a topic with optional topic configs (0.8.1.1 has different configs than 0.8.2+)
Delete topic (only supported on 0.8.2+ and remember set delete.topic.enable=true in broker config)
Topic list now indicates topics marked for deletion (only supported on 0.8.2+)
Batch generate partition assignments for multiple topics with option to select brokers to use
Batch run reassignment of partition for multiple topics
Add partitions to existing topic
Update config for existing topic
Optionally enable JMX polling for broker level and topic level metrics.
Optionally filter out consumers that do not have ids/ owners/ & offsets/ directories in zookeeper.
How to install
I am using the machine which installed Red Hat, require java 1.8 for kafka manager
sudo yum update
wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u111-b14/jdk-8u111-linux-x64.rpm"
sudo rpm -ivh jdk-8u161-linux-x64.rpm
java -version
(download kafka-manager.zip)
unzip kafka-manager-master.zip
cd kafka-manager-master/conf
vi application.conf (check https://github.com/yahoo/kafka-manager for Configuration)
./sbt clean dist
cd kafka-manager-master/target/universal/kafka-manager-1.3.3.17/bin
./kafka-manager &
Usage
1). Export JMX port from Kafka to get Metrics information.
Description Mbean name Normal value
Message in rate kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
Byte in rate kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
Request rate kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}
Byte out rate kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec
Log flush rate and time kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs
# of under replicated partitions (|ISR| < |all replicas|) kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions 0
Is controller active on broker kafka.controller:type=KafkaController,name=ActiveControllerCount only one broker in the cluster should have 1
Leader election rate kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs non-zero when there are broker failures
Unclean leader election rate kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec 0
Partition counts kafka.server:type=ReplicaManager,name=PartitionCount mostly even across brokers
Leader replica counts kafka.server:type=ReplicaManager,name=LeaderCount mostly even across brokers
ISR shrink rate kafka.server:type=ReplicaManager,name=IsrShrinksPerSec If a broker goes down, ISR for some of the partitions will shrink. When that broker is up again, ISR will be expanded once the replicas are fully caught up. Other than that, the expected value for both ISR shrink rate and expansion rate is 0.
ISR expansion rate kafka.server:type=ReplicaManager,name=IsrExpandsPerSec See above
Max lag in messages btw follower and leader replicas kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica < replica.lag.max.messages
Lag in messages per follower replica kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+) < replica.lag.max.messages
Requests waiting in the producer purgatory kafka.server:type=ProducerRequestPurgatory,name=PurgatorySize non-zero if ack=-1 is used
Requests waiting in the fetch purgatory kafka.server:type=FetchRequestPurgatory,name=PurgatorySize size depends on fetch.wait.max.ms in the consumer
Request total time kafka.network:type=RequestMetrics,name=TotalTimeMs,request={Produce|FetchConsumer|FetchFollower} broken into queue, local, remote and response send time
Time the request waiting in the request queue kafka.network:type=RequestMetrics,name=QueueTimeMs,request={Produce|FetchConsumer|FetchFollower}
Time the request being processed at the leader kafka.network:type=RequestMetrics,name=LocalTimeMs,request={Produce|FetchConsumer|FetchFollower}
Time the request waits for the follower kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower} non-zero for produce requests when ack=-1
Time to send the response kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request={Produce|FetchConsumer|FetchFollower}
Number of messages the consumer lags behind the producer by kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.\w]+)
The average fraction of time the network processors are idle kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent between 0 and 1, ideally > 0.3
The average fraction of time the request handler threads are idle kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent between 0 and 1, ideally > 0.3