How to run Redis Sentinel

This post is a walk-through of using Redis Sentinel, showing some of its internals. First, start a Redis master:

$ redis-server --port 6379

Now start your first Redis Sentinel. We’re going to start three of them in total. Each Redis Sentinel requires a separate config file. Create the config file for the first one, called sentinel1.conf:

$ cat << EOF > sentinel1.conf
port 5000
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
EOF
$

The main thing to note is that the config file points to the address of the Redis master, 127.0.0.1:6379. Now start the first Sentinel from the first config file:

$ redis-server sentinel1.conf --sentinel
...
65905:X 08 Jan 19:45:59.536 # Sentinel ID is cc83a347c59af48d0604d328604cd5cc21a82800
65905:X 08 Jan 19:45:59.536 # +monitor master mymaster 127.0.0.1 6379 quorum 2

Notice that you started Redis Sentinel using the normal redis-server command. Redis Sentinel is bundled with Redis, and you don’t need to install anything else.

After starting your first Sentinel process, check sentinel1.conf again. Against expectation (perhaps), it has changed!:

$ cat sentinel1.conf
port 5000
sentinel myid cc83a347c59af48d0604d328604cd5cc21a82800
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
# Generated by CONFIG REWRITE
dir "/Users/jim/dev/tmp/sentinel"
sentinel config-epoch mymaster 0
sentinel leader-epoch mymaster 0
sentinel current-epoch 0
$

The “config file” should really be thought of more as a database for that Sentinel; the initial file you give it is just the initial state of the database. Notice Sentinel generates itself a myid on startup, then stores that in the database. And most importantly, the configured address of the Redis master can change! Redis Sentinel permanently records failovers in the config file itself.

Now you’ve started Redis Sentinel, you can connect to it as a client. You can do this with the normal redis-cli command, Redis Sentinel uses the same line protocol as Redis! Just point it at the configured port 5000:

$ redis-cli -p 5000
127.0.0.1:5000>

Once connected to the Sentinel, you can ask it for the current master:

127.0.0.1:5000> sentinel get-master-addr-by-name mymaster
1) "127.0.0.1"
2) "6379"

The Sentinel process connects to the Redis process to detect whether it’s still available. You can see this from the Redis process, by listing its clients:

$ redis-cli -p 6379 client list
id=19 addr=127.0.0.1:54765 fd=11 name=sentinel-cc83a347-cmd age=86 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=ping
id=20 addr=127.0.0.1:54766 fd=12 name=sentinel-cc83a347-pubsub age=86 idle=1 flags=N db=0 sub=1 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=subscribe
id=21 addr=127.0.0.1:54779 fd=6 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=client

The first two connections are from the Sentinel process. They have names sentinel-cc83a347-cmd and sentinel-cc83a347-pubsub. That string cc83a347 is the start of the myid value that the Sentinel process generated for itself earlier.

The Sentinel process has one client to use Pub/Sub subscribe; and another process to run arbitrary commands. This is because if a Redis client opens a Pub/Sub subscription, it is no longer allowed to issue most other commands.

The pubsub client is subscribed to a single global channel:

$ redis-cli -p 6379 pubsub channels
1) "__sentinel__:hello"

You can subscribe to that channel. You’ll see a new message every two seconds:

$ redis-cli -p 6379 subscribe __sentinel__:hello
Reading messages... (press Ctrl-C to quit)
1) "subscribe"
2) "__sentinel__:hello"
3) (integer) 1
1) "message"
2) "__sentinel__:hello"
3) "127.0.0.1,5000,cc83a347c59af48d0604d328604cd5cc21a82800,0,mymaster,127.0.0.1,6379,0"
1) "message"
2) "__sentinel__:hello"
3) "127.0.0.1,5000,cc83a347c59af48d0604d328604cd5cc21a82800,0,mymaster,127.0.0.1,6379,0"
1) "message"
2) "__sentinel__:hello"
3) "127.0.0.1,5000,cc83a347c59af48d0604d328604cd5cc21a82800,0,mymaster,127.0.0.1,6379,0"
^C

Sentinel uses Pub/Sub to advertise itself and to broadcast which Redis is master. New Sentinels can use this to discover each other!

You can also use monitor to see what Sentinel is doing:

$ redis-cli -p 6379 monitor
OK
1546981642.808843 [0 127.0.0.1:54765] "PING"
1546981642.866671 [0 127.0.0.1:54765] "PUBLISH" "__sentinel__:hello" "127.0.0.1,5000,cc83a347c59af48d0604d328604cd5cc21a82800,0,mymaster,127.0.0.1,6379,0"
1546981643.883505 [0 127.0.0.1:54765] "PING"
1546981644.907646 [0 127.0.0.1:54765] "PING"
1546981644.983626 [0 127.0.0.1:54765] "PUBLISH" "__sentinel__:hello" "127.0.0.1,5000,cc83a347c59af48d0604d328604cd5cc21a82800,0,mymaster,127.0.0.1,6379,0"
...

Every second, Sentinel sends a ping to Redis to check whether it’s still alive. We can use debug sleep to cause Redis to stop responding for ten seconds:

$ redis-cli -p 6379 debug sleep 10

In the Sentinel output, you’ll see that it detected that Redis went away, then came back:

65905:X 08 Jan 21:13:49.363 # +sdown master mymaster 127.0.0.1 6379
65905:X 08 Jan 21:13:53.790 # -sdown master mymaster 127.0.0.1 6379

The +sdown event happened 5 seconds after Redis went away; this is due to the sentinel down-after-milliseconds mymaster 5000 config that we set earlier.

Sentinel detected that Redis went away, but it didn’t actually do anything in response. For a start, we’ve configured Sentinel with a quorum of 2, which says that at least 2 Sentinels need to consider the Redis to be down before a failover can happen. We can never reach 2, because we only have one Sentinel, so let’s start another:

$ cat << EOF > sentinel2.conf
port 5001
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
EOF
$ redis-server sentinel2.conf --sentinel

When you launch the second Sentinel, both Sentinels log that they discovered each other:

$ redis-server sentinel1.conf --sentinel
...
39660:X 08 Jan 23:13:58.949 * +sentinel sentinel 003e3c58ef3d98ac0fb5e98d4fafc55e8c0036fa 127.0.0.1 5001 @ mymaster 127.0.0.1 6379

They discovered each other via Redis Pub/Sub. We can see that both are now advertising themselves:

$ redis-cli -p 6379 subscribe __sentinel__:hello
Reading messages... (press Ctrl-C to quit)
1) "subscribe"
2) "__sentinel__:hello"
3) (integer) 1
1) "message"
2) "__sentinel__:hello"
3) "127.0.0.1,5000,cc83a347c59af48d0604d328604cd5cc21a82800,0,mymaster,127.0.0.1,6379,0"
1) "message"
2) "__sentinel__:hello"
3) "127.0.0.1,5001,003e3c58ef3d98ac0fb5e98d4fafc55e8c0036fa,0,mymaster,127.0.0.1,6379,0"
1) "message"
2) "__sentinel__:hello"
^C

Now we have enough Sentinels to reach “quorum”, so try making Redis go away again:

$ redis-cli -p 6379 debug sleep 10

This time, we get much more output in the Sentinel log:

39660:X 08 Jan 23:19:31.137 # +sdown master mymaster 127.0.0.1 6379
39660:X 08 Jan 23:19:31.204 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
39660:X 08 Jan 23:19:31.204 # +new-epoch 1
39660:X 08 Jan 23:19:31.205 # +try-failover master mymaster 127.0.0.1 6379
39660:X 08 Jan 23:19:31.207 # +vote-for-leader cc83a347c59af48d0604d328604cd5cc21a82800 1
39660:X 08 Jan 23:19:31.209 # 003e3c58ef3d98ac0fb5e98d4fafc55e8c0036fa voted for cc83a347c59af48d0604d328604cd5cc21a82800 1
39660:X 08 Jan 23:19:31.274 # +elected-leader master mymaster 127.0.0.1 6379
39660:X 08 Jan 23:19:31.274 # +failover-state-select-slave master mymaster 127.0.0.1 6379
39660:X 08 Jan 23:19:31.338 # -failover-abort-no-good-slave master mymaster 127.0.0.1 6379
39660:X 08 Jan 23:19:31.404 # Next failover delay: I will not start a failover before Tue Jan  8 23:21:31 2019
39660:X 08 Jan 23:19:35.146 # -sdown master mymaster 127.0.0.1 6379
39660:X 08 Jan 23:19:35.146 # -odown master mymaster 127.0.0.1 6379

The state we reached first was sdown, or “subjectively down”. Next, we reach odown, or “objectively down”. A Redis is odown if at least quorum (2) Sentinels consider the Redis sdown. Once we reach odown, we try to failover, but this gets aborted:

39660:X 08 Jan 23:19:31.338 # -failover-abort-no-good-slave master mymaster 127.0.0.1 6379

To fail over, Sentinel needs a slave to fail over to, but we never started one! Let’s start a new Redis:

$ redis-server --port 6380
...

Then from a new shell, set it as the slave:

$ redis-cli -p 6380 slaveof 127.0.0.1 6379

In the Redis server logs, note that one starts slaving from the other. You can also see the new slave by asking the Redis master with the info command:

$ redis-cli -p 6379 info
...
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6380,state=online,offset=38446,lag=1
master_repl_offset:38710
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:38709
...

In the same way, Sentinel uses the info command to find slaves which it can use for failover. If you run monitor on the master, you can see each Sentinel running info every 10 seconds:

$ redis-cli -p 6379 monitor | grep INFO
1546990849.944140 [0 127.0.0.1:52298] "INFO"
1546990850.145960 [0 127.0.0.1:52297] "INFO"
1546990860.044550 [0 127.0.0.1:52298] "INFO"
1546990860.192189 [0 127.0.0.1:52297] "INFO"

Now that Sentinel knows of a slave, it should be able to fail over! So let’s try it again:

$ redis-cli -p 6379 debug sleep 10

This time, it works! In the Sentinel log:

39660:X 08 Jan 23:45:46.637 # +sdown master mymaster 127.0.0.1 6379
39660:X 08 Jan 23:45:47.033 # +new-epoch 2
39660:X 08 Jan 23:45:47.034 # +vote-for-leader 003e3c58ef3d98ac0fb5e98d4fafc55e8c0036fa 2
39660:X 08 Jan 23:45:47.723 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
39660:X 08 Jan 23:45:47.723 # Next failover delay: I will not start a failover before Tue Jan  8 23:47:47 2019
39660:X 08 Jan 23:45:48.116 # +config-update-from sentinel 003e3c58ef3d98ac0fb5e98d4fafc55e8c0036fa 127.0.0.1 5001 @ mymaster 127.0.0.1 6379
39660:X 08 Jan 23:45:48.117 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380
39660:X 08 Jan 23:45:48.117 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380

In Redis, we can now see that the master/slave relationship is reversed:

$ redis-cli -p 6379 config get slaveof
1) "slaveof"
2) "127.0.0.1 6380"
~
$ redis-cli -p 6380 config get slaveof
1) "slaveof"
2) ""
~
$

And each Sentinel has recorded this failover by re-writing its config file:

$ cat sentinel1.conf
port 5000
sentinel myid cc83a347c59af48d0604d328604cd5cc21a82800
sentinel monitor mymaster 127.0.0.1 6380 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
# Generated by CONFIG REWRITE
dir "/Users/jim/dev/tmp/sentinel"
sentinel config-epoch mymaster 2
sentinel leader-epoch mymaster 2
sentinel known-slave mymaster 127.0.0.1 6379
sentinel known-sentinel mymaster 127.0.0.1 5001 003e3c58ef3d98ac0fb5e98d4fafc55e8c0036fa
sentinel current-epoch 2

As a normal Redis client, how should you find out whether a failover happened? One way is to subscribe to the __sentinel__:hello channel on the Redis master, but that’s not a good idea: that channel gets quite busy once you have several Sentinels, but more importantly, if a failover happens, the master probably isn’t working anyway! You can instead subscribe to channels on the Sentinel process. There’s one for each event type, and the important one is +switch-master:

$ redis-cli -p 5000 subscribe +switch-master
Reading messages... (press Ctrl-C to quit)
1) "subscribe"
2) "+switch-master"
3) (integer) 1

Now trigger a failover again, this time in the other direction:

$ redis-cli -p 6380 debug sleep 10

When the failover happens, you’ll see this come through on the +switch-master channel:

1) "message"
2) "+switch-master"
3) "mymaster 127.0.0.1 6380 127.0.0.1 6379"

As a Redis client receiving this message, you should stop using the old master, and start using the new one.

Finally, I should note: we only created two Sentinel processes, and this worked, but I said we should create three! Sentinel uses consensus to log failovers, but consensus algorithms hate even numbers. You should create another, to have a minimum of three nodes.

Tagged #programming, #networking.

Similar posts

More by Jim

👋 I'm Jim, a full-stack product engineer. Want to build an amazing product and a profitable business? Read more about me or Get in touch!

This page copyright James Fisher 2019. Content is not associated with my employer. Found an error? Edit this page.