If This Then Sonos - Hooking the Music Player to the Internet of Things

If This Then Sonos - Hooking the Music Player to the Internet of Things

I've been on a bit of a run lately with the Sonos hacks. They've run the gamut from (IMO) really useful, to pretty silly, and places in between. That first PowerMate-based hack has driven all this, prompting me to try and find more and more ways to make getting to my music through Sonos more convenient and simple.

ifttt-logo-largeToday's hack provides perhaps the most flexibility possible in getting your Sonos Zones to do what you want them to do. I've gotten my players hooked up to the If This Then That (IFTTT) internet “glue” service. In IFTTT's own words, they are “a service that lets you create powerful connections with one simple statement”. As a very simple example, you could configure IFTTT to trigger when you post a new Instagram photo, and post that photo to Facebook.

Obviously, IFTTT works with services and systems directly connected to the net, and your Sonos system isn't such a thing. What I've done is written an interface that can talk to IFTTT, and feed the appropriate commands to a Sonos Zone.


As usual, this'll be a node app, running (in my case) on a Raspberry Pi. The app will receive POSTs from IFTTT, interpret them, and manipulate Sonos Zones via the sonos-discovery package. There are a couple tricks we need to perform before this will all work, though. First, we need to realize that IFTTT doesn't provide any channel (as they call their internet service connectors) that will push arbitrary data to an arbitrary endpoint. Or, at least, they don't provide any channels that purposefully do such a thing. Luckily for us, there's a work-around.


IFTTT features a WordPress channel that uses WP's XML-RPC interface to post directly to a user's blog. We can hijack this channel, and set up a node server that will emulate a WP XML-RPC endpoint, allowing us to grab data from repurposed WordPress-specific fields.

The second challenge is that the IFTTT WordPress channel MUST point to an endpoint running on port 80. So you need to open port 80 on your router, and direct it to whatever port your internal nodejs server is running on. (Another option is to set up something like a Heroku redirector as the XML-RPC endpoint, and feed data directly to the desired port on your system. That technique is beyond the scope of this article.)

These two challenges mean two things:

  1. You cannot have an existing WordPress integration on IFTTT, nor can you add one in the future as long as you want to keep the Sonos-to-IFTTT connection (IFTTT only supports one WordPress configuration).
  2. You cannot already be running a public-facing web server on port 80 of your home network connection.

If those conditions sound OK to you, let's dive in!


Clone the github repo, cd into the created directory, and type npm install to install the required packages. Edit “sonosifttt.js” and change the port on the very last line to a number that works for you. Type node sonosifttt.js to start the server.

Next, get your router all set to forward data from external port 80 to the port you chose above. This procedure varies by router manufacturer, but you can generally find instructions by Googling your router make and model, and the phrase “port forwarding”.

Now get an IFTTT account all set up. Once this is done, you need to set up the WordPress channel. (Remember, we're exploiting the relatively configurable nature of the WordPress channel on IFTTT). To set up the channel, go to the Channels page, scroll to the bottom, and select the WordPress icon.

Click on “Activate”, and enter the URL/ip address for your home network. Username and password are required, but are not checked anywhere, so you can enter anything in these field. If all goes well, once you click the “Activate” button on this page, IFTTT will reach out to your node server, running the sonosifttt.js code, and recognize a WP XML-RPC endpoint.

I noted above the username and password are not checked currently, and in the github code, they're not. I'd highly recommend you put in some kind of username/password checking to reject any request from folks looking to mess with your Sonos system.


At this point, you should be all set to make your Sonos system the “that” in “If This Then That”. Let's start with a simple setup to test things out. Go back to the Channels page, find the “Date & Time” channel, and activate it, selecting your local time zone. Click on the “My Recipes” link at the top of the page, then click on the big “Create a Recipe” button.

Click on the “this” link of the phrase, then select the Date & Time channel you just configured. Select “Every day at”, then configure the time to be the next occurring time from right now. Click “Create Trigger”.

Click the “that” part of the phrase, then scroll down and select the “WordPress” channel. Click on “Create a Post”, and you'll get to the most important part of creating a Sonos trigger. Here is where we'll hijack the WordPress channel for our own purposes.

In the “Title” field, change the text to read “say”. Make the “Body” field read “This is a test of if this then Sonos”. Leave “Categories” blank, and make “Tags” say the name of the Sonos zone you want to control for this recipe. In my case, I set it to “Family Room”. (This field is case insensitive.)

Now, sit and wait until the time you configured, and you should hear your Sonos Zone speak to you. You may have to wait a bit longer; I noticed the Date & Time trigger could sometimes run a few minutes slow. You can check whether it fired in the “Logs” section of your newly created recipe.

WordPress Channel to Sonos Configuration

Now that we've confirmed that things are working, it's time to learn how to build your own recipes with your Sonos system as the target. The following fields are interpreted by the sonosifttt code:

  • Title ”“ Can be either “play”, “pause”, “favorite”, or “say”. - play ”“ Play what ever is in the queue of the selected Zone
  • pause ”“ Pause the selected Zone
  • favorite ”“ Play the Sonos Favorite specified in the “Body” section
  • say ”“ Speak the text specified in the “Body” section
  • Body ”“ Used to provide information for the “favorite” and “say” commands - for the “favorite” command ”“ the EXACT name of the favorite you want to play
  • for the “say” command ”“ the words you want spoken by your Sonos
  • Tags ”“ Comma-delimited list of Zones you want to be the receivers of the command. Note that if your selected Zone is the “child” in a Sonos Group, the commands will actually be applied to the “parent” of the group

Recipe Ideas

There are a few obvious ideas: use the iPhone/Android IFTTT app location awareness to stop and start your Sonos when you leave/arrive home. Or you could have your Sonos speak the weather every morning (Weather channel), or the scores of your favorite sports team (ESPN channel). You could set up the Twitter channel to play a Sonos Favorite when a certain hashtag is sent to your account.

The possibilities are endless. I'd really love to hear what you come up with.

The Details

The code is all pretty straightforward with a couple exceptions. First, the part where we parse the “Tags” field to get our target players:

for (var i = 0; i < req.body.mt_keywords.length; i++) {
    var player = discovery.getPlayer(req.body.mt_keywords[i]);
    if (!player) continue;
    var playerInfo = player.convertToSimple();
    if (playerInfo.uuid != playerInfo.coordinator) {
        player = discovery.getPlayerByUUID(playerInfo.coordinator);
        if (!player) continue;
    if (playerUUIDs.indexOf(player.uuid) == -1) {

We first grab the player by the name (in line 2), and then check to make sure that player is either the coordinator (“parent”) of a group, or not in a group at all (line 5). If it IS the coordinator, we get that player instead, via its uuid. Once we've got the player we're interested in, we confirm that we haven't gotten it already (line 9), which might occur if we specified a child Zone, and got the parent instead, which he had also specified.

The other challenging part was the “say” functionality. I had used this pattern before, in sonospowermate, so I was familiar with the methods involved in sending the text for text-to-speech conversion. The challenge was in getting the text spoken without destroying the Zone's existing queue.

if (req.body.title.toLowerCase() === 'say') {
    var textURL;
    tts.getSpeech(req.body.description, function(error, link) {
        if (error) return;
        textURL = link;
        for (var i = 0; i < players.length; i++) {
            var player = players[i];
            if (player.state.currentState === 'PLAYING') continue;
            var favURI = player.avTransportUri;
            var favTrack = favTrack = player.state.trackNo;
            player.setAVTransportURI(textURL, '', function(success) {
                player.play(function() {
                    queueSave[player.uuid] = {
                        favTrack: favTrack,
                        favURI: favURI,
                        started: false

We did this by saving the queue items and current item (lines 9 and 10), and putting them in an object with player uuid as key after the Zone has started “saying” the text (line 13). This object is used in the player emitter that informs us of transport-state changes (Zone starts and stops, basically).

discovery.on('transport-state', function(data) {
    if (!queueSave[data.uuid]) return;
    if (!queueSave[data.uuid]['started']) {
        queueSave[data.uuid]['started'] = true;
    var player = discovery.getPlayerByUUID(data.uuid);
    if (player.state.currentState !== 'PAUSED_PLAYBACK' && player.state.currentState !== 'STOPPED') return;
    player.setAVTransportURI(queueSave[data.uuid]['favURI'], '', function(success) {
        player.seek(queueSave[data.uuid]['favTrack'], function() {
            queueSave[data.uuid] = null;

In line 2, we check to see if the uuid of the player emitting the transport-state change exists as a key in our object. If so, we know this player is in the process of saying something. Here's the hard part: normally, we'd look for a “STOPPED” state, indicating that the player was done speaking. The problem is, we get a “STOPPED” state right BEFORE the player starts, which was causing us to miss our queue reset. So we put a flag (started) in the object, and check it when we get a state change.

If it's false, we we know we have started, so we set it to true, and wait for our next REAL “STOPPED” event (line 8). Once we get that, we put the queue back (line 9), and set the track number to the right spot (line 10). After all that is done, we set the uuid key to null so this emit handler doesn't try to do all this again, when the player ISN'T currently speaking.

Wrap Up

Again, all the code is available on github. Please clone it, try it out, fork it, and improve it. I'd love to get some more ideas for some verbs we could implement, and I'd LOVE to hear about what killer IFTTT recipes you come up with.