Mixed relationship taxonomy queries in WordPress

Earlier this year we launched a membership component for my very first project with 10up: LearningWorks for Kids. LearningWorks’s content focuses on making the most of digital content to support learning, academics, and development of critical thinking skills. Memberships come with support for multiple private child users, each of whom has a profile that includes their age, thinking skills, academic skills, and special learning needs. Each member can also indicate which digital platforms and devices are available to their child users.

The central benefit of membership, beyond access to exclusive content, is the personalized recommendations across four different content types for each child user. These recommendations take into account the child’s age and available platforms, as well as a combination of skills and needs as indicated in their profile. These are related by five custom taxonomies – platforms, age, thinking skills, academic skills, and special needs. In this instance, we need to get content matching this set of criteria: age AND one of the platforms AND one of thinking skills OR academic skills OR special needs. This is a mixed relationship taxonomy query, and I’ll show you how we pulled it off.

LearningsWorks recommendations

An important preface before we get technical: While the technique outlined below does work in WordPress 3.5+, it relies on a workaround. I have opened ticket #25284 on Trac and have been working on a patch, especially unit tests, to make the following behavior both supported and tested to make this safer for the future.

Advanced taxonomy queries were introduced in WordPress 3.1. With them we can easily retrieve content based on a number of characteristics, like retrieving games that can help with focus and have a recommended age of 5, or games that can help with focus or help with reading skills or have a recommended age of 5. Otto has a great write-up for getting started with taxonomy queries, where he also notes:

Note that the relation is global to the query, so it appears outside the arrays in the tax_query, but still in the tax_query array itself.

Since each array within the tax_query has a taxonomy parameter, the most obvious behavior is that you can either match against all of the taxonomy terms specified (the AND relation) or any one of the taxonomy terms specified (the OR relation). The example from LearningWorks fits neither of those behaviors, representing a mix of “and” and “or” conditions – but a good understanding of the terms schema and how WP_Tax_Query is transformed empowers us to  accomplish what we need.

When you pass an array of term IDs or slugs to tax_query (the field parameter determines which term identifier is used: slug or ID), it uses these terms in combination with the taxonomy parameter (defining which taxonomy you’re querying against) to get the unique term_taxonomy_id, which generates an extra query. This is the transformation, a result of the current database schema for terms and taxonomies. In this current schema, each term has a unique ID, and can be shared across multiple taxonomies. For example, a “personal” tag and “personal” category exist as one entry in the database’s terms table. Let’s skip past “why” this came to be (an entirely different post): the term+taxonomy combination (e.g. the “personal” tag) does have a truly unique identifier in term_taxonomy_id. We can avoid the extra query – the transformation – by passing term taxonomy IDs (instead of term IDs) and setting the field parameter to term_taxonomy_id.

$args['tax_query'] = array (
	'relation' => 'AND',
	array (
		'taxonomy' => 'age',
		'terms' => array( 13 ),
		'field' => 'term_taxonomy_id',
		'operator' => 'IN'
		'include_children' => false,
	),
	array (
		'taxonomy' => 'platform',
		'terms' => array( 2, 10, 39, 48 ),
		'field' => 'term_taxonomy_id',
		'operator' => 'IN',
		'include_children' => false,
	),
	array (
		'taxonomy' => 'thinking-skills',
		'terms' => array( 3, 5, 6 ),
		'field' => 'term_taxonomy_id',
		'operator' => 'IN',
		'include_children' => false,
	),
	array (
		'taxonomy' => 'academic-skills',
		'terms' => array( 44, 45 ),
		'field' => 'term_taxonomy_id',
		'operator' => 'IN',
		'include_children' => false,
	),
	array (
		'taxonomy' => 'special-needs',
		'terms' => array( 71 ),
		'field' => 'term_taxonomy_id',
		'operator' => 'IN',
		'include_children' => false,
	),
);

In this example, we’re looking for games (a custom post type) that match at least one term in all of these taxonomies – which is not what we are looking for and, knowing our content, is very rarely going to return any matches. If we were to switch to an OR relation, then we might get games that are age appropriate, but aren’t relevant to the child’s skills or playable on any platforms they have available.

While we haven’t yet addressed the issue of mixing the relationships, it’s important to note that there’s a big side benefit: better query performance, by skipping the transformation query (mapping term IDs to term taxonomy IDs). The magic comes when we take a closer look at how the SQL query is actually constructed.

The term_relationships table doesn’t care which taxonomy the term is in – all it looks for is the object’s ID (in this case, the ID of the game) and the unique term_taxonomy_id. So long as we pass a valid taxonomy to each tax_query piece (so that WP_Tax_Query::clean_query() doesn’t throw an error), using the term_taxonomy_id field enables you to specify terms across multiple taxonomies in one piece of your tax_query (it’s only in WP_Tax_Query::transform_query() that the taxonomy itself is consulted, not in the SQL generated in WP_Tax_Query::get_sql() and passed back to WP_Query). Combine this with the IN operator for the array, which is equivalent to the OR operator for that piece of the query, and you can have a mixed relationship taxonomy query.

Whoa.

Going back to our example, we can reduce our tax_query to three pieces: a game with at least one platform match, an age match, and at least one match between thinking skills, academic skills, and special needs. Our tax_query thus looks like this:

$args['tax_query'] = array (
	'relation' => 'AND',
	array (
		'taxonomy' => 'age',
		'terms' => array( 13 ),
		'field' => 'term_taxonomy_id',
		'operator' => 'IN'
		'include_children' => false,
	),
	array (
		'taxonomy' => 'platform',
		'terms' => array( 2, 10, 39, 48 ),
		'field' => 'term_taxonomy_id',
		'operator' => 'IN',
		'include_children' => false,
	),
	array (
		'taxonomy' => 'thinking-skills', // but really any taxonomy
		'terms' => array( 3, 5, 6, 44, 45, 71 ),
		'field' => 'term_taxonomy_id',
		'operator' => 'IN',
		'include_children' => false,
	),
);

In addition to the IN operator, you can also use AND or NOT IN, thus providing even more matching possibilities. Perhaps I’d like to return games that match the child’s age and either all thinking skills or all academic skills.

$args['tax_query'] = array (
	'relation' => 'OR',
	array (
		'taxonomy' => 'age', // but really any taxonomy
		'terms' => array( 13, 3, 5, 6 ),
		'field' => 'term_taxonomy_id',
		'operator' => 'AND'
		'include_children' => false,
	),
	array (
		'taxonomy' => 'platform',  // but really any taxonomy
		'terms' => array( 13, 44, 45 ),
		'field' => 'term_taxonomy_id',
		'operator' => 'AND',
		'include_children' => false,
	),
);

Note that the include_children parameter defaults to true and requires an additional query to get the child term IDs, so for hierarchical taxonomies you will need to either not include child terms in matches or include them in the terms array and set include_children to false – as in the above examples – to bypass the transform entirely. Support for term_taxonomy_id was officially added in 3.5, although it worked in some cases prior to that.

Using this technique, we crafted finely tuned recommendations for LearningWorks members leveraging WordPress APIs (as opposed to direct queries), and realized enough performances gains to add order by RAND, delivering new recommendations on every visit.

 

If you’ve looked for inventive ways to generate more powerful taxonomy querying and matching, this should help pave the way. Meanwhile, keep an eye out for a big rethink of the terms / taxonomy schema, directly addressing the underlying problems, coming soon to a WordPress near you.

6 Comments

  1. Mark Root-Wiley

    Great write up and really useful for future projects. I can think of a couple on my horizon that may need this. It’s great to see you pushing that work back into core too. Thanks!

    One thing did confuse me though. Seeing that this site is all about education for kids, I’m surprised that any of your queries had results with `’include_children’ => false`…

    · Reply
    • Helen Hou-Sandi
      Helen Hou-Sandi

      Ba-dum, TSH! :)

      · Reply
  2. Timothy Brand

    Mind blown. Awesome post, thank you. Bookmarked as a future reference :)

    · Reply
  3. Topher

    This is mighty neato, I’ve wanted to do this many times in the past. Thanks a bunch!

    · Reply
  4. Andrej

    Have you seen a decrease in performance with these queries?

    We used this approach to build a ACL-type permissions thingy that intercepts all calls for content and it seems that the queries WPDB generates with multiple taxonomies in the tax_query array are less than optimal (with expensive subqueries).

    · Reply
  5. P. Enrique

    This doesn’t seem to work with WP 3.8.1. You only get results that match the first criteria. I have opened a ticket about it: https://core.trac.wordpress.org/ticket/27193 My bug report doesn’t try the technique with term_taxonomy_ids, but I can confirm that the same thing happens.

    · Reply

Leave a Comment