PostgreSQL pivot tables, selecting has-many relationships

16 December 2017

PostgreSQL has a useful function called crosstab that makes selecting ‘has many’ data associations pretty easy. In our example we have contacts who has many match_values (0+), and we want to present the data in a single table.

A frequent approach I see to this problem is to make multiple queries to a database and piece the data together in a client application. Well if you have seen that being done before, this is a single query that returns the exact data for that type of problem. Instead of multiple queries, we are going to make a pivot table, join that with another table, and return a set of results that otherwise might be hard to get.

Problem we are trying to solve:

Desired structure:
name | position | industry | nps
------+----------+----------+-----
Jon  | Director | Media    |
Beth |          | Banking  | 8
Amy  | VP       |          | 10
Raw data:
SELECT * FROM contacts;

-- id | name
-- ----+------
--  1 | Jon
--  2 | Beth
--  3 | Amy

SELECT * FROM match_values;
-- id | contact_id | match_category_id |   name
-- ----+------------+-------------------+----------
--  1 |          1 |                 1 | Director
--  2 |          1 |                 2 | Media
--  3 |          2 |                 2 | Banking
--  4 |          2 |                 3 | 8
--  5 |          3 |                 1 | VP
--  6 |          3 |                 3 | 10

select * from match_categories;
-- id |   name
-- ----+----------
--  1 | Position
--  2 | Industry
--  3 | NPS

The first thing we need to do is CREATE extension tablefunc;, this will let us use the crosstab function.

Now lets take a quick high level look at how the crosstab function works for our case. We will be passing in 2 string arguments, and provide a schema for the results to be returned in.

SELECT * FROM crosstab(
  TEXT source_sql,
  TEXT category_sql
) AS ct (
  result_schema
)

The first argument ‘source_sql’ is a string that returns 3 columns in this order

  • row_name column, how the table extends vertically
  • category column, how the table extends horizontally
  • value column, the values populating the columns

Our source_sql will be:

SELECT contact_id, match_category_id, name FROM match_values ORDER BY 1, 2

contact_id is the row name, match_category_id is the column names, and name is the values.

The second argument category_sql is the ordering of the category column, this prevents all the null values being squashed to the left. It should return a single column of equal length to the categories from the first query of the same type and with no duplicates.

Our category_sql will be:

SELECT DISTINCT id FROM match_categories ORDER BY 1

The results is a list of the column names and data types, for us to select from.

Ours columns look like this:

contact_id INT,
position TEXT,
industry TEXT,
NPS TEXT

Now putting it all together into a full query, we get this:

SELECT * FROM crosstab (
  $$ SELECT contact_id, match_category_id, name FROM match_values ORDER BY 1, 2 $$,
  $$ SELECT DISTINCT match_category_id FROM match_values ORDER BY 1 $$
) AS ct (
  contact_id INT,
  position TEXT,
  industry TEXT,
  NPS TEXT
);

-- | contact_id | position | industry | nps  |
-- |---         |---       |---       |--    |
-- | 1          | Director | Media    |      |
-- | 2          |          | Banking  | 8    |
-- | 3          | VP       |          | 10   |

The result is pretty close to what we want, but instead of contact_id we really want a column that has the contact name. What we can do is combine our retults with a Common Table Expression using WITH, and join that with contacts to get exactly what we were looking for

WITH pivot_table AS (
  SELECT * from crosstab (
    $$ SELECT contact_id, match_category_id, name FROM match_values ORDER BY 1 $$,
    $$ SELECT DISTINCT match_category_id from match_values ORDER BY 1 $$
  ) AS ct (
    contact_id INT,
    position TEXT,
    industry TEXT,
    nps TEXT
  )
)

SELECT
  contacts.name,
  pivot_table.position,
  pivot_table.industry,
  pivot_table.nps
FROM contacts
LEFT JOIN pivot_table ON contacts.id = pivot_table.contact_id;

-- | name | position | industry | nps  |
-- |---   |---       |---       |--    |
-- | Jon  | Director | Media    |      |
-- | Beth |          | Banking  | 8    |
-- | Amy  | VP       |          | 10   |

Now we can return exactly what we were looking for in a single query! Our database can do more of the heavy lifting and we can avoid an alternative of multiple queries and piecing the data together somewhere else.

If you want to try this out yourself, here is the SQL to generate the data in your own database:

CREATE extension tablefunc;

CREATE TABLE contacts(
  id INT PRIMARY KEY,
  name TEXT NOT NULL
);

CREATE TABLE match_values(
  id INT PRIMARY KEY,
  contact_id INT NOT NULL,
  match_category_id INT NOT NULL,
  name TEXT NOT NULL
);

CREATE TABLE match_categories(
  id INT PRIMARY KEY,
  name TEXT NOT NULL
);

INSERT INTO contacts
  (id, name)
VALUES
  ( 1, 'Jon'),
  ( 2, 'Beth'),
  ( 3, 'Amy');

INSERT INTO match_categories
  (id, name)
VALUES
  ( 1, 'Position'),
  ( 2, 'Industry'),
  ( 3, 'NPS');

INSERT INTO match_values
  (id, contact_id, match_category_id, name)
VALUES
  ( 1, 1, 1, 'Director'),
  ( 2, 1, 2, 'Media'),
  ( 3, 2, 2, 'Banking'),
  ( 4, 2, 3, '8'),
  ( 5, 3, 1, 'VP'),
  ( 6, 3, 3, '10');