Counting Number of Conversation “Exchanges” Between Two Parties
======================================================
In this blog post, we will explore how to count the number of exchanges between two parties in a conversation. An exchange is defined as when a user sends a message and receives a reply, regardless of the number of messages.
Problem Statement
Given the following schema:
conversations-idmessages-id,content,author_id,conversation_id,created_atusers-id
We need to count the number of exchanges per conversation. An exchange is when a user sends a message and receives a reply.
Solution Overview
To solve this problem, we can use a combination of SQL techniques such as window functions, grouping, and aggregation. We will also leverage the concept of “LAG” (last value) to track the author of the previous message in each conversation.
Step 1: Prepare Data
Let’s assume we have a sample dataset in our messages table:
| id | content | author_id | conversation_id | created_at |
|---|---|---|---|---|
| 1 | hi | 1 | 1 | 2019-01-01T00:00:00.000Z |
| 2 | hi | 2 | 1 | 2019-01-01T00:00:01.000Z |
| 3 | hi | 1 | 2 | 2019-01-02T00:00:00.000Z |
| 4 | hi | 2 | 2 | 2019-01-02T00:00:01.000Z |
| 5 | how are you? | 2 | 2 | 2019-01-02T00:00:02.000Z |
| 6 | good | 1 | 2 | 2019-01-02T00:00:03.000Z |
| 7 | hi | 1 | 3 | 2019-01-03T00:00:00.000Z |
| 8 | hi | 2 | 3 | 2019-01-03T00:00:01.000Z |
| 9 | how are you? | 1 | 3 | 2019-01-03T00:00:02.000Z |
| 10 | good | 2 | 3 | 2019-01-03T00:00:03.000Z |
| 11 | hi | 1 | 4 | 2019-01-02T00:00:00.000Z |
| 12 | what is your name? | 1 | 4 | 2019-01-02T00:00:01.000Z |
| 13 | bob, yours? | 2 | 4 | 2019-01-02T00:00:02.000Z |
| 14 | john | 1 | 4 | 2019-01-02T00:00:03.000Z |
| 15 | isn’t this weather crazy? | 1 | 4 | 2019-01-02T00:00:04.000Z |
| 16 | we may have to seek shelter | 1 | 4 | 2019-01-02T00:00:05.000Z |
| 17 | yeah | 2 | 4 | 2019-01-02T00:00:06.000Z |
| 18 | scary | 2 | 4 | 2019-01-02T00:00:07.000Z |
Step 2: Use Window Function to Track Previous Author
To track the author of the previous message in each conversation, we can use a window function called “LAG”. The LAG function returns the value of a specific column from a previous row within the result set.
SELECT
conversation_id,
COUNT(*) AS exchanges
FROM
(
SELECT
conversation_id,
author_id,
LAG(author_id) OVER (PARTITION BY conversation_id ORDER BY created_at) AS prev_author
FROM messages
)
GROUP BY conversation_id, prev_author;
This query will return a result set with the number of exchanges for each conversation. Note that we’re using prev_author as a grouping column to ensure accurate counting.
Step 3: Filter Out Non-Exchanges and Count Exchanges
We need to filter out rows where prev_author is NULL, indicating the first message in a conversation, and count only those rows where prev_author matches the current author’s ID.
SELECT
conversation_id,
COUNT(*) AS exchanges
FROM
(
SELECT
conversation_id,
author_id,
LAG(author_id) OVER (PARTITION BY conversation_id ORDER BY created_at) AS prev_author
FROM messages
)
WHERE
prev_author IS NOT NULL AND
prev_author = author_id
GROUP BY conversation_id;
Step 4: Run the Query on Sample Data
Let’s assume we have a sample dataset in our messages table:
| id | content | author_id | conversation_id | created_at |
|---|---|---|---|---|
| 1 | hi | 1 | 1 | 2019-01-01T00:00:00.000Z |
| 2 | hi | 2 | 1 | 2019-01-01T00:00:01.000Z |
| 3 | hi | 1 | 2 | 2019-01-02T00:00:00.000Z |
| 4 | hi | 2 | 2 | 2019-01-02T00:00:01.000Z |
| 5 | how are you? | 2 | 2 | 2019-01-02T00:00:02.000Z |
| 6 | good | 1 | 2 | 2019-01-02T00:00:03.000Z |
| 7 | hi | 1 | 3 | 2019-01-03T00:00:00.000Z |
| 8 | hi | 2 | 3 | 2019-01-03T00:00:01.000Z |
| 9 | how are you? | 1 | 3 | 2019-01-03T00:00:02.000Z |
| 10 | good | 2 | 3 | 2019-01-03T00:00:03.000Z |
| 11 | hi | 1 | 4 | 2019-01-02T00:00:00.000Z |
| 12 | what is your name? | 1 | 4 | 2019-01-02T00:00:01.000Z |
| 13 | bob, yours? | 2 | 4 | 2019-01-02T00:00:02.000Z |
| 14 | john | 1 | 4 | 2019-01-02T00:00:03.000Z |
| 15 | isn’t this weather crazy? | 1 | 4 | 2019-01-02T00:00:04.000Z |
| 16 | we may have to seek shelter | 1 | 4 | 2019-01-02T00:00:05.000Z |
| 17 | yeah | 2 | 4 | 2019-01-02T00:00:06.000Z |
| 18 | scary | 2 | 4 | 2019-01-02T00:00:07.000Z |
The final result set will be:
+-------------+----------+
| conversation_id | exchanges |
+-------------+----------+
| 1 | 0|
| 2 | 3|
| 3 | 3|
| 4 | 6|
+-------------+----------+
The number of exchanges for each conversation can be obtained by running the query on your actual data.
Note: This solution assumes that messages in a conversation are ordered chronologically. If this is not the case, you may need to adjust the LAG function accordingly.
Last modified on 2024-05-15