Using GROUP_CONCAT in a single to many JOIN query with MySQL

I’ve had this conundrum before, but I didn’t know how to do it until now. What if you are doing a MySQL query and pulling a bunch of rows. But then each of those rows has more data in another table, which can have multiple rows itself. Normally you’d have to loop through the first query with PHP and run the second query.

Well I figured it out how to do it using GROUP_CONCAT. But first, here is the “old way” of doing what I’m talking about:

Select Code
1
2
3
4
5
6
7
8
9
$query = "SELECT * FROM `posts`";
$result = mysql_query($query);
while ($POST = mysql_fetch_array($result)) {
$query2 = "SELECT `tag` FROM `tags` WHERE `PID`='{$POST["PID"]}'";
$result2 = mysql_query($query2);
while($TAG = mysql_fetch_array($result2)) {
.. code goes here ..
}
}

So what ends up happening is for each row in posts, you run that second query. So if there are 100 posts it ends up being 101 queries run each time. Using GROUP_CONCAT you can get the same info, but with only one query:

Select Code
1
2
3
4
5
6
7
$query = "SELECT *,
(SELECT GROUP_CONCAT(tag) FROM tags WHERE tags.PID = posts.PID) AS lists
FROM posts";
$result = mysql_query($query);
while ($POST = mysql_fetch_array($result)) {
... code goes here ...
}

Essentially GROUP_CONCAT is running that extra query in the background, grabbing all the tags, and combining them into one string separated by commas. So you’d get all the tags without having to run additional queries.

Why care? Well for one, speed and efficiency. Which is my favorite thing. By having only one query goto the database, the database can do the heavy lifting while it’s got everything going. Each query has an overhead and bandwidth, so anytime you can combine queries is good!

Doing things without cron jobs

As you start to build applications that are bigger than your average site, you can no longer rely on cron jobs to do tasks for you. That’s because the data grows and you hit a point where your cron job takes too long to run.

Take Sitizens for instance. We’ve got over 42 million rows of data in the system, and not a single cron job for that site. Yet everything works great.

Today I’ve been working on bringing this same concept over to Trck.me. We’ve been growing tremendously but the system still depends on cron jobs. I’ve been coding for a few hours now on removing that dependency.

It’s actually a really fun problem to have, but it’s better practice to think big and code it so you don’t have to use cron jobs in the first place.

Technically speaking doing it without cron jobs uses more CPU time on the web servers but the bottleneck these days is the database servers. We could add in another web server anytime without having to do much. But a database server no matter how you do it, will require more effort to scale.

For that reason I’ve been trying to change my style of coding to rely less on the DB server and more on the web server. I’ve been so used to coding on single machines that one way or the other didn’t matter because it was all on the machine so CPU on the DB or CPU on the web was still the same CPU.

So anyways, that’s my techy thoughts for the day. I was originally gonna make a blog post about Bill’s blog post or Patrick’s blog post but by the time I started typing I forgot what I was typing and wrote this!

ON DUPLICATE KEY UPDATE

When you are working with high traffic sites it’s important to optimize every query you can. It’s also important to realize with so much traffic multiple people could be doing the same thing so data can messy.

Say you create a new table to track how many users login to your site every day. You create a table with the fields Date and Table. I used to do it this way:

Select Code
1
2
3
4
5
6
7
8
9
$query = "SELECT * FROM `logins` WHERE `Date`='09-17-2012'";
$result = mysql_query($query);
if (mysql_num_rows($result) == 0) {
$query = "INSERT INTO `logins` SET `Date`='09-17-2012', `Logins`=1";
$result = mysql_query($query);
} else {
$query = "UPDATE `logins` SET `Logins`=`Logins`+1 WHERE `Date`='09-17-2012'";
$result = mysql_query($query);
}

The problem is two fold: There are two queries for one task, and if you have a thousand people hitting that page at the same time there can be multiple inserts into the logins table.

Yes, you can make sure MySQL doesn’t allow multiple inserts, but it’ll result in an error and those logins won’t be counted.

What I found was a new way to do the same thing.. In one query, like so:

Select Code
1
2
$query = "INSERT INTO `logins` SET `Date`='09-17-2012', `Logins`=1 ON DUPLICATE KEY UPDATE `Logins`=`Logins`+1";
$result = mysql_query($query);

That’s it.. ON DUPLICATE KEY UPDATE tells MySQL to update the table if the key is a duplicate. Here the table structure would have the Key being “Date” and set to unique.

PHP Multi-Page MySQL Results

If you use any search engine, they will all limit the number of results per page. Some searches can return millions of pages! Now what about your site? Do you have any pages that return more than 30 items? You can easily break that down into 10 items per page.

In this example, I will show you how to return a results page with PHP, using a MySQL database. This script will return the entire table, with 10 items per page, and it will generate the appropriate previous and next links.

The MySQL Table
This code assumes you have a table named directory with “ID”, “URL”, and “Title” fields.

The Code

Select Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<?
if ($page < 1) {
  $page = 0;
} else {
  $links[] = "<a href='directory.php?page=".($page-1)."'>Previous</a>";
}

$global_dbh = mysql_connect("localhost","username","password");
mysql_select_db("mydatabasename", $global_dbh);

$query = "SELECT * FROM `directory` ORDER BY `ID` DESC LIMIT ".($page*10).", 11";
$result = mysql_query($query, $global_dbh);
$cnt = mysql_num_rows($result);

if ($cnt > 10) {
    $cnt = 10;
    $links[] = "<a href='directory.php?page=".($page+1)."'>Next</a>";
}

echo @implode(" | ",$links);

for ($i=0; $i<$cnt; $i++) {
    $row = mysql_fetch_array($result);
    echo "<a href='{$row["URL"]}'>{$row["Title"]}</a><P>";
}

echo @implode(" | ",$links);
?>

The Explaination
In this code example it is assumed you have already connected to your database previously in the script, with $global_dbh

Lines 2-6: If $page is less than 1, make sure it is set to 0 to prevent negative input. Otherwise create a previous link to go back.

Lines 8-10: Here we select the database information. We start at the current page * 10, skipping over previous pages. Notice we are going to select 11 entries – if it returns 11 entries, we know to add a next page link.

Lines 12-15: Check if the returned results is greater than 10, if it is – set the counter to 10 (so we only get 10 results) and create the next page link.

Lines 17,24: Here we output the links to the browser. It looks nice having “Previous | Next”, but ugly if it was just “Previous |” or “| Next”. Using the implode function (with an @ in case there is only one link) we get the “|” only if there are two links.

Lines 19-22: Simply outputing the table rows.