Database Server Maintenance

While investigating sluggish behavior on our database server this morning I noticed that a couple of our settings were not properly optimized. As a result we’ll be updating these settings tomorrow to ensure our website is as fast as can be.

The actual process is quite simple, and should have a small amount of downtime. We’ll be doing it sometime between 2AM and 9AM EST. The actual window of downtime shouldn’t be longer than 15 minutes.

I’ll be posting updates on this post. Our server techs will be handling the process to ensure it’s done as quickly as possible. I’ll update here if I get a more specific time frame.

Update: The process went smoothly with only a few minutes of downtime. Hurray!

Machine Learning is Amazing

I’ve been starting to use machine learning here at TimTech. If you don’t know what it is, here’s a very basic run down..  You track what people do, and can use that to predict what they’ll do next.

Here are some examples..   The biggest is Amazon..  When you’re browsing items they have recommendations that are similar to what you’re looking at. This has worked many times, as I’ve thought I wanted one thing and saw something that was more what I wanted in that list.

Another is Google. Ever have a time where you go to search something, and the query suggestion is spot on? I’ve had it happen to me on some real bizarre situations. It’s not a coincidence, it’s machine learning.

And another that you might not know about..  Some of those really addicting games on your phone use it to predict when you’re going to turn the game off. They can tell that after certain events you’re likely to stop playing for a bit. So they see that situation coming up and all of a sudden you open up a magical box of something cool that keeps you playing for another hour.

Right now we’re using machine learning to detect fraud. We were starting to get up to $1000/day of fraud so it had to get stopped. Since using machine learning, we haven’t had an issue in weeks. And now I’m starting to use it to predict recommendations and other fun stuff.

Now what is really neat for all you privacy wizards. Machine learning actually doesn’t know who you are or what about you. When I first thought of it, I thought it was like learning people and such. But really it’s all anonymous data because the computer doesn’t use your name or any personal info.

Anyways, that’s what fun I’ve been up to. What are you doing that’s fun?

The backups I’ve been waiting for

I’ve always been a guy who likes to back things up. The problem is, it’s always been difficult or costly or slow. I found a solution that fixes all three which makes me super happy. It’s called CrashPlan.

The concept is simple.. You want to have backups that are both nearby and far away. Nearby because restoring your backups from a local copy is super fast, and far away because a fire destroying both copies does nobody any good.

Well CrashPlan does this by allowing you to backup your computers against each other. So I’ve got my laptop and desktop copying backups to each other. If I’m out on the road and someone steals my desktop, I’ve still got everything on my laptop. And if my laptop gets lost, I’ve still got the desktop.

On top of that, they have a cloud backup, which has unlimited space and keeps a copy far far away from your house so fires aren’t a concern.

But what makes this really really cool is there is a friend code option. I can give you my friend code, and then you can send your backups to my computer.  Fully encrypted so I can’t mess with your stuff, but storing it on my system which is free, and which has an added benefit: being close by.

So say Larry and I swap share codes. If his hard drive fails he can buy a new one and come over with an external drive to my house, I can then copy the backup to the external drive very quickly, and he can bring it home and be restored. No waiting for downloads. How cool is that?

This is the kind of system I’ve been wanting to setup with a few family members, but never did because it was too complicated. But they make it super simple.

Expanding Stats on Trck.me

You may have noticed a change in Trck.me, and it’s the beginning of a new era for simplified tracking. I’ve expanded the stats we collect to give you daily views. This makes for a few cool new things:

You can now select any date, especially yesterday. This was one of our top requests, a yesterday button. Because after all if you weren’t there at midnight you wouldn’t see how many hits you got that day. And better yet you can pick any day to view it’s stats.

Last week is based on the last 7 days, not since Monday. If you still want to see the stats for Monday through today, you can select that on the calendar too. But most people wanted stats for the last rolling 7 days, not the stats since Monday.

Going forward I’m looking into making it so you can compare stats. For example you could compare this week to last week, or this month to last month. That would be a cool way to know if you’re doing better or worse for any given squeeze page, advertising source, etc.

What I need from you guys is some good feedback. There are a few bugs here and there that I’m squashing, but ultimately you should find it quick and easy to use!

It took a long time to get this to work quickly. We’ve had over 30GB of tracking data, and finding the way to pull the results and sort them instantly turned into quite the learning process. I’m confident now you’ll get the tracking you need, and we’ll be able to deliver the data quickly!

How to prevent a PHP script from running while it’s still running

When you’re running cron jobs sometimes you’ll need to run things that use a lot of CPU or other resources. If you need it to run right away, but you don’t want to copies running at the same time, it’s pretty simple to setup.

The way I’ve been doing it is by creating a file at the top of the script, and deleting it at the end. When the script starts you just check if the file is there, and if it is, you quit.

[code]define(‘PIDFILE’, ‘/home/username/public_html/file.pid’);

if (file_exists(PIDFILE)) { EXIT(); }

file_put_contents(PIDFILE, posix_getpid());
function removePidFile() {
unlink(PIDFILE);
}
register_shutdown_function(‘removePidFile’);[/code]

What I’ve done is used the register_shutdown_function so that way it deletes it on exit. The benefit is you don’t have to mess with all your code to ensure every area it could possibly stop running deletes the file.

Then I simply set the cron job to run every minute. It will check if there is something to do, if there isn’t, it exits, if there is it keeps going and no new jobs start till it’s finished. Real simple!

Using GROUP_CONCAT in a single to many JOIN query with MySQL

I’ve had this conundrum before, but I didn’t know how to do it until now. What if you are doing a MySQL query and pulling a bunch of rows. But then each of those rows has more data in another table, which can have multiple rows itself. Normally you’d have to loop through the first query with PHP and run the second query.

Well I figured it out how to do it using GROUP_CONCAT. But first, here is the “old way” of doing what I’m talking about:

[code]$query = “SELECT * FROM `posts`”;
$result = mysql_query($query);
while ($POST = mysql_fetch_array($result)) {
$query2 = “SELECT `tag` FROM `tags` WHERE `PID`='{$POST[“PID”]}'”;
$result2 = mysql_query($query2);
while($TAG = mysql_fetch_array($result2)) {
.. code goes here ..
}
}[/code]

So what ends up happening is for each row in posts, you run that second query. So if there are 100 posts it ends up being 101 queries run each time. Using GROUP_CONCAT you can get the same info, but with only one query:

[code]$query = “SELECT *,
(SELECT GROUP_CONCAT(tag) FROM tags WHERE tags.PID = posts.PID) AS lists
FROM posts”;
$result = mysql_query($query);
while ($POST = mysql_fetch_array($result)) {
… code goes here …
}[/code]

Essentially GROUP_CONCAT is running that extra query in the background, grabbing all the tags, and combining them into one string separated by commas. So you’d get all the tags without having to run additional queries.

Why care? Well for one, speed and efficiency. Which is my favorite thing. By having only one query goto the database, the database can do the heavy lifting while it’s got everything going. Each query has an overhead and bandwidth, so anytime you can combine queries is good!