Sunday, March 30, 2008

PgAgent - Installation and usage

Prompted by the usual Devshed's post, this time about pgAgent, here I am with a small "tutorial" (or maybe just a small collection of screenshots).
PgAgent is a great little tool, much like SQLServer's SQLAgent or the built in scheduling abilities of Oracle and MySQL (which is a much nicer solution IMHO), what's weak is the installation and the docs, I mean, why the pgAdminIII installer does ask for pgAgent installation and doesn't really install it?
All this just to have a few files placed on my Hd?
So, first of all, pgAdmin guys please add to your installer the ability to actually install pgAgent, or even better have the PostgreSQL guys add it to the main server distribution, where it belongs.

Now for the actual installation let's have a look at the docs, the main steps for installation are (I'm working on Windows):
  1. Database setup
    1. Work within the postgres standard database
    2. Check if pl/pgsql is installed
    3. If not, install it
    4. Connect to the postgres database and run a sql script provided with the installation that creates all required objects, that script is named pgagent.sql and it's kept in the [$Program files]\pgAdmin III\[$version]\Scripts folder
    5. After this, if you connect with pgAdminIII you'll see a new branch in the objects tree, named "Job"
  2. Service installation
    1. The service has to be installed from the command line, like this

C:\Documents and Settings\Paolo>"C:\Programmi\pgAdmin III\1.8\pg
Agent" INSTALL pgAgent -u postgres -p postgres hostaddr=127.0.0.1 dbname=pgadmin
user=postgres

C:\Documents and Settings\Paolo>
    1. Start the just installed service
BE AWARE OF THE SECURITY RISKS of this kind of installation

Monday, March 24, 2008

Shoot in the foot

I've just finished reading two recent blog posts about new query optimizations in the upcoming MySQL 6.0, it's all fine and dandy but ...
Looking at Correlated semi-join subqueries and PostgreSQL by S. Petrunia we can read something like
Quote:
The first thing we did was to take a look at PostgreSQL as it is easily available and seems to have at least decent subquery handling (or even better than decent, I have not heard much complaints).
Does this mean their benchmark is PostgreSQL? I mean, Oracle and SQLServer are easily available too ... not to mention the comparison any reader can see a few lines below between MySQL's and PostgreSQL's explain plans.

The other amusing read is New optimizer features in MySQL 6.0 again by Sergey where you can find an interesting speed comparison which (might) boil down to

Server version Wallclock time # of reads
MySQL 5.1 12 min 9,001,055
MySQL 5.2 1.8 sec 153,008
MySQL 5.2 no_semijoin 25 sec 7,651,215
PostgreSQL 8.2.5 0.1 sec 2,413

Very interesting!

But anyone interested in query optimization should take the time to check the whole paper, extremely informative (and much of it's content is good for any database too)!!!

Ok, now LET THE FLAME WAR START ;-)

BTW MySQL 5.1 is not in production currently, while PostgreSQL 8.2.5 has just been superseded by the new PostgreSQL 8.3.1

Sunday, March 16, 2008

Fasten your seatbelts ...

... or how to safely run data manipulation statements in your database.
Reading posts on Devshed's forums I sometime notice people doing maintenance work on their data without any safety net apart from occasional ages old backups ;-).
Anyway I think there's no need for a restore if you just issued the wrong update query, I mean, transactions are here for this, it's just a matter of educating people ...
Say you have a database structure like this

  1. CREATE TABLE `test`.`users` (
  2. `id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
  3. `username` VARCHAR(45) NOT NULL,
  4. PRIMARY KEY (`id`)
  5. )
  6. ENGINE = InnoDB;

and

  1. CREATE TABLE `test`.`agent` (
  2. `id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
  3. `actiontime` DATETIME NOT NULL,
  4. PRIMARY KEY (`id`),
  5. CONSTRAINT `FK_agent_1` FOREIGN KEY `FK_agent_1` (`id`)
  6. REFERENCES `users` (`id`)
  7. ON DELETE CASCADE
  8. ON UPDATE CASCADE
  9. )
  10. ENGINE = InnoDB;

not very nice, it's just an example, but notice that I'm also using real foreign keys (real because these are InnoDB tables).
That's because of the many database engines only InnoDB supports transactions (BDB used to support them but has been discontinued and Falcon is not ready for prime time).

Now let's see how this tables content looks like

mysql> select * from users;
+----+----------+
| id | username |
+----+----------+
| 1 | a |
| 2 | b |
| 3 | f |
+----+----------+
3 rows in set (0.00 sec)

mysql> select * from agent;
+----+---------------------+
| id | actiontime |
+----+---------------------+
| 1 | 2008-03-16 11:21:38 |
| 2 | 2008-03-16 11:21:41 |
| 3 | 2008-03-16 11:21:44 |
+----+---------------------+
3 rows in set (0.00 sec)

Nothing much really, now, say we want to change the actiontime of agent 'f', but we are unshure of the synthax and we don't want to damage our data, that's how we should do:

mysql> set autocommit='OFF';
Query OK, 0 rows affected (0.00 sec)

Actually we have just fastened our seatbelts!!! Autocommit is off and the database will make permanent changes on request only, now on with the query

mysql> update agent a inner join users u on a.id = u.id set a.actiontime=now() where
u.username = 'f';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0

Looks ok, but let's check what's really happened

mysql> select * from agent;
+----+---------------------+
| id | actiontime |
+----+---------------------+
| 1 | 2008-03-16 11:21:38 |
| 2 | 2008-03-16 11:21:41 |
| 3 | 2008-03-16 11:27:07 |
+----+---------------------+
3 rows in set (0.00 sec)

Ok, we updated the right row!
At the same time users are querying the database, but, as we haven't committed work, they see the "old" data (note, this depends on transaction isolation level)

mysql> select * from test.agent;
+----+---------------------+
| id | actiontime |
+----+---------------------+
| 1 | 2008-03-16 11:21:38 |
| 2 | 2008-03-16 11:21:41 |
| 3 | 2008-03-16 11:21:44 |
+----+---------------------+
3 rows in set (0.00 sec)

You see? But, as we are satisfied with changes, it's time to commit work, to actually make those changes permanent.

mysql> commit;
Query OK, 0 rows affected (0.03 sec)

At this time users other than the one which actually changed data will start seeing the updated version, see it by yourself

mysql> select * from test.agent;
+----+---------------------+
| id | actiontime |
+----+---------------------+
| 1 | 2008-03-16 11:21:38 |
| 2 | 2008-03-16 11:21:41 |
| 3 | 2008-03-16 11:21:44 |
+----+---------------------+
3 rows in set (0.00 sec)
commit by the other user happened between
these selects

mysql> select * from test.agent;
+----+---------------------+
| id | actiontime |
+----+---------------------+
| 1 | 2008-03-16 11:21:38 |
| 2 | 2008-03-16 11:21:41 |
| 3 | 2008-03-16 11:27:07 |
+----+---------------------+
3 rows in set (0.00 sec)

This was an easy case, but transactions are a safety net because you can rollback changes, see it in action

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from users;
+----+----------+
| id | username |
+----+----------+
| 1 | a |
| 2 | b |
| 3 | f |
+----+----------+
3 rows in set (0.00 sec)

mysql> delete from users;
Query OK, 3 rows affected (0.00 sec)

mysql> select * from users;
Empty set (0.00 sec)

:eek: I just deleted all my precious users!!!

mysql> rollback;
Query OK, 0 rows affected (0.02 sec)

mysql> select * from users;
+----+----------+
| id | username |
+----+----------+
| 1 | a |
| 2 | b |
| 3 | f |
+----+----------+
3 rows in set (0.00 sec)

But I had my safety net :-D

Here you can find the thread that prompted me to write this post.

I seem to hit a bug (severe or not) every time I post, this is no exception, see bug 35318, but again and again MySQL's support team showed it's dedication, solving and closing it in less than 8h!!!

Sunday, March 09, 2008

Simulating procedural logic

Sometimes I see people having great difficulties in describing how to fetch data for a report.
They are unable to reason by sets and tend to describe things in procedural terms.
Here I'm posting a small example of how you can write a query that reproduces that procedural reasoning and lets the optimizer do the work of translating it into efficient SQL.
Say someone has a table structure like this, a main table named guys holding their id and name and two tables bads_attributes and goods_attributes, if you are a bad guy your attributes will be in the bads_attributes table and vice versa.
Looks ugly? It is, but you'll find it around, sooner or later :-(
And it's not even the worst case scenario, confronted with a similar requirement I've heard that stored procedures where proposed as the ideal solution.
The table structure:

DROP TABLE IF EXISTS `test`.`guys`;
CREATE TABLE `test`.`guys` (
`guy_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`guy_name` varchar(45) NOT NULL,
PRIMARY KEY (`guy_id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `test`.`bads_attributes`;
CREATE TABLE `test`.`bads_attributes` (
`attribute_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`guy_id` int(10) unsigned NOT NULL,
`attribute_name` varchar(45) NOT NULL,
PRIMARY KEY (`attribute_id`,`guy_id`),
KEY `FK_bads_attributes_1` (`guy_id`),
CONSTRAINT `FK_bads_attributes_1` FOREIGN KEY (`guy_id`) REFERENCES `guys` (`guy_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `test`.`goods_attributes`;
CREATE TABLE `test`.`goods_attributes` (
`attribute_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`guy_id` int(10) unsigned NOT NULL,
`attribute_name` varchar(45) NOT NULL,
PRIMARY KEY (`attribute_id`,`guy_id`),
KEY `FK_goods_attributes_1` (`guy_id`),
CONSTRAINT `FK_goods_attributes_1` FOREIGN KEY (`guy_id`) REFERENCES `guys` (`guy_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8;



Now let's retrieve those attributes with a query that mimics the logic described above

  1. SELECT
  2. g.guy_id,
  3. g.guy_name,
  4. CASE
  5. WHEN EXISTS (SELECT 0 FROM bads_attributes b WHERE b.guy_id = g.guy_id)
  6. THEN (SELECT group_concat(b.attribute_name separator ', ') FROM bads_attributes b WHERE b.guy_id = g.guy_id GROUP BY b.guy_id)
  7. WHEN EXISTS (SELECT 0 FROM goods_attributes a WHERE a.guy_id = g.guy_id)
  8. THEN (SELECT group_concat(a.attribute_name separator ', ') FROM goods_attributes a WHERE a.guy_id = g.guy_id GROUP BY a.guy_id)
  9. ELSE 'no attributes for this guy'
  10. END right_attributes
  11. FROM
  12. guys g

This goes after the reasoning described above, if you are found in the bads_attributes then your data is retrieved from there, the same for goods_attributes.
Output is

+--------+----------+----------------------------+
| guy_id | guy_name | right_attributes |
+--------+----------+----------------------------+
| 1 | Paolo | Fichissimo |
| 2 | Carlo | Fico |
| 3 | Ciccio | Ugly, Uglier |
| 4 | Bender | Ugliest |
| 5 | New kid | no attributes for this guy |
+--------+----------+----------------------------+
5 rows in set (0.02 sec)

It's explain plan is

+----+--------------------+-------+------+-----------------------+-----------------------+---------+---------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+-----------------------+-----------------------+---------+---------------+------+-------------+
| 1 | PRIMARY | g | ALL | NULL | NULL | NULL | NULL | 5 | |
| 5 | DEPENDENT SUBQUERY | a | ref | FK_goods_attributes_1 | FK_goods_attributes_1 | 4 | test.g.guy_id | 1 | Using where |
| 4 | DEPENDENT SUBQUERY | a | ref | FK_goods_attributes_1 | FK_goods_attributes_1 | 4 | test.g.guy_id | 1 | Using index |
| 3 | DEPENDENT SUBQUERY | b | ref | FK_bads_attributes_1 | FK_bads_attributes_1 | 4 | test.g.guy_id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | b | ref | FK_bads_attributes_1 | FK_bads_attributes_1 | 4 | test.g.guy_id | 1 | Using index |
+----+--------------------+-------+------+-----------------------+-----------------------+---------+---------------+------+-------------+
5 rows in set (0.00 sec)

Well, it could be worse, note that I'm using InnoDB tables and I've declared foreing keys, whose indexes are picked up by the optimizer.

Let's a more SQLish version of this query

  1. SELECT
  2. g.guy_id,
  3. g.guy_name,
  4. COALESCE(
  5. group_concat(b.attribute_name separator ', '),
  6. group_concat(a.attribute_name separator ', ')
  7. ) right_attributes
  8. FROM
  9. guys g LEFT OUTER JOIN bads_attributes b
  10. ON g.guy_id = b.guy_id
  11. LEFT OUTER JOIN goods_attributes a
  12. ON g.guy_id = a.guy_id
  13. GROUP BY
  14. g.guy_id,
  15. g.guy_name

Output is the same

+--------+----------+---------------------------+
| guy_id | guy_name | right_attributes |
+--------+----------+---------------------------+
| 1 | Paolo | Fichissimo |
| 2 | Carlo | Fico |
| 3 | Ciccio | Ugly, Uglier |
| 4 | Bender | Ugliest |
| 5 | New kid | NULL |
+--------+----------+---------------------------+
5 rows in set (0.00 sec)

And the explain plan

+----+-------------+-------+------+-----------------------+-----------------------+---------+---------------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------------+-----------------------+---------+---------------+------+----------------+
| 1 | SIMPLE | g | ALL | NULL | NULL | NULL | NULL | 5 | Using filesort |
| 1 | SIMPLE | b | ref | FK_bads_attributes_1 | FK_bads_attributes_1 | 4 | test.g.guy_id | 1 | |
| 1 | SIMPLE | a | ref | FK_goods_attributes_1 | FK_goods_attributes_1 | 4 | test.g.guy_id | 1 | |
+----+-------------+-------+------+-----------------------+-----------------------+---------+---------------+------+----------------+
3 rows in set (0.00 sec)

Well it looks certainly better, but there's a nasty filesort (this is a very small data set) which should be checked against a much larger dataset.

Doing the same on Firebird 2.1 beta 2 (which supports LIST() a function similar to MySQL's group_concat()) leads to:

  1. SELECT
  2. g.guy_id,
  3. g.guy_name,
  4. CASE
  5. WHEN EXISTS (SELECT 0 FROM bads_attributes b WHERE b.guy_id = g.guy_id)
  6. THEN (SELECT CAST(list(b.attribute_name) AS varchar(5000)) FROM bads_attributes b WHERE b.guy_id = g.guy_id
  7. GROUP BY b.guy_id
  8. )
  9. WHEN EXISTS (SELECT 0 FROM goods_attributes a WHERE a.guy_id = g.guy_id)
  10. THEN (SELECT CAST(list(a.attribute_name) AS varchar(5000)) FROM goods_attributes a WHERE a.guy_id = g.guy_id
  11. GROUP BY a.guy_id
  12. )
  13. ELSE 'no attributes for this guy'
  14. END right_attributes
  15. FROM
  16. guys g

and

  1. SELECT
  2. g.guy_id,
  3. g.guy_name,
  4. CAST(
  5. COALESCE(
  6. list(b.attribute_name),
  7. list(a.attribute_name)
  8. ) AS varchar(5000)) right_attributes
  9. FROM
  10. guys g
  11. LEFT OUTER JOIN bads_attributes b
  12. ON g.guy_id = b.guy_id
  13. LEFT OUTER JOIN goods_attributes a
  14. ON g.guy_id = a.guy_id
  15. GROUP BY
  16. g.guy_id,
  17. g.guy_name


Note that both queries need an explicit cast as list's results in Firebird are blobs.
The respective explain plans show that the set oriented one is better.

Prepare time: 00:00:00.
Field #01: GUYS.GUY_ID Alias:GUY_ID Type:INTEGER
Field #02: GUYS.GUY_NAME Alias:GUY_NAME Type:STRING(45)
Field #03: . Alias:RIGHT_ATTRIBUTES Type:STRING(5000)
PLAN (B INDEX (FK_BADS_ATTRIBUTES_1))
PLAN (B ORDER FK_BADS_ATTRIBUTES_1 INDEX (FK_BADS_ATTRIBUTES_1))
PLAN (A INDEX (FK_GOODS_ATTRIBUTES_1))
PLAN (A ORDER FK_GOODS_ATTRIBUTES_1 INDEX (FK_GOODS_ATTRIBUTES_1))
PLAN (G NATURAL)


Executing...
Done.
116 fetches, 0 marks, 0 reads, 0 writes.
0 inserts, 0 updates, 0 deletes, 21 index, 5 seq.
Delta memory: 54852 bytes.
Execute time: 00:00:00.

and

Prepare time: 00:00:00.
Field #01: GUYS.GUY_ID Alias:GUY_ID Type:INTEGER
Field #02: GUYS.GUY_NAME Alias:GUY_NAME Type:STRING(45)
Field #03: . Alias:RIGHT_ATTRIBUTES Type:STRING(5000)
PLAN JOIN (SORT (JOIN (G NATURAL, B INDEX (FK_BADS_ATTRIBUTES_1))), A INDEX (FK_GOODS_ATTRIBUTES_1))


Executing...
Done.
66 fetches, 0 marks, 0 reads, 0 writes.
0 inserts, 0 updates, 0 deletes, 9 index, 5 seq.
Delta memory: 38168 bytes.
Execute time: 00:00:00.

Saturday, March 01, 2008

How to sync two tables in MySQL

A question pops out quite often on Devshed forums, "How do I keep table x of my local database in sync with a remote copy?"
Usually replication is the suggested answer, but it might be a little overkill, think of those who just want to push the new product catalogue from a local server to their hosted site? No permanent connection and so on ...
In this case the right tool might be a mix of the new MySQL features, federated tables, extended insert synthax, stored procedures, events, triggers ... quite a fest.
Say you have a local catalogue table which holds the products on sale:

  • CREATE TABLE `test`.`catalogue` (
  • `product_code` CHAR(10) NOT NULL,
  • `product_name` VARCHAR(45) NOT NULL,
  • `product_desc` VARCHAR(500) NOT NULL,
  • `product_weight` DECIMAL NOT NULL,
  • `product_colour` VARCHAR(45) NOT NULL,
  • PRIMARY KEY (`product_code`)
  • )


  • Load it with a sample row

    1. INSERT INTO
    2. catalogue
    3. VALUES
    4. ('AA12F', 'Sm. Widget', 'This is a small widget', 5, 'Red');

    In this sample I'll suppose that a similar table exists in the remote database, no need to check.

    First of all we need to be able to issue queries against that table right from this server, we'll use federated tables:

  • DROP TABLE IF EXISTS `test`.`remote_catalogue`;
  • CREATE TABLE `test`.`remote_catalogue` (
  • `product_code` CHAR(10) NOT NULL,
  • `product_name` varchar(45) NOT NULL,
  • `product_desc` varchar(500) NOT NULL,
  • `product_weight` decimal(10,0) NOT NULL,
  • `product_colour` varchar(45) NOT NULL,
  • PRIMARY KEY (`product_code`)
  • )
  • ENGINE=FEDERATED
  • DEFAULT CHARSET=latin1
  • CONNECTION='mysql://root:nt300jk@127.0.0.1:3306/remote_test/catalogue';
  • --don't use a privileged user in real life!!!

  • Now we have our federated table pointing to the remote server, let's set up the most basic sync, one query will be enough

    REPLACE INTO remote_catalogue SELECT * FROM catalogue 


    A quick test:

    mysql> use remote_test
    Database changed
    mysql> select * from catalogue;
    +--------------+--------------+------------------------+----------------+-------
    ---------+
    | product_code | product_name | product_desc | product_weight | produc
    t_colour |
    +--------------+--------------+------------------------+----------------+-------
    ---------+
    | AA12F | Sm. Widget | This is a small widget | 5 | Red
    |
    +--------------+--------------+------------------------+----------------+-------
    ---------+
    1 row in set (0.01 sec)

    mysql>

    After a while data in our local database changes, a new product is added and the existing one is updated

    1. INSERT INTO
    2. catalogue
    3. VALUES
    4. ('AB12G', 'Lg. Widget', 'A large widget', 34, 'blue');
    5. UPDATE
    6. catalogue
    7. SET
    8. product_weight = 4
    9. WHERE
    10. product_code = 'AA12F';

    we are ready to push the changes to remote site with the query used before, reissueing the REPLACE leads to this on the remote server

    mysql> select * from catalogue;
    +--------------+--------------+------------------------+----------------+-------
    ---------+
    | product_code | product_name | product_desc | product_weight | produc
    t_colour |
    +--------------+--------------+------------------------+----------------+-------
    ---------+
    | AA12F | Sm. Widget | This is a small widget | 4 | Red
    |
    | AB12G | Lg. Widget | A large widget | 34 | blue
    |
    +--------------+--------------+------------------------+----------------+-------
    ---------+
    2 rows in set (0.00 sec)

    So now the two tables hold the same values!
    Enough for now! I'll show you the rest later

    BTW, while writing this I filed two bug reports (34973 and 34971) hope those will turn into errors on my side ...

    MySQL's support guys took care of the bug reports, the outcome is:

    Bug #34973 is not a bug (it's a duplicate of closed bug 25511 which says that's it's fine for INSERT ... SELECT ... ON DUPLICATE KEY UPDATE to fail with federated tables), however bug 25511 suggests that REPLACE doesn't work too, but, as you can see it works fine.

    Bug #34971 is verified, from Miguel's comments it looks like it's fixed in 5.1.24 (not released yet)