PDA

View Full Version : MySQL query - need someone with a brain!


rich w
April 13th, 2006, 16:41
You solve one problem and you come across another!

This one is for all the database experts out there :hi:

I have a database field (`desc_soundex`) which contains lots of 4 digits strings.
For e.g. a record may contain this:
I000 A500 U252 R300 H260 T000 M200 B552
Another containing this:
M252 U252 R300 H260 T000 M200

Basically, I want to query the database to return the most relevant record.
For eg, I may query for the following:
"R300 H260"
and it would return the 1st example record I gave.
I could also query:
"H260 R300 I210"
and it would also return the 1st example record.

So step one is the ability to query the different strings regardless of the order they are given in, and not have to match the entire string given.

The second step is to return the most relevant result...
So if I queried "R300 H260 M200 M252"
the order returned would be example 2, example 1.

Anyone got any ideas on the query for this? I have no ideas really..
I've made a start, and this is all I have:
$query = "SELECT `routedesc` FROM `hiking_hikedetails` WHERE `desc_soundex` LIKE '%$squery%'";

jos
April 14th, 2006, 11:16
Hi I'm not a database expert but I can try help you figure this out... I hope I'm not writing complete horsecrap here...

I have a database field (`desc_soundex`) which contains lots of 4 digits strings.
For e.g. a record may contain this:
I000 A500 U252 R300 H260 T000 M200 B552
Another containing this:
M252 U252 R300 H260 T000 M200

Basically, I want to query the database to return the most relevant record.
For eg, I may query for the following:
"R300 H260"
and it would return the 1st example record I gave.
I could also query:
"H260 R300 I210"
and it would also return the 1st example record.

So step one is the ability to query the different strings regardless of the order they are given in, and not have to match the entire string given.

I'm not sure if what I'm saying is possible but I think it should look something like this:


$str1 = 'H260';
$str2 = 'R300';
$str3 = 'I210';

$query = "SELECT `routedesc` FROM `hiking_hikedetails` WHERE `desc_soundex` LIKE '%$str1%', '%$str2%', '%$str3%'";


Create a string for every 4 digits you want to be in there. However this should not return any of your results because 'I210' is not in there... (unless I'm blind) so you would have to leave this string out of the query.

The second step is to return the most relevant result...
So if I queried "R300 H260 M200 M252"
the order returned would be example 2, example 1.


If you use the technique I describe it would only return example 2, not 1 because 'm252' is not in there. I'm not sure if this is what you wanted. If not sure if this works and if it's what you wanted... If not I'll grab my books again and start digging.

rich w
April 14th, 2006, 16:20
$str1 = 'H260';
$str2 = 'R300';
$str3 = 'I210';

$query = "SELECT `routedesc` FROM `hiking_hikedetails` WHERE `desc_soundex` LIKE '%$str1%', '%$str2%', '%$str3%'";


Create a string for every 4 digits you want to be in there. However this should not return any of your results because 'I210' is not in there... (unless I'm blind) so you would have to leave this string out of the query.


Hmm this could be tricky.
The problem is 'I210' HAS to be sent in the query... if its not in the database it just has to be ignored...and the records are returned in the order depending on how many matches they have made.

jos
April 14th, 2006, 16:50
I think this is not possible with a simple sql query. You're gonna have to write a nice PHP function for this to filter your query/result.

rich w
April 14th, 2006, 17:23
I think this is not possible with a simple sql query. You're gonna have to write a nice PHP function for this to filter your query/result.

Exactly what I was hoping not to hear!

Anyone got any ideas where to start?

sea otter
April 15th, 2006, 14:58
well, to start, what's the specific definition of "most relevant". Once defined, we can come up with an algorithm and determine how much of the work can be done with sql vs php.

I know you gave a couple of examples in your first post, but whats your formal definition for "most relevant?"

For example, a couple of questions:

(1) does string order matter?

(2) say you have
M252 U252 R300 H260 T000 M200
M252 U252 R300

in the database, and check for the string M252 U252

Which database record is more relevant?

Those are just a couple of quick thoughts, but you get the idea.

rich w
April 15th, 2006, 17:35
well, to start, what's the specific definition of "most relevant". Once defined, we can come up with an algorithm and determine how much of the work can be done with sql vs php.

I know you gave a couple of examples in your first post, but whats your formal definition for "most relevant?"

For example, a couple of questions:

(1) does string order matter?

(2) say you have
M252 U252 R300 H260 T000 M200
M252 U252 R300

in the database, and check for the string M252 U252

Which database record is more relevant?

Those are just a couple of quick thoughts, but you get the idea.

Good point. Sorry I should have been more specific. I'll explain the concept.

Basically this is a search script. As users enter words, they are converted to soundex codes (the 4 digit strings), and compared against ones in the database.

Most relevant = the most matched strings

For the example you gave, both results are as relevant as each other, so it doesn't matter what order they appear.

sea otter
April 16th, 2006, 15:50
Got it. now I can see if I can come up with anything. You're not using a database with stored procedures by any chance, are you?

rich w
April 18th, 2006, 08:57
Got it. now I can see if I can come up with anything. You're not using a database with stored procedures by any chance, are you?

Nope.

If you need a bit of data to play with:

-- phpMyAdmin SQL Dump
-- version 2.7.0-pl2
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Apr 18, 2006 at 07:57 AM
-- Server version: 5.0.18
-- PHP Version: 5.1.2
--
-- Database: `route_main`
--

-- --------------------------------------------------------

--
-- Table structure for table `hiking_hikedetails`
--

CREATE TABLE `hiking_hikedetails` (
`hikeid` int(3) NOT NULL auto_increment,
`userid` mediumint(8) NOT NULL default '0',
`date` int(10) NOT NULL default '0',
`startdate` int(10) NOT NULL default '0',
`averagewalkingspeed` char(3) NOT NULL default '',
`ascenttime` char(2) NOT NULL default '',
`magnorth` char(2) NOT NULL default '',
`routedesc` varchar(255) NOT NULL default '',
`desc_soundex` longtext NOT NULL,
`routecomm` varchar(255) NOT NULL default '',
`maps` varchar(255) NOT NULL default '',
`localbaselocation` varchar(100) NOT NULL default '',
`localbasephone` varchar(20) NOT NULL default '',
`nameleader` varchar(100) NOT NULL default '',
`namehomecontact` varchar(100) NOT NULL default '',
`phonehomecontact` varchar(20) NOT NULL default '',
`type` int(2) NOT NULL default '0',
PRIMARY KEY (`hikeid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=113 ;

--
-- Dumping data for table `hiking_hikedetails`
--

INSERT INTO `hiking_hikedetails` VALUES (26, 27, 1142802339, 1151103600, '3', '2', '2', 'LykeWakeWalke', 'L224 ', '42 mile walk to be completed within 24 hours.', 'Ordnance Survey Explorer OL26 and OL27', 'The Mini Bus', '07900685947', 'Phil', 'Phil', '07900685947', 4);
INSERT INTO `hiking_hikedetails` VALUES (62, 60, 1144436604, 1144105200, '2', '1', '2', 'Silver D of E Practice expedition.', 'S416 D000 O100 E000 P623 E213 ', '4 Explorers.', 'OS Explorer 135: Ashdown Forest', 'N/A', '0000000000', 'Chris Sowden', 'Iain Bell', '00000000000', 3);
INSERT INTO `hiking_hikedetails` VALUES (30, 1, 1142954846, 1143414000, '4.5', '1', '2', 'fgjdf h hoi hoi', 'F231 H000 H000 H000 ', 'fdsfdsf fds ds', 'dsf sdfsdf sdfsdf', 'ds fsdfdsf sdf sdf', '3423432', 'bla bl abla', 'dsfiosdjfoi sdf', '015301350', 3);
INSERT INTO `hiking_hikedetails` VALUES (80, 1, 1144852965, 1148511600, '3.4', '2.', '2', 'Gold Duke of Edinburgh. 4 People. Ashby College.', 'G430 D200 O100 E351 0000 P140 A210 C420 ', 'Heavy rain. High terrain. Autumn Hike.', 'Ordanance Survey - SK Series. 1:25000.', 'Whitehouse farm,\r\nLeicestershire.\r\nSK414191', '01530 123123', 'J. Smith', 'Barbara Jones', '01230 123124', 4);
INSERT INTO `hiking_hikedetails` VALUES (28, 53, 1142893952, 1144450800, '3.5', '1', '2', 'Annual Dovedale Backpacking Expedition', 'A540 D134 B212 E213 ', 'N/A', 'OS O/L White Peak (2)', 'Camping at two sites - Fri & Sat', '0', 'Bryan Nevin', '--', '0', 11);


The charset isn't right but hey!

Good luck with your attempt!!

sea otter
April 21st, 2006, 15:47
argh! just typed something up, pressed the wrong button, lost it. !!#*($%@(&*@ :flame: :uzi2:

ok, gotta go write it again. I think I have something fairly brute force and inelegant, but it *should* work.

I'll be back!

rich w
April 23rd, 2006, 09:05
Sorry!

Hope you have more luck this time!

sea otter
April 23rd, 2006, 18:05
ok, how about this. Put it all in a page and run it, then change the strings and let me know if this is the right idea:


<?
// this is in the database
$str_target = "I000 A500 U252 R300 H260 T000 M200 B552";

// these are user-entered or whatever
$str_1 = "T000 A500";
$str_2 = "R300 I000 M200";

// higher numbers are most relevant
function relevancy($target_str, $source_str)
{
$target = explode(' ',$target_str);
$source = explode(' ',$source_str);
$diff = array_diff($target,$source);

$relevancy = count($diff);

// return negative version so that higher number are most relevant
return -$relevancy;
}

if (relevancy($str_target,$str_1) == relevancy($str_target,$str_2))
echo 'both strings are equally relevant';
else if (relevancy($str_target,$str_1) > relevancy($str_target,$str_2))
echo 'str_1 is more relevant';
else
echo 'str_2 is more relevant';

?>

rich w
April 23rd, 2006, 19:56
Works perfectly and the algorithm is as it should be..

There's always a but...

I think you've got it the wrong way round...the user will be entering one string, and the database will contain multiple strings, and it should return them in the most relevant order.

Excellent job so far though!

sea otter
April 23rd, 2006, 21:46
Ah well, I wasn't sure which way was the "right" way.

It should probably go in a class so that the single string against which the others are compared (the *user* string, if I now understand correctly) is only exploded once, and not on each compare call.

rich w
April 23rd, 2006, 22:39
Yup thats a good idea.

Blimey I never though this search script would be so much trouble!