anonymous user

Forums   Register   Login   Forgot your login/password?   Search

Using Sphinx with Regex to search a number plate table

Common forum | 1 | 2 | 3 | 4 | 5 | ... | 518 | 519 | 520 | 521 | next »» | Create new thread

KernelKev

Name: Kevin Orriss
Posts: 3

2012-01-25 14:16:50 | reply!


Hi, I would be ever grateful if someone could advise me here please. I am new to sphinx
and have installed it onto my server with the test database/index example working.

I am now in the position where I need to change the setting in the sphinx.conf file and
to create a new index etc.

My table layout is as follows:
Name: listings_dvla
===================
id -> Unsigned INT, primary key
characters -> varchar(15) FULLTEXT indexed
active -> bool
price -> decimal(10,2)

This table contains 30,000,000 rows and before I was using mysql to query the database
for similar number plates using the following query (assuming i searched "kev"):

SELECT characters, price FROM listings_dvla WHERE active=TRUE AND characters REGEXP
'^[a-z0-9]*[(k)]+[a-z0-9]?[(e)(3)]+[a-z0-9]?[(v)]+[a-z0-9]*' ORDER BY characters ASC
LIMIT 0, 12

What the regex is doing is saying "find me a K folowed by an E or 3 followed by a V"
This query can take over a minute depending on the search, I need it to be under a few
seconds.

So my questions are:
1- Is Sphinx built to handle this requirement?
2- Can Sphinx do regular expressions that would be built up at runtime depending on the
user's input
3- How do I even begin to go about doing this?

I have been looking around on google and on these forums but I am really struggling.

Thank you for reading this, I really appreciate the help.

barryhunter

Name: Barry Hunter
Posts: 7334

to: KernelKev, 2012-01-25 15:16:48 | reply!


> 1- Is Sphinx built to handle this requirement?

Its not 'built to' do it (its built for full-text searching of whole words), but I think
it should handle it well with a bit of work to set it up.

> 2- Can Sphinx do regular expressions that would be built up at runtime depending on the
> user's input

No. It doesn't do regex. But the same search could be expressed in a form that sphinx
could run.

> 3- How do I even begin to go about doing this?

In the general sense, could use the techniques discussed here
http://swtch.com/~rsc/regexp/regexp4.html

This shows how generic regex searching can be implemented with a trigram index. Sphinx
itself would work as the trigram index. You store the trigrams as keywords which then
sphinx indexes. Sphinx can run the boolean queries taht that system outputs.

(normal sphinx, works pretty much like the 'Indexed Word Search' section documents. So
the trick would be using sphinx as the backend for the indexed Reg-Ex Search)

There is
http://sphinxsearch.com/docs/current.html#conf-ngram-len
but no idea how useful that would be in this case.



But I suspect you could get away with much simpler setup, with sphinx. If dont need a
totally arbitary regex matching, then could get a system tailered to your use case -
running much easier.

For example, I think you could just index the number place as a series of letters. So in
mysql (or your sql_query) space seperate the characters, so, S123ABC becomes "S 1 2 3 A B
C". Sphinx will then index them as seperate words (make sure min_word_len=1 !)

Then your example query "kev" could be converted to a sphinx extended query of the form

k << (e | 3) << v

(and this should perform, exactly the same as your example regular expression)

   http://sphinxsearch.com/docs/current.html#extended-syntax



(as an aside, mysql's CONV function should produce a nice integer you can store in a
sphinx attribute, so you can sort by characters like your original)

KernelKev

Name: Kevin Orriss
Posts: 3

to: barryhunter, 2012-01-26 13:46:21 | reply!


Thank you for your reply, really appreciate it.

I have now changed all the entries so that they are space seperated "A 1 2 U A A" for
example and when i use this code:

$result = $s->query("a << 1 << 2 << u << a << a");

Then sphinx returns over 20 results such as:
A 1 2 U A A
A 1 2 U E A (note character 4 (0-based))

How can I tell sphinx to only return exact matches? But still leaving it open to return...

A 1 2 K E V

... when i use:
$result = $s->query("k << e << v");

Thank you for your help.

barryhunter

Name: Barry Hunter
Posts: 7334

to: KernelKev, 2012-01-26 15:42:27 | reply!


> Thank you for your reply, really appreciate it.
>
> I have now changed all the entries so that they are space seperated "A 1 2 U A A" for
> example and when i use this code:
>
> $result = $s->query("a << 1 << 2 << u << a << a");

instead do

$s->query("a << 1 << 2 << u << \"a a\" ");

ie put consecutive of same letter in quotes.

or maybe

$result = $s->query("(a << 1 << 2 << u << a) (a << a)");

to allow a charactor in between.

KernelKev

Name: Kevin Orriss
Posts: 3

to: barryhunter, 2012-01-30 14:50:13 | reply!


I have tried your suggestions and I am not getting the expected results:

$s = new SphinxClient;
$s->setServer("localhost", 9312);
$s->setMatchMode(SPH_MATCH_EXTENDED2);
$s->setMaxQueryTime(1000);
$result = $s->query('D << E << 5 << 1 << "A A A"');

Returns 0 results yet I know that "D E 5 1 A A A" is within the database.

Can you see anything wrong with that? I have tried escaping the " characters but that
makes no difference and also tried using brackets.

Thank you for your help.

barryhunter

Name: Barry Hunter
Posts: 7334

to: KernelKev, 2012-01-30 14:53:21 | reply!


> $s->setMaxQueryTime(1000);

Does it make a difference if remove that? Maybe the query is too slow?

(just as a test)


> $result = $s->query('D << E << 5 << 1 << "A A A"');

What about

$s->query('"1 A A A"');

Just to see if get results.

jaysoni2012

Name: Jay Soni
Posts: 1

to: KernelKev, 2012-06-16 15:49:34 | reply!


Hi Kevin Orriss,

I am in the same situation. I have table Name: listings_dvla with over 40,000,000 entries
and I am using mysql. could you please help me to install Sphnix to my local windows
system with xampp. As I tried a lot but its not working.

Please help its URGENT!!!

Thanks in advance.


> Hi, I would be ever grateful if someone could advise me here please. I am new to sphinx
> and have installed it onto my server with the test database/index example working.
>
> I am now in the position where I need to change the setting in the sphinx.conf file and
> to create a new index etc.
>
> My table layout is as follows:
> Name: listings_dvla
> ===================
> id -> Unsigned INT, primary key
> characters -> varchar(15) FULLTEXT indexed
> active -> bool
> price -> decimal(10,2)
>
> This table contains 30,000,000 rows and before I was using mysql to query the database
> for similar number plates using the following query (assuming i searched "kev"):
>
> SELECT characters, price FROM listings_dvla WHERE active=TRUE AND characters REGEXP
> '^[a-z0-9]*[(k)]+[a-z0-9]?[(e)(3)]+[a-z0-9]?[(v)]+[a-z0-9]*' ORDER BY characters ASC
> LIMIT 0, 12
>
> What the regex is doing is saying "find me a K folowed by an E or 3 followed by a V"
> This query can take over a minute depending on the search, I need it to be under a few
> seconds.
>
> So my questions are:
> 1- Is Sphinx built to handle this requirement?
> 2- Can Sphinx do regular expressions that would be built up at runtime depending on the
> user's input
> 3- How do I even begin to go about doing this?
>
> I have been looking around on google and on these forums but I am really struggling.
>
> Thank you for reading this, I really appreciate the help.

Common forum | 1 | 2 | 3 | 4 | 5 | ... | 518 | 519 | 520 | 521 | next »» | Create new thread