|
|||||||||||||
|
|||||||||||||
Using MySQL Full-text Searching By Jim Ferrara
Intended Audience Overview Learning Objectives Definitions Background Information Prerequisites Synopsis Initial Ideas The Solution: Setup The Solution: Actually Doing Something More about Basic Searching Example: Basic Searching Application Advanced Boolean Searching Boolean: The Basic Technical Aspect A Basic Boolean Searching Application Resources About The Author: Intended AudienceThis tutorial is intended for developers using MySQL (http://www.MySQL.com/) and PHP (http://www.php.net) who want to create a searchable database of some sort of textual data. It will focus on the Full-text capabilities presented by MySQL, moving into the Boolean opportunities that are presented in the latest alpha version, 4.1, of MySQL.OverviewUsing directories to group articles by category is a great way to help people to navigate through many articles. At some point, however, someone will want to find all the articles that pertain to a certain topic that may not have a directory of it’s own, or may span many directories . This is what the search engine is for.Learning ObjectivesIn this tutorial, you will learn:
Definitions
Background InformationBefore the advent of the search engine, users had to search manually through dozens – or hundreds – of articles and tidbits to find the ones that were right for them. Nowadays, in our more user-centered world, we expect the results to come to the user, not the other way around. The search engine gets the computer to do the work for the user.Prerequisites
SynopsisLet’s start with a quick review of our situation:We have a database that contains articles. We might create a table of database contents using a statement like this: CREATE TABLE articles (body TEXT, title VARCHAR(250), id INT NOT NULL auto_increment, PRIMARY KEY(id); Let’s say we have about 100 of these articles, covering various topics: MySQL, PHP, and various other topics of that sort. How do the users find the tutorials they want? Remember, we need to bring the results to the user. This is going to be a search engine operation. Initial IdeasWhen I started to work with my first database which was only a tenth of the size, my MySQL query went something like this:SELECT * FROM articles WHERE body LIKE '%$keyword%'; This was slow and inefficient.Every time someone searched for an article, they got far too many results, and as the database grew the system became downright shameful. So what is the solution? It’s right here: Full-text Searching. The Solution: SetupFull-text Search is a feature introduced to MySQL in version 3.23.23. This is how I used it to fix my problem:I started out with an update to my table: ALTER TABLE articles ADD FULLTEXT(body, title); This set ups our Full-text index. The (body, title) part tells us that we can search the body and title for keywords later on. We’ll find out how to use this later, once we’ve overcome a potential problem. In my original database BLOB was my datatype for the body of the article. What’s the problem, you ask? BLOBs are meant primarily for binary data. What use is searching binary data? MySQL has been programmed not to index BLOB datatypes for Full-text searching. If you try to index BLOB datatypes, you get an Error 140. The fix for this is simple: ALTER TABLE articles MODIFY body TEXT; That switches datatype from BLOB to TEXT, thus making a useful column for searching. The Solution: Actually Doing SomethingHow do we get results? Let’s jump right in and try it out:<?php What will this give us? Well, let’s go over Full-Text first. According to the MySQL manual, Full-text is a “natural language searchâ€; it indexes words that appear to represent the row, using the columns you specified. As an example, if all your rows contain “MySQL†then “MySQL†won’t match much. It’s not terribly unique, and it would return too many results. However, if “MySQL†were present in only 5% of the rows, it would return those rows because it doesn’t appear too often to be known as a keyword that’s very common. (If you have “MySQL†in none of your rows, it’ll return nothing; duh.) MySQL also does something pretty useful. It creates a score. This score is usually something like .9823475 or .124874, but always larger than zero. It can range up above 1, and I have seen it at 4 sometimes. (Don’t try to multiply it by 100 and portray it as a % value; people will wonder why their keyword matches an article 431%!) MySQL will also order a row by its score, descending. Another useful tidbit: If you use MATCH() AGAINST() Change the document style for this to “Inline Code†twice in a query, as we will, there is no additional speed penalty. You might expect that because you are executing the same search twice the query would take twice as long, but in fact MySQL remembers the results from the first search as it runs the second. So, let's talk about the actual query: We are taking every column from articles, and searching "title" and "body" for $keyword This is also Inline Code. Pretty simple. And if we want to display the score too: <?php More about Basic SearchingWhat more is there to say? Well, there's another feature I can introduce to you.When most people meet up with a search box they don't type in only one word. Not knowing the backend, they just type in as many words as they feel like! MySQL realizes this and deals with it. If I were you, the only thing I would do is remove the commas that might be there, using str_replace . MySQL will take all the words, split them up, and then match using a natural language search. As a secondary note, you should never send input directly from the user to the MySQL prompt because any number of characters could terminate your MySQL query and begin another dastardly statement. (This is presuming you replace PHP with a $keyword in the above script.)Example: Basic Searching ApplicationNow that we know all about basic searching, you ask: What more is there to learn? Well, not much except that I’ve created a quick sample application for you.Let's launch straight into the code. This bare bones application will search for a phrase or a keyword that the user inputs: <?php What does this script do? First, it checks $c to see if user input has been sent. If it has not, the form is displayed. If it has, the script moves onwards.The same query that we've been using is used here: we match against what the user inputs. We then draw a table and display it in [semi-]pretty form. The ORDER BY score DESC Code Inline makes sure that the best scores (the most accurate matches) are shown first. Important note: Never use this simple script in any production form because I have done absolutely no error checking. The $query variable provides an easy opening for an intruder to input something nasty into your query that might destroy your data.Advanced Boolean SearchingIf you need more options in your MySQL searching, or you haven't finished your coffee yet, keep reading. The advanced search engine tutorial begins here.Before we get started into the magic of bool, I recommend you do a quick SELECT version(); Code Inlineon your MySQL server. I spent several hours battling my computer until I read this line in the MySQL manual: As of Version 4.0.1, MySQL can also perform Boolean full-text searches using the IN BOOLEAN MODE modifier. Whoops; 4.0.1 is the newest, alpha release of MySQL Still check for this... it should by MySQL. If you're looking to use this on a production server, I'd strongly recommend against that decision. I found I was using a 3.23.23, and I had to set up an experimental MySQL server to use the bool functions that it offers now. Overall, I was very pleased with the performance of the new bool searching; the scoring system is changed, but one can still manage. Within 15 minutes of upgrading, I had a simple bool search page up and running on my articles database. Boolean: The Basic Technical AspectThe only thing you change to use Boolean mode is theAGAINST() part of your query. You add IN BOOLEAN MODE to the very end of it, and place the arguments right before it. E.g. to search for all the articles that contain the word PHP, you could write: SELECT * FROM articles WHERE MATCH(title, body) AGAINST('PHP' IN BOOLEAN MODE)DR[10] That will find all the articles that contain the word "PHP" somewhere in them. It's a fairly simple search. If you were to get more complex, and wanted everything that has to do with PHP, but not with MySQL, then you could execute this statement: SELECT * FROM articles WHERE MATCH(title, body) AGAINST('+PHP -MySQL' IN BOOLEAN MODE); There are more modifiers that one can use to search with, and I will quote from the MySQL manual since I see no point in typing out a synopsis of the manual:
A Basic Boolean Searching ApplicationAgain, we’ll start with the code straight off:<?php After we get the input from the form, $c Code Inlineis set to 1 and we start the real work. First we check our input. If it's empty, we leave it empty, if it's not, we append the proper + or - to it. The parentheses are to allow for the user typing more than 1 word in a given field. $query = " That's the final query that we use. $all , $none Code Inline, and $any have already been prepared for the query, and they are inserted. Score is returned as a column to order them by (if we wanted to do that), and from there on, we just have to output the results. if(MySQL_num_rows($artm1) > 0) { That's the output code. If there's less than 1 row to output, we send a "no records found" message out. That's about it for Full-Text searching. ResourcesMySQL Man Page for Full-Text: http://www.MySQL.com/doc/F/u/Fulltext_Search.htmlI found the following note on the Man pages for Full-text. It might be useful for anyone trying to work out why their database returns no matches.
About The AuthorJim Ferrara has never held an IT job in his life, but will be heading off to Old Dominion University in Virginia, USA in the fall. He currently works at Outback Steakhouse as a cook and loves it. Questions or comments can be directed to jferr007@odu.edu.« Back |