$theTitle=wp_title(" - ", false); if($theTitle != "") { ?>
Talks around Computer Science and Stuff
In this first article, we will talk about how to integrate a strong and flexible search engine within your web application. There are various open source search engine available in the market. This talk will be about Solr. I have been using it for different projects and it offers a nice solid set of features.
Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and many more features. It runs in a Java servlet container such asTomcat.
Source : http://lucene.apache.org/solr/
Solr is a standalone enterprise search server with a web-services like API. You put documents in it (called “indexing”) via XML over HTTP. You query it via HTTP GET and receive XML results.
Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces – XML,JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Scalability – Efficient Replication to other Solr Search Servers
Flexible and Adaptable with XML configuration
Extensible Plugin Architecture
Source: http://lucene.apache.org/solr/features.html
One of the “sexy features” offered by Solr is faceted search. If you are not familiar with that notion. You will often see facets in e-commerce sites.
Faceted navigations are used to enable users to rapidly filter results from a product search based on different ways of classifying the product by their attributes or features.
For example: by brand, by sub-product category, by price bands
Source : http://www.davechaffey.com/E-marketing-Glossary/Faceted-navigation.htm
It helps a lot when user are searching. Here are some examples of facets displayed on various websites.
Solr has a full Rest interface, making it very easy to talk with.
It can output response in different format (XML, JSON, etc.), it can even output response in PHP or PHPS.
We will discuss later how you can enable and use this feature.
Some open source libraries are available in PHP. There is also a PHP extension for Solr.
In our this article, we will use a custom made client (named SimpleSolr). The SimpleSolr class is available for download but as said earlier there are many existing frameworks or libs that can be used for that purpose. I personally decided to build my own little class for the purpose to learn more about the Solr API.
We will assume in this tutorial that you have a functional Apache / PHP 5.2+ installation ready also we will assume you are running with a Unix platform. The first thing you need to do, is to install Solr.
We recommend using the lucidworks package for Solr. Based on the most latest stable release of Apache Solr, it includes major new features enhancements. For further details, you can check their website (www.lucidimagination.com). Here is the URL to download it: http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr
Installing Solr is very easy, Lucidworks offers an installer (that run with Windows and Linux since it is a .jar)
You will need to install JRE, for all the details on how to install lucidworks, please refer to this documentation : http://www.lucidimagination.com/search/document/CDRG_ch02_2.1
Whenever you are ready go to the folder where Solr is installed, to see all the option, you can type this :
sh lucidworks.sh –help
Using CATALINA_BASE: /var/www/solr/lucidworks/tomcat
Using CATALINA_HOME: /var/www/solr/lucidworks/tomcat
Using CATALINA_TMPDIR: /var/www/solr/lucidworks/tomcat/temp
Using JRE_HOME: ./../jre
Usage: catalina.sh ( commands … )
commands:
debug Start Catalina in a debugger
debug -security Debug Catalina with a security manager
jpda start Start Catalina under JPDA debugger
run Start Catalina in the current window
run -security Start in the current window with security manager
start Start Catalina in a separate window
start -security Start in a separate window with security manager
stop Stop Catalina
stop -force Stop Catalina (followed by kill -KILL)
version What version of tomcat are you running?
In order to start Solr, you need to type the following :
sh lucidworks.sh start
Using CATALINA_BASE: /var/www/solr/lucidworks/tomcat
Using CATALINA_HOME: /var/www/solr/lucidworks/tomcat
Using CATALINA_TMPDIR: /var/www/solr/lucidworks/tomcat/temp
Using JRE_HOME: ./../jre
Now you can check if Solr is running by going there: http://localhost:8983/solr/admin You should see the Solr backoffice admin page.
Now that you have an install ready to be used, lets build a simple Search.
You can configure Solr to use various cores. That way, the same Solr instance can serve various applications. In order to do that, you can check the following link : http://wiki.apache.org/solr/CoreAdmin
You need to edit the solr.xml, and add the new core :
<solr persistent=”false”>
<cores adminPath=”/admin/cores”>
<core name=”tutorial” instanceDir=”tutorial” />
<!– You can add new cores here –>
</cores>
</solr>
Then, edit the solrconfig.xml file, make sure the right path is set for the dataDir property.
<!– Used to specify an alternate directory to hold all index data
other than the default ./data under the Solr home.
If replication is in use, this should match the replication configuration. –>
<dataDir>${solr.solr.home}/tutorial/data</dataDir>
Make sure also that the PHP and PHPS responseWriters are enabled ! Otherwise, it won’t work !
<queryResponseWriter name=”php” class=”org.apache.solr.request.PHPResponseWriter”/>
<queryResponseWriter name=”phps” class=”org.apache.solr.request.PHPSerializedResponseWriter”/>
Now, edit the conf/schema.xml file and add the following fields within the <fields> node :
<!– Unique Solr document ID (see <uniqueKey>) –>
<field name=”solrDocumentId” type=”string” indexed=”true” stored=”true” required=”true” />
<!– Fields for searching –>
<field name=’id’ type=’integer’ indexed=’true’ stored=’false’ />
<field name=’text’ type=’text’ indexed=’true’ stored=’false’ multiValued=’true’ />
<field name=’title’ type=’text’ indexed=’true’ stored=’false’ />
<field name=’author’ type=’text’ indexed=’true’ stored=’false’ />
<!– Facet fields for searching –>
<field name=’category’ type=’text_facet’ indexed=’true’ stored=’false’ multiValued=’true’ />
<field name=’concept’ type=’text_facet’ indexed=’true’ stored=’false’ multiValued=’true’ />
<field name=’location’ type=’text_facet’ indexed=’true’ stored=’false’ multiValued=’true’ />
<field name=’person’ type=’text_facet’ indexed=’true’ stored=’false’ multiValued=’true’ />
<field name=’company’ type=’text_facet’ indexed=’true’ stored=’false’ multiValued=’true’ />
In this article, we will index and search on articles. An article has an title, text, author and some special metadata such as : category, concept, location, person and company.
Make sure to restart solr every time you do a change on your solrconfig or schema files !
Now it is the time to start with the PHP code.
IMB article on Solr
http://www.ibm.com/developerworks/opensource/library/os-php-apachesolr/
Refer to http://wiki.apache.org/solr/SolPHP?highlight=((CategoryQueryResponseWriter)) for more information.
My name is Bashar Al-Fallouji, I work as a Enterprise Solutions Architect at Amazon Web Services.
I am particularly interested in Cloud Computing, Web applications, Open Source Development, Software Engineering, Information Architecture, Unit Testing, XP/Agile development.
On this blog, you will find mostly technical articles and thoughts around PHP, OOP, OOD, Unit Testing, etc. I am also sharing a few open source tools and scripts.