Creating a search for your application with Solr and PHP

In: PHP| Programming language

5 Sep 2016

In this first article, we will talk about how to integrate a strong and flexible search engine within your web application. There are various open source search engine available in the market. This talk will be about Solr. I have been using it for different projects and it offers a nice solid set of features.

What is Solr ?

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and many more features. It runs in a Java servlet container such asTomcat.
Source : http://lucene.apache.org/solr/

What features does it offer ?

Solr is a standalone enterprise search server with a web-services like API. You put documents in it (called “indexing”) via XML over HTTP. You query it via HTTP GET and receive XML results.

Advanced Full-Text Search Capabilities
Optimized for High Volume Web Traffic
Standards Based Open Interfaces - XML,JSON and HTTP
Comprehensive HTML Administration Interfaces
Server statistics exposed over JMX for monitoring
Scalability - Efficient Replication to other Solr Search Servers
Flexible and Adaptable with XML configuration
Extensible Plugin Architecture
Source: http://lucene.apache.org/solr/features.html

One of the “sexy features” offered by Solr is faceted search. If you are not familiar with that notion. You will often see facets in e-commerce sites.

Faceted navigations are used to enable users to rapidly filter results from a product search based on different ways of classifying the product by their attributes or features.
For example: by brand, by sub-product category, by price bands
Source : http://www.davechaffey.com/E-marketing-Glossary/Faceted-navigation.htm

It helps a lot when user are searching. Here are some examples of facets displayed on various websites.
Example of facets

How to use Solr with a PHP application ?

Solr has a full Rest interface, making it very easy to talk with.
It can output response in different format (XML, JSON, etc.), it can even output response in PHP or PHPS.

  • PHPS = Serialized PHP
  • PHP = PHP code

We will discuss later how you can enable and use this feature.
Some open source libraries are available in PHP. There is also a PHP extension for Solr.

  • PHPSolrClient
  • SimpleSolr
  • Solr PHP extension

Architecture
In our this article, we will use a custom made client (named SimpleSolr). The SimpleSolr class is available for download but as said earlier there are many existing frameworks or libs that can be used for that purpose. I personally decided to build my own little class for the purpose to learn more about the Solr API.
We will assume in this tutorial that you have a functional Apache / PHP 5.2+ installation ready also we will assume you are running with a Unix platform. The first thing you need to do, is to install Solr.

Installing Lucidworks Solr

We recommend using the lucidworks package for Solr. Based on the most latest stable release of Apache Solr, it includes major new features enhancements. For further details, you can check their website (www.lucidimagination.com). Here is the URL to download it: http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr
Installing Solr is very easy, Lucidworks offers an installer (that run with Windows and Linux since it is a .jar)
You will need to install JRE, for all the details on how to install lucidworks, please refer to this documentation : http://www.lucidimagination.com/search/document/CDRG_ch02_2.1
Whenever you are ready go to the folder where Solr is installed, to see all the option, you can type this :
sh lucidworks.sh –help

Using CATALINA_BASE:   /var/www/solr/lucidworks/tomcat
Using CATALINA_HOME:   /var/www/solr/lucidworks/tomcat
Using CATALINA_TMPDIR: /var/www/solr/lucidworks/tomcat/temp
Using JRE_HOME:       ./../jre
Usage: catalina.sh ( commands … )
commands:
debug             Start Catalina in a debugger
debug -security   Debug Catalina with a security manager
jpda start        Start Catalina under JPDA debugger
run               Start Catalina in the current window
run -security     Start in the current window with security manager
start             Start Catalina in a separate window
start -security   Start in a separate window with security manager
stop              Stop Catalina
stop -force       Stop Catalina (followed by kill -KILL)
version           What version of tomcat are you running?

In order to start Solr, you need to type the following :
sh lucidworks.sh start

Using CATALINA_BASE:   /var/www/solr/lucidworks/tomcat
Using CATALINA_HOME:   /var/www/solr/lucidworks/tomcat
Using CATALINA_TMPDIR: /var/www/solr/lucidworks/tomcat/temp
Using JRE_HOME:       ./../jre

Now you can check if Solr is running by going there: http://localhost:8983/solr/admin You should see the Solr backoffice admin page.

Solr Admin
Now that you have an install ready to be used, lets build a simple Search.

Configuring schema.xml and solrconfig.xml

You can configure Solr to use various cores. That way, the same Solr instance can serve various applications. In order to do that, you can check the following link : http://wiki.apache.org/solr/CoreAdmin

You need to edit the solr.xml, and add the new core :

<solr persistent=”false”>
<cores adminPath=”/admin/cores”>
<core name=”tutorial” instanceDir=”tutorial” />
<!– You can add new cores here –>
</cores>
</solr>

Then, edit the solrconfig.xml file, make sure the right path is set for the dataDir property.

<!– Used to specify an alternate directory to hold all index data
other than the default ./data under the Solr home.
If replication is in use, this should match the replication configuration. –>
<dataDir>${solr.solr.home}/tutorial/data</dataDir>

Make sure also that the PHP and PHPS responseWriters are enabled ! Otherwise, it won’t work !

<queryResponseWriter name=”php” class=”org.apache.solr.request.PHPResponseWriter”/>
<queryResponseWriter name=”phps” class=”org.apache.solr.request.PHPSerializedResponseWriter”/>

Now, edit the conf/schema.xml file and add the following fields within the <fields> node :

<!– Unique Solr document ID (see <uniqueKey>) –>
<field name=”solrDocumentId” type=”string” indexed=”true” stored=”true” required=”true” />
<!– Fields for searching –>
<field name=’id’              type=’integer’ indexed=’true’ stored=’false’ />
<field name=’text’        type=’text’    indexed=’true’ stored=’false’ multiValued=’true’ />
<field name=’title’           type=’text’    indexed=’true’ stored=’false’ />
<field name=’author’          type=’text’    indexed=’true’ stored=’false’ />
<!– Facet fields for searching –>
<field name=’category’  type=’text_facet’ indexed=’true’ stored=’false’ multiValued=’true’ />
<field name=’concept’   type=’text_facet’ indexed=’true’ stored=’false’ multiValued=’true’ />
<field name=’location’  type=’text_facet’ indexed=’true’ stored=’false’ multiValued=’true’ />
<field name=’person’    type=’text_facet’ indexed=’true’ stored=’false’ multiValued=’true’ />
<field name=’company’  type=’text_facet’ indexed=’true’ stored=’false’ multiValued=’true’ />

In this article, we will index and search on articles. An article has an title, text, author and some special metadata such as : category, concept, location, person and company.

Make sure to restart solr every time you do a change on your solrconfig or schema files !

Now it is the time to start with the PHP code.

A simple search controller

References

IMB article on Solr
http://www.ibm.com/developerworks/opensource/library/os-php-apachesolr/

Refer to http://wiki.apache.org/solr/SolPHP?highlight=((CategoryQueryResponseWriter)) for more information.

  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks

Comment Form

Who am I?

My name is Bashar Al-Fallouji, I work as a Service Architect at Origin Energy (Sydney, Australia).

I am particularly interested in Web applications, Open Source Development, Software Engineering, Information Architecture, Unit Testing, XP/Agile development, etc.

On this blog, you will find mostly technical articles and thoughts around PHP, OOP, OOD, Unit Testing, etc. I am also sharing a few open source tools and scripts.

  • dipan: Hi Bashar It's really awesome that you wrote this code. IT'll save tones of time of all developer. [...]
  • Bashar: Glad that you liked it ! [...]
  • Angel S. Moreno: well, there goes wasting a couple of hours of development and a couple of days of testing. I owe you [...]
  • Bashar: Thats right, the setSaveFile create a files containing an associative array of classname => filen [...]
  • Loggy: Jim's clarification in particular was pretty useful although I did have to dig down into the tree to [...]