Objective of this article is to provide an insight on upgrade / migrate Solr 3.x to Solr 4. Every organization is unique and has its own set of requirements and hence Solr configuration files adhering to the unique business requirements of the the organization. Of-course every Solr Configuration is unique in nature around the world with its schema (fields and data structure), analyzers, tokenizers, stopwords, Boosting & Blocking, synonyms, etc and solrconfig (indexConfig, request handler, etc).
Improvements and New Features
Solr 4 brings numerous improvements and new features. Bird Eye view of Solr 3.x and Solr 4. Solr 4 contains numerous bug fixes, optimizations, and improvements since Solr 4 Beta release.
New Hardware Requirements
Indeed we have new hardware required for SolrCloud advantage from Solr 4. A ZooKeeper cluster is used in code name SolrCloud as-
- The central configuration store for the cluster
- A co-ordinator for operations requiring distributed synchronization
- The system-of-record for cluster topology
For production its recommend to you use external ZooKeeper ensemble rather than having Solr run embedded ZooKeeper server(s). Check out an article Zookeeper Cluster (Multi-Server) Setup. I hope you understood, why we need a new hardware requirement?
Upgrade / Migrate Solr 3.x to Solr 4
Article provides required changes should be done in your existing Solr Configuration and it helps to reduce the risk/errors on configuration changes while upgrade. We are going to follow configuration file wise changes for upgrade / migrate Solr 3.x to Solr 4. Let’s begin with solrconfig.xml
- solrconfig.xml
- schema.xml
- solr.xml
solrconfig.xml Confirguration
We will go through step by step configuration change which we need to take care. Open solrconfig.xml file in your favorite text editor and perform following changes.
Step 1
Update the luceneMatchVersion attribute to
1 |
LUCENE_40 |
Step 2
Solr 4 introduces the Soft Commit option. Soft Commit is like Auto Commit behavior except it enables/ensures that changes are visible. However it does not ensure that data is synced to disk. Of course this is faster and more Near-Realtime friendly.
1 2 3 |
<autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit> |
Step 3
Solr 4.0 introduces the Transaction Log. It’s used for real-time get operation. Update log accepts dir as parameter for storing transaction log in a directory. By defaults to Solr Data directory. Below updateLog config should reside inside your <updateHandler ...> .... </updateHandler>
1 2 3 |
<updateLog> <str name="dir">${solr.data.dir:}</str> </updateLog> |
Step 4
Once we define updateLog configuration and it also requires a Near-Realtime Handler too.
1 2 3 4 5 6 7 |
<requestHandler name="/get" class="solr.RealTimeGetHandler"> <lst name="defaults"> <str name="omitHeader">true</str> <str name="wt">json</str> <str name="indent">true</str> </lst> </requestHandler> |
Step 5: Solr Replication
Classic Solr: role of replication handler remains as-is. However in SolrCloud mode Replication Handler is mandatory; In SolrCloud replication handler used to bulk transfer segments when nodes are added or need to recover. So we should apply changes appropriately
Classic Solr migration/upgrade: preserve existing replication handler definition as-is
SolrCloud upgrade: Remove existing replication handler definition and add below one for SolrCloud mode
1 |
<requestHandler name="/replication" class="solr.ReplicationHandler" startup="lazy" /> |
Step 6: Solr index Configuration
<indexDefaults> and <mainIndex> configuration sections was depreacated in Solr 3.6 and discontinued in Solr 4.0; instead new section <indexConfig> introduced. So remove old configuration section and add new one <indexConfig>.
Sample <indexConfig> section:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
<indexConfig> <ramBufferSizeMB>32</ramBufferSizeMB> <maxBufferedDocs>1000</maxBufferedDocs> <mergeFactor>10</mergeFactor> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> <int name="maxMergeAtOnce">10</int> <int name="segmentsPerTier">10</int> </mergePolicy> <!-- Defaults: 'native' is default for Solr3.6 and later, otherwise 'simple' is the default --> <lockType>native</lockType> <reopenReaders>true</reopenReaders> ... ... </indexConfig> |
Since Solr 3.6 <useCompoundFile> value is false by default
<maxFieldLength> field is discontinued, to achieve similar behavior include LimitTokenCountFilterFactory in your fieldType definition.
For Example:
1 |
<filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="10000"/> |
Step 7
Make a list of existing external library dependencies referred in solrconfig.xml, it should be used to get those libraries form Solr 4.0 artifact ( apache-solr-4.x.x.zip) and place appropriately during upgrade.
We are done with solrconfig.xml configuration changes, let’s move on to schema.xml
schema.xml Configuration
In schema.xml we have less changes for upgrade/migration, those are-
Step 1
Update schema version in sync to Solr 4 i.e. 1.5
1 |
<schema name="customer-data" version="1.5"> |
Bit of information around schema version (from schema.xml about versions, shipped with Solr artifact)
1 2 3 4 5 6 |
1.0: multiValued attribute did not exist, all fields are multiValued by nature 1.1: multiValued attribute introduced, false by default 1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields 1.3: removed optional field compress feature 1.4: autoGeneratePhraseQueries attribute introduced to drive QueryParser behavior when a single string produces multiple tokens. Defaults to off for version >= 1.4 1.5: omitNorms defaults to true for primitive field types (int, float, boolean, string...) |
Step 2
Keenly notice to observe that, we have defined updateLog definition for near-realtime search function in solrconfig.xml, it requires the below field in the scheme.xml, add the below field definition inside tag <fields> ..... </fields>
1 |
<field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/> |
Step 3
org.apache.lucene.search.DefaultSimilarity has been refactored to org.apache.lucene.search.similarities.DefaultSimilarity; if your schema.xml has this reference, update the reference
1 |
<similarity class="org.apache.lucene.search.similarities.DefaultSimilarity"/> |
solr.xml Configuration
solr.xml used to provide list of Solr core information to Solr. Solr 4 has improvements of Code name SolrCloud. Following are the available options/configuration (below tags/attributes details includes all available options up-to Solr 4)
XML Tag ‘<solr …>’ attributes are
- persistent: persisting configuration changes to disk – default is false
- sharedLib: Share library directory path for Sor cores – default is null
- zkHost: ZooKeeper Host name with port #; if absent it tries to read from System property. More importantly this attribute determines Solr is in Cloud mode or Classic mode during a startup – default is null
- coreLoadThreads: Solr Core loading threads – default is 3; minimum 2 coreLoadThreads is required
XML Tag ‘<cores …>’ attributes are
- adminPath: Solr core management through request handler. its mandatory attribute – default is null
- defaultCoreName(optional): Solr core name; while no core name is specified in request URL
- zkClientTimeout: ZooKeeper client connection negotiation timeout in milliseconds – default is 15000
- host(optional): host name of solr – default is localhost
- hostPort: Solr host port number for eg.: values are 8080, 9090, etc – default is 8983
- hostContext: solr context name – default is solr for eg.: http://localhost:8080/<hostContext>
- leaderVoteWait: Shard leader vote waiting in milliseconds – default is 180000
- shareSchema: sharing Solr schema configuration – default is false
- adminHandler: Solr Administration handler – default is null
- managementPath: Solr Management path – default is null
XML Tag ‘<core …>’ attributes are
- name: Name of the Solr core – default is collection1
- shard: Name of the Solr shard for eg.: shard1, shard2, etc.
- collection: Name of the Solr Collection – default is collection1
- instanceDir: Solr core directory
- dataDir: Data directory Solr core
- schema: Name of the solr core schema file name – default is schema.xml
- config: Name of the solr core config file name – default is solrconfig.xml
- properties: The solr core properties file name
- loadOnStartup: Boolean value for Solr core loading – default is null
- swappable: Boolean value for wether Solr core swappable – default is null
Sample solr.xml definition for SolrCloud, similarly define yours
1 2 3 4 5 6 7 8 |
<?xml version="1.0" encoding="UTF-8" ?> <solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores" zkClientTimeout="${zkClientTimeout:}" hostPort="${port:}" hostContext="${hostContext:}"> <core schema="schema.xml" shard="shard2" instanceDir="europe-collection/" name="europe-collection" config="solrconfig.xml" collection="europe-collection"/> <core schema="schema.xml" shard="shard1" instanceDir="shard1-replica-1/" name="shard1-replica-1" config="solrconfig.xml" collection="europe-collection"/> <core schema="schema.xml" shard="shard3" instanceDir="shard3-replica-2/" name="shard3-replica-2" config="solrconfig.xml" collection="europe-collection"/> </cores> </solr> |
We are done with solr.xml, let’s move on!
I have upgraded Solr Configuration files. What next?
Classic Solr:
- Place your newly upgrade Solr configuration files solrconfig.xml, schema.xml in respective place
- Deploy you Solr 4 War file
- Start the Solr Instance(s)
SolrCloud:
Check out this article ‘SolrCloud Cluster (Single Collection) Deployment‘.
Pay Attention to Deprecated Elements in Solr 4
Next Important think to take care is Deprecated items in Solr 4.0; for reference below are the Solr core and Lucene core of 4.0 Javadocs
http://lucene.apache.org/core/4_0_0/core/deprecated-list.html and http://lucene.apache.org/solr/4_0_0/solr-core/index.html
Or Another simplest way to find out deprecated classes in Solr Configuration
1 |
Look for string 'WARNING: Using deprecated class:' in your Solr log file, then address those deprecated classes |
For an instance: below line you find from log then replace it with UpdateRequestHandler class. Follow similarly
1 2 3 4 5 |
... WARNING: Using deprecated class: XmlUpdateRequestHandler -- replace with UpdateRequestHandler ... WARNING: Using deprecated class: BinaryUpdateRequestHandler -- replace with UpdateRequestHandler ... |
What will happen to existing Solr Index?
Last and important concern of everyone ‘What will happen to existing Solr Index?’
Classic Solr
Migrating from 3.x classic solr to 4.x classic solr; after migrate issue an optimize command, Solr will take care rest (Index optimize, index format, etc). To take advantage of Solr 4 features like Near-Realtime Get; Full Re-indexing required, else to use Older version of index version, update luceneMatchVersion to your existing version. For example: LUCENE_33
SolrCloud
Upgrade / migrate Solr 3.x to Solr 4 (SolrCloud), then Full Re-indexing required and recommended to take advantage of Solr 4/SolrCloud feature such Shard Range, Near-Realtime search, Document version & counts, SolrCloud Clustering capabilities, etc.
Your Migration/Upgrade Journey Start Here
I hope this article gives an idea & insight of ‘Elements to take care for migrate / upgrade Solr 3.x to Solr 4‘. All the best!