Friday, November 13, 2015

Sitecore indexes

There are already a lot of blog posts describing the use of Sitecore indexes, especially since Sitecore 7 and the introduction of the ContentSearchManager and the ease to use them.
And still.. I see lots of people writing queries going through lots of items. So: yet another post to promote the use of indexes.

Sitecore indexes, indexes, indexes!

Sitecore has some built-in indexes (since Sitecore 8 even more). The best know are probably the sitecore_master and sitecore_web indexes. Personally I never use those. I always create a custom index. Why? 
  • I don't want to mess with the indexes that Sitecore uses
  • I want my indexes small and lean
    • faster (re)build
    • easier to check

When to use?

It's hard to say exactly when to use an index, but I'll try to give some common real-life examples of request where I almost always think "index":
  • fetch all news items from year x
  • fetch all products from category x
  • fetch all events happening in the future
  • fetch the latest news items
  • get the last 3 blog posts written by x
  • ...
Too often developers write queries which are fast with the test data.. but after a while the real data is has outgrown the solution and it gets slow..  So it's better to think ahead and make more use of those indexes. In lots of cases the result will be faster than a (fast) query.

Create a custom index

As there already are a lot of examples out there, just a fast introduction on how you can create (and use) a custom index. For more information on all the possibilities, check the Sitecore docs (or the default index config files which include examples and comments).

Configuration

I usually create a separate config file where I put the index definition and configuration together. 

Example index definition:

<configuration type="Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch">
  <indexes hint="list:AddIndex">
    <index id="MyCustom_index" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
      <param desc="name">$(id)</param>
      <param desc="folder">$(id)</param>
      <!-- This initializes index property store. Id has to be set to the index id -->
      <param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
      <configuration ref="contentSearch/indexConfigurations/myCustomIndexConfiguration" />
      <strategies hint="list:AddStrategy">
       <!-- NOTE: order of these is controls the execution order -->
       <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />
      </strategies>
      <commitPolicyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor, Sitecore.ContentSearch">
        <policies hint="list:AddCommitPolicy">
          <policy type="Sitecore.ContentSearch.TimeIntervalCommitPolicy, Sitecore.ContentSearch" />
        </policies>
      </commitPolicyExecutor>
      <locations hint="list:AddCrawler">
        <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
         <Database>web</Database>
           <Root>/sitecore/content/Corporate</Root>
        </crawler>
      </locations>
    </index>
  </indexes>
</configuration>

In this example I used the LuceneProvider, the onPublishEndAsync update strategy (info on update strategies by John West here) and refer to my custom configuration.
Note that I added a crawler for the web database and gave it a root path (can also be an ID).


Example index configuration:

<indexConfigurations>
  <myCustomIndexConfiguration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
    <indexAllFields>true</indexAllFields>
    <initializeOnAdd>true</initializeOnAdd>
    <analyzer ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/analyzer" />
    <documentBuilderType>Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilder, Sitecore.ContentSearch.LuceneProvider</documentBuilderType>
    <fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch">
      <fieldNames hint="raw:AddFieldByFieldName">
        <field fieldName="_uniqueid" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
          <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
        </field>
        <field fieldName="__sortorder" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.Integer" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
        <field fieldName="title" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
          <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
        </field>
        <field fieldName="date" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.DateTime" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
        <field fieldName="sequence" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.Integer" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
        <field fieldName="topics" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.Guid" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
        <field fieldName="applications" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
      </fieldNames>
    </fieldMap>
    <include hint="list:IncludeTemplate">
      <NewsTemplate>{5CD362B8-C129-437A-A0D4-4EE58E71FEB1}</NewsTemplate>
      <ProductTemplate>{18D5467C-79F9-405B-AA87-2BA4B7CDB443}</ProductTemplate>
      <EventTemplate>{6CA9AC2A-1A9D-429B-870C-FC9417D3A1C7}</EventTemplate>
    </include>
    <fieldReaders ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/fieldReaders"/>
    <indexFieldStorageValueFormatter ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexFieldStorageValueFormatter"/>
    <indexDocumentPropertyMapper ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexDocumentPropertyMapper"/>
  </myCustomIndexConfiguration>
</indexConfigurations>


Notice here:

  • the "indexAllFields" : if not true, you need to define the fields in an include (just like the IncludeTemplate, but then with includeField)
  • the "fieldMap": here we define the field options: storageType, type (can be string, int, guid, date, ...) and if needed an analyzer (all information on analyzers by Adam Conn here)
  • the IncludeTemplate section where we define the templates of the items to include in the index (the guid is important, the name is useful for understanding the config)

Standard Sitecore fields
Most of the standard Sitecore fields are included automatically. 
Some are not, but you can include them (in the example above the SortOrder field is included).


After creating the configuration you should see your index in the Index Manager in Sitecore and you can rebuild it. Check your data in the index with a tool like Luke. This way you are sure your config is good before you start using it. Luke is also handy later on to check your queries.

Computed fields

Computed fields are fields that are added to the index through custom code (the value is computed instead of just fetched from a field). Off course, computed fields can also be added to a custom index.

Querying your index

Sitecore has a class SearchResultItem that can be used to fetch results from the index, but in most cases you will want to extend this class.

Example SearchResultItem:


public class EventItem : SearchResultItem
public class EventItem : SearchResultItem
{
  [IndexField("title")]
  public string Title { get; set; }
  
  [IndexField("startdate")]
  public DateTime StartDate { get; set; }
  
  [IndexField("profile")]
  public ID Profile { get; set; }
}
We use the SearchContext to do the actual query. Example code:

private IEnumerable<EventItem> GetEventItems()
{
  var templateRestrictions = new List<ID>
  {
    new ID(applicationSettings.EventsTemplateId)
  };

  using (var context = ContentSearchManager.GetIndex("MyCustom_index").CreateSearchContext())
  {
    var templatePredicate = PredicateBuilder.False<EventItem>();
    templatePredicate = templateRestrictions.Aggregate(templatePredicate, (current, template) => current.Or(p => p.TemplateId == template));
    var datePredicate = PredicateBuilder.True<EventItem>();
    datePredicate = datePredicate.And(p => p.StartDate >= DateTime.Today);
    var predicate = PredicateBuilder.True<EventItem>();
    predicate = predicate.And(templatePredicate);
    predicate = predicate.And(datePredicate);
    predicate = predicate.And(p => p.Language == Sitecore.Context.Language.Name);
    var query = context.GetQueryable<EventItem>(new CultureExecutionContext(Sitecore.Context.Language.CultureInfo)).Where(predicate).OrderBy(p => p.StartDate);
    var queryResults = query.GetResults();
    foreach (var hit in queryResults.Hits)
    {
      if (string.IsNullOrEmpty(hit.Document.Title))
      {
        continue;
      }

      yield return hit.Document;
    }
  }
}
We use "predicates" to define our query. I find them useful to create reusable code (not shown here), especially combined with generics. Predicates are created with the PredicateBuilder (use true for "and" and false for "or" queries).

First we defined a predicate to check the templateID (from a list of possibilities). We also check a datefield and in the end we have predicate for the language.

In the example we sort (OrderBy), but the queryable has also options to use paging, facetting, ... The resultSet include a list of results, but also the facets, the total number of results (important when paging), ..   

Make sure that if you sort you are using the correct types. Sorting numbers as string will give you unexpected results..

Fetching Sitecore items
It is also important to know that the results are not yet Sitecore items - we get the items we define (our SearchResultItem's). It is however quite easy to fetch the actual Sitecore items here, also using Glass if you want. Be aware though that the part after the index is sometimes the performance bottleneck: you wouldn't be the first to lose all performance benefits from the index by fetching too many Sitecore items or writing a slow Linq query after the search.

Before fetching the real Sitecore items (or Glass-mapped-classes), consider if you really need them. In lots of cases you will, but sometimes the information from the index can be sufficient and you can save even more time not retrieving actual items.


Logs

If your query returns unexpected results a good place to start looking in the search log file. All queries that are performed are logged there and if you are using Luke you can copy/paste the query in Luke and test it. 


Issues

There are some known issues.. some unknown as well. I have a few open tickets with Sitecore support regarding indexes at the moment, so maybe more posts will follow...

No comments:

Post a Comment