Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Jon Dunfee 199 posts 468 karma points
    Oct 22, 2012 @ 22:53
    Jon Dunfee
    0

    XSLT Lucene results using multiple terms

    I won't go into the whole story, so I'll get to my issue.  Here's the snippet of logic I'm using to generate my results in case you need a visual:

            <xsl:variable name="resultsRaw">
            <results>
              <xsl:for-each select="msxml:node-set(umbraco.examine:SearchContentOnly($searchTerm,true()))/nodes/node" >
                <xsl:copy-of select="." />
              </xsl:for-each>
              <xsl:for-each select="msxml:node-set(umbraco.examine:Search($searchTerm,true(),'PDFSearcher'))/nodes/node" >
                <xsl:copy-of select="." />
              </xsl:for-each>
            </results>
          </xsl:variable>
                
          <xsl:variable name="results" select="msxml:node-set($resultsRaw)/results" />

    I only get results for a single search term, not even a phrase with two terms.  Looking at UmbracoExamine.XsltExtensions it boils down to the following:

     

            internal static XPathNodeIterator Search(string searchText, bool useWildcards, LuceneSearcher provider, string indexType)

            {

                if (provider == null) throw new ArgumentNullException("provider");

     

                var results = provider.Search(searchText, useWildcards, indexType);

                return GetResultsAsXml(results);           

            }

    Is there something I need to set in the configuration?  Just learning this Lucene stuff.  XSLTSearch has been enough for most clients, but this one needed PDFs indexed.  I must be having an ADD moment cause I can't seem to focus on which rabbit hole to jump down in.

  • Ismail Mayat 4511 posts 10059 karma points MVP 2x admin c-trib
    Oct 23, 2012 @ 11:20
    Ismail Mayat
    0

    Jon,

    When doing a non xslt examine query you have to split the keyword on whitespace then add each term to the query eg

                    if(qsValue.Contains(" "))

                    {

                        string[] terms = qsValue.Split(' ');

                        int termCount = 0;

                        foreach (var term in terms)

                        {

                           if(termCount==0){

                               queryToBuild = queryToBuild.And().Field(key, term);

      

                           }

                           else{

                               queryToBuild = queryToBuild.Or().Field(key, term);

                           } 

                           

                           

                           termCount++;

                        }

                    }

     

    So you will need to update the xsltextensions search code to do something simlar

  • Jon Dunfee 199 posts 468 karma points
    Oct 24, 2012 @ 15:29
    Jon Dunfee
    0

    This is what I got so far looking at some of your (@Ismail) past posts and others out there: 

    public static XPathNodeIterator Search(string searchTerm)

    {

        if (string.IsNullOrEmpty(searchTerm)) return null;

     

        var criteria = ExamineManager.Instance.CreateSearchCriteria(IndexTypes.Content);

     

        string[] terms = searchTerm.Split(' ');

        string[] fields = new string[] { "nodeName", "pageTitle", "metaKeywords", "metaDescription", "bodyContent" };

     

        Examine.SearchCriteria.IBooleanOperation filter = null;

     

        int i = 0;

        for (i = 0; i < terms.Length; i++)

        {

            if (filter == null)

                filter = criteria.GroupedOr(terms, terms[i]);

            else

                filter = filter.Or().GroupedOr(terms, terms[i]);

        }

     

        var searchProvider = ExamineManager.Instance.DefaultSearchProvider.Name;

        var provider = ExamineManager.Instance.SearchProviderCollection[searchProvider] as LuceneSearcher;

        var results = provider.Search(filter.Compile());

        return GetResultsAsXml(results);

    } 

    It's not generating any results.  I read the GroupedOr may be bugged, so I'm thinking that's where I'm right, but wrong.  Anyone have suggestions? 

  • Ismail Mayat 4511 posts 10059 karma points MVP 2x admin c-trib
    Oct 24, 2012 @ 15:47
    Ismail Mayat
    0

    Can you write out the generated query just after 

    var results = provider.Search(filter.Compile());

    writeout provider.ToString() that should give you the generated query. 

    Regards

    Ismail

  • Jon Dunfee 199 posts 468 karma points
    Oct 24, 2012 @ 19:16
    Jon Dunfee
    0

    provider.ToString() outputs => "UmbracoExamine.UmbracoExamineSearcher"

    filter.ToString() outputs => "Examine.LuceneEngine.SearchCriteria.LuceneBooleanOperation"

    I don't believe the output is what you are expecting.

  • Jon Dunfee 199 posts 468 karma points
    Oct 24, 2012 @ 23:42
    Jon Dunfee
    0

    Ok, I made some slight changes, however, I'm observing some interesting results, so I think it's the query build logic I'm struggling with.

    Here's my method:

     

    public static XPathNodeIterator Search(string searchTerm)

    {

        if (string.IsNullOrEmpty(searchTerm)) return null;

     

        var criteria = ExamineManager.Instance.CreateSearchCriteria(IndexTypes.Content);

     

        string[] terms = searchTerm.Split(' ');

        string[] fields = new string[] { "nodeName", "pageTitle", "metaKeywords", "metaDescription", "bodyContent" };

     

        Examine.SearchCriteria.IBooleanOperation filter = null;

     

        int i = 0;

        foreach (string field in fields)

        {

            for (i = 0; i < terms.Length; i++)

            {

                if (filter == null)

                    filter = criteria.Field(field, terms[i]);

                else

                    filter = filter.Or().Field(field, terms[i]);

            }

        }

     

        var searchProvider = ExamineManager.Instance.DefaultSearchProvider.Name;

        var provider = ExamineManager.Instance.SearchProviderCollection[searchProvider] as LuceneSearcher;

        var results = provider.Search(filter.Compile());

               

        return GetResultsAsXml(results);

    }

     

     

    I don't get any results; however, if I change the order of the fields I start getting results:

     

        string[] fields = new string[] { "metaKeywords", "metaDescription", "pageContent", "pageTitle", "nodeName" };

     

    I get some results.  Now, if I limit the fields to just { "pageContent" }  I get even more results.  That ain't right.

  • Jon Dunfee 199 posts 468 karma points
    Oct 25, 2012 @ 06:17
    Jon Dunfee
    0

    I believe I figured it out. I needed to set the default BooleanOperator to Or.

     

    var criteria = ExamineManager.Instance.CreateSearchCriteria(IndexTypes.Content, BooleanOperation.Or);

     

     

     

  • Jon Dunfee 199 posts 468 karma points
    Oct 30, 2012 @ 16:28
    Jon Dunfee
    0

    Here's the final results to help folks on their journey.  What it does: combines content & PDF search results utilizing Lucene in XSLT.  Where it falls short: It's lacking proper scoring as it combines two sets together as a string then I convert it to a node-set to traverse through.  Since this replaced XSLTSearch for me, you'll notice some of the same markup and style references.  Hopefully I'll have some free time I can continue the transformation to apply some of the same XSLTSearch parameters and enhancements to provide a fully functioning alternative.

    You need to create a PDF Index and I don't recommend using the default index for content; create your own (for the example below, I'm using the default).  So here's my XSLT:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp "&#x00A0;"> ]>
    <xsl:stylesheet 
      version="1.0" 
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
      xmlns:msxml="urn:schemas-microsoft-com:xslt"
      xmlns:msxsl="urn:schemas-microsoft-com:xslt"
      xmlns:umbraco.library="urn:umbraco.library" 
      xmlns:umbraco.examine="urn:umbraco.examine"
      xmlns:umbraco.page="urn:umbraco.page"
      exclude-result-prefixes="msxml umbraco.library umbraco.examine umbraco.page">

    <xsl:output method="html" omit-xml-declaration="yes"/>

    <xsl:param name="currentPage"/>
        
    <xsl:template match="/">

      <xsl:variable name="searchTerm" select="umbraco.library:RequestQueryString('q')" />
      
      <div id="xsltsearch">
        
        <form method="get">
          <input name="q" type="text" class="input" value="{umbraco.library:StripHtml($searchTerm)}" />
          <input type="submit" class="submit" value="Search"/>
        </form>
        
        <xsl:if test="string-length($searchTerm) > 0">
          
          <xsl:variable name="resultsRaw">
            <results>
              <xsl:for-each select="msxml:node-set(umbraco.page:Search($searchTerm))/nodes/node [string(hideFromSearch) != '1']" >
                <xsl:copy-of select="." />
              </xsl:for-each>
              <xsl:for-each select="msxml:node-set(umbraco.page:SearchPDF($searchTerm))/nodes/node" >
                <xsl:copy-of select="." />
              </xsl:for-each>
            </results>
          </xsl:variable>
                      
          <xsl:variable name="results" select="msxml:node-set($resultsRaw)/results" />
          <xsl:variable name="total" select="count($results/node)" />
                
          <xsl:choose>
            <xsl:when test="$total = 0">
              <id="xsltsearch_summary">No matches were found for <strong><xsl:value-of select="$searchTerm" /></strong></p>
            </xsl:when>
            <xsl:otherwise>
              <id="xsltsearch_summary">
                Your search for <strong><xsl:value-of select="$searchTerm" /></strong>
                <xsl:value-of select="umbraco.page:Pluralize($total,' match ',' matches ')" /> <strong><xsl:value-of select="$total" /></strong<xsl:value-of select="umbraco.page:Pluralize($total,' result ',' results ')" />.
              </p>
            </xsl:otherwise>
          </xsl:choose>
          
          <div id="xsltsearch_results">
            <xsl:for-each select="$results/node">          
              <xsl:sort select="./@score" order="descending" data-type="number" />
              
              <div class="xsltsearch_result">
                <xsl:choose>
                  <xsl:when test="./data [@alias='FileTextContent'] != ''">
                    <xsl:variable name="pdf" select="umbraco.library:GetMedia(./@id,0)" />
                    <class="xsltsearch_result_title xsltsearch_pdf">
                      <href="{$pdf/umbracoFile}" class="xsltsearch_title" target="_blank">
                        <xsl:value-of select="$pdf/@nodeName" disable-output-escaping="yes" />
                      </a>
                    </p>                
                    <class="xsltsearch_result_description">
                      <span class="xsltsearch_description">
                        <xsl:variable name="sample" select="umbraco.library:StripHtml(./data [@alias='FileTextContent'])" />
                        <xsl:value-of select="substring($sample,1,255)" />
                        <xsl:if test="string-length($sample) > 255">...</xsl:if>
                      </span>
                    </p>
                  </xsl:when>
                  <xsl:otherwise>
                    <xsl:variable name="page" select="umbraco.library:GetXmlNodeById(./@id)" />
                    <class="xsltsearch_result_title">
                      <href="{umbraco.library:NiceUrl(./@id)}" class="xsltsearch_title">
                        <xsl:value-of select="$page/@nodeName" disable-output-escaping="yes" />
                      </a>
                    </p>
                    <class="xsltsearch_result_description">
                      <span class="xsltsearch_description">
                        <xsl:variable name="sample">
                          <xsl:choose>
                            <xsl:when test="$page/metaDescription != ''">
                              <xsl:value-of select="umbraco.library:StripHtml($page/metaDescription)" />
                            </xsl:when>
                            <xsl:otherwise>
                              <xsl:value-of select="umbraco.library:StripHtml($page/bodyContent)" />
                            </xsl:otherwise>
                          </xsl:choose>
                        </xsl:variable>
                        <xsl:value-of select="substring($sample,1,255)" />
                        <xsl:if test="string-length($sample) > 255">...</xsl:if>
                      </span>
                    </p>                
                  </xsl:otherwise>
                </xsl:choose>
              </div>
            </xsl:for-each>    
          </div>
          
        </xsl:if>
          
      </div>
        
    </xsl:template>


    <msxsl:script language="C#" implements-prefix="umbraco.page">
      <msxsl:assembly name="Examine" />
      <msxsl:assembly name="UmbracoExamine" />
      <msxsl:assembly name="System.Xml.Linq" />
      <msxsl:assembly name="System.Configuration" />
      <msxsl:using namespace="Examine" />
      <msxsl:using namespace="Examine.SearchCriteria" />
      <msxsl:using namespace="Examine.LuceneEngine.Providers" />
      <msxsl:using namespace="UmbracoExamine" />
      <msxsl:using namespace="UmbracoExamine.SearchCriteria" />
      <msxsl:using namespace="System.Collections.Generic" />
      <msxsl:using namespace="System.Xml.Linq" />
      <msxsl:using namespace="System.Xml.XPath" />
      <![CDATA[ 
        public static string Pluralize(int count, string singular, string plural)
        {
            return (count == 1) ? singular : plural;
        }

        public static XPathNodeIterator SearchPDF(string searchTerm)
        {
          if (string.IsNullOrEmpty(searchTerm)) return null;

          var searchProvider = "PDFSearcher";
          var provider = ExamineManager.Instance.SearchProviderCollection[searchProvider] as UmbracoExamineSearcher;
          var criteria = provider.CreateSearchCriteria(BooleanOperation.Or);

          string[] terms = searchTerm.Split(' ');
          string[] fields = new string[] { "FileTextContent" };

          Examine.SearchCriteria.IBooleanOperation filter = null;

          int i = 0;
          foreach (string field in fields)
          {
            for (i = 0; i < terms.Length; i++)
            {
              if (filter == null)
                filter = criteria.Field(field, terms[i]);
              else
                filter = filter.Or().Field(field, terms[i]);
            }
          }

          var results = provider.Search(filter.Compile());

          return GetResultsAsXml(results);
        }

        public static XPathNodeIterator Search(string searchTerm)
        {
          if (string.IsNullOrEmpty(searchTerm)) return null;

          var searchProvider = ExamineManager.Instance.DefaultSearchProvider.Name;
          var provider = ExamineManager.Instance.SearchProviderCollection[searchProvider] as UmbracoExamineSearcher;
          var criteria = provider.CreateSearchCriteria(IndexTypes.Content, BooleanOperation.Or);

          string[] terms = searchTerm.Split(' ');
          string[] fields = new string[] { "metaKeywords", "metaDescription", "pageContent", "pageTitle", "nodeName" };

          Examine.SearchCriteria.IBooleanOperation filter = null;

          int i = 0;
          foreach (string field in fields)
          {
            for (i = 0; i < terms.Length; i++)
            {
              if (filter == null)
                filter = criteria.Field(field, terms[i]);
              else
                filter = filter.Or().Field(field, terms[i]);
            }
          }

          var results = provider.Search(filter.Compile());

          return GetResultsAsXml(results);
        }

        private static XPathNodeIterator GetResultsAsXml(ISearchResults results)
        {
          XDocument doc = new XDocument();
          XElement root = new XElement("nodes");
          foreach (SearchResult result in results)
          {
            XElement node = new XElement("node");
            XAttribute nodeId = new XAttribute("id", result.Id);
            XAttribute nodeScore = new XAttribute("score", result.Score);
            node.Add(nodeId, nodeScore);

            foreach (KeyValuePair<String, String> field in result.Fields)
            {
              XElement data = new XElement("data");
              XAttribute alias = new XAttribute("alias", field.Key);
              XCData value = new XCData(field.Value);
              data.Add(alias, value);
              node.Add(data);
            }
            root.Add(node);
          }
          doc.Add(root);
          return doc.CreateNavigator().Select("/");
        }      
      ]]>
    </msxsl:script>    
        
    </xsl:stylesheet>

  • This forum is in read-only mode while we transition to the new forum.

    You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies