Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Bala Gudibandla 12 posts 131 karma points
    Jun 15, 2017 @ 20:40
    Bala Gudibandla
    0

    Fuzzy search with SnowballAnalyzer

    I've been working on building user friendly search functionality on a website and trying to implement Fuzzy match that works with SnowballAnalyzer. But SnowballAnalyzer isn't working if the term appended with '~'.

    Lucene's Query

    -hideFromSearch:1 +(seoMetaKeywords:patrner~0.6 pageName:patrner~0.6 bodyText:patrner~0.6 richText:patrner~0.6 FileTextContent:patrner~0.6 ) +(seoMetaKeywords:pharmacies~0.6 pageName:pharmacies~0.6 bodyText:pharmacies~0.6 richText:pharmacies~0.6 FileTextContent:pharmacies~0.6 ) 
    

    Search code:

    var model = new SearchViewModel
        {
            SearchTerm = EscapeSearchTerm(CleanseSearchTerm(("" + Request["q"]).ToLower(CultureInfo.InvariantCulture))),
            CurrentPage = int.TryParse(Request["p"], out parsedInt) ? parsedInt : 1,
    
            PageSize = GetMacroParam(Model, "pageSize", s => int.Parse(s), 10),
            RootContentNodeId = GetMacroParam(Model, "rootContentNodeId", s => int.Parse(s), -1),
            RootMediaNodeId = GetMacroParam(Model, "rootMediaNodeId", s => int.Parse(s), -1),
            IndexType = GetMacroParam(Model, "indexType", s => s.ToLower(CultureInfo.InvariantCulture), ""),
            SearchFields = GetMacroParam(Model, "searchFields", s => SplitToList(s), new List<string> { "nodeName", "metaTitle", "metaDescription", "metaKeywords", "bodyText" }),
            PreviewFields = GetMacroParam(Model, "previewFields", s => SplitToList(s), new List<string> { "bodyText" }),
            PreviewLength = GetMacroParam(Model, "previewLength", s => int.Parse(s), 250),
            HideFromSearchField = GetMacroParam(Model, "hideFromSearchField", "umbracoNaviHide"),
            SearchFormLocation = GetMacroParam(Model, "searchFormLocation", s => s.ToLower(), "bottom")
        };
    
        // Validate values
        if (model.IndexType != UmbracoExamine.IndexTypes.Content &&
            model.IndexType != UmbracoExamine.IndexTypes.Media)
        {
            model.IndexType = "";
        }
    
        if (model.SearchFormLocation != "top"
            && model.SearchFormLocation != "bottom"
            && model.SearchFormLocation != "both"
            && model.SearchFormLocation != "none")
        {
            model.SearchFormLocation = "bottom";
        }
    
        // ====================================================
        // Comment the next if statement out if you want a root
        // node id of -1 to search content across all sites
        // and not just the current site.
        // ====================================================
        if (model.RootContentNodeId <= 0)
        {
            model.RootContentNodeId = Model.Content.AncestorOrSelf(1).Id;
        }
    
        // If searching on umbracoFile, also search on umbracoFileName
        if (model.SearchFields.Contains("umbracoFile") && !model.SearchFields.Contains("umbracoFileName"))
        {
            model.SearchFields.Add("umbracoFileName");
        }
    
        // Check the search term isn't empty
        if(!string.IsNullOrWhiteSpace(model.SearchTerm))
        {
            // Tokenize the search term
            model.SearchTerms = Tokenize(model.SearchTerm);
    
            // Perform the search
            var searcher = ExamineManager.Instance.SearchProviderCollection["ContentSearcher"];
            var criteria = searcher.CreateSearchCriteria();
            var query = new StringBuilder();
            query.AppendFormat("-{0}:1 ", model.HideFromSearchField);
    
            // Set search path
            var contentPathFilter = model.RootContentNodeId > 0
                ? string.Format("__IndexType:{0} +searchPath:{1} -template:0", UmbracoExamine.IndexTypes.Content, model.RootContentNodeId)
                : string.Format("__IndexType:{0} -template:0", UmbracoExamine.IndexTypes.Content);
    
            // Ensure page contains all search terms in some way
            foreach (var term in model.SearchTerms)
            {
                var groupedOr = new StringBuilder();
                foreach (var searchField in model.SearchFields)
                {
                    groupedOr.AppendFormat("{0}:{1}~ ", searchField, term);
                }
                query.Append("+(" + groupedOr.ToString() + ") ");
            }
    
            var criteria2 = criteria.RawQuery(query.ToString());
    
            var results = searcher.Search(criteria2).ToList();
    

    ExamineSettings.config:

    <add name="ExternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
                     analyzer="Our.Umbraco.ezSearch.SnowballAnalyzerEnglish, Our_Umbraco"/>
    
    <add name="PDFIndexer" type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF"
                     analyzer="Our.Umbraco.ezSearch.SnowballAnalyzerEnglish, Our_Umbraco"
                     extensions=".pdf"
                     umbracoFileProperty="umbracoFile"/>
    
    
    <!--Searcher-->
    <add name="ContentSearcher" type="Examine.LuceneEngine.Providers.MultiIndexSearcher, Examine" analyzer="Our.Umbraco.ezSearch.SnowballAnalyzerEnglish, Our_Umbraco"
                  enableLeadingWildcards="true" indexSets="ExternalIndexSet,PDFIndexSet"/>
    

    SnowballAnalyzer is working without the Fuzzy search symbol in query (~), but I want to get it worked with ~

    Thanks in advance for the help :)

  • Damiaan 438 posts 1290 karma points MVP 3x c-trib
    Jun 19, 2017 @ 09:15
    Damiaan
    0

    I guess the analyzer does not support fuzzy search?

    Where did you found the analyzer?

  • Bala Gudibandla 12 posts 131 karma points
    Jun 19, 2017 @ 13:47
    Bala Gudibandla
    0

    I found the analyzer at: https://our.umbraco.org/forum/umbraco-7/using-umbraco-7/61148-Search-not-returning-expected-results#comment-230133

    Fuzzy query needs to have ~ symbol at the end of each search keyword, but SnowballAnalyzer (for stemming) isn't identifying it as an English word.

    If ~ is removed, stemming is working, but Fuzzy search isn't.

  • This forum is in read-only mode while we transition to the new forum.

    You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies