Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Andy Welch 3 posts 73 karma points
    Nov 26, 2019 @ 19:02
    Andy Welch
    0

    Examine MultiSearcher and PDFIndex not working

    Hi, I've followed the docs and implemented PDFIndex and MultiSearcher.

    In the backend these are both healthy and return results.

    However, when I use them in code I get zero results from the PdfIndex either directly or via the MultiSearcher.

    This is the code which creates the PdfIndex and MultiSearcher...

    public void Initialize()
            {
                //Get both the external and pdf index
                if (_examineManager.TryGetIndex(Constants.UmbracoIndexes.ExternalIndexName, out var externalIndex)
                    && _examineManager.TryGetIndex(PdfIndexConstants.PdfIndexName, out var pdfIndex))
                {
                    //register a multi searcher for both of them
                    var multiSearcher = new MultiIndexSearcher("MultiSearcher", new IIndex[] { externalIndex, pdfIndex });
                    _examineManager.AddSearcher(multiSearcher);
                }
            }
    

    This is how I'm searching the PdfIndex directly...

    var textFields = new[]
                {
                    "title", "description", "content", "bodyText", "location", "pageHeading", "subHeading",
                    "nodeName", "__NodeTypeAlias"
                };
    
    if (ExamineManager.Instance.TryGetIndex("PDFIndex", out index))
                    {
                        searcher = index.GetSearcher(); ;
    
                        var query = searcher.CreateQuery("media").GroupedOr(textFields, searchQuery.Fuzzy(0.2f));
                        results = query.Execute();
                    }
    

    ...and this is a search against the MultiSearcher

    if (_searchPdf && ExamineManager.Instance.TryGetSearcher("MultiSearcher", out searcher))
                    {
                        var query = searcher.CreateQuery("media,content").GroupedOr(textFields, searchQuery/*.Fuzzy(0.2f)*/);
                        results = query.Execute();
                    }
    

    I suspect I'm phrasing the search incorrectly. I'll continue to "fiddle" and strip the code down to minimum.

    Suggestions appreciated.

  • Andy Welch 3 posts 73 karma points
    Nov 26, 2019 @ 20:14
    Andy Welch
    0

    Follow up on my experiments...

    Based on the fields I see in results when testing the PdfIndex in backoffice, I've tried the following textFields variation...

    var PdfTextFields = new[]
    {
           "nodeName", "fileTextContent"
     };
    

    still, sadly, with zero results.

  • Ismail Mayat 4511 posts 10059 karma points MVP 2x admin c-trib
    Nov 27, 2019 @ 10:33
    Ismail Mayat
    0

    Andy,

    I have multi searcher over 3 indexes including pdfindex working. I suspect its your query, try getting rid of "media,content" bit also in your textfields you do not have fileTextContent which is where it stores the extracted content:

    enter image description here

    Also can you do query.ToString() and report back the actual generated lucene query.

  • Andy Welch 3 posts 73 karma points
    Nov 27, 2019 @ 20:15
    Andy Welch
    0

    Thanks for your help Ismail, it's really appreciated. I'll need a couple days to get back to you.

  • This forum is in read-only mode while we transition to the new forum.

    You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies