DNN Blog

Mar 25

Posted by: Chris Hammond
3/25/2008 2:11 PM  RssIcon

With the 3.0 release of DotNetNuke, way back in March of 2005 searching was implemented in the project, after a hiatus in the 2.* releases.

 

Since then, not much has changed with the search, though it is still a very mysterious system for most users and developers. I hope to clear up some of the mystery with this blog post.

 

As a DNN site administrator you most likely won't ever worry about how the searching works, until users come to you and ask why they aren't seeing results they might expect. This blog post will explain how search works so you can answer their questions.

 

A few things to note

 

1. The Search implementation within DNN is pretty basic, it searches for the number of times a term is found within a body of "text", more on this later.

 

2. Each module differs in how, and if, they implement the necessary interfaces to interact with the core DNN Search provider. Not all modules implement ISearchable, meaning they won't support the core search.

 

3. Each module chooses what content it provides to the Search provider to be indexed, it might pass everything associated with a particular object, or it might pass only specific values from an object, this would possibly limit the effectiveness of a search for individual modules.

 

The scheduler:

 

The Search indexer runs on a schedule, defined under the Host/Schedule options. Most cases I've seen have the indexer set to run every 30 minutes. If you're making changes to content within your modules and expecting them to show up in the results immediately this is not very likely. To get around this, you can change the time between executions of the indexer, though I wouldn't recommend having the indexer run too frequently if you have a lot of content on your website, the more it runs the more times the database will get hit by each module to load the content.

 

One way to force your content to be indexed is to go to the Schedule page under the Host menu, edit the Search Indexer task, disable and save the task. Edit the task again and enable, this should force the indexer to fire immediately. If this isn't getting your content indexed as you would expect, you can clear the Search tables in the database and have them repopulate completely the next time the indexer runs, I made a blog post on how to do this quite a while ago (http://weblogs.asp.net/christoc/archive/2006/06/26/DotNetNuke-Daily-Tip-_2300_3-6_2F00_26_2F00_06-Clear-Search-Tables.aspx

 

There is also a "re-index" option on the Host/Search Settings page, though personally I've always found it to be a bit flakey and I take the above approach to forcing my content to reindex.

 

A basic overview of how the indexer works:

 

The indexer job fires, and makes a request to each of the modules on the website that support the ISearchable interface. These modules return a collection of SearchInfoObjects, assuming the modules have any content to return. How the modules populate these searchinfoobjects is completely up to the modules. As a developer it is important to populate these searchinfoobjects with unique SearchKey values, otherwise the indexer will log an exception.

 

The search indexer then parses through each of these objects that are returned from each module. The indexer checks to see if this object has already been indexed by checking the last updated date on the object and comparing it to the last updated date in the SearchItem table. If they differ the indexer will update the indexing of this object.

 

If the indexer finds an object in SearchItem that wasn't returned from a module it presumes that item has been deleted so it deletes all indexing for this item. This is a key item that most module developers miss, if you don't pass back an item DNN assumes it no longer exists and it will get removed from the index. This functionality has changed in Cambrian, stay tuned for more Cambrian blog posts this year.

 

The indexing of content basically consists of parsing out individual words in an item and storing these words in a SearchWord table, then creating a reference for each word in the SearchItemWord table.

 

Search results process:

 

When you search for an individual term DNN will look to see if that word exists in the SearchWord table, if so it will then look to see what "items" in SearchItem have this word. It will count the number of times a particular word is found in an item, and return that item as a search result. There is a relevance number passed back to the search results, this number is usually 4 characters. This relevance number is built in this manner.

 

For each time a word is found in a particular item the count is incremented, 1001 would mean that your search term was found once in an item. 1050, would mean that you search term was found 50 times in an item.

 

If you were searching for multiple search terms the only difference in the process is how the relevance number is built. If you searched for two terms, and both terms were found one time in an item, the relevance number would be 2002. If the two search terms were found a total of 3 times (once for term 1, and twice for term 2) the relevance would be 2003. The key information here is this.

 

For the first number (X) in Relevance (X000), X is the number of search terms found in a particular item, if you searched for 3 terms, and all three terms were found, X would be 3.

 

For the other three numbers (YYY) in the relevance (XYYY), YYY is the count of the number of times any of the search terms that were found in an item.

 

You might have searched for 3 terms in a particular search, and returned a relevance number of 3099. This tells you that all three terms were found in this particular result, but does not provide you any more insight into how many times each individual term was found, the first term might have been found 97 times and the second two terms may have been found one time each, 97+1+1 = 99, or each term might have been found 33 times, with the basic nature of the core DNN search provider you don't get this information.

 

Hopefully this post has provided you a bit more insight into how the core searching functionality works. If you're interested in learning more open up the solution, because DotNetNuke is open source you can learn a lot by opening up the code and stepping through some of the functionality.

 

A note of thanks to Charles Nurse for reviewing this blog post for accuracy and providing me some feedback before I posted it! :)

Tags:
Categories:
Location: Blogs Parent Separator Chris Hammond

24 comment(s) so far...


Re: An Understanding of the DotNetNuke Search

Good explanation - Chris

By cnurse on   3/25/2008 3:00 PM

Re: An Understanding of the DotNetNuke Search

Thanks Charles, and thanks for reviewing the post prior to it being published.

By christoc on   3/25/2008 3:01 PM

Re: An Understanding of the DotNetNuke Search

Chris - I think you nailed it when you said that it is a basic implementation. That was intentional. It was envisioned that third parties would implement more robust versions using enterprise grade search engines and indexers. Our goal was to build interfaces that would allow this to happen. For whatever reason, there has not been much third party work around this feature.

By jbrinkman on   3/25/2008 4:57 PM

Re: An Understanding of the DotNetNuke Search

Thanks for the explanation it works the same way as the core Microsoft Helpfile search, which I am all too familiar with. It is good to know that though removes some of the guess work. I agree with jbrinkman the 3rd party Search module / controls are way to few and far, nor am I finding it overly common place to see modules building to be search friendly.
Chris- You mentioned a few things were changing with Cambrian and the search is it just slight modifcations on the way it handles / generates, or is it a bit more of a full overhaul which seems to be the trend for Cambrian changes? I know you will blog more about it later, but its been a thorn in my side and my boss's think I should try and convert one of my custom asp.net controls I build for search into our DNN installation. I really don't want to do this as well its a full rewrite for me. Not to mention hate to rewrite some of the same things that will become core later on. No details needed, just a yes there are significant changes or no, no real big change just some efficiencies will do for answers.

By keeperofstars on   3/26/2008 8:39 AM

Re: An Understanding of the DotNetNuke Search

As far as I understand, the changes are fairly minor, better handling for sets of data from a module, meaning search won't delete items that aren't passed into the indexer every time.

By christoc on   3/26/2008 9:02 AM

Re: An Understanding of the DotNetNuke Search

The main change is search is as Chris suggests. Items NOT passed in are no longer assumed to be deleted. The reason for this was two-fold - (1) large modules with lots of data (like Forums) impact performance when they pass ALL their search data for indesing, even if most data is not indexed as it hasn't changed. (2) Modules with multiple instances on a site are not compatible with this behaviour.

By cnurse on   3/26/2008 10:07 AM

Re: An Understanding of the DotNetNuke Search

The Relevance thing is quite confusing. If you search the DotNetNuke site for the word "IronPython" you get Relevance score of 27054 for the top result item.

I find it hard to believe that there is 26,054 occurrences of this word on this site. There must be something else in that number.

By ecktwo on   3/27/2008 2:58 PM

Re: An Understanding of the DotNetNuke Search

ecktwo - this issue of Relevance is exactly the problem I alluded to in my comment that there are issues with multiple instances of modules - in this case the Forum module. There are probably 26 instances of the forum module on the site, so you are seeing a distorted relevance number. These should be fixed in Cambrian.

By cnurse on   3/28/2008 9:33 AM

Re: An Understanding of the DotNetNuke Search

Thanks Charles.

What a dilemma. On one hand you like the search result take you to where the search term is located (each specific instance) and on the other hand, you don't really want to index the same stuff that many times. Perhaps the module developers should add a tabmodule setting to disable search for a particular instance?

By ecktwo on   4/4/2008 6:24 PM

Re: An Understanding of the DotNetNuke Search

I recall testing the search function and concluding that search only finds exact matches of the word. For example, if I search for the word procedure, the word procedures is not found. Question: Is my conclusion valid? If so, will it ever be changed to find "like" or partial matches?

By ronrysanek on   5/4/2008 12:06 PM

Re: An Understanding of the DotNetNuke Search

@ronrysanek I believe that to be correct. I do not know of any planned changes.

By christoc on   5/4/2008 12:07 PM
Gravatar

Re: An Understanding of the DotNetNuke Search

Does the search application detect file names, titles, or properties? If I upload a PDF file, does that title get parsed if it was added to an HTML/Text module as a hyperlink from the page?

By canamguy on   2/24/2009 11:43 AM
Gravatar

Re: An Understanding of the DotNetNuke Search

The link to the file would potentially get parsed, but the file itself wouldn't.

By Chris Hammond on   2/24/2009 11:43 AM
Gravatar

Re: An Understanding of the DotNetNuke Search

Ever see the search module only return results one time and then after that if you enter a search term and hit"search" it will return you to the home page every time instead of the search results page?

By TechWG on   8/29/2009 1:56 PM
Gravatar

Re: An Understanding of the DotNetNuke Search

TechWG, Did you happen to rename the search results module?

By Chris Hammond on   8/29/2009 1:57 PM
Gravatar

Re: An Understanding of the DotNetNuke Search

You said: "If the indexer finds an object in SearchItem that wasn't returned from a module it presumes that item has been deleted so it deletes all indexing for this item."
Yet in DNN 5.2.2 it seems deleted items are not deleted from the search results.

Can you explain what has changed?

By M Bouwman on   2/27/2010 11:58 PM
Gravatar

Re: An Understanding of the DotNetNuke Search

I've got the Google CSE working nicely for public pages, with results showing on the same page as the core search - depending on which search entry was used.

Trying to find the css for the core search has been frustrating. I don't intend to change the whole site (default.css), rather just the results page itself. The 3rd party modules seem to package Google seach, not a local engine with tweaking.

Any further info from anyone?

By Daniel on   2/27/2010 11:56 PM
Gravatar

Re: An Understanding of the DotNetNuke Search

OK, dumb question answered by self: search for "searchresults", and there it is...SearchResults.ascx

By Daniel on   2/27/2010 11:56 PM
Gravatar

Re: An Understanding of the DotNetNuke Search

@M Bouwman sorry I haven't looked into things for search lately, but to be honest i don't think anything has changed. Are you sure the indexer has recently run? Possible that it's failing somewhere in the process?

By Chris Hammond on   2/28/2010 12:00 AM
Gravatar

Re: An Understanding of the DotNetNuke Search

Has there been any updates/improvements to the search functionality since this blog post?

By Josh King on   3/14/2011 11:10 AM
Gravatar

Re: An Understanding of the DotNetNuke Search

Josh, nothing major that I know of, except that in the Professional Edition there is a new search mechanism available

By Chris Hammond on   3/14/2011 11:11 AM
Gravatar

Re: An Understanding of the DotNetNuke Search

Has there been any updates/improvements to the search functionality since this blog post?

By Josh King on   12/9/2011 1:36 PM
Gravatar

Re: An Understanding of the DotNetNuke Search

Anyone know how to remove modules from search results but make them accessible by all users with the URL?

By tknman0700 on   12/9/2011 1:36 PM
Gravatar

Re: An Understanding of the DotNetNuke Search

@Josh King nothing has really changed in CE

@tknman0700 that would be better to ask in the Forums

By Chris Hammond on   12/9/2011 1:38 PM
Attend A Webinar
Free Demo Site
Download DotNetNuke Professional Edition Trial
Have Someone Contact Me
Have Someone Contact Me

Like Us on Facebook Join our Network on LinkedIn Follow DNN Corporate on Twitter Follow DNN on Twitter

Advertisers

Sponsors

DotNetNuke Corporation

DotNetNuke Corp. is the steward of the DotNetNuke open source project, the most widely adopted Web Content Management Platform for building web sites and web applications on Microsoft. Organizations use DotNetNuke to quickly develop and deploy interactive and dynamic web sites, intranets, extranets and web applications. The DotNetNuke platform is available in a free Community and subscription-based Professional and Enterprise Editions with an Elite Support option. DotNetNuke Corp. also operates the DotNetNuke Store where users purchase third party apps for the platform.