116
 | | Enhance the DNN search engine
Enhance the DNN search engine by adding support for indexing PDF and MS Word documents. This could be extended by adding crawling functionality to index non DNN content (e.g. html, text)
Problem: The current DNN implementation is limited to indexing modules that implement the isearchable interface. This excludes static content such PDF, MS Word documents, html or text files that would benefit from indexing.
Rationale: Many sites include reference to static documents such as PDFs that should be included in search results. The current DNN search implementation will not index this content.
Solution: The Apache Lucene project includes a .Net implementation that provides crawling functionality that includes PDFs and html (amongst others). This can leveraged into the DNN search project.
Impact: The proposed solution is a crawler. The current DNN solution indexes DNN module content marked as indexable. This change would likely have a moderate to high impact on the way indexes are created and stored. The impact would be limited to search indexing and search result references.
Risk:
Created: 6/28/2007 7:22:14 AM by Richard Dorman Scheduled For Version: Delivered In Version: | |