It appears that there's at least one folder in one of our crawled network shares that's not getting indexed properly, as evidenced by searching for unique words known to be in PDF documents in that folder and not getting the documents back in the search results. Documents in this folder are showing up in the crawl log as being successfully crawled, either with a "Crawled" or "Not modified" status. There are no errors or warnings in the Windows event log, and none that I can find in the SharePoint trace logs. Of the 42 PDF files in this folder, all are less than 5MB in size and most are less than 1MB.
Can anyone offer some tips on troubleshooting this issue?
Update: It looks like SharePoint is indexing the title metadata for these PDFs, just not the content. There is text content in the PDFs -- Adobe Reader's Find command works fine within a given document. What could be causing the content not to be indexed?
Do you have the PDF iFilter installed? Indexing doesn't look inside PDFs natively; you must have the iFilter installed.
are not correctly picked up by the search index. All the content is indexed but the sites themselves are not associated to the correct contentclass. Prior to SP2 the site collections (SPSite's) are listed... have observed to exhibit this 'bug' is: a site collection or site is based on a publishing template (can be built-in e.g. collaboration portal or custom site template) SharePoint 2007 with updates post Infrastructure Update - in my situation it was applying SP2 but it might have been one of the culmlative updates after IU A search pre-SP2 using contentclass:STS_Site as a filter will list all
I have pages in a pages library on a publishing site which have a managed metadata (taxonomy) field in their content type. I want my custom search webpart to read the taxonomy set on its parent page... = “owstaxIdMetadataAllTagsInfo=#0[TERM GUID]” of my CoreResultWebPart with no success... I actually gave up after I was getting 0 results and am now trying to just perform a FullTextSqlQuery. Unfortunately it seems that even though pages with the managed metadata field are successfully being indexed, the managed property owstaxIdMetadataAllTagsInfo has no data in the results! I went ahead and made the property
I'm using Search Server Express + WSS 3.0. I wanna crawl external public web sites. One site is : http://www.av.se/ When I try a full crawl it is throwing: http://www.av.se Access is denied. Check that the Default Content Access Account has access to this content, or add a crawl rule to crawl this content. Local sites and other public sites are getting crawled OK. What is wrong with that sit? Can you add it on Content sources and try a full crawl for testing?
My understanding is that custom properties (=metadata) supplied for uploaded documents will become crawled properties, after a full index. Such has been done. Examining the list of crawled properties (no mean feat) has not yet yielded a single crawled property relatable to my metadata. That the content has been indexed is proven by the Search results. I want to map some of the metadata fields to managed properties, so that a search can return results within that scope. First, can I assume that these crawled properties will appear within the "SharePoint" category in Crawled Properties
to be re-crawled in its new content database? We don't want to do a full crawl as the last one we did took more than eight days and in any case we would still have the same problem after the full crawl...In line with Microsoft's recommendation to keep content databases on SharePoint 2007 under 100Gb, we often move site collections to different content databases to balance the load. However, we have noticed a problem - after we move the site collection to a new content database, the entries in the search index for everything in the site collection disappear. This seems strange as the paths
I am working on MOSS 2007 and trying to order the results returned by the people search alphabetically by name across all pages. Currently the results are sorted alphabetically on each page rather than across all pages. I have read the way to order the results is by passing parameters to the search via the URL query string rather than by the XSL. I have tried passing the following to the URL by using this page as a reference: k=Department%3A%22Legal%22&start1=52&v1=LastName But the results are still sorted by last name on each page rather than across all pages. This looks
I've recently found that when a user searches for a phrase in double quotes, the results page is highlighting individual words from within the phrase that may not actually match the full quoted phrase. This can make it difficult for the user to decide which document in the results list is the one they're looking for. SharePoint 2007 does appear to be correctly filtering the results based on the complete phrase, it's just the highlighting that's off. For that matter, the text blurb that's shown in the results is also often not useful since it's based on the highlighted word. Does anyone
I want to index only the files with a specific extension "*.las". I know that there are less than 20K of these files and there are a total of over 1 million files in the containing fileshare. I've... would keep everything else out. That version didn't index anything. When I indexed using just the include rule, I still got all 1 million files even though the rule ended with "*.las" and there aren't that many of those files on the share. I'd provide the string I used for the rule but Markdown is playing games with the slashes. How to I configure the crawl rules to only crawl one file type
. Also, the All Content scope works fine when used by itself. There are no relevant messages I can find in the ULS logs. I know that this is a silly search to do and don't expect it to be done... improperly), than get this error message. One workaround would be to remove the All Content scope from the Advanced Search display group in the site collection settings. But that requires me...I have several search scopes set up in Search Administration, one of which is the default All Content scope which includes everything but items with a contentclass of SPSPeople. On my advanced