Only crawling a content source for files with a given extension

Kelly French
  • Only crawling a content source for files with a given extension Kelly French

    I want to index only the files with a specific extension "*.las". I know that there are less than 20K of these files and there are a total of over 1 million files in the containing fileshare.

    I've used an include rule for the extention "*.las" and an exclude rule for everything located on the host. The assumption was the include would trigger first to catch the desired files and the exclude would keep everything else out. That version didn't index anything.

    When I indexed using just the include rule, I still got all 1 million files even though the rule ended with "*.las" and there aren't that many of those files on the share.

    I'd provide the string I used for the rule but Markdown is playing games with the slashes.

    How to I configure the crawl rules to only crawl one file type for a given host/share?

search crawling crawl-rules
Related questions and answers
  • Crawling some external sites failed Ovidiu BecheĊŸ-Puia

    I'm using Search Server Express + WSS 3.0. I wanna crawl external public web sites. One site is : When I try a full crawl it is throwing: Access is denied. Check that the Default Content Access Account has access to this content, or add a crawl rule to crawl this content. Local sites and other public sites are getting crawled OK. What is wrong with that sit? Can you add it on Content sources and try a full crawl for testing?

  • I got the following warning after I crawled a site. Info from the Crawl Log: The content for this address was excluded by the crawler because this item was marked with a no-index meta-tag. To index this item, remove the meta-tag and recrawl. Sound easy but I don't no where to do this. In the Crawl Configuration section I have selected "Include all items in this path". Also the Content Access Account has full permissions on content to be crawled.

  • I'm getting this following error only when performing a search through the search center. "Your search cannot be completed because of a service error. Try your search again or contact your administrator for more information" Here's the link that is used to show the results page in a iFrame. I tried several stuffs like, resetting the crawl content re-configuring the Office Search Service re-associating the index server in the SSP None of these worked. Any thoughts on this?

  • How can you change the SharePoint 2010 Central Administration site so that it works over SSL, rather than a random port over http? Other sites on the server may also use SSL, so would use a host header to differentiate between them (e.g. for central admin, for documents etc). Edit: Can't set host headers with https in IIS, even though you can add a https binding without defining an IP address... suggesting that wildcard ip's may work (and the same certificate used on different sites). Is it just a case of configuring the site in IIS as you would any other

  • Using host file Loïc Wolff

    I created an application in SharePoint, lets call it and a site collection with the "Empty Site" template. Since I'm using it locally in a virtual machine in host-only mode, I had to edit my host file to add Now, whenever I access this website, I cannot click on "Site Action". All it does is add a # at the end of the URL. Since I'm in a host-only VM, i cannot change the DNS to make it work (that's the way we fixed it for a dev server).

  • 2010 this obvious flaw would have been fixed. Unfortunately the code still checks if the current user is local administrator (I used reflector on the SP2010 Microsoft.SharePoint.dll file...All the SharePoint applications I work on use resource files to allow localisation and sitemap files to integrate with breadcrumbs. It has always been a problem to deploy and process these files... by Sean McDonough. Update: We have completely rewritten the deployment mechanism for sitemaps and resource files. We can now deploy everything to all machines in a farm for SP2007 as well as SP2010. Keep

  • I'm using VS 2010 to create a SP 2010 web part. I want to include my images, css files and js files in the VS project, and have them be deployed to the correct location. My understanding is that I should be using SPWebPartManager.GetClassResourcePath() to get the urls for these items, and that the files should end up in C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\wpresources\{web part name}\{version}__{pk token}. Is the above correct, or is there a better way? How to I include these fines in the VS project so that they are deployed to this location - e.g.

  • in this Farm" and it shows me two warnings. I am wandering if the second warning has anything to do with these intermittent database errors. And, I want to know how can I solve these warnings. Warning 1.... The crawl will include documents that are not published. Impact: Search results for Shared Services Provider SharedServices2 may contain documents that are not published. Warning 2 Service:Web Front Ends...I have successfully configured a SharePoint farm of 4 nodes but I got intermittent database connection failure errors. I have to restart SharePoint services and IIS to fix the database errors. Mostly

  • We are trying to crawl an Oracle Portal from SharePoint 2007. We are getting a no-index attribute error in the crawl log file. Does anyone know why we are getting this error? Could it be a setting on the Oracle side blocking the crawler?

Data information