How do I configure Director to crawl a website, using the Content Sync Module?
I have already installed the Content Sync Module (CSM) of Director and would now like details on how to create a job.
What is the Content Sync Module used for in Director deployments?
Can you crawl a website to pre-populate the cache on your SG appliances?
Can you pre-populate content on you SG appliances?
What type of content can be pre-populated using the CSM feature of Director?
The Content Sync Module of Director (CSM) can then be invoked and configured to create a list of HTTP or CIFS objects and folders for pre-population. After the intial scan, CSM can be used to rescan for updates to the content. Once the content list is created, it can be uploaded, (either by schedule or manually), to the Director. Director then pushes the list to each ProxySG within the Content Distribution list. Installation requires a Windows or LINUX workstation on which to install the Content Sync Module (CSM) component. Managing the Director appliance requires a Java applet, downloadable from the Director appliance, upon login. The CSM operates by crawling a CIFS, or HTTP server and tracks the time that the content was last modified, pushing the content to the ProxySG appliances accordingly. It will generate a list of files along with their last-modified times and keep that in a flat file rather than a true database. You can then push this content out to your SG network, via Director. These appliances will get that list, delete the objects that shouldn’t be there anymore and download the other files.
Note: The Content Sync Module does not ship with Director, but is available as a separately downloable module. This article assumes you have a working DIrector appliance, and are able to log into it using the Director Management Console (DMC) Java application. For details on how to install the CSM, see KB4481
To create a job that crawls a website for content, follow these steps:
Points to note:
The CSM Configuration file contains, in the all the settings for one job. Each new job has its own configuration file, located in C:/Program Files/Expect-5.21/bin/data. The first CSM configuration file for the job you create is titled csm.cfg. Each new job has its own configuration file; for example, csm001.cfg, csm002.cfg, and so on. Each time the job is run, the csmXXX.cfg file is output in the data directory with a timestamp, so you can see what changes you made in each running of the job.
The CSM Configuration file,called CSM001.cfg, is kept in the same folder, and should not be edited directly. Most of the settings can be changed through the Management Console standard windows; a few can be made only through the Advanced window of the Management Console. (These few settings generally do not need to be changed; the defaults are usually satisfactory.
The recomended platform for the Content Sync Module, and Expect, is Microsoft Windows XP with service pack 3 installed. There are known problems with this software being installed to Windows 7, and 64 bit Windows 2003 servers.
Frequently asked questions:
1: When we create a CIFS crawl job what is the correct entry for the "Corresponding URL" box? If you leave this option blank the job does not run, so what must I place here?
The Coresponding URL is used only when you are scanning Directories. Since the Director appliance/SG network can only distribute URLS we need to send out URLS. Each Url uses this syntax "file://<SG IP address>
Here's a sample output of a CSM job pulling files from the default 'Sample pictures director on a windows workstation.
Using username "admin".
Blue Coat Systems CSM/SG-ME 184.108.40.206 #32468 2008.01.30-083843 ended: Wed Feb 08 10:23:26 India Standard Time 2012
2: Why do we see URLS like this? "file://10.125.0.51/<file:///\\10.125.0.51\>"
This is because of outlook html format. It automatically identifies and converts them as hyperlinks. When you see it in normal text mode it will display text and link like that.
If you have a directory (C:\MyDir\) contents as mentioned below :
And you are scanning that directory (C:\MyDir\) using CSM and provided Corresponding URL as “file://testserver/” then CSM will generate and distribute below URLS:
That means whatever director you are scanning will be replaced by Corresponding URL.
3: Does the Content Sync Module ( CSM) application create jobs on the Director appliance?
No, the the CSM does not create a job on the Director appliance. It runs only when triggered by the CSM application. Each time it runs it uses Director CLI commands to execute the tasks on the Director appliance.
You can use also use the Query option provided in the CSM to know the caching status of the URLs that you have distributed. Here is an example screenshot:
For details on how to create a job to scan a CIFS server, see KB4515
For a definition of what it means to crawl a webserver, see WIKI site.
For details on a known problem with CSM and timezone changes, see KB4483
For a list of Proxy SG version compability with Director SGME 220.127.116.11, see KB1568
For details on what problems you may face launching the Director Managment console Java application, see KB4383
For details on helpful Director command Line syntax, see KB4178
Rate this Page
Please take a moment to complete this form to help us better serve you.