The Page view Combiner (PVC) needs to sift HTTP requests across hours as it goes to store data in the database. With the Bluecoat reporter client, we will do so in a repeatable fashion based on the data received. from run to run. For all methods of receiving data in the database, we flush data to the disk primarily to prevent data loss in the event of a power outage, or unplanned server restarts. See the Frequently Asked Questions below for more information on the Bluecoat Reporter client, and how it flushes to disk.
Both of the mentioned types of purging- data driven or triggered by two hours of access logging- cause the PVC algorithm to activate, and will cause the PVC to choose an hour in which to place the requests . If you have two different servers, and their their disk speeds are different enough to cause different access log processing speeds,you will see different PVC results.
Frequently asked questions:
- What are the three methods of receiving data from the SG into the Reporter database?
- From the SG directly to the FTP server loaded on the Reporter server. See KB2983
- Pulled down from a FTP server into the Reporter database. See FAQ822
- Using the Bluecoat Reporter client, configured on the SG, to stream data directly to the Reporter database. See KB3489
- Does the database size determine the amount of time it takes to flush the data to disk?
- Yes, larger databases take longer to flush.
- Will adding physical memory help decrease the time it takes to flush the data to disk?
- No, there is no direct correlation between the available physical memory, and the dataset flush time.
- Is log processing suspended while we flush the data to disk?
- Yes, log processing is suspended while we complete this task.
- Why is log processing suspended while we flush the data to disk?
- This is done to protect the integrity of the data while we finish flushing the data to disk. We do not wish to add any more data to the database while we flush the data in memory to disk.
- Is there anything I can do to prevent log processing from being suspended while we flush the data?
- No, log processing and the database flush are mutually exclusive.
- Flushing of data to disk is conducted peri'odically so as to prevent any data loss in the event of power loss, or unplanned server restarts.
- I notice these settings in the preferences.cfg file which appear to be releated to flushing the data to disk. Will changing these settings help?
- No, the below settings are related to the Bluecoat reporter client, ( streaming data directly to the reporter database directly from the SG appliance) and will not affect the processing of access log files, off disk, or being pulled from a FTP server.
file_data_set_flush_interval = "1800"
stream_data_set_flush_threshold = "3146000"
stream_data_set_flush_interval = "1800"
- As they relate to the Bluecoat reporter client, what do the above mentioned preferences.cfg values mean?
- The above entries are related to the Streaming protocol, as described above (SGP) and not applicable for HFP (when log files are on local system, or pulled from a FTP server).
- Reporter uses the value between stream_data_set_flush_threshold and stream_data_set_flush_interval to determine the next flush time. stream_data_set_flush_threshold is the log lines processed and stream_data_set_flush_interval is the time in seconds since the last flush. When either of the conditions are reached for streaming the database is flushed.
- Bluecoat never recommends changing anything in the preferences.cfg file unless specificly asked to do so, for troubleshooting reasons.
- When the log files are in the local file system, or being pulled down from a FTP server, Reporter flushes the database every two hours. This is not configurable.
- Please note that while the frequency of database flush is every two hours, it usually takes much less time to actually flush the database.
In summary, if you run the same version of Reporter on two very different bits of hardware, but use the same access logs, you will see a different result because of the interaction between the PVC algorithm and the Flushing mechanism, as mentioned above.
NOTE1: To prevent this from occurring, you can turn off the PVC by following these steps: FAQ828
NOTE2: For more information on the PVC, please see: KB1774
NOTE3: For information on how to setup the streaming connection ( Bluecoat reporter client), see KB3489