2003-01-20T10:59:00 Buck Buchanan: > In an ideal world, all systems would have WWV(B), GPS, and other > radio broadcast time signal receivers built in along with network > synchronization for those systems that can not sync to a broadcast > signal. I believe Best Practice these days is to have one or more timebase references, either "local" references like GPS, or NTP repeaters (Stratum 1 servers, I believe they call them) slaved directly off such references (some of which are publicly available on the internet). This is believed to give you absolute precision to within a few tens of milliseconds once it has time to completely stabilize. All other systems are synced off your time master. > Also in an ideal world, there would be sub-microsecond delay > between an event and the timestamp generation of the event. I don't think absolute time precision in the microsecond range is achievable on current commodity hardware. > Reality is most systems are not synchronized, and there can be > significant delays in timestamping events. Timestamps in logs can > be tampered with. All the above are true. People who care about their timestamps can synchronize their systems to within a few tens of milliseconds using commonly-available tools and technologies, and that's believed to be as close as is practical today; that's on the same order as the jitter of latency in network propogation. Re tampering with logs, if you wish to make that harder, use encryption if needed for transit, and forward all your logs to a tightly secured log server. There's no possibility in principle of preventing a compromised logging host from injecting bogus entries, but the logserver can at least apply its own timestamp and add its own notation about the src from which the messages were received. I don't know of off-the-shelf software used for this today myself, but I suspect some of the enhanced syslogds offer this capability. > I envision a time synchronization tool being given logs covering > a minimum of several hours of logs from multiple systems that are > supposed to cover the time period of interest. The tool would > scan the logs to look for events that would be common to two or > more systems. For any pair of systems, the tool would identify > events that originate on each system and result in a log entry on > the other. I think, if this sort of tool is desired, the way to work it is to deliberately inject timestamp log entries for the sole purpose of establishing these relationships, rather than heuristically looking for existing entries capable of proxying for this functionality. > By analyzing these events the tool should be able to estimate the > time difference and network delays with error bounds on those > estimates. I'd be very interested in learning that a log analysis tool would be capable of learning anything interesting in this way; unless of course the system clocks aren't synchronized with the best available tools; and if they aren't, I have trouble seeing why the operators would be expected to care about such analysis. A priori clock correction really seems to be the more fruitful line, and if that's done all such an analysis tool could discover is that network propogation delay jitter (in which I include the network stacks of the endpoints --- I'm referring to the irregularities in propogation delays from user-space code on one machine, via the network, to user-space code on another) is of the same order as any skew left between the system clocks. > Analyzing events several hours later to compute another time > difference estimate would give some idea of the clock drift > between the systems. Squashing clock drift out of existence is what NTP and clockspeed do best. > Development of this tool would also require determining the > characteristics of timestamping delays for various common > operating systems to further aid in computing the uncertainty > bounds on an event. I suspect that the development of this tool would be impractical except with systems having really decoupled clocks --- no NTP, nothing like it. For folks without the interest to use such commonly-available sync code, I'd expect them to be uninterested in this loganalysis tool as well. Simple manual timestamp rewriting is easy, trivial with timestamp parseing/formatting code (e.g. perl module TimeDate or Date::Manip) plus simple arithmetic. Automated analysis to determine clock skew and jitter seems like a problem isomorphic to clock synchronization, so why not fix the clocks if you need that code complexity anyway? > The logs given to the tool to analyze should be given > trustworthiness ratings relative to the other logs. I think the easiest way to get a measure of clock precision included in logs would be to have a cronjob that asks the sync code (ntp, clockspeed) what the current skew looks like and logs that periodically. -Bennett
This archive was generated by hypermail 2b30 : Tue Jan 21 2003 - 09:50:25 PST