A thousand Linux hosts, on syslog-ng (I'm assuming TCP connections), means a thousand ports open... Generally speaking you'll find subtle problems when you reach the thousand port situation. You would need to tune the host carefully. Once the host gets the slightest hickup, you're losing data from all thousand hosts... not a pleasant thought. Imagine a denial of service (insider) timed with an attack -- that's what I would do if I wore my black hat... Consider hierarchical collection... not a single wham-bam solution. Buffering would also be good. Basically, break the problem into smaller pieces. A thousand (and more) hosts into a single aggregation point does not sound good. A thousand hosts, with 1 MB a day... Is only 11 kb/sec, if syslog traffic was evenly distributed... which it isn't... the peaks tend to easily be 100x the valley. So, we're talking about 1.2 MB/sec sustained during the peak hours. Good news is, that this is fairly easy to digest. My experience is that about 20MB/sec is what you can expect (sustained) from the file system. NFS? Have not had good experiences trying to aggregate into a NFS server... I would use SAN (if I was serious), or local disks (with RAID or replication). Putting everything directly into NFS introduces another network related point of vulnerability, plus it doubles (or triples) your network traffic - latter obviously alleviated with a switch. LVS... with NFS? I would steer clear of that approach. Keywords for design: traffic analysis, peak traffic flow, reliability, buffering, staging, sustained throughput, fail-over, fail-back, acceptable outage duration TaO ScottO wrote: >Okay, so here is the current task I am working on and was looking to see >how people have tackled it, basically any ideas out there to ponder. Any >thoughts, comments, etc. will be appreciated. Thanks. > >Key Highlights: > >- Centralized logging setup for over 1000 Linux hosts. >- Need it to be scalable to even more eventual hosts. >- Estimate less than 1MB of data per host per day. Want to do >summarization with syslog-ng to reduce network traffic, to make this >even less. >- Need it setup so that the network isn't saturated. >- Rollout syslog-ng to the hosts, for using filtering etc. > >Two ways I'm considering doing the backend right now: > >- Potentially some sort of Linux LVS cluster with an NFS backend. So a >pair of Linux load balancers that will hand off the syslog data to >centralized syslog servers in a cluster, that then dump into some shared >NFS server/solution. >- Or, maybe having distributed "collector" syslog servers that somehow >dump back to a central syslog server. So a distributed architecture >approach. > > >The LVS setup seems appealing to me for the scalability potential, but >not sure if it is overkill. What I am currently most concerned with is >the amount of traffic over the network. > >Thanks for any help. >_______________________________________________ >LogAnalysis mailing list >LogAnalysis@private >http://lists.shmoo.com/mailman/listinfo/loganalysis > > _______________________________________________ LogAnalysis mailing list LogAnalysis@private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2.1.3 : Wed May 10 2006 - 16:26:14 PDT