Another alternative: http://www.kx.com From notes on their ktdb 'tick database:' QUOTE: "A tick database contains trades and quotes data on financial securities. A Ktdb tick database is actually two independent but closely linked databases, real-time and historical. A real-time process captures trades and quotes from a Reuters feed and updates the trade and quote tables in the real-time database. These tables can be queried in real-time. The historical database, which also contains trade and quote tables, is updated daily from the real-time database and can be updated monthly from NYSE TAQ data. It is not necessary to use both databases. However, even if you are only interested in historical data, the NYSE TAQ data lags behind by 2 months, a period that can be covered by the real-time Ktdb database...Ktdb is a high performance system that can support hundreds of traders on the real-time database and analysis of billions of historical trades and quotes. The KSQL query language applies to ordered, time-series data, and is therefore particularly well-suited for tick data analysis. Ktdb servers can be customized with stored procedures and custom analytics written in C or K. " ENDQUOTE Since k and kdb are, essentially, functional languages (as kSQL is a functional subset of SQL), they take some getting used to. But they perform, are robust and do not cost an arm and a leg. Because of their size in memory, the hardware requirements are less than for a HOLAP/ROLAP solution by orders of maginitude (check out the ktdb case study at: http://www.kx.com/ktdb.htm - performance is from a 500MHz single-processor machine!). Finally, because k and kdb are time-based, have Java, C and VB API hooks, and are usable on any of 65 platforms, including L390/zSeries servers, they are far easier to code for than proprietary OLAP/ROLAP systems. The architecture makes them faster and more appropriate for most syslog mining, and, best of all, the demo downloads, while somewhat limited in their ability to use space are fully functional and FREE (absent a sign-up and cookie-enabled browser). (FULL DISCLOSURE: We develop and use kx systems products for our clients, especially in real-time systems and security applications and for optimization of existing Data Warehouses and OLAP systems.). Michael J. Cannon Ubiquicomm "Si vis pacem, para bellum." ----- Original Message ----- From: "Henry Dixon" <henrydat_private> To: "Hans-Joachim Picht" <hansat_private>; "Nicolai Rasmussen" <nicolaiat_private> Cc: <loganalysisat_private> Sent: Friday, September 28, 2001 1:45 PM Subject: RE: [logs] Webserver logs to database - Toward data mining Similar Idea... You could use the DTS services in MS SQL Server as well. Start the DTS package at 1:00 AM, and you should be done before you come in for Coffee the next morning. Keep in mind that you'll need a nice box (Dual-Quad Proc, 1+Gb of RAM). >From there, you can toss OLAP services and do some quick mining. In fact, if you have early-risers, you can configure your Cubes for HOLAP storage for decent response time and fast refresh of data. Even better -- Toss the daily findings and aggregation of data to an ASP page where you can get to the reports from anywhere. Hope this helps. hd, CISSP -----Original Message----- From: Hans-Joachim Picht [mailto:hansat_private] Sent: Friday, September 28, 2001 11:31 AM To: Nicolai Rasmussen Cc: loganalysisat_private Subject: Re: [logs] Webserver logs to database - Toward data mining On Wed, Sep 19, 2001 at 09:48:22PM +0200, Nicolai Rasmussen wrote: > We run some websites that generates more than 5 gb logs pr. day on approx. > 50 different sites and we would like to put them into a database so we could > do some data mining on them. > > Does anyone have any idears, input, thoughts or anything on how we should do > this ? > > We thought about making a optimized table definition and then dump each line > into the database. From there we would make some summary reports.. This is the way I implemented such a solution for an ISP. We piped around 4 gb of data into a db2 database (on linux) and used ms access (*duck*) to connect to the database to generate traffic bills. -- With best regards Hans - Joachim Picht <hansat_private> --------------------------------------------------------------------- To unsubscribe, e-mail: loganalysis-unsubscribeat_private For additional commands, e-mail: loganalysis-helpat_private --------------------------------------------------------------------- To unsubscribe, e-mail: loganalysis-unsubscribeat_private For additional commands, e-mail: loganalysis-helpat_private --------------------------------------------------------------------- To unsubscribe, e-mail: loganalysis-unsubscribeat_private For additional commands, e-mail: loganalysis-helpat_private
This archive was generated by hypermail 2b30 : Fri Sep 28 2001 - 13:29:25 PDT