Re: [logs] Webserver logs to database - Toward data mining

From: Michael J. Cannon (mcannonat_private)
Date: Fri Sep 28 2001 - 13:24:21 PDT

  • Next message: Rainer Gerhards: "RE: [logs] Windows based Monitoring Tool"

    Another alternative:
    
    http://www.kx.com
    
    From notes on their ktdb 'tick database:'
    
    QUOTE:
    "A tick database contains trades and quotes data on financial securities. A
    Ktdb tick database is actually two independent but closely linked databases,
    real-time and historical. A real-time process captures trades and quotes
    from a Reuters feed and updates the trade and quote tables in the real-time
    database. These tables can be queried in real-time. The historical database,
    which also contains trade and quote tables, is updated daily from the
    real-time database and can be updated monthly from NYSE TAQ data. It is not
    necessary to use both databases. However, even if you are only interested in
    historical data, the NYSE TAQ data lags behind by 2 months, a period that
    can be covered by the real-time Ktdb database...Ktdb is a high performance
    system that can support hundreds of traders on the real-time database and
    analysis of billions of historical trades and quotes. The KSQL query
    language applies to ordered, time-series data, and is therefore particularly
    well-suited for tick data analysis. Ktdb servers can be customized with
    stored procedures and custom analytics written in C or K. "
    ENDQUOTE
    
    Since k and kdb are, essentially, functional languages (as kSQL is a
    functional subset of SQL), they take some getting used to.  But they
    perform, are robust and do not cost an arm and a leg.  Because of their size
    in memory, the hardware requirements are less than for a HOLAP/ROLAP
    solution by orders of maginitude (check out the ktdb case study at:
    http://www.kx.com/ktdb.htm - performance is from a 500MHz single-processor
    machine!).  Finally, because k and kdb are time-based, have Java, C and VB
    API hooks, and are usable on any of 65 platforms, including L390/zSeries
    servers, they are far easier to code for than proprietary OLAP/ROLAP
    systems.  The architecture makes them faster and more appropriate for most
    syslog mining, and, best of all, the demo downloads, while somewhat limited
    in their ability to use space are fully functional and FREE (absent a
    sign-up and cookie-enabled browser).
    
    (FULL DISCLOSURE:  We develop and use kx systems products for our clients,
    especially in real-time systems and security applications and for
    optimization of existing Data Warehouses and OLAP systems.).
    
    Michael J. Cannon
    Ubiquicomm
    "Si vis pacem, para bellum."
    ----- Original Message -----
    From: "Henry Dixon" <henrydat_private>
    To: "Hans-Joachim Picht" <hansat_private>; "Nicolai Rasmussen"
    <nicolaiat_private>
    Cc: <loganalysisat_private>
    Sent: Friday, September 28, 2001 1:45 PM
    Subject: RE: [logs] Webserver logs to database - Toward data mining
    
    
    Similar Idea...
    
    You could use the DTS services in MS SQL Server as well.  Start the DTS
    package at 1:00 AM, and you should be done before you come in for Coffee
    the next morning.
    
    Keep in mind that you'll need a nice box (Dual-Quad Proc, 1+Gb of RAM).
    
    >From there, you can toss OLAP services and do some quick mining.  In
    fact, if you have early-risers, you can configure your Cubes for HOLAP
    storage for decent response time and fast refresh of data.
    
    Even better -- Toss the daily findings and aggregation of data to an ASP
    page where you can get to the reports from anywhere.
    
    Hope this helps.
    hd, CISSP
    
    -----Original Message-----
    From: Hans-Joachim Picht [mailto:hansat_private]
    Sent: Friday, September 28, 2001 11:31 AM
    To: Nicolai Rasmussen
    Cc: loganalysisat_private
    Subject: Re: [logs] Webserver logs to database - Toward data mining
    
    
    On Wed, Sep 19, 2001 at 09:48:22PM +0200, Nicolai Rasmussen wrote:
    > We run some websites that generates more than 5 gb logs pr. day on
    approx.
    > 50 different sites and we would like to put them into a database so we
    could
    > do some data mining on them.
    >
    > Does anyone have any idears, input, thoughts or anything on how we
    should do
    > this ?
    >
    > We thought about making a optimized table definition and then dump
    each line
    > into the database. From there we would make some summary reports..
    
    This is the way I implemented such a solution for an ISP. We piped
    around 4 gb of data into a db2 database (on linux) and used ms access
    (*duck*) to
    connect to the database to generate traffic bills.
    
    --
    With best regards
    Hans - Joachim Picht  <hansat_private>
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: loganalysis-unsubscribeat_private
    For additional commands, e-mail: loganalysis-helpat_private
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: loganalysis-unsubscribeat_private
    For additional commands, e-mail: loganalysis-helpat_private
    
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: loganalysis-unsubscribeat_private
    For additional commands, e-mail: loganalysis-helpat_private
    



    This archive was generated by hypermail 2b30 : Fri Sep 28 2001 - 13:29:25 PDT