Professional Web Applications Themes

Application that generates 162 million records per day - MySQL

What's your take on an application that generates a minimum of 162 million records per day? I need your opinion with regards to this because the app I'm developing behaves exactly like that....

  1. #1

    Default Application that generates 162 million records per day

    What's your take on an application that generates a minimum of 162
    million records per day? I need your opinion with regards to this
    because the app I'm developing behaves exactly like that.

    Erol Guest

  2. #2

    Default Re: Application that generates 162 million records per day

    On 9 Dec 2006 08:43:10 -0800, Erol Fornoles wrote: 

    My take? How many records it is far less important than how you want to
    look at the output. Are you looking at this stuff in the aggregate only?
    Need to look at individual event records often?

    --
    99. Any data file of crucial importance will be padded to 1.45Mb in size.
    --Peter Anspach's list of things to do as an Evil Overlord
    Peter Guest

  3. #3

    Default Re: Application that generates 162 million records per day

    Erol Fornoles wrote:
     

    Is there a particular issue with generating that much data? What is the problem
    you need to solve? Home use? Business use? No one can give their "take" as
    there is insufficient data to make even the wildest of guess as to what you are
    looking for.

    I work with databases that generate 5TB+/mth. To do some quick calculations,
    162M*30=4.86B records and depending on the record size - I would say you have a
    pretty healthy size database. Let's say your nominal record length is 250
    bytes, then you are storing ~1.22Tb /mth.

    Hope you have LOTS of disc space.

    --
    Michael Austin.
    Database Consultant
    Michael Guest

  4. #4

    Default Re: Application that generates 162 million records per day

    Erol Fornoles wrote: 

    I'd recommend starting by reading this manual page:
    http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

    There are several articles on the MySQL.com website that talk about
    scalable architectures for MySQL.
    http://dev.mysql.com/tech-resources/articles/

    There are several case studies that talk about high-volume MySQL sites:
    http://www.mysql.com/why-mysql/case-studies/

    Regards,
    Bill K.
    Bill Guest

  5. #5

    Default Re: Application that generates 162 million records per day

    The application is an advertising solution aimed for deployment on
    computer-rental shops or cybercafes. The data is an ad transaction log
    which not only records whether a particular ad was clicked, but also
    when (timestamp) and where (lan hostname, lan ip, net ip) a particular
    ad was displayed. The ads are cycled every 4 seconds.

    Most of the time we're looking at the aggregate of the data, although
    the other developer is saying that we keep the individual ad logs (the
    one which records the time and location of each display) instead of
    keeping the summaries only (how many times the ad was displayed). He is
    contending that the data is absolutely required by the clients
    (advertisers) and wrote a webapp that allow the clients to login and
    view those individual records. Yep, all 162 million records per day.

    Btw, the version of MySQL we're using is 4.0.27 MAX, due to the fact
    that the other developer's web-development tools don't work with MySQL
    5.0.

    On Dec 10, 12:58 am, "Peter H. Coffin" <com> wrote: 
    > look at the output. Are you looking at this stuff in the aggregate only?
    > Need to look at individual event records often?
    >
    > --
    > 99. Any data file of crucial importance will be padded to 1.45Mb in size.
    > --Peter Anspach's list of things to do as an Evil Overlord[/ref]

    Erol Guest

  6. #6

    Default Re: Application that generates 162 million records per day

    Sorry if my question was a bit vague.

    What I'm looking for is the feasibility of the above and your opinions
    given the following conditions:

    1. We do not have a data center of our own and our server is co-located
    instead.
    2. Due to budgetary concerns, we only have 1 server at the moment and
    the disc capacity is 160 GB.
    3. The version of MySQL we're using is 4.0.27 MAX, since the other
    developer's web-development tools don't work with MySQL 5.0.
    4. Most of the time we are looking at the aggregate of the data. The
    aggregate function is defined as the count of each record based on
    whether a particular field's value is 1 (ad displayed) or 2 (ad
    clicked).

    Thanks!

    On Dec 10, 3:45 am, Michael Austin <com> wrote: 
    > you need to solve? Home use? Business use? No one can give their "take" as
    > there is insufficient data to make even the wildest of guess as to what you are
    > looking for.
    >
    > I work with databases that generate 5TB+/mth. To do some quick calculations,
    > 162M*30=4.86B records and depending on the record size - I would say you have a
    > pretty healthy size database. Let's say your nominal record length is 250
    > bytes, then you are storing ~1.22Tb /mth.
    >
    > Hope you have LOTS of disc space.
    >
    > --
    > Michael Austin.
    > Database Consultant[/ref]

    Erol Guest

  7. #7

    Default Re: Application that generates 162 million records per day

    Erol Fornoles wrote: 

    With Michael Austin's estimate of 1.22Tb/mth, your hard drive will be
    filled more than 8 times a month. You simply do not have enough space to
    store all this information. Store summary data instead.

    Cheers,
    Nicholas Sherlock

    --
    http://www.sherlocksoftware.org
    Nicholas Guest

  8. #8

    Default Re: Application that generates 162 million records per day


    Erol Fornoles gDG
     

    My company experience this as well, we have a lot of web log to be
    written into DB (more than 162 million records per day), The only way
    is to store the summary of data

    howa Guest

  9. #9

    Default Re: Application that generates 162 million records per day

    Nicholas Sherlock wrote:
     
    >
    >
    > With Michael Austin's estimate of 1.22Tb/mth, your hard drive will be
    > filled more than 8 times a month. You simply do not have enough space to
    > store all this information. Store summary data instead.
    >
    > Cheers,
    > Nicholas Sherlock
    >[/ref]
    that can be revised down a bit since the record size appears (I am guessing
    based on the info provided by the OP) to be date(8bytes=64bits), host
    name(50??), host ip (16bbytes?)2, client ip (16bytes?), clicked (1) displayed(1)
    ?? Again pure speculation but it looks like: 92bytes*162M = ~14.9Gb/day or
    447.2GB/mth which means you are doing a LOT of purging... after the aggregate
    functions complete.

    --
    Michael Austin.
    Database Consultant
    Michael Guest

  10. #10

    Default Re: Application that generates 162 million records per day

    Michael Austin wrote: 


    I wouldn't recommend storing both client hostname and client IP if space
    and throughput are issues. Resolving the client hostname can be slow,
    and storing it is likely to be redundant if the IP is stored.
    Especially if aggregate data is the normal view mode, because individual
    hostnames probably won't be displayed anyway.

    If you do need to store the hostname as a string, read about the
    tradeoffs between char and varchar. Char is a fixed-length datatype,
    and if all your columns in a given table are of fixed length, access to
    that table can be quicker.

    Store IP addresses as unsigned integers, not strings. One should
    convert the dotted quad representation to its integer equivalent with
    the INET_ATON() function.

    You could store the date as a UNIX-style timestamp in an unsigned
    integer column, instead of a MySQL DATETIME datatype. That will reduce
    it to 4 bytes instead of 8.

    You can probably combine clicked and displayed into one byte-sized
    column by making them bitfields of a single TINYINT.

    There -- that reduced 92 bytes/row to 13 bytes/row! Now it requires
    only 2008MB/day. You should be able to purge once a month and you'll be
    fine.

    Regards,
    Bill K.
    Bill Guest

  11. #11

    Default Re: Application that generates 162 million records per day

    Thanks for the replies!

    We've normalized the hostnames to another table, so only an id
    (integer) gets saved along with the ips (integers encoded via
    inet_aton).

    Aside from the allowable date ranges, would there be a trade-off if I
    used Unix-style timestamps? Is there a conversion function available
    that converts the Unix-style timestamp to a normal timestamp or
    datetime?

    On Dec 11, 2:56 am, Bill Karwin <com> wrote: 
    > and throughput are issues. Resolving the client hostname can be slow,
    > and storing it is likely to be redundant if the IP is stored.
    > Especially if aggregate data is the normal view mode, because individual
    > hostnames probably won't be displayed anyway.
    >
    > If you do need to store the hostname as a string, read about the
    > tradeoffs between char and varchar. Char is a fixed-length datatype,
    > and if all your columns in a given table are of fixed length, access to
    > that table can be quicker.
    >
    > Store IP addresses as unsigned integers, not strings. One should
    > convert the dotted quad representation to its integer equivalent with
    > the INET_ATON() function.
    >
    > You could store the date as a UNIX-style timestamp in an unsigned
    > integer column, instead of a MySQL DATETIME datatype. That will reduce
    > it to 4 bytes instead of 8.
    >
    > You can probably combine clicked and displayed into one byte-sized
    > column by making them bitfields of a single TINYINT.
    >
    > There -- that reduced 92 bytes/row to 13 bytes/row! Now it requires
    > only 2008MB/day. You should be able to purge once a month and you'll be
    > fine.
    >
    > Regards,
    > Bill K.[/ref]

    Erol Guest

  12. #12

    Default Re: Application that generates 162 million records per day

    Erol Fornoles wrote: 

    FROM_UNIXTIME() converts from a UNIX timestamp to a MySQL date format.
    UNIX_TIMESTAMP() converts from MySQL date format to a UNIX timestamp.

    See http://dev.mysql.com/doc/refman/4.1/en/date-and-time-functions.html

    Regards,
    Bill K.
    Bill Guest

  13. #13

    Default Re: Application that generates 162 million records per day

    On 9 Dec 2006 22:08:28 -0800, Erol Fornoles wrote: 

    Guck. Okay. I can see both the problem and why all the stuff you're
    collecting would be useful.
     

    Okay, my *recommendation* for this to put all the actual data into log
    files by advertiser (possibly by time as well) and if/when someone
    actually gets curious enough to want to check when his advertisement
    appeared, your web application can slam the log files through grep or
    something. It's not hugely efficient, but it's not something I'd imagine
    people doing all day, every day. The click counts and impression counts,
    by day and total etc, can be handled very nicely with bunch of InnoDB
    tables, where you're not having to lock a whole table to exclusively
    increment one row.

    --
    91. I will not ignore the messenger that stumbles in exhausted and obviously
    agitated until my personal grooming or current entertainment is finished.
    It might actually be important.
    --Peter Anspach's list of things to do as an Evil Overlord
    Peter Guest

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139