Reasons to use innodb_file_per_table

May 21, 2009

When working with InnoDB, you have two ways for managing the tablespace storage:

  1. Throw everything in one big file (optionally split).
  2. Have one file per table.

I will discuss the advantages and disadvantages of the two options, and will strive to convince that innodb_file_per_table is preferable.

A single tablespace

Having everything in one big file means all tables and indexes, from all schemes, are 'mixed' together in that file.

This allows for the following nice property: free space can be shared between different tables and different schemes. Thus, if I purge many rows from my log table, the now unused space can be occupied by new rows of any other table.

This same nice property also translates to a not so nice one: data can be greatly fragmented across the tablespace.

An annoying property of InnoDB's tablespaces is that they never shrink. So after purging those rows from the log table, the tablespace file (usually ibdata1) still keeps the same storage. It does not release storage to the file system.

I've seen more than once how certain tables are left unwatched, growing until disk space reaches 90% and SMS notifications start beeping all around.

There's little to do in this case. Well, one can always purge the rows. Sure, the space would be reused by InnoDB. But having a file which consumes some 80-90% of disk space is a performance catastrophe. It means the disk needle needs to move large distances. Overall disk performance runs very low.

The best way to solve this is to setup a new slave (after purging of the rows), and dump the data into that slave.

InnoDB Hot Backup

The funny thing is, the ibbackup utility will copy the tablespace file as it is. If it was 120GB, of which only 30GB are used, you still get a 120GB backed up and restored.

mysqldump, mk-parallel-dump

mysqldump would be your best choice if you only had the original machine to work with. Assuming you're only using InnoDB, a dump with --single-transaction will do the job. Or you can utilize mk-parallel-dump to speed things up (depending on your dump method and accessibility needs, mind the locking).

innodb_file_per_table

With this parameter set, a .ibd file is created per table. What we get is this:

  • Tablespace is not shared among different tables, and certainly not among different schemes.
  • Each file is considered a tablespace of its own.
  • Again, tablespace never reduces in size.
  • It is possible to regain space per tablespace.

Wait. The last two seem conflicting, don't they? Let's explain.

In our log table example, we purge many rows (up to 90GB of data is removed). The .ibd file does not shrink. But we can do:

ALTER TABLE log ENGINE=InnoDB

What will happen is that a new, temporary file is created, into which the table is rebuilt. Only existing data is added to the new table. Once comlete, the original table is removed, and the new table renamed as the original table.

Sure, this takes a long time, during which the table is completely locked: no writes and no reads allowed. But still - it allows us to regain disk space.

With the new InnoDB plugin, disk space is also regained when execuing a TRUNCATE TABLE log statement.

Fragmentation is not as bad as in a single tablespace: the data is limited within the boundaries of a smaller file.

Monitoring

One other nice thing about innodb_file_per_table is that it is possible to monitor table size on the file system level. You don't need access to MySQL, to use SHOW TABLE STATUS or to query the INFORMATION_SCHEMA. You can just look up the top 10 largest files under your MySQL data directory (and subdirectories), and monitor their size. You can see which table grows fastest.

Backup

Last, it is not yet possible to backup single InnoDB tables by copying the .ibd files. But hopefully work will be done in this direction.

tags: , , ,
posted in MySQL by shlomi

« | »

Follow comments via the RSS Feed | Leave a comment | Trackback URL

37 Comments to "Reasons to use innodb_file_per_table"

  1. Tom Krouper wrote:

    You talked a little bit about the negatives of using a single file (i.e. you can't reduce the disk foot print.) What about the negative(s) of innodb_file_per_table? If there is no negative, why would anyone use a single file?

  2. shlomi wrote:

    @Tom,

    There is a negative which I've mentioned.
    With single file, table space can be shared between rows of different tables and schemas. This means less wasted tablespace.

    With innodb_file_per_table, each table may have unused tablspace, which can only be utilized by rows of the same table. This means (sometimes much) more wasted tablespace.

  3. fwdemails wrote:

    Since re-claiming space with innodb_file_per_table is possible, makes it a easier choice over the one single tablespace.

  4. Przemek wrote:

    Yes, I agree there is only one negative about innodb_file_per_table - wasted table space, but in multi-database environments this can be a real problem since innodb separate files disk space overhead is quite large. We can speak of hundreds of gigabytes when you have like million tables.

  5. shlomi wrote:

    @Przemek,

    The wasted space does accumulate. A millions tables, though?
    In this case there is obviously the impossible number of open files (open tables / table_cache / table_definition_cache) to manage.

  6. Robert Wultsch wrote:

    @shlomi.
    One does not need all tables open at the same time to have a useful database. Assuming one has large amount of disk space and one does not kill the fs (ie more than 60k odd files in a single ext3 directory) a million tables is not all that incredible of an idea.

  7. Przemek wrote:

    Well, I haven't seen a server with *milions* of tables, just over 1 milion :) And if you have opened only few percent of these tables simultaneously it works well :)
    And since we have flash drives on the market similiar cases are less painful.

  8. shlomi wrote:

    Robert, Przemek,

    Thanks. "A millions tables" was a typo and should have read "A million tables".
    If you do indeed only use a small fraction of these tables at any given time - then that's more reasonable.

    Can you guarantee these behavior, though? Is it possible that at some peak time some, say, 100K tables need to be accessed?

    I'm actually more concerned with the table cache: it has a single lock, as far as I know, and would therefore become a serious bottleneck in such cases.

  9. stop messing with the tablespace « domas mituzas: vaporware, inc. wrote:

    [...] keep loving and endorsing the –innodb-file-per-table. Then poor new users read about that, get confused, start using [...]

  10. domas wrote:

    I wrote a somewhat large response to this, at http://dammit.lt/2009/05/21/innodb-tablespace/ :-)

  11. Ronald Bradford wrote:

    Your posts forgets to mention some very important reasons why innodb_file_per_table is bad.

    The most critical of this, is the necessary fsync, that if running one per second now has to occur on 'n' opened tables, rather then a single file.

    A Disk I/O bound system is the most common resource bottleneck of a popular system, minimizing unnecessary accesses to your slowest physical resource should be a priority.

  12. shlomi wrote:

    @Ronald,

    Thanks, good point!

  13. Log Buffer #147: a Carnival of the Vanities for DBAs | Pythian Group Blog wrote:

    [...] “When working with InnoDB,” writes Schlomi Noach, “you have two ways for managing the tablespace storage.  . . .  I will discuss the advantages and disadvantages of the two options, and will strive to convince that innodb_file_per_table is preferable.” The item is, Reasons to use innodb_file_per_table. [...]

  14. Rolando Edwards wrote:

    One of the biggest headaches with InnoDB is indeed the monolithic ibdata1 file. To reduce the size of the file and leave nothing but InnoDB metadata is the following:

    1. Run SELECT DISTINCT table_schema FROM information_schema.tables where engine='InnoDB';
    db-1
    db-2
    ...
    db-n
    2. Run mysqldump of only those databases.
    mysqldump -h... -u... -p... --routines --triggers --databases db1 db2 ... dbn > InnoDBData.sql

    Note: If there are MyISAM tables in the dump, no problem. They will get put back when reloading. The dump file will also contains the CREATE TABLE command for db1, db2, ..., dbn.

    3. Drop those databases
    DROP DATABASE db1;
    DROP DATABASE db2;
    ...
    DROP DATABASE dbn;

    4. Run 'service mysql stop'

    5. Delete InnoDB files
    rm -f /var/lib/mysql/ibdata1
    rm -f /var/lib/mysql/ib_logfile0
    rm -f /var/lib/mysql/ib_logfile1

    6. Add innodb_file_per_table to /etc/my.cnf

    7. Make sure ibdata1 setting is 10M:autoextend

    8. Run 'service mysql start'
    This rebuilds ibdata1 and the ib log files
    ibdata1 is now 10MB

    9. Run 'source InnoDBData.sql' in mysql
    This will reload the InnoDB data

    Now ibdata1 is defragged (in a convoluted way)

    Going forward, run 'OPTIMIZE TABLE' on all InnoDB tables periodically to defragment the .ibd files. ibdata1 will only contain InnoDB metadata.

  15. shlomi wrote:

    @Rolando,

    Sure, if you can take your server down for that, then life is good.
    Also, if you can allow for periodic OPTIMIZE TABLE, life is good, again.
    Using master-master replication may help out on this, and shorten your downtime.

  16. Log Buffer #148: a Carnival of the Vanities for DBAs | Pythian Group Blog wrote:

    [...] As does Shlomi Noach, with his reasons to use innodb_file_per_table. [...]

  17. Artículos destacados, Mayo de 2009 | cambrico.net wrote:

    [...] para utilizar el parámetro innodb_per_table en MySQL, en Openark. (en [...]

  18. nos wrote:

    innodb_per_table has one other drawback.
    You'll use more resources. Namely, mysqld will need to keep 1 open file per table. This can be a problem if you have _many_ tables. (Ofcourse, myisam tables always had that issue)

  19. Feuchttraum wrote:

    My ibdata1 is 45 GB. MySQL server runs on my Windows development machine and file ibdata1 is severely fragmented. Defrag programs cannot defragment it. Can I temporarily (while MySQL service is down) move ibdata1 file to another drive, defragment the first drive (actually a partition) and then move ibdata1 back?

  20. shlomi wrote:

    @Feuchttraum

    I'm not sure about defragmenting files on windows. If you say that's feasable, then, yes, you can take the service down and work on the file during that time.

  21. Feuchttraum wrote:

    @schlomi, thank you for fast reply. I also think it should work, I just wanted to be sure first. It will save me some time, since doing mysqldump on 45 GB database can take quite long.

  22. shlomi wrote:

    Whatever you do - make a backup first!

  23. Mikev wrote:

    I been googling around and it appears to do get slower write speeds with innodb_file_per_table enabled.
    Personally I don't give a dam about gaining file space here and there I am far more concerned about total speed performance then possible space.

  24. shlomi wrote:

    @Mikev,

    I'm not aware that there's a difference in write speed. Can you point out the references?
    Thanks

  25. Mikev wrote:

    I have found various blogs/sites on the net attacking the innodb_file_per_table setting due to extra threads and disk io usage that can occur.
    This guys post appears to be the most complete with small benchmark examples of fdatasync/sec numbers

    http://yoshinorimatsunobu.blogspot.com/2009/05/overwriting-is-much-faster-than_28.html

    I lost some of the other links there is this other guys blog that totally hates it and he seemed like his arguments were quite good. I read since I closed by browser down earlier today.

    Seems like if you value the odd amount of disk space over performance by all means enable innodb_file_per_table but if you value performance above easy management thing don't use this setting.

    Also I grabbed this out of the "High performance MYSQL" Book.

    'Innodb_file_per_table causes each file to be fsynced separately, which means writes to multiple tables can't be combined into a single I/O operation. This may require InnoDB to perform a higher total number of fsync() operations.'

  26. shlomi wrote:

    Hi,

    They guy who absolutely hates innodb_file_per_table would be Domas Mituzas.
    I disagree with part of his arguments. But he is an expert in the field, so this does not mean I belittle him. It's just that there's some contention between "what's utterly best for performance" and "what's best to live with". The two do not work well together.

    There's reasons for and against. I've presented the reasons "for".
    As I see it, the development is moving in the direction of innodb_file_pre_table: the innodb plugin has to use this option if you want to use Barracuda table format (which allows for compression, among other things).

    Anyway, I don't get paid to support innodb_file_per_table :) I'll use the single tablespace if I think it's superior.

    Regards

  27. Gabe da Silveira wrote:

    I find the whole tone of Domas' post extremely irritating. Internet trolls are a dime a dozen, but when someone really knows their stuff, it's annoying when they go around with an attitude like anyone without that domain knowledge is an idiot. Especially when he makes hand-wavy assumptions about real world scenarios that aren't universally true, such as the idea that permanent data will always grow faster overall than temporary data. Not to mention cases where there are other infrastructure considerations in play, such as using Time Machine on OS X where monolithic files defeat the primary utility of the system.

    It sort of reminds me of Fabian Pascal's dbdebunk.com mysterious silence coinciding very closely with the rise of the NoSQL movement. That guy was one of the foremost experts in relational theory, and yet for all his expertise, he was not able to educate people and fell prey to his own dogma which was unreconcilable with the unprecedented scaling challenges faced by the social web. No doubt he is still sitting in his office wildly gesticulating about the failure of the market to produce a "true" RDBMS system, which, of course, would have superceded the need for the current breed of alternate database technologies.

    The lesson here, which I would have posted to Domas' post if he allowed comments, is that no one knows everything, and if you are a domain expert you should share your knowledge openly rather than belittling others. If Domas built a web application I could poke his implementation choices full of countless holes, but it wouldn't make me smarter than him.

  28. John Scott wrote:

    I hail from DB2 land and I really enjoyed the ability to put one or more tables of my choosing in a particular tablespace and others in another tablespace.

    I totally agree with many of the points here about the filesystem overhead of having a file per table. However, I also don't want all tables to be in one file.

    Is there not a happy medium in mysql?

  29. George M wrote:

    I've been running several MySQL servers for a number of years. I have one particular box that has a little over 400 databases totaling about 350GB. The server was originally setup using 3 ibdata files back on 4.0, and was later changed to file per table. There has not been any appreciable speed difference from what I can see. The server is very busy, averaging 6-7 GB of binlogs per day. There is file system overhead for sure, but the db files are stored on a 16 spindle 6gb/s sas array, so I don't expect that any small gain would warrant moving back to single files again.

  30. shlomi wrote:

    @John, George,

    Ah, but it has been more than two years since this post was published. Much has happened since.
    In particular, the emergence of InnoDB Plugin (now just InnoDB), replacing the older "builting" InnoDB.
    InnoDB Plugic comes with many improvements; performance, scale up, bugfixes, blob storage, ...
    And to fully utilize its functionality (you can use half of its advantages without changing anything), you need to upgrade your tables to "barracuda format".
    Guess what? It requires innodb_file_per_table.
    So file-per-table is now also a feature issue. You loose features (e.g. compression, blob storage) which can directly relate to performance issues.

  31. rahul wrote:

    i want to know one thing>>>>if bychance iddata1 file has deleted then all table data would gone.....how i will resolve this solution.

  32. shlomi wrote:

    Well, you're in a serious problem.
    I would tell you to open your backup, but I'm sure you're asking this question as you don't have one. Or you could rely on your replicating slave if you have one.
    Depending on your file system, you may be able to restore the file itself.
    If this is made possible, you may recover your innodb data using innodb-tools. But if you're down to that, you may seek out help from Percona, who author those tools.

  33. MySQL performance tips for Zabbix | Zabbix Zone wrote:

    [...] Some discussions about this: http://dom.as/2009/05/21/innodb-tablespace/ http://code.openark.org/blog/mysql/reasons-to-use-innodb_file_per_table [...]

  34. MySQL标记贴 | 我爱花生米 wrote:

    [...] MySQL InnoDB的幻读 MySQL索引背后的数据结构及算法原理 Jeremy Zawodny on MySQL Reasons to use innodb_file_per_table This entry was posted in DataBase, Paste and tagged InnoDB, MySQL. Bookmark the permalink. [...]

  35. innodb_file_per_table for existing MySQL database | Olof Larsson wrote:

    [...] http://code.openark.org/blog/mysql/reasons-to-use-innodb_file_per_table [...]

  36. Optimizing MySQL server settings | Codingpedia.org wrote:

    […] the same as if you had this disabled, but you have now the possibility to rebuild the table. Follow Reasons to use innodb_file_per_table for more […]

  37. MySQL performance tips for Zabbix | Zabbix Zone - 시스존 wrote:

    […] Some discussions about this: http://dom.as/2009/05/21/innodb-tablespace/ http://code.openark.org/blog/mysql/reasons-to-use-innodb_file_per_table […]

Leave Your Comment

 

 
Powered by Wordpress and MySQL. Theme by openark.org