openark forge

Open source utilities
 

oak-chunk-update

NAME

oak-chunk-update: Perform long, non-blocking UPDATE/DELETE operation in auto managed small chunks

SYNOPSIS

Delete rows from world.City where population is small:

oak-chunk-update –database=world –execute=”DELETE FROM City WHERE Population < 10000000 AND OAK_CHUNK(City)”

Same as above, provide fully qualified table names:

oak-chunk-update –execute=”DELETE FROM world.City WHERE Population < 10000000 AND OAK_CHUNK(world.City)”

Same as above, use 1oo rows chunk size, verbose:

oak-chunk-update –database=world –execute=”DELETE FROM City WHERE Population < 10000000 AND OAK_CHUNK(City)” –chunk-size=100 –verbose

Same as above, print progress:

oak-chunk-update –database=world –execute=”DELETE FROM City WHERE Population < 10000000 AND OAK_CHUNK(City)” –chunk-size=100 –verbose –print-progress

Same as above, do not log to binary log:

oak-chunk-update –database=world –execute=”DELETE FROM City WHERE Population < 10000000 AND OAK_CHUNK(City)” –chunk-size=100 –verbose –print-progress –no-log-bin

Same as above, sleep for 10 milliseconds between chunks:

oak-chunk-update –database=world –execute=”DELETE FROM City WHERE Population < 10000000 AND OAK_CHUNK(City)” –chunk-size=100 –sleep=100 –verbose

Perform an UPDATE operation:

oak-chunk-update –database=world –execute=”UPDATE City SET Population = Population+1 WHERE OAK_CHUNK(City)”

Perform a multi-table UPDATE operation, choose world.City as chunking table:

oak-chunk-update –execute=”UPDATE City, Country SET City.District = ‘unknown’ WHERE City.CountryCode = Country.Code AND Country.Continent = ‘Africa’ AND OAK_CHUNK(City)”

Provide connection parameters. Prompt for password:

oak-chunk-update –user=root –ask-pass –socket=/tmp/mysql.sock  –database=world –execute=”DELETE FROM City WHERE Population < 10000000 AND OAK_CHUNK(City)”

Use a defaults file for parameters.

oak-chunk-update –defaults-file=/home/myuser/.my-oak.cnf  –database=world –execute=”DELETE FROM City WHERE Population < 10000000 AND OAK_CHUNK(City)”

DESCRIPTION

This utility allows for splitting long running or non-indexed UPDATE/DELETE oprations, optionally multi-table ones.
Long running updating queries are often used. Some examples:

  • Purging old table records (e.g. purging old logs).
  • Updating a column on a table scale.
  • Deleting or updating a small number of rows, but with a non-indexed search condition.

oak-chunk-update splits such long running tasks into small chunks. It also allows for sleep time between chunks. This allows for less lock time, better replication responsiveness (less lag) and less stress on system resources (CPU, IO).
To perform, the utility uses a UNIQUE KEY on a given table, which is used for the splitting process.
Note that the query may involve multiple tables (JOINed), in which case one of the tables must have a UNIQUE KEY.
The utility requires, then:

  • At least on of the tables participating in the UPDATE/DELETE query has a UNIQUE KEY.
  • The query must indicate to the utility the table for which the UNIQUE KEY is used.

The query must include a hint in the form OAK_CHUNK(table_name) or OAK_CHUNK(database_name.table_name). See SYNOPSIS for examples.
The table indicated in the OAK_CHUNK clause is the table which must contain a UNIQUE KEY, which is used for splitting the query. The utility rewrites the query by iteratively replacing the OAK_CHUNK(…) clause with appropriate values from the UNIQUE KEY.
In case more than one UNIQUE KEY is available on the table, the utility chooses in the following order:

  • If there’s a PRIMARY KEY – this is the selected key
  • A key for which the first column is non-textual is prefereable to a key for which the first column is textual
  • A key with a smaller numeric data type takes precedance
  • A key with fewer columns take precedance

OPTIONS

–ask-pass

Prompt for password.

-c CHUNK_SIZE, –chunk-size=CHUNK_SIZE

Number of rows to act on in chunks (default: 1000). 0 means all rows updated in one operation
The lower the number, the shorter any locks are held, but the more operations required and the more total running time.

-d DATABASE, –database=DATABASE

Database name (required unless table is fully qualified)

–defaults-file=DEFAULTS_FILE

Read from MySQL configuration file. Overrides –user, –password, –socket, –port.

Configuration needs to be in the following format:

[client]
user=my_user
password=my_pass
socket=/tmp/mysql.sock
port=3306

-e EXECUTE_QUERY, –execute=EXECUTE_QUERY

Query (UPDATE or DELETE) to execute, which contains a chunk placeholder (required)

-H HOST, –host=HOST

MySQL host (default: localhost)

–no-log-bin

Do not log to binary log (actions will not replicate). This may be useful if the slave already finds it hard to replicate behind master. The utility may be spawned manually on slave machines, therefore utilizing more than one CPU core on those machines, making replication process faster due to parallelism.

-p PASSWORD, –password=PASSWORD

MySQL password

-P PORT, –port=PORT

TCP/IP port (default: 3306)

–print-progress

Show number of affected rows during utility runtime

–sleep=SLEEP_MILLIS

Number of milliseconds to sleep between chunks. Default: 0

-S SOCKET, –socket=SOCKET

MySQL socket file. Only applies when host is localhost

-u USER, –user=USER

MySQL user

-v, –verbose

Print user friendly messages

ENVIRONMENT

Requires MySQL 5.0 or newer, python 2.3 or newer.

python-mysqldb must be installed in order to use this tool. You can

apt-get install python-mysqldb

or

yum install mysql-python

SEE ALSO

LICENSE

This tool is released under the BSD license.

Copyright (c) 2008-2009, Shlomi Noach
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the name of the organization nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

AUTHOR

Shlomi Noach

 
Powered by Wordpress and MySQL. Theme by openark.org