sql_util api documentation

class sql_util.sql_token[source]

Bases: object

The class tokenises the sql statements captured by mysql_engine. Several regular expressions analyse and build the elements of the token. The DDL support is purposely limited to the following.


The regular expression m_fkeys is used to remove any foreign key definition from the sql statement as we don’t enforce any foreign key on the PostgreSQL replication.


The method builds a list of dictionaries with the column definitions.

The regular expression m_fields is used to find all the column occurrences and, for each occurrence, the method parse_column is called. If parse_column returns a dictionary, this is appended to the list col_parse.

Parameters:inner_stat – The statement within the round brackets in CREATE TABLE
Returns:cols_parse the list of dictionary with the column definitions
Return type:list
build_key_dic(inner_stat, table_name)[source]

The method matches and tokenise the primary key and index/key definitions in the create table’s inner statement.

As the primary key can be defined as column or table constraint there is an initial match attempt with the regexp m_inline_pkeys. If the match is successful then the primary key dictionary is built from the match data. Otherwise the primary key dictionary is built using the eventual table key definition.

The method search for primary keys keys and indices defined in the inner_stat. The index name PRIMARY is used to tell pg_engine we are building a primary key. Otherwise the index name is built using the format (uk)idx_tablename[0:20] + counter. If there’s a match for a primary key the composing columns are save into pkey_cols.

The tablename limitation is required as PostgreSQL enforces a strict limit for the identifier name’s lenght.

Each key dictionary have three keys. index_name, the index name or PRIMARY index_columns, a list with the column names non_unique, follows the MySQL’s information schema convention and marks an index if is unique or not.

When the dictionary is built is appended to idx_list and finally returned to the calling method parse_create_table.s

  • inner_stat – The statement within the round brackets in CREATE TABLE
  • table_name – The table name

idx_list the list of dictionary with the index definitions

Return type:



The method parses the alter table match. As alter table can be composed of multiple commands the original statement (group 0 of the match object) is searched with the regexp m_alter_list. For each element in returned by findall the first word is evaluated as command. The parse alter table manages the following commands. DROP,ADD,CHANGE,MODIFY.

Each command build a dictionary alter_dic with at leaset the keys command and name defined. Those keys are respectively the commant itself and the attribute name affected by the command.

ADD defines the keys type and dimension. If type is enum then the dimension key stores the enumeration list.

CHANGE defines the key command and then runs a match with m_alter_change. If the match is successful the following keys are defined.

old is the old previous field name new is the new field name type is the new data type dimension the field’s dimensions or the enum list if type is enum

MODIFY works similarly to CHANGE except that the field is not renamed. In that case we have only the keys type and dimension defined along with name and command.s

The excluded_names list is used to skip the CONSTRAINT and PRIMARY built along the the match object.

Parameters:malter_table – The match object returned by the match method against tha alter table statement.
Returns:stat_dic the alter table dictionary tokenised from the match object.
Return type:dictionary

This method parses the column definition searching for the name, the data type and the dimensions. If there’s a match the dictionary is built with the keys column_name, the column name data_type, the column’s data type is nullable, the value is set always to yes except if the column is primary key ( column name present in key_cols) enum_list,character_maximum_length,numeric_precision are the dimensions associated with the data type. The auto increment is set if there’s a match for the auto increment specification.s

Parameters:col_def – The column definition
Returns:col_dic the column dictionary
Return type:dictionary
parse_create_table(sql_create, table_name)[source]

The method parse and generates a dictionary from the CREATE TABLE statement. The regular expression m_inner is used to match the statement within the round brackets.

This inner_stat is then cleaned from the primary keys, keys indices and foreign keys in order to get the column list. The indices are stored in the dictionary key “indices” using the method build_key_dic. The regular expression m_pars is used for finding and replacing all the commas with the | symbol within the round brackets present in the columns list. At the column list is also appended a comma as required by the regepx used in build_column_dic. The build_column_dic method is then executed and the return value is stored in the dictionary key “columns”

  • sql_create – The sql string with the CREATE TABLE statement
  • table_name – The table name

table_dic the table dictionary tokenised from the CREATE TABLE

Return type:



The method cleans and parses the sql string A regular expression replaces all the default value definitions with a space. Then the statements are split in a list using the statement separator;

For each statement a set of regular expressions remove the comments, single and multi line. Parenthesis are surrounded by spaces and commas are rewritten in order to get at least one space after the comma. The statement is then put on a single line and stripped.


The match which is successful determines the parsing of the rest of the statement. Each parse builds a dictionary with at least two keys. Name and Command. Each statement comes with specific keys.

When the token dictionary is complete is added to the class list tokenised

Parameters:sql_string – The sql string with the sql statements.

The method adds the ” quotes to the column names. The string is converted to a list using the split method with the comma separator. The columns are then stripped and quoted with the “”. Finally the list elements are rejoined in a string which is returned. The method is used in build_key_dic to sanitise the column names.

Parameters:cols – The columns string
Returns:The columns quoted between ”.
Return type:text

The method resets the lists to empty lists after a successful tokenisation.