Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Precision and Scale Parsing Due to Column Comment Interference #866

Open
MrDingz opened this issue Oct 11, 2024 · 0 comments · May be fixed by #868
Open

Incorrect Precision and Scale Parsing Due to Column Comment Interference #866

MrDingz opened this issue Oct 11, 2024 · 0 comments · May be fixed by #868
Labels
dev-complete Development completed
Milestone

Comments

@MrDingz
Copy link

MrDingz commented Oct 11, 2024

Description

In the file MySqlDDLParserListenerImpl.java, at line 339, there is an issue with how the precision and scale are determined based on the presence of (, ), and , in parsedDataType.

else if(parsedDataType.contains("(") && parsedDataType.contains(")") && parsedDataType.contains(",") ) {
    try {
        precision = Integer.parseInt(parsedDataType.substring(parsedDataType.indexOf("(") + 1, parsedDataType.indexOf(",")));
        scale = Integer.parseInt(parsedDataType.substring(parsedDataType.indexOf(",") + 1, parsedDataType.indexOf(")")));
    } catch(Exception e) {
        log.error("Error parsing precision, scale : columnName" + columnName);
    }
}

The issue arises because parsedDataType is derived from colDefTree.getText(), which includes column comments. If the comment contains characters like (, ), or ,, it incorrectly triggers the precision and scale parsing logic, even though the actual column definition does not contain these characters.

Example Scenario

Consider a column definition like the following:

`col1` varchar(45) NOT NULL COMMENT 'a column, test'

Here, the parsedDataType will contain the comment as well as:

"varchar(45)NOTNULLCOMMENT'a column, test'"

potentially causing incorrect parsing due to the extra , characters in the comment, so it will cause a Exception.

Suggested Solution

To prevent this issue, the logic should be adjusted to ensure that only the actual datatype segment is used for precision and scale determination, excluding any comments. This can be achieved by extracting only the datatype portion of parsedDataType without comments.

Proposed Code Change

One way to handle this is to sanitize parsedDataType before checking for (, ), and ,. Here is a suggested approach:

String sanitizedDataType = parsedDataType.split("COMMENT")[0].trim();
if(sanitizedDataType.contains("(") && sanitizedDataType.contains(")") && sanitizedDataType.contains(",")) {
    try {
        precision = Integer.parseInt(sanitizedDataType.substring(sanitizedDataType.indexOf("(") + 1, sanitizedDataType.indexOf(",")));
        scale = Integer.parseInt(sanitizedDataType.substring(sanitizedDataType.indexOf(",") + 1, sanitizedDataType.indexOf(")")));
    } catch(Exception e) {
        log.error("Error parsing precision, scale : columnName" + columnName);
    }
}

This ensures that only the datatype itself is analyzed for precision and scale, avoiding issues caused by comments.

Environment

  • ClickHouse Sink Connector version: [2.3]
  • Debezium version: [3.0]

Steps to Reproduce

  1. Define a MySQL column with a datatype containing precision and scale and a comment with (, ), and ,.
  2. Observe how the ClickHouse Sink Connector processes this column and note any parsing errors in the logs.

Expected Behavior

The connector should correctly parse the precision and scale based only on the actual datatype and ignore any characters in comments.

Actual Behavior

The connector incorrectly includes comment characters in the precision and scale parsing, potentially leading to incorrect ClickHouse datatype assignments.

Additional Information

N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev-complete Development completed
Projects
None yet
2 participants