-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade embedded Postgres and fix compatibility issues. #1199
Upgrade embedded Postgres and fix compatibility issues. #1199
Conversation
@@ -264,30 +264,25 @@ public void updateProcMemoryUsage(FrameInterface f, long rss, long maxRss, | |||
"SELECT pk_frame FROM proc WHERE pk_frame=? FOR UPDATE", | |||
String.class, f.getFrameId()).equals(f.getFrameId())) { | |||
|
|||
getJdbcTemplate().update(UPDATE_PROC_MEMORY_USAGE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this was a merge mistake. This block and the one below it seem to duplicate each other, and this block is missing one of the placeholder values.
@@ -212,6 +211,7 @@ public void testPortionSorted() throws Exception { | |||
@Test | |||
@Transactional | |||
@Rollback(true) | |||
@Ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this unit test. I dug into it a bit and I don't see an obvious problem, but the assertNotEquals(jobs, sortedJobs)
check fails for me sometimes in an ARM environment. After looking at the dispatch SQL query, I don't see why jobs
and sortedJobs
should be different. The jobs are all launched with the same priority, so it's expected to get them back in the same order, right?
@DiegoTavares Any idea here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't make much sense to me. I guess the first assert with sortedJobs can be removed as it's not guaranteed both lists will be different.
getJdbcTemplate().update(UPDATE_PROC_MEMORY_USAGE, | ||
rss, maxRss, vss, maxVss, | ||
usedGpuMemory, maxUsedGpuMemory, f.getFrameId()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, this is was a weird merge issue. Not sure where it came from. Good catch.
@@ -88,7 +88,7 @@ public class DispatchQuery { | |||
"LEFT JOIN (" + | |||
"SELECT " + | |||
"limit_record.pk_limit_record, " + | |||
"SUM(layer_stat.int_running_count) AS int_sum_running " + | |||
"SUM(CAST(layer_stat.int_running_count AS numeric)) AS int_sum_running " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading postgres docs raised some concerns regarding the performance of this conversion:
https://www.postgresql.org/docs/current/datatype-numeric.html :
The type numeric can store numbers with a very large number of digits. It is especially recommended for storing monetary amounts and other quantities where exactness is required. Calculations with numeric values yield exact results where possible, e.g., addition, subtraction, multiplication. However, calculations on numeric values are very slow compared to the integer types, or to the floating-point types described in the next section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to dig a bit deeper to understand where this is coming from. Was the issue impacting every query or a specific subset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed to impact every query that used a SUM
on a bigint
field. I didn't test all of these queries but once I saw it a few times I just went through and changed all of them, it seemed consistent.
According to postgres docs SUM(bigint)
always returns a numeric
, so I'm assuming something is going wrong in that conversion process somewhere.
It could be a problem specific to the embedded postgres, I don't think I was able to reproduce it anywhere else. And I can't get an older version of the embedded postgres on ARM so I wasn't able to try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DiegoTavares Any update here? Possible to do a test on your side to see if it produces any performance issues of note?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've submitted a ticket to EDB asking about the error.
It looks there's a change this update will actually have performance implications. We can try a different version of postgres and see if this is a bug specific for this version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bcipriano I have few questions for you
- Did you installed postgres 14.5 using following link https://www.enterprisedb.com/downloads/postgres-postgresql-downloads or using some other source ?
- Did you got similar issue when you was on version 12.12 ? Are you able to test it on Postgresql12.12 on your new macbook ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bcipriano Can you confirm the source of postgres download? can you try https://www.enterprisedb.com/downloads/postgres-postgresql-downloads for download postgres ? Let me know how it goes with EDB 14.5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to mention -- this only seems to happen in Docker. When I run tests directly on my ARM Macbook, they pass fine, in fact this PR isn't needed at all. It's only when I try to build the Cuebot Docker image (which does the Gradle build/test as part of the build process) that this problem appears.
Did you installed postgres 14.5 using following link https://www.enterprisedb.com/downloads/postgres-postgresql-downloads or using some other source ?
Postgres is installed via Gradle/Maven. You can see the versions defined in build.gradle
: https://github.com/AcademySoftwareFoundation/OpenCue/pull/1199/files#diff-7e57ab670111c27c23069909d0cc32c88d266c9a1f3e6ca38052c973418146fe
I don't know if it's possible use Postgres binaries from some other source with this embedded Postgres thing we're using.
Did you got similar issue when you was on version 12.12 ? Are you able to test it on Postgresql12.12 on your new macbook ?
Same issue on 11.13.0 and 12.12.0. But only on Docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a bit more research into this today:
- Tried the latest Postgres 15 binaries. Same issue.
- Tried a different base image (
almalinux
instead ofgradle
). Same issue. - Created a Postgres container from the standard
postgres
image on my mac. Was not able to reproduce the issue.
Still kind of stumped. I filed zonkyio/embedded-postgres#99 to see if those folks have any idea.
From TSC meeting: we're going to hold on this PR for now while we continue to investigate. The upstream ticket has some suggestions I'll look into next. |
Had a breakthrough on this based on the advice I got on the upstream ticket. See zonkyio/embedded-postgres#99 (comment) for details. Hopefully we can get some updated Postgres binaries released soon, after which we can probably throw out this PR and do a simple Postgres version upgrade. |
Closing stale PRs |
Link the Issue(s) this Pull Request is related to.
#1196
Summarize your change.
In order to get Cuebot working on the
arm64v8
architecture, we need to upgrade the embedded Postgres we're using. This created some compatibility issues which are fixed here as well.The main issue I ran into was a strange one. On ARM only, and only in Docker, it appears that
SUM(bigint)
does not work properly. It returns anumeric
type response as expected, but the value is completely wrong -- we were getting results like20 + 9 = 368934881474632988248
, which would cause overflows in the Cuebot code. The solution I found is toCAST
the input values asnumeric
before theSUM
happens.This only happens when building the Cuebot Docker image. When I run tests directly on my Mac i.e. through IntelliJ, the tests run fine.
There are a few other minor test fixes in here, possibly due to some differences in either the Postgres version or x86/arm differences, but I didn't see anything concerning there.
This PR contains some of the changes from #1185 to make testing easier; these will be cleaned up once that PR is merged.