Skip to content

Commit

Permalink
fix/enh(broker): cbd memory leak in sql module and new library to tra…
Browse files Browse the repository at this point in the history
…ce allocations (#1071)

REFS: MON-33334
  • Loading branch information
jean-christophe81 authored Jan 18, 2024
1 parent 0653c65 commit e173da3
Show file tree
Hide file tree
Showing 16 changed files with 1,426 additions and 0 deletions.
6 changes: 6 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ if(NOT CMAKE_CXX_COMPILER_ID STREQUAL "GNU" AND NOT CMAKE_CXX_COMPILER_ID
FATAL_ERROR "You can build broker with g++ or clang++. CMake will exit.")
endif()

option(WITH_MALLOC_TRACE "compile centreon-malloc-trace library." OFF)

# set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++14 -stdlib=libc++")
# set(CMAKE_CXX_COMPILER "clang++")
add_definitions("-D_GLIBCXX_USE_CXX11_ABI=1")
Expand Down Expand Up @@ -190,6 +192,10 @@ add_subdirectory(engine)
add_subdirectory(connectors)
add_subdirectory(ccc)

if (WITH_MALLOC_TRACE)
add_subdirectory(malloc-trace)
endif()

add_custom_target(test-broker COMMAND tests/ut_broker)
add_custom_target(test-engine COMMAND tests/ut_engine)
add_custom_target(test-clib COMMAND tests/ut_clib)
Expand Down
1 change: 1 addition & 0 deletions broker/core/sql/src/mysql_connection.cc
Original file line number Diff line number Diff line change
Expand Up @@ -848,6 +848,7 @@ void mysql_connection::_run() {
::mysql_error(_conn)));
_state = finished;
_start_condition.notify_all();
_clear_connection();
return;
}
_last_access = std::time(nullptr);
Expand Down
42 changes: 42 additions & 0 deletions malloc-trace/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#
# Copyright 2024 Centreon
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy of
# the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations under
# the License.
#
# For more information : [email protected]
#

# Global options.
project("Centreon malloc trace" C CXX)

# Set directories.
set(INC_DIR "${PROJECT_SOURCE_DIR}/inc/com/centreon/malloc_trace")
set(SRC_DIR "src")

add_library(centreon-malloc-trace SHARED
"src/by_thread_trace_active.cc"
"src/malloc_trace.cc"
"src/orphan_container.cc"
"src/simply_allocator.cc"
)

target_link_libraries(centreon-malloc-trace
CONAN_PKG::fmt
)

target_include_directories(centreon-malloc-trace PRIVATE
${INC_DIR}
${CMAKE_SOURCE_DIR}/common/inc
)

target_precompile_headers(centreon-malloc-trace PRIVATE "precomp_inc/precomp.hh")
67 changes: 67 additions & 0 deletions malloc-trace/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# malloc-trace

## Description

The goal of this little library is to trace each orphan malloc free call. It overrides weak **malloc**, **realloc**, **calloc** and **free**
We store in a container in memory every malloc, free calls. We remove malloc from that container each time a free with the same address is called otherwise free is also store in the container.
Every minute (by default), we flush to disk container content:
* malloc that had not be freed and that are older than one minute
* free that has not corresponding malloc in the container.

In order to use it you have to feed LD_PRELOAD env variable
```bash
export LD_PRELOAD=malloc-trace.so
```
Then you can launch your executable and each call will be recorded in /tmp/malloc-trace.csv with ';' as field separator

The columns are:
* function (malloc or free)
* thread id
* address in process mem space
* size of memory allocated
* timestamp in ms
* call stack contained in a json array
```json
[
{
"f": "free",
"s": "",
"l": 0
},
{
"f": "__gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<boost::asio::io_context, std::allocator<boost::asio::io_context>, (__gnu_cxx::_Lock_policy)2> >::deallocate(std::_Sp_counted_ptr_inplace<boost::asio::io_context, std::allocator<boost::asio::io_context>, (__gnu_cxx::_Lock_policy)2>*, unsigned long)",
"s": "",
"l": 0
}
]
```
f field is function name
s field is source file if available
l field is source line

This library works in that manner:
We replace all malloc, realloc, calloc and free in order to trace all calls.
We store all malloc in a container. Each time free is called, if the corresponding malloc is found, it's erased from container,
otherwise orphan free is stored in the container.
Every 60s (by default), we flush the container, all malloc older than 60s and not freed are dumped to disk, all orphan freed are also dumped.

Output file may be moved during running, in that case it's automatically recreated.

## Environment variables
Some parameters of the library can be overriden with environment variables.
| Environment variable | default value | description |
| ------------------------ | --------------------- | ----------------------------------------------------------------------------- |
| out_file_path | /tmp/malloc-trace.csv | path of the output file |
| out_file_max_size | 0x100000000 | when the output file size exceeds this limit, the ouput file is truncated |
| malloc_second_peremption | one minute | delay between two flushes and delay after which malloc is considered orphaned |

## Provided scripts

### create_malloc_trace_table.sql
This script creates a table that can store an output_file
In this scripts, you will find in comments how to load output csv file in that table.

### remove_malloc_free.py
We store in output file malloc that aren't freed in the next minute, we also store orphan free.
So if a malloc is dumped and it's corresponding free is operated two minutes later, the two are stored in output file.
The purpose of this script is to remove these malloc-free pairs.
32 changes: 32 additions & 0 deletions malloc-trace/create_malloc_trace_table.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
-- Copyright 2024 Centreon
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
--
-- For more information : [email protected]
--


-- this sql code lets you to create table where you can load data from /tmp/malloc-trace.csv
-- you have to connect to the bdd with this command line:
-- mysql -h 127.0.0.1 --local-infile=1 -D centreon_storage -u centreon -pcentreon
-- then you load data with
-- load data local infile '/tmp/malloc-trace.csv' into table malloc_trace fields terminated by ';';

CREATE TABLE `centreon_storage`.`malloc_trace` (
`funct_name` VARCHAR(10) NOT NULL,
`thread_id` INT UNSIGNED NULL,
`address` BIGINT UNSIGNED NULL,
`size` INT UNSIGNED NULL,
`ms_timestamp` BIGINT UNSIGNED NULL,
`call_stack` TEXT(65535) NULL,
FULLTEXT INDEX `call_stack_ind` (`call_stack`) VISIBLE);
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
/**
* Copyright 2024 Centreon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* For more information : [email protected]
*/

#ifndef CMT_BY_THREAD_TRACE_ACTIVE_HH
#define CMT_BY_THREAD_TRACE_ACTIVE_HH

#include "intrusive_map.hh"

namespace com::centreon::malloc_trace {

/**
* @brief This class is used to store the tracing of a thread
* The problem is: malloc is called, we explore call stack and this research may
* do another malloc and we risk an infinite loop
* So the first malloc set the _malloc_trace_active and explore call stack
* The next malloc called by stacktrace process try to set _malloc_trace_active
* and as it's yet setted we don't try to explore call stack
*
*/
class thread_trace_active : public boost::intrusive::set_base_hook<> {
pid_t _thread_id;
mutable bool _malloc_trace_active = false;

public:
thread_trace_active() {}
thread_trace_active(pid_t thread_id) : _thread_id(thread_id) {}

pid_t get_thread_id() const { return _thread_id; }

/**
* @brief Set _malloc_trace_active
*
* @return old value of _malloc_trace_active
*/
bool set_malloc_trace_active() const {
if (_malloc_trace_active) {
return true;
}
_malloc_trace_active = true;
return false;
}

/**
* @brief reset _malloc_trace_active
*
* @return old value of _malloc_trace_active
*/
bool reset_malloc_trace_active() const {
if (!_malloc_trace_active) {
return false;
}
_malloc_trace_active = false;
return true;
}

bool is_malloc_trace_active() const { return _malloc_trace_active; }

struct key_extractor {
using type = pid_t;
type operator()(const thread_trace_active& node) const {
return node._thread_id;
}
};
};

/**
* @brief container of thread_trace_active with zero allocation
* the drawback is that we are limited to store 4096 thread trace states
*
*/
class thread_dump_active
: protected intrusive_map<thread_trace_active,
thread_trace_active::key_extractor,
4096> {
std::mutex _protect;

public:
bool set_dump_active(pid_t thread_id);
void reset_dump_active(pid_t thread_id);
};

} // namespace com::centreon::malloc_trace

#endif
50 changes: 50 additions & 0 deletions malloc-trace/inc/com/centreon/malloc_trace/funct_info_cache.hh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
/**
* Copyright 2024 Centreon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* For more information : [email protected]
*/
#ifndef CMT_FUNCT_INFO_CACHE_HH
#define CMT_FUNCT_INFO_CACHE_HH

namespace com::centreon::malloc_trace {
/**
* @brief symbol information research is very expensive
* so we store function informations in a cache
*
*/
class funct_info {
const std::string _funct_name;
const std::string _source_file;
const size_t _source_line;

public:
funct_info(std::string&& funct_name,
std::string&& source_file,
size_t source_line)
: _funct_name(funct_name),
_source_file(source_file),
_source_line(source_line) {}

const std::string& get_funct_name() const { return _funct_name; }
const std::string& get_source_file() const { return _source_file; }
size_t get_source_line() const { return _source_line; }
};

using funct_cache_map =
std::map<boost::stacktrace::frame::native_frame_ptr_t, funct_info>;

} // namespace com::centreon::malloc_trace

#endif
Loading

6 comments on commit e173da3

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
6 1 0 7 85.71 0s

Failed Tests

Name Message ⏱️ Duration Suite
EBNHGU4_BBDO3 hostgroup_1 not found in /tmp/lua-engine.log 0.000 s Hostgroups

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
19 1 0 20 95.00 0s

Failed Tests

Name Message ⏱️ Duration Suite
BDBU1 Central Broker not correctly stopped (coredump generated) 0.000 s Sql

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
9 3 0 12 75.00 0s

Failed Tests

Name Message ⏱️ Duration Suite
BENCH_1000STATUS_100ENGINE AttributeError: 'NoneType' object has no attribute 'query_read_bytes' 0.000 s Bench
BRCTS1 There should not exist queue map files. 0.000 s Reverse-Connection
BRCS1 There should not exist queue map files. 0.000 s Reverse-Connection

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
4 1 0 5 80.00 0s

Failed Tests

Name Message ⏱️ Duration Suite
BABOOORREL The 'boolean-ba' BA is not OK as expected 0.000 s Boolean Rules

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
3 2 0 5 60.00 0s

Failed Tests

Name Message ⏱️ Duration Suite
BRCTS1 There should not exist queue map files. 0.000 s Reverse-Connection
BRCS1 There should not exist queue map files. 0.000 s Reverse-Connection

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
19 1 0 20 95.00 0s

Failed Tests

Name Message ⏱️ Duration Suite
BDBU1 Central Broker not correctly stopped (coredump generated) 0.000 s Sql

Please sign in to comment.