Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store counter of restarts in NVS (Non-Volatile Storage) #779

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

llange
Copy link
Contributor

@llange llange commented Nov 20, 2022

In preparation of #748, this counter will serve as part of an "unique" token in the log file name ; ensuring that after a reboot we will not overwrite an existing log file.

The counter is stored in NVS, and is incremented at each class instanciation; which should roughly translate to every reboot.
(Don't know about the deepsleep for the moment).

The counter will reset to 1 after 99.999.999 restarts, which should gives us plenty of time.

Signed-off-by: Ludovic LANGE [email protected]

@llange llange marked this pull request as ready for review November 20, 2022 20:51
@dexterbg
Copy link
Member

Ludovic,

can you please elaborate on your reasons to choose the esp-idf NVS system over our config store NVS system? While we did reserve that partition (I think it's used by the Wifi blob?), I didn't see any advantage in actually using this up to now.

I once though it could later be useful for values needed early in the boot process. You currently place it right before the actual config store init – is there a reason this needs to be done that late? Btw, is there also a reason why Init and Start cannot or should not be done by the constructor?

Thanks,
Michael

@llange
Copy link
Contributor Author

llange commented Nov 21, 2022

Regarding the choice NVS vs config, I did not give it a complete thought I'm afraid. It may be a (wrong ?) gut feeling that NVS would be more low-level, and inaccessible (protected from modifications) from the user. A kind of fire and forget.
Moreover, I thought this counter was somehow "tied" to the hardware, whereas a config (through a backup / restore) would sometime cause the restore of an old counter value which (IMHO) is not the right thing to do here.
(Also, when restoring in another module - I have 2 modules, and I sometimes backup from one and restore on the other.)

Indeed it's used mainly for the Wifi blob at the moment - when I remove the initialisation, Wifi fails with error.

I tried to make it initialized in the constructor, but it did crash or fail (don't remember exactly the details), and so I thought it had to be there for a reason and that I should not mess with it.

If you're OK for continuing to use the NVS, I'll try to give another go at the initialisation sequence in order to have it as early as possible.

Let me know.

@dexterbg
Copy link
Member

I'm fine with using the NVS. Having this counter outside the config store to exclude it from user access & backup/restore is a valid reason. Thinking about this, we also still see occasional store partition corruptions, which maybe do not occur on the NVS without the FAT FS layer.

It would be nice to have this early in the boot process, but there currently is no need for this, so don't put too much effort in it if it doesn't work easily. Getting issues from doing the init in the constructor may mean it's designed to be used in FreeRTOS mode only.

@llange
Copy link
Contributor Author

llange commented Nov 21, 2022

OK so when I move the init in the constructor, it aborts with a mutex-related assert :

I (699) nvs: Initialising NVS (10000)
/Users/llange/Projects/CANBUS/OVMS/esp-idf/components/freertos/queue.c:621 (xQueueTakeMutexRecursive)- assert failed!
abort() was called at PC 0x4008f9cb on core 0

ELF file SHA256: 9dd926330d31ae11

Backtrace: 0x4008ddc6:0x3ffe39c0 0x4008e061:0x3ffe39e0 0x4008f9cb:0x3ffe3a00 0x4018e271:0x3ffe3a20 0x40084e36:0x3ffe3a40 0x400857d0:0x3ffe3a60 0x40085a31:0x3ffe3ab0 0x4018e393:0x3ffe3ae0 0x4018e4b4:0x3ffe3b20 0x401481b9:0x3ffe3b40 0x401481e2:0x3ffe3b70 0x4
0122b47:0x3ffe3b90 0x40122dce:0x3ffe3bb0 0x4008167a:0x3ffe3bd0 0x400819bc:0x3ffe3c00 0x40078b02:0x3ffe3c40 0x40078bb5:0x3ffe3c70 0x40078d4d:0x3ffe3cb0 0x40078e96:0x3ffe3e70 0x40007c15:0x3ffe3eb0 0x4000073d:0x3ffe3f20


[OVMS] Current tasks: OVMS CanRx|OVMS Events
Rebooting...

Stack trace (long)

(gdb) list * 0x4008f9cb
0x4008f9cb is in xQueueTakeMutexRecursive (esp-idf/components/freertos/queue.c:621).
616   BaseType_t xQueueTakeMutexRecursive( QueueHandle_t xMutex, TickType_t xTicksToWait )
617   {
618   BaseType_t xReturn;
619   Queue_t * const pxMutex = ( Queue_t * ) xMutex;
620 
621     configASSERT( pxMutex );
622 
623     /* Comments regarding mutual exclusion as per those within
624     xQueueGiveMutexRecursive(). */
625 
(gdb) list *0x4008ddc6
0x4008ddc6 is in invoke_abort (esp-idf/components/esp32/panic.c:156).
151 #endif
152    while (1) {
153        if (esp_cpu_in_ocd_debug_mode()) {
154            __asm__ ("break 0,0");
155        }
156        *((int *) 0) = 0;
157    }
158 }
159 
160 void abort()
(gdb) list * 0x4008e061
0x4008e061 is in abort (esp-idf/components/esp32/panic.c:171).
166     * don't overwrite that.
167     */
168    if (esp_reset_reason_get_hint() == ESP_RST_UNKNOWN) {
169        esp_reset_reason_set_hint(ESP_RST_PANIC);
170    }
171    invoke_abort();
172 }
173 
174 
175 static const char *edesc[] = {
(gdb) list * 0x4008f9cb
0x4008f9cb is in xQueueTakeMutexRecursive (esp-idf/components/freertos/queue.c:621).
616   BaseType_t xQueueTakeMutexRecursive( QueueHandle_t xMutex, TickType_t xTicksToWait )
617   {
618   BaseType_t xReturn;
619   Queue_t * const pxMutex = ( Queue_t * ) xMutex;
620 
621     configASSERT( pxMutex );
622 
623     /* Comments regarding mutual exclusion as per those within
624     xQueueGiveMutexRecursive(). */
625 
(gdb) list * 0x4018e271
0x4018e271 is in spi_flash_op_lock (esp-idf/components/spi_flash/cache_utils.c:56).
51     assert(s_flash_op_mutex != NULL);
52  }
53  
54  void spi_flash_op_lock()
55  {
56     xSemaphoreTakeRecursive(s_flash_op_mutex, portMAX_DELAY);
57  }
58  
59  void spi_flash_op_unlock()
60  {
(gdb) list * 0x40084e36
0x40084e36 is in spi_flash_disable_interrupts_caches_and_other_cpu (esp-idf/components/spi_flash/cache_utils.c:95).
90     xTaskResumeAll();
91  }
92  
93  void IRAM_ATTR spi_flash_disable_interrupts_caches_and_other_cpu()
94  {
95     spi_flash_op_lock();
96  
97     const uint32_t cpuid = xPortGetCoreID();
98     const uint32_t other_cpuid = (cpuid == 0) ? 1 : 0;
99  #ifndef NDEBUG
(gdb) list * 0x400857d0
0x400857d0 is in spi_flash_mmap_pages (esp-idf/components/spi_flash/flash_mmap.c:151).
146    mmap_entry_t* new_entry = (mmap_entry_t*) heap_caps_malloc(sizeof(mmap_entry_t), MALLOC_CAP_INTERNAL|MALLOC_CAP_8BIT);
147    if (new_entry == 0) {
148        return ESP_ERR_NO_MEM;
149    }
150 
151    spi_flash_disable_interrupts_caches_and_other_cpu();
152 
153    spi_flash_mmap_init();
154    // figure out the memory region where we should look for pages
155    int region_begin;   // first page to check
(gdb) list * 0x40085a31
0x40085a31 is in spi_flash_mmap (esp-idf/components/spi_flash/flash_mmap.c:125).
120        return ESP_ERR_NO_MEM;
121    }
122    for (int i = 0; i < page_count; i++) {
123        pages[i] = phys_page+i;
124    }
125    ret = spi_flash_mmap_pages(pages, page_count, memory, out_ptr, out_handle);
126    free(pages);
127    return ret;
128 }
129 
(gdb) list * 0x4018e393
0x4018e393 is in esp_partition_find (esp-idf/components/spi_flash/partition.c:150).
145 static esp_err_t load_partitions()
146 {
147    const uint32_t* ptr;
148    spi_flash_mmap_handle_t handle;
149    // map 64kB block where partition table is located
150    esp_err_t err = spi_flash_mmap(ESP_PARTITION_TABLE_OFFSET & 0xffff0000,
151            SPI_FLASH_SEC_SIZE, SPI_FLASH_MMAP_DATA, (const void**) &ptr, &handle);
152    if (err != ESP_OK) {
153        return err;
154    }
(gdb) list * 0x4018e4b4
0x4018e4b4 is in esp_partition_find_first (esp-idf/components/spi_flash/partition.c:121).
116 }
117 
118 const esp_partition_t* esp_partition_find_first(esp_partition_type_t type,
119        esp_partition_subtype_t subtype, const char* label)
120 {
121    esp_partition_iterator_t it = esp_partition_find(type, subtype, label);
122    if (it == NULL) {
123        return NULL;
124    }
125    const esp_partition_t* res = esp_partition_get(it);
(gdb) list * 0x401481b9
0x401481b9 is in nvs_flash_init_partition(char const*) (esp-idf/components/nvs_flash/src/nvs_api.cpp:160).
155    if (mStorage) {
156        return ESP_OK;
157    }
158 
159    const esp_partition_t* partition = esp_partition_find_first(
160            ESP_PARTITION_TYPE_DATA, ESP_PARTITION_SUBTYPE_DATA_NVS, part_name);
161    if (partition == NULL) {
162        return ESP_ERR_NOT_FOUND;
163    }
164 
(gdb) list * 0x401481e2
0x401481e2 is in nvs_flash_init() (esp-idf/components/nvs_flash/src/nvs_api.cpp:171).
166            partition->size / SPI_FLASH_SEC_SIZE);
167 }
168 
169 extern "C" esp_err_t nvs_flash_init(void)
170 {
171    return nvs_flash_init_partition(NVS_DEFAULT_PART_NAME);
172 }
173 
174 #ifdef CONFIG_NVS_ENCRYPTION
175 extern "C" esp_err_t nvs_flash_secure_init_partition(const char *part_name, nvs_sec_cfg_t* cfg)
(gdb) list * 0x40122b47
0x40122b47 is in OvmsNonVolatileStorage::OvmsNonVolatileStorage() (OVMS-llange/vehicle/OVMS.V3/main/ovms_nvs.cpp:40).
35  
36  OvmsNonVolatileStorage MyNonVolatileStorage __attribute__ ((init_priority (10000)));
37  
38  OvmsNonVolatileStorage::OvmsNonVolatileStorage() : m_restart_counter(0){
39   ESP_LOGI(TAG, "Initialising NVS (10000)");
40   esp_err_t err = nvs_flash_init();
41   ESP_LOGI(TAG, "Initialising NVS (10000) %d", __LINE__);
42   if (err == ESP_ERR_NVS_NO_FREE_PAGES || err == ESP_ERR_NVS_NEW_VERSION_FOUND) {
43   ESP_LOGI(TAG, "Initialising NVS (10000) %d", __LINE__);
44     nvs_flash_erase();

It may mean that during initialisation, a few things are missing, or another part is lock the mutex ?
I'll need some time to understand what's going on.

@markwj
Copy link
Member

markwj commented Nov 22, 2022

I have a question on why this is required? Is it not possible / easier to simply look at the files on SD CARD and set the next counter as the highest numbered log file + 1? Apart from the issue of trying to keep two separate things (the log files on SD card and the counter) synchronised (especially when the SD CARD can be easily changed), we also have to be aware of the limited number of write cycles on the on-board flash (and try to minimise writing/logging to that wherever possible).

@llange
Copy link
Contributor Author

llange commented Nov 22, 2022

Hi @markwj,
It's not required at all, just a nice-to-have feature (or so I thought).
I thought about scanning the sdcard, and I thought it could be too much tied with the path the user choose.
But you're right, minimising writing / logging is an important goal to keep in mind.

Let me try to work on the "find the highest numbered log file" + 1 thing, and I'll keep this discussion updated with a comparison between the two approaches.

@llange llange marked this pull request as draft November 22, 2022 07:00
@dexterbg
Copy link
Member

It may mean that during initialisation, a few things are missing, or another part is lock the mutex ?

It means the esp-idf NVS needs FreeRTOS running, so there will be no way to use this early in the boot process.

Is it not possible / easier to simply look at the files on SD CARD and set the next counter as the highest numbered log file + 1?

Let me try to work on the "find the highest numbered log file" + 1 thing

…or maybe as I suggested earlier, apply the log task scheme of writing to a specific file name and renaming that on cycling?

@llange llange force-pushed the keep-track-of-restart-counter branch from 05cb95b to 13b1944 Compare November 23, 2022 22:21
@llange
Copy link
Contributor Author

llange commented Nov 23, 2022

Is it not possible / easier to simply look at the files on SD CARD and set the next counter as the highest numbered log file + 1?

Let me try to work on the "find the highest numbered log file" + 1 thing

…or maybe as I suggested earlier, apply the log task scheme of writing to a specific file name and renaming that on cycling?

Regarding post-renaming vs creating the log file with the proper name right from the start, I think I prefer having the name right. Ultimately I think this is not much of an issue, but I'm always trying to imagine worst-case scenarii such as a sudden power loss during can log.
I may be wrong, but I think that having the file with the proper name could be a win in that scenario (of course, it will somehow be corrupted but I think we could extract part of it) - while for the other approach, I fear that there is a risk to overwrite the file at next boot.
What do you think ?

Regarding the naming - you rename the file with a timestamp, which is the right thing to do. However in my case I fear that for the first few files my module would still live in the past - it does timestamp my files in the 70s and the 80s at the moment (Network disabled, GSM disabled, GPS slow slow to receive time). So I prefer to offer the option of an incrementing counter (with appended timestamp):

     156  01-Jan-1980 01:01  /sd/log/ovms-vehicle/00000345/essai/00000001.crtd
    7.5k  20-Nov-2022 23:50  /sd/log/ovms-vehicle/00000345/essai/00000002.crtd
  127.9k  21-Nov-2022 00:22  /sd/log/ovms-vehicle/00000345/essai/00000003-20221120-235058.crtd
       0  30-Nov-1979 00:00  /sd/log/ovms-vehicle/00000345/essai/00000004-20221121-002246.crtd
       0  30-Nov-1979 00:00  /sd/log/ovms-vehicle/00000346/essai/00000001-19700101-010008.crtd
       0  30-Nov-1979 00:00  /sd/log/ovms-vehicle/00000347/essai/00000001-19700101-010009.crtd
       0  30-Nov-1979 00:00  /sd/log/ovms-vehicle/00000349/essai/00000001-19700101-010008.crtd
       0  30-Nov-1979 00:00  /sd/log/ovms-vehicle/00000350/essai/00000001-19700101-010008.crtd
       0  30-Nov-1979 00:00  /sd/log/ovms-vehicle/00000351/essai/00000001-19700101-010008.crtd
    5.6k  21-Nov-2022 01:23  /sd/log/ovms-vehicle/00000353/essai/00000001-19700101-010008.crtd
       0  30-Nov-1979 00:00  /sd/log/ovms-vehicle/00000353/essai/00000002-20221121-012311.crtd
       0  30-Nov-1979 00:00  /sd/log/ovms-vehicle/00000354/essai/00000001-19700101-010008.crtd
    7.2k  21-Nov-2022 01:57  /sd/log/ovms-vehicle/00000355/essai/00000001-19700101-010008.crtd
       0  30-Nov-1979 00:00  /sd/log/ovms-vehicle/00000357/essai/00000001-19700101-010008.crtd
       0  30-Nov-1979 00:00  /sd/log/ovms-vehicle/00000360/essai/00000001-19700101-010008.crtd
       0  30-Nov-1979 00:00  /sd/log/ovms-vehicle/00000361/essai/00000001-19700101-010008.crtd
       0  30-Nov-1979 00:00  /sd/log/ovms-vehicle/00000362/essai/00000001-19700101-010009.crtd
   22.1k  22-Nov-2022 01:11  /sd/log/ovms-vehicle/00000363/essai/00000001-19700101-010008.crtd

(first counter is number of restarts; second is number of cyclings for this "file")
It's still sortable, and gives a timestamp indication when there's one.

@llange llange force-pushed the keep-track-of-restart-counter branch from 13b1944 to d57d003 Compare December 2, 2022 22:05
@llange llange force-pushed the keep-track-of-restart-counter branch from d57d003 to 6c397f8 Compare April 13, 2023 19:58
@llange llange force-pushed the keep-track-of-restart-counter branch from 6c397f8 to 0d63028 Compare June 7, 2023 07:07
@llange llange force-pushed the keep-track-of-restart-counter branch from 0d63028 to 0482f8e Compare June 22, 2023 21:05
@llange llange force-pushed the keep-track-of-restart-counter branch 2 times, most recently from 87f42bf to fe88aeb Compare January 16, 2024 23:19
@llange llange force-pushed the keep-track-of-restart-counter branch from fe88aeb to 4ce56ac Compare April 5, 2024 22:15
In preparation of openvehicles#748, this counter will serve as part of an "unique"
token in the log file name ; ensuring that after a reboot we will
not overwrite an existing log file.

The counter is stored in NVS, and is incremented at each class instanciation;
which should roughly translate to every reboot.
(Don't know about the deepsleep for the moment).

The counter will reset to 1 after 99.999.999 restarts, which should gives us
plenty of time.

Signed-off-by: Ludovic LANGE <[email protected]>
@llange llange force-pushed the keep-track-of-restart-counter branch from 4ce56ac to 66d04ed Compare April 5, 2024 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants