Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[humble request] add support for input from flash memory (memcpy_P) #745

Open
atesin opened this issue Oct 13, 2022 · 9 comments
Open

[humble request] add support for input from flash memory (memcpy_P) #745

atesin opened this issue Oct 13, 2022 · 9 comments

Comments

@atesin
Copy link

atesin commented Oct 13, 2022

hi... as all we know, hhXash is a very small and efficient hash algoritm, what makes it perfect for embedded systems

most embedded systems with harvard architecture (like development boards, home routers, soc's, thin clients, media centers, etc.) have eeprom/flash memory storages and have the capability to read data directly from program flash memory to save ram

recently i had a problem trying to make streaming mode work in arduino uno (stripped down example)

XXH32_state_t *state = XXH32_createState();
XXH32_reset(state, 0);
XXH32_update(state, "ardu", 4);
XXH32_update(state, (PGM_P) F("ino"), 3);

the program crashes at 4th line because it doesn't know what to do with (PGM_P) F("ino") what is a pointer to a char array (c_str) stored in higher program flash memory

for these situations where data resides in program memory, avr, xtensa, gnu c++ compilers and many other architectures and sdk's have the *_P version for some functions, like strlen_P(), strcpy_P(), memcpy_P(), etc. (i.e. for arduino, for esp8266)

i tried adding this function to xxhash.h in line ~1740 and add support for XXH32_update() as a test for my particular case and worked flawlessly...

static void* XXH_memcpy_P(void* dest, const void* src, size_t size)
{
    return memcpy_P(dest,src,size);
}

if you think it could help small systems users and worth to be added later, more *_P functions could be "oficially" added and enabled/disabled through a build modifier

please tell us what do you think, and if you think it could be a good idea

@t-mat
Copy link
Contributor

t-mat commented Oct 13, 2022

Good point.
We may be able to split the change with the following changes and additinal issues.

(1) Eliminate raw memcpy() from xxhash.h

Unfortunately, we have some raw memcpy() in xxhash.h which can be found with the following command

$ grep -n "[^_]memcpy" ./xxhash.h

We should eliminate them before other changes.

(2) Introduce new build modifier macro XXH_MEMCPY

Define the following macro somewhere in xxhash.h

#if !defined(XXH_MEMCPY)
#  define XXH_MEMCPY XXH_memcpy
#endif

And use XXH_MEMCPY everywhere instead of XXH_memcpy

(3) note: We should avoid to use XXH_MEMCPY outside of xxhash.h

Since tests and CLI are depend on "standard" stdlib,
I think we should not use XXH_MEMCPY outside of xxhash.h.

(4) problem: How can we test and keep these changes?

Since memcpy() works fine in almost all platforms, I have no idea so far.

  • Setting explicit memory protection (set source memory region as "write-only" memory)?
  • Define special memcpy() which detect unintended memory access?

@Cyan4973
Copy link
Owner

Cyan4973 commented Oct 14, 2022

If I understand correctly,
it seems that @atesin wants to circumvent a limitation of memcpy(),
which makes it unable to read from pointers of the kind (PGM_P).

I'm indeed surprised to learn that memcpy() has such limitation.
I would have expected it to be able to read from read-only ROMs for example,
hence from higher program flash memory too.

Anyway, assuming this is not classified as a memcpy() bug,
and that the acknowledged solution is to invoke a variant memcpy_P() instead,
then yes, one could imagine a way to allow a compile-time redirection of memcpy() for xxhash.

Indeed, the first step, as mentioned by @t-mat, should be to ensure that there is no direct call to memcpy(),
and everything is invoked through XXH_mempcy(),
so that the modification only needs to be done in one place.

@t-mat
Copy link
Contributor

t-mat commented Oct 14, 2022

potential issue: How can we mix memcpy (RAM to RAM) and memcpy_P(PROGMEM to RAM) ?

Both of AVR and Tensilica have families/variants which contains/supports larger RAM.
So some apllication or variant processor user may want to use memcpy (RAM to RAM) instead of memcpy_P (PROGMEM to RAM).

Possible solutions are

  • Leave it as is. I think it's okay to trust application developers who can choose proper compile options.
  • They must define their own unique XXH_NAMESPACEs for memcpy and memcpy_P variant.
  • (C++) Surround #include "xxhash.h" with namespace {}.
  • (C++/extreme) Use #embed and constexpr for read-only/AOT-immutable data. 😈

@Cyan4973
Copy link
Owner

Cyan4973 commented Oct 14, 2022

oh, so memcpy_p() (or memcp_P()?) is not a superset of memcpy(), and must be used carefully for the right circumstances ?
I presume the differentiation comes from the src pointer, which is const void* for the "normal" case, but has a different type for these specific scenarios ?

In which case, I could imagine a C11 macro solution based on _Generic,
but indeed, the exact implementation would have to be done specifically for each project.

One important thing xxhash can provide is to ensure that all memcpy() operations are controlled through XXH_memcpy(). But I suspect it won't be enough. Functions like XXH32_update() would likely need a similar treatment, and now it becomes specific enough to be dealt with in a dedicated fork.

@t-mat
Copy link
Contributor

t-mat commented Oct 14, 2022

AFAIK, AVR has dedicated instruction for each memory type.

  • Program memory : lpm (Load Program Memory) or its variant.
  • Normal memory : lds (Load direct from Data Space) or its variant.

And actual implementation of memcpy_P() explicitly uses lpm (X_lpm in the code).

http://svn.savannah.nongnu.org/viewvc/avr-libc/trunk/avr-libc/libc/pmstring/memcpy_P.S?view=markup

oh, so memcpy_p() (or memcp_P()?) is not a superset of memcpy(), and must be used carefully for the right circumstances ?

( Sorry, I mean memcpy_P(). I edited my previous post. )

Yes, application programmer must carefully recognize their situation and choose memcpy() or memcpy_P().

I presume the differentiation comes from the src pointer, which is const void* for the "normal" case, but has a different type for these specific scenarios ?

I'm not sure about ability of recent compilers.
But since their instructions are completely different, they likely should use proper functions/instructions for them manually.

One important thing xxhash can provide is to ensure that all memcpy() operations are controlled through XXH_memcpy(). But I suspect it won't be enough. Functions like XXH32_update() would likely need a similar treatment, and now it becomes specific enough to be dealt with in a dedicated fork.

Right. If we'll support these NUMA platform, I think we should define and use the following functions

  • Copy source (special, readonly memory) to main RAM (or register).
  • Copy main RAM memory to main RAM.

And I'm not sure we still have speed gain with these restrictions or not.

@atesin
Copy link
Author

atesin commented Oct 14, 2022

hi... thanks for your interest

@t-mat

(2) Introduce new build modifier macro XXH_MEMCPY

Define the following macro somewhere in xxhash.h

i suggest another name like XXH_PROGMEM more appropiate for this case, since the latter sounds specific to that function and not generic for all possibly progmem functions needed

also, i think support for progmem functions should be added, not switched or replaced... see for example the first example i posted in which we can see calls to overloaded XXH32_update() with both datatypes char * and PGM_P

be aware in this case PGM_P is just a macro replacement for char * so it doesn't work for typecasting... for a deeper explanation read THIS (this is for esp8266 but works mostly similar for arduino, be aware FPSTR macro is not present in arduino)

@Cyan4973
i guess the components/circuitry/process/timings/addresses/etc to read eeprom/flash memory are different from normal ram operations, so they deserve different opcodes/functions ... as i noted above, PGM_P doesn't work for typecasting, that is why my program did compile, but crashed at runtime... for same reason you can't rely easily on overloaded PGM_P functions (read the explanation linked above)... in this case, chances are users (i.e. app developers) know when they are working with progmem data, so writing *_P functions like XXH32_P() or XXH32_update_P() won't hurt them...

i believe *_P functions are not part of ansi c standard, but part of some gnu c++ compilers... remember available *_P functions internally use specific opcodes to work directly with flash memory through different platforms, so we can just go and use them for ease

even until i know is not optimal (especially for such a limited platform like arduino and similar), in the meanwhile i am trying the workaround below ... so i think the point all this conversation is to actually to keep the work with this lib blaze optimized and avoid heavy workarounds like this snippet:

// having a previously instantiated and resetted hash state, and another feed(char*) function
void feed(const __FlashStringHelper *inputData){
  if ( !errorCode ){
    // for code below i'm wating what happen with: https://github.com/Cyan4973/xxHash/issues/745
    //errorCode = XXH32_update_P(state, (PGM_P) inputData, strlen_P((PGM_P) inputData));
    // slimy workaround in the meanwhile...
    char bufData[17];  // yes, another temporary buffer :/
    bufData[16] = '\0';  // strncpy_P() below doesn't terminate strings
    PGM_P ptrData = (PGM_P) inputData;
    PGM_P endData = ptrData + strlen_P(ptrData);
    while ( ptrData < endData ){  // yes another iteration :/
      strncpy_P(bufData, ptrData, 16);
      feed(bufData);
      ptrData += 16;
    }
  }
}

@Cyan4973
Copy link
Owner

Closing. It seems a version differentiating memcpy() from memcpy_P() depending on memory type is out of scope for this repository, as it would significantly impact the entire code base for the benefit of a single specific use case.

@atesin
Copy link
Author

atesin commented Jul 15, 2023

hi... i know this issue is closed .. don't be so drastic yet, because the solution may be more simple than you think

since it had been some time from this issue, maybe i forgot some details (additionally, i am not a C/C++ programmer but a regular user, so to have been able to talk in same tech level with you fills me with pride :) )

... but basically, taking the example from my opening post and the commonn existence of PROGMEM macro discussed here, i think simply a code like this could be added

#ifdef PROGMEM
static void* XXH_memcpy_P(void* dest, const void* src, size_t size)
{
    return memcpy_P(dest,src,size);
}
#endif

this should make available the new XXH_memcpy_P() function only if the platform supports progmem functions (assuming a platform with no progmem functions have no PROGMEM macro defined ;) (at least avr and xtensa, maybe stm32 and others))

whaf if you reopen the issue and request it to some skilled collaborator? (i could make a blind PR if you want)

thanks

@Cyan4973
Copy link
Owner

I think the issue here is that we must first make a complete pass over the code to understand where the memcpy() function is invoked, both directly and indirectly,
in order to ensure that only the parts that read from input get impacted by the memcpy_P() change, while all other parts continue to use the regular memcpy().

I suspect this is a bit more complex to ensure than it sounds, but it would still be tractable if limited to XXH32, and likely XXH64. It tends to become more complex though with XXH3_* variants. And the issue is, all these variants share the same read functions kernel.

@Cyan4973 Cyan4973 reopened this Jul 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants