• How do the ARM Compilers handle memcpy()?

    2010. 1. 13.

    by. 꼼발남자





    Applies to: ARM Developer Suite (ADS)RealView Developer Kit (RVDK) for OKIRealView Developer Kit (RVDK) for STRealView Development Suite (RVDS)

    In the general case, when compiling calls to memcpy() the ARM C compiler will actually generate calls to an optimised library function instead. The names vary depending on the version of the compiler used. There is an original name used in ADS and RVCT 2.0 and a new AEABI name used in RVCT 2.1 and later by compilers compliant with the ABI for the ARM architecture (AEABI) .

    The basic optimised version can copy arbitrary amounts of data using aligned or unaligned source and destination addresses.

    Original : __rt_memcpy

    AEABI: __aeabi_memcpy 

    It is possible for the compiler to further optimize the memory copy when it detects the use of aligned source and destination addresses. If the compiler can determine that the addresses will always be word-aligned then it will generate a call to a more optimized copy function which takes advantage of LDM/STM instructions to copy several words at a time.

    Original: __rt_memcpy_w

    AEABI: __aeabi_memcpy4

    Only ARM versions of these functions are provided. Prior to RVCT 2.0, the Thumb libraries also contain __16_rt_* versions of these memcpy functions, which simply change to ARM state and branch to the ARM version of the function. In RVCT 2.0 and later, there are no Thumb versions because the linker will automatically generate inline veneers to perform the state change.

    A further optimization will take place if the compiler can determine that you require a word-aligned copy of a small number of bytes (typically <= 64) which is a multiple of four (e.g. 36 bytes). In this case, rather than calling a function it will actually inline multiple LDM/STM instructions to perform the copy. The maximum number of registers that it will transfer in one LDM/STM is 4 - this is due to our desire to keep interrupt latency down.

    Due to the tight integration between the compiler and the libraries, you must take care when copying data using unaligned pointers. The ARM compiler assumes that all pointers are naturally-aligned (i.e. int* is word-aligned, short* is halfword-aligned, etc.). You need to either explicitly tell the compiler when you are using unaligned pointers by using the __packed keyword (described in the compiler guide), or create a temporary char* pointer to access the address. For example:

    #include <string.h>
    
    unsigned int * const dest;
    
    void example (unsigned int * const unaligned_ptr)
    {
      __packed unsigned int * packed_ptr = unaligned_ptr;
      char * temp_ptr = (char *)unaligned_ptr;
      memcpy(dest, unaligned_ptr, 32);         /* Unsafe */
      memcpy(dest, (void *)packed_ptr, 32);    /* Safe   */
      memcpy(dest, temp_ptr, 32);              /* Safe   */
    }
    

    In both of the safe cases the compiler will call __*_memcpy and not try to copy words at a time. In the unsafe case the compiler will try to call the word aligned version which is not safe because unaligned_ptr is not word aligned.

    If you need to disable the automatic optimization of calls to memcpy you can use the undocumented compiler switch -Ono_memcpy. This will disable inlining of memcpy and the conversion of memcpy to the specially optimized library functions, such as __aeabi_memcpy. Instead, the compiler will just call the C library function memcpy(). This gives you the ability to provide your own implementation of memcpy(). If your project includes a new implementation of memcpy() then the linker will replace the C library version with your version.

    In a similar way calls to memmove() and memset() may result in calls to optimized versions, as well as inlining. Calls to memset() with zero as the initializing value result in a call to an optimised memclr(). RVCT also provides versions of memcpy() and memmove() optimized for Architecture v6 processors which support unaligned access in hardware. All these optimizations are disabled with -Ono_memcpy.

    These optimised functions are described further in the Run-time ABI for the ARM Architecture, part of the AEABI, which can be found at http://www.arm.com/products/DevTools/ABI.html.



    댓글