Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 5023

Bare metal, Assembly language • Re: Bare metal toggle GPIO weird timing

$
0
0
The loops up and dn in your assembler code should ideally be equally fast, as they each consist of two instructions: a str (Store Register) and a b (Branch). However, there can be differences in execution time.

There are several reasons why this might not be the case:

Pipeline Effects and Dependencies:

Pipeline Filling: When the CPU loads and executes instructions, this happens in multiple stages (Fetch, Decode, Execute, etc.). If the branch (b) occurs to an address not in the Branch Prediction Cache (BPC) or Instruction Cache (IC), this can lead to delays.

Cache Effects:

Instruction Cache: The points to which the loops branch may be differently organized in the cache, leading to varying cache hits and misses.

Data Cache: If the memory addresses for str are not optimally placed in the cache, this can lead to additional latency.

Memory Accesses:

Different addresses [x1, #0x1c] and [x1, #0x28] might be organized differently in memory or cache. If one of the addresses causes cache misses or is located within the same cache line area as a recently used address, this could cause additional latencies.

Instruction Synchronization:

Branch instructions can be differently affected by CPU-internal Branch Predictor effects and Branch Target Buffer optimizations. If one loop is better predicted by the Branch Target Buffer than the other, it can lead to differences in execution time.

Revised Version to Minimize Timing Differences

Code:

.text.global mainmain:    ldr x1, =0xfe200000    mov w0, #0x40000    str w0, [x1, #0x08]    mov w0, #0x4000000loop:    str w0, [x1, #0x1c]    b .L1    .L1:    str w0, [x1, #0x28]    b loop 
This essentially looks the same, but it actually helped me once.

Statistics: Posted by satyria — Thu Dec 05, 2024 12:52 pm



Viewing all articles
Browse latest Browse all 5023

Trending Articles