BIO – The Bao I/O Co-Processor

(crowdsupply.com)

20 points | by hasheddan 2 days ago

4 comments

  • theamk 2 days ago
    It's a very nice write-up, but this part makes me uneasy:

    > So long as all the computation in the loop finishes before the next quantum, the timing requirements [...] are met.

    Seems like we are back to cycle counting then? but instead of having just 32 1-IPC instructions, we have up to 4K instructions with various latency, and there is C compiler too, so even if you had enough cycles in budget now, the things might break when compiler is upgraded.

    I am wondering if the original PIO approach was still salvageable if the binary compatibility is not a goal. Because while co-processors are useful, people did some amazing things with PIO, like fully software DVI.

  • jauntywundrkind 2 days ago
    Really glad to get this write-up, adds a very nice broad picture & does a good job introducing the queue too.

    I'm an unranked unwashed neophyte at hardware design, but I did spend some time looking at BIO. One particular thing that caught my eye a while ago was Streaming Semantic Registers, which is an instruction set extension for risc-v where load and store are implicit, with data pointers that automatically walk on each instruction. This greatly increases code density, allowing for DSP like capabilities on risc-v. https://arxiv.org/abs/1911.08356

    I forget how exactly I was convinced, but after spending a while chatting with the LLM, I became somewhat convinced that the FIFO queues here gave a lot of similar capabilities. With additional interesting use for decoupling multiple systems. Register mapped data arrays, that can be used without having to load/store each word. I felt then and felt now that I still have a good bit to learn about how exactly each of the FIFO registers works, but it was cool to see, and I love this idea of code that can run without having to issue endless load/stores all the time.

  • rasz 1 day ago
    PIOs might be heavier on hardware resources

    >The BIO uses 14597 cells, while the PIO uses 39087 cells

    and BIO might reach higher clock speeds

    > when ported to an ASIC flow, the clock rate achieved by the BIO is over 4x that of a PIO implemented in the same process node.

    but BIO is ~15x less efficient per clock. RP2350 is capable of reading IOs at 400Mbps (https://github.com/gusmanb/logicanalyzer) and bitbanging at 800 Mbps (HSTX). From Bunnie writeup BIO needs 700MHz to do pedestrian 25Mhz SPI.

  • fragmede 53 minutes ago