Integrating an HLS accelerator into Petalinux

In a previous post we created an HLS accelerator that was used in a bare metal application. In this post we will examine how we can integrate the above mentioned peripheral in a embedded linux system, which in our case is the Petalinux 2016.4 distribution.

The whole project with the hdf file, bootable images and test files is here.

First things first, we will use this implemented hardware for our accelerator. The block design is exactly the same with the only difference being the ethernet port that needs to be added so that we can transfer the necessary files to our embedded system. This is an optional way; the alternative one is to transfer the files through the serial port which is very slow.

eth01eth02

After this modification we can create the bitstream and export the hardware description. Now we are ready to start the petalinux build process. In this post we explained step by step the process of creating a bootable petalinux with custom application.

!Note that in petalinux 2016 and after, the created application is stored at the <petalinux-project dir>/project-spec/meta-user/recipes-apps/.

In order to make our HLS kernel accessible from the OS we have to create a uio driver as well as making a few changes in the system device tree.

We modify the system-top.dts file in the <petalinux-project dir>/project-spec/meta-user/recipes-dt/device-tree/files/.  as shown below

/dts-v1/;
/include/ "system-conf.dtsi"
/ { chosen {
        bootargs = "console=ttyPS0,115200 earlyprintk uio_pdrv_genirq.of_id=generic-uio";
    };
    amba_pl { #address-cells = <0x1>;
        #size-cells = <0x1>;
        compatible = "simple-bus";
        ranges;
        sobel@43c00000 { compatible = "generic-uio";
            reg = <0x43c00000 0x10000>;
            xlnx,s-axi-control-bus-addr-width = <0x5>;
            xlnx,s-axi-control-bus-data-width = <0x20>;
        };
    };
 };

We load the UIO modules with the following commands:

modprobe uio
modprobe uio_pdrv_genirq

We can list the loaded modules with:

lsmod

And in order to make sure that the /dev/uioX correctly represents the UIO device

mdev-s

In our application now, we use the auto created linux header files from the HLS and we use this initializing function for the kernel.

void sobel_init(){
 //Kernel - Init 
    printf("Ready to init kernel\n");
    //In the second argument we use the exact name
    //of the top function of the HLS 
    //otherwise the app crashes
    int status = XSobel_Initialize(&Sbl,"sobel");
    if(status != XST_SUCCESS){ 
        printf("XSobel_Initialize failed\n");
    }   
    printf("Kernel initialized\n");
    XSobel_InterruptGlobalDisable(&Sbl);
    XSobel_InterruptDisable(&Sbl, 1); 
    printf("Interrupts Disabled\n");
    XSobel_Set_in_pointer(&Sbl,(u32)INPUT_BASE_ADDR); 
    printf("Input initialized\n");
    XSobel_Set_out_pointer(&Sbl,(u32)OUTPUT_BASE_ADDR); 
    printf("Output initialized\n");
    printf("Sobel kernel initialized with %x for input and %x for output\n",XSobel_Get_in_pointer(&Sbl),XSobel_Get_out_pointer(&Sbl));
}

A major difference between a bare metal application and a embedded OS one is at the memory mapping. In the bare metal world every address is a physical address but in an operating system we only see the virtual ones; so we have to find a way to access a specific physical address from our application. That is done with “mmap” function which returns the virtual mapping of a physical address. In our app we want to use the 0x05000000 as the input address and the 0x07000000  as the output one.  We chose these addresses because they fit in the 512mB ram of our Zedboard and also because the operating system doesn’t use them. We create the mappings in the following way.

//Physical addresses
#define INPUT_BASE_ADDR 0x05000000
#define OUTPUT_BASE_ADDR 0x07000000
#define SIZE 1024
unsigned char *hw_addr;
devmem = open("/dev/mem", O_RDWR | O_SYNC);
PageOffset = (off_t) INPUT_BASE_ADDR % getpagesize();
PageAddress = (off_t) (INPUT_BASE_ADDR - PageOffset);

hw_addr = (unsigned char *) mmap(0, SIZE*SIZE*sizeof(unsigned char), PROT_READ|PROT_WRITE, MAP_SHARED, devmem, PageAddress);

Apart of the above mentioned changes the application code is exactly the same with the bare metal one. It goes without saying that we don’t use the xillfs functions for file-IO because in petalinux we have the linux file system.

Benchmarks

In our application we applied the sobel filter in a 1024×1024 image both in software and hardware getting the following results:

Hardware average time: 0.0630581s

Software average time: 0.407505s

Acceleration factor: 6.462366!!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s