![]() |
![]() |
| Projects (2003-2004) |
|
Binary Utilities for Multiprocessor Code Generation Code generation for a retargetable heterogenous multiprocessor, like the one targetted in Srijan, is a challenging problem. The heterogenity could be at multiple levels: (i) the processors may not identical, (ii) the memories used could be shared, local or all together different, say, FIFO memories, and (iii) how each processor is connected to other components may differ. To tackle this complex problem we divide it into two major tasks: high level compilation, which generates assembly from high level language like C, and low level code generation, which combines the assembly generated by different compilers and emits the program and data memory images. For the latter task GNU binary utilities (binutils) are well known and have been ported to various, but uniprocessor or homogenous multiprocessor systems. The goal of this project is to create a framework and tool suite for low level code generation for retargetable heterogenous multiprocessor with well defined interfaces to invoke high level compilation tools. We choose ELF format to represent various binaries involved (relocatable object files and executables) and IMPACT as high level compilation tool. We try to reuse as much as we can from GNU binutils, in specific, BFD library, gas and ld. The inputs to the envisaged framework are: application described as task network using C, architecture description in MD language and the mapping information of tasks to processors. The outputs are: program and data memory images in ELF and SREC formats. We would like to use multiprocessor simulator, which is being under development in another project, as testing aid.
Power Library of Architectural Components for ASIPs A part of the srijan project is a cycle accurate simulator which can be parametrized and is flexible to a large extent, allowing to vary common parameters and plug in new components as needed. The goal of this project is to build accurate power library plugin for the simulator. Power consumption is one of the rigid contraints when designing an embedded system. An efficient design should be able to optimize power consumption. So, including a power library in the simulator would be a helpful utility for an ASIP designer. This power library will be based upon optimized VHDL models of various architectural components. Power consumption of architectural components is highly dependent on its input. This is because inputs govern which parts of the circuits are activated. Therefore, any design of a power library would require simulations of architectural models for an exhaustive set of inputs. By the end of this project, we intend to come out with a set of empirical functions for each component which will relate power consumed with the input. These functions will be integrated with the simulator.
A Compiler Optimization to Exploit Pipeline Registers and Forwarding Circuitry Usually processors contain a limited number of registers. Added to it the fact that all instruction operands should reside in registers make the register file an important and yet scarce resource. So the register allocator must make the most efficient use of these registers so that the number of accesses to memory are minimised. We can get some optimisation in register allocation by exploiting the fact that a significant number of values used in the program have just a single use. For such variables whose live-ranges spawn only a few instructions (typically 2 or3) the register write and in some cases even register allocation (if the variable has just one value) is unnecessary as the value will be present in the pipeline registers and the instruction that uses that value can get that directly from the pipeline registers. We can do this just by the use of the comparator circuit used for bypassing. The reason for choosing a VLIW processor for this implementation is that because of large ILP the fact that the single use of the value occurs within 2 or 3 instructions immediately following its definition is more pronouncing. So we can get better amounts of saving in power consumption than in normal RISC processors. Further optimisations in cost(area) can be achieved by adding additional information about forwarding in the instruction itself during compilation process as the dependencies are known at compile time itself. By this we can avoid the use of comparators, whose number grows quadratically with number of FUs. Thus we can further reduce power consumption as the forwarding circuitry is simplified.
Component Library for Embedded Multiprocessor Simulator The goal of the project is to build a component library for an embedded Multiprocessor Simulator. The first part of the project deals with consolidating the behavioral model of the uni-processor simulator and testing it thoroughly on media applications. At this stage, we will have a very simplistic model of a multiprocessor, where one can just plugin uniprocessors running independent programs.
The second part deals with addition of the following parametrizable components
to the simulator:
Focus will be on keeping the simulator parametrized and extendable to a large extent, allowing the user to vary common parameters and plug in new components as needed.
Bit-width Analysis in Behavioral Synthesis from C
Bit-width is the number of bits used to represent an operand. Bit-width
analysis attempts to minimize the bits used for representing
each variable in a program while retaining program correctness.
For instance we can use the bounds on an array to set the
index variable's maximum bitwidth. This would result in data path and register
area and power savings, and better performance of synthesized circuit.
Thus the objective of this project is to
Optimizations in Behavioural Synthesis from C In Behavioural Descriptions loops are often used and known to contribute to the most of the execution time. If we are able to optimize the loops, we can expect considerable improvement in the performance/area of the hardware synthesized from it. There are many loop optimizations such as loop unrolling, loop fusion, loop fission, etc. In this project we would like to study the impact of loop unrolling in behavioral synthesis from C description. It is well known that loop unrolling improves the parallelism and results in better schedules with lesser clock cycles. But it also affects the register allocation, complexity of data path interconnection, control unit complexity. Some of these would demand more hardware area and some may lengthen the critical path. However, a systematic study of these other effects of loop unrolling is missing. Also no methdology or heuristic is known for deciding the optimal loop unrolling factors. In this project we like to bride this gap. Apart from the impact on area and performance we would also like to investigate the impact of loop unrolling on power and energy consumption.
Floating Point Unit (FPU) Integration with LEON The Goal of this project is to integrate a floating point unit with LEON processor. This would involve designing and implementing an interface between the generic co-processor interface of LEON and an opensource Floating point Unit. The LEON model provides two interface options for a floating-point unit : either a parallel interface or integrated interface where FP instructions do not execute in parallel with the Integer instructions. Both interface methods expect an FPU core to have an interface same as the generic co-processor interface defined for LEON processor.
Our project aims to create a LipSync application for the embedded ARM platform. We shall be integrating it with a text-to-speech converter. This has many uses. It can be used to create an email reading application for a handheld device. Ideally, the application could take an email as input, and create an animated character to read out the email, with the facial movements synchronised with the speech readout. Applications of LipSync: a. SMS Reader - The cellphone would have your favorite cartoon character read out your SMS/email correspondence.The same message could also be read out in different downloadable tones, voices and faces. b. Mobile VideoPhone - Voice communication is very effective when the listener is able to observe the mouth of the speaker. It is particularly important for hearing-impaired users. The LipSync application would be able to generate a talking picture so that the user would understand the message more clearly.
Hardware Software Codesign for Speech Synthesis Software [Klatt] There are two extreme approaches for implementing an application - pure software and pure hardware. Pure software version is slow and pure hardware version is expensive. The codesign actvity is the intermediary version where one tries to optimize over cost and speed. This helps us reducing power consumed by system and we can reduce the system size while maintaining the real time performance (like today's cellphone). In this project the target application is klatt speech synthesizer which is based on formant synthesis.
Extensions to Real Time Kernel RtKer RtKer is a Real-Time Operating System (RTOS) developed at IIT-Delhi. It has an extremely small footprint, making it suitable for embedded systems. Currently Rtker only runs in a uniprocessor environment. Our first objective is to port RtKer into a multiprocessor environment, in particular on Leon multiprocessor, which is a free SPARC V8 compatible processor core freely available in the form of synthesizable VHDL. RtKer has a customizable scheduler framework under which the scheduler is not a part of the kernel but a part of the application. The kernel interacts with the application specific scheduler through a fixed Scheduler API. This customizability is particularly suited for the design of embedded systems, as it gives an embedded system designer the ability to have an optimal scheduling policy customized to the targeted application. The next part of the project aims on extending RtKer for multiprocessor embedded applications. We plan to exploit the pluggability of the scheduler. We will explore the possibility of a semi-automated framework wherein the designer specifies the constraints of the application and the resources available, alongwith other information about partitioning, etc. The framework should be able to analyze the information submitted to it, and generate a pluggable scheduler for RtKer that will satisfy the application constraints. This Scheduler Compiler will not generate the scheduler from scratch, rather it will choose from a subset of scheduling policies discussed in the book Real-time Systems by Jane W. S. Liu.
Hardware Accelerator for Ray-Tracing Ray-tracing is a rendering technique that calculates an image of a scene by simulating the way rays of light travel in the real world. However it does its job backwards. PCTrace is a raytracer. It creates photo-realistic images using a raytracing rendering technique. It reads a text file containing information about the objects and lighting in the scene and generates an image of that scene from the view point of the camera also specified in the textfile. The goal of the project is to design a hardware that accelerates the process of generating an image from given scene by pipelining and parallelizing the computation intensive code within the PCTrace. This involves spliting of PCTrace into two parts: one part, which needs high computation, is embedded into the hardware, and the other part which runs on this dedicated hardware accelerator. At the end of the project, we have a hardware accelerator for ray tracing with PCTrace installed on it. Given the scene as text input, this product will generate as output, an image of input scene in comparatively less time.
The goal of this project is to analyse a SystemC Description with 'wait()' statements inside arbitrarily complex control structures, and derive the FSM that is implied. The idea here is to look into the control structures used in the code written in SystemC.We assume that wait() statements define the boundries of the states so number of states in FSM will be equal to the number of wait() statements. The dataflow executables between the wait() statements should be extracted using a compiler toolkit(SUIF) and respective states should show the exact control flow what the designer had implemented.
Customization of Cache Memory for Embedded Systems Exploration of cache memory has always been a very important aspect of system design because it compensates for the ever increasing gap between cpu and memory speed. Unlike the general purpose systems where design of cache memory concerns of the average optimal behaviour i.e. the configuration that gives optimal or better performance for a variety of applications, it can be optimally configured in case of an embedded system, thereby saving and effectively utilising the costly on-chip area. The goal of this project, hence, is to automate the process of customizing the cache memory for any given application. This involves analysis of the given application using a compiler toolkit (SUIF2) and determining the optimal cache configuration and validating the result with the help of a cache simulator (dinero IV or sim-cache).
Extending the LEON Multiprocessor (LEON-MP) with Local Memory LEON is a SPARCv8 compatible processor core. In previous project it has been extended from a uni-processor system to a on-chip Multiprocessor System. The main idea behind implementation of Local Memory is to reduce the energy consumption of the memory unit, reduce the power and increase the performance. The complete address space will now be divided between Cacheable memory and Non-Cacheable(Local Memory) memory. In cache memory the mapping is done during run time, whereas in Local Memory system this is done by the user. Local Memory will store certain critical data so that access time for these critical data can be reduced and we don't have to search for this data in the cache memory or in main memory.
Process Network Implementation of MPEG4 The project is part of a bigger framework whose aim is to identify several modelling issues involved in creating a parallel model using Process Networks, which can be used to expand the application set and estimate the performance of other application models. A standard MPEG4 Decoder implementation has been chosen to be re modelled with increased parallelism and hence increase performance. A detailed study of the Decoder, is required to deal with the overheads of context switching, communication and synchronization, overheads imposed in parallel models. Just like in MPEG2 certain independent tasks, such as variable length decoding, motion estimation, compensation, IDCT can be identified. But there is much more to the MPEG4 with the concept of interacting visual objects which are embedded in a scene.
Customization of Xtensa Microprocessors Find out the various details of the Xtensa Microprocessor Emulation kit(XT2000), by Tensilica. This enables to evaluate various processor configuration options and to initiate software development and debug early in the SOC design cycle. This kit is to be used to specify the configuration including the instruction set for the mpeg4 encoder for which the decoder has already been made. Project Homepage
The project aims at designing a MIPS emulator with MIPS running on a FPGA. The emulator will provide user interface similar to SPIM. It involves:
Project Homepage
Case Study on ADM-XRC-II Board To explore various features and capabilities of the ADM XRC II FPGA board. Comparison of performance on applications running on Leon with/without co processor(s). To port some algorithms(applications) where computational part run on a separate hardware unit and software/interactive part runs on PC. We are currnttly working on implementing fft hardware on the FPGA Project Homepage
|
| http://embedded.cse.iitd.ernet.in |
©
1998-2009
Department of Computer Science & Engineering,
IIT Delhi
|