EECS 446 Low-level Kernel Development

**Embedded FPGA-CPU Communication via an External Memory Interface**

**Abstract:** Many embedded devices are designed around the concept of heterogeneous computing for both speed and power consumption reasons. Often this will lead to systems containing a primary compute unit (a CPU), and a secondary compute unit that acts as a co-processor, where the two are connected through some variant of a system bus. The most notable example of this would be the standard CPU-GPU model seen in laptop and desktop computers. In more specialized cases, such as test instrumentation, the model will often take the form of CPU-FPGA, where the primary compute unit is a low power embedded processor. In the context of test instrumentation, this model allows for real-time data streaming from sensors to be processed by the FPGA,and then presented to the user by way of the CPU. The design and implementation of such a system, however, is non-trivial, requiring custom software solutions to properly bridge the CPU-FPGA gap. For example, signal acquisition and display, as seen in bench oscilloscopes and spectrum analyzers, is an application of a CPU-FPGA. To implement a FPGA-MPU bridge design, we will be considering communication between a TI AM1808 ARM CPU and an Altera FPGA . The communication will be done over the exposed EMIFA bus on the ARM CPU,which provides an interface of synchronous and asynchronous RAM, as well as DMA channel support. In the end, we were able to verify successful write operations to the FPGA, however,we were unsuccessful at read operations.

EECS 468 GPU Computing

**A GPU Accelerated Non-Uniform Fourier Transform **

**Abstract:** We implement the Non-Uniform Direct Fourier Transform (NDFT) on the graphics processing unit (GPU), to provide a fast, low-error solution to computing the Fourier Transform of a Non-Uniformly space data-set. We implement three methods on the GPU, two for the Fourier transform, and one for the Adjoint transform. We compare our implementation to that of the central processing unit (CPU) implementations from the Non-Uniform Fast Fourier Transform (NFFT) Library. We show that for the case of sparse input data, prevalent in image reconstruction, we are able to achieve a performance gain using the GPU when compared to an over-sampling FFT approach on the CPU.

EECS 452 Advanced Computer Architecture

**Applications of Fixed-Point Computation for Approximate Computing **

**Abstract:**High-performance computing, often used in scientific applications, is frequently seeking to improve performance in order to speed up the enormous calculations being performed. This paper presents and discusses the potential for floatingpoint operations to be approximated by fixed-point computations in order to reduce the cycle latency of arithmetic operations. While this implementation may increase the error of individual computations, applications which are iterative and error-tolerant can perform its calculations much more quickly, thus reaching an acceptable answer more quickly. After performing LU decomposition on several different types of matrices of similar size, we determined both effect that fixed-point operations had on both the speed of arithmetic calculations and the error of the result. As we will show below, a fixed-point architecture can even increase the accuracy of the results if the proper precision is selected.

MATH 519 / 597 – Stochastic Processes / Numerical Methods for Coupled PDEs

** On Applications of Mori-Zwanzig Formalism to Multi-gird Methods for Laplace’s Equation **

**Abstract:**We consider applications of the Mori-Zwanzig formalism to Multi-grid methods for Laplace’s equation in terms of improved prediction for prolongation and restriction steps. This is done by effectively transforming the respective steps of the Multi-grid method into a Stochastic differential equation where error in the prolongation step is modeled as noise. This noise is then estimated by an infinitely long memory term, and results in the derivation of a prediction that effectively becomes an error estimation term. The added complexity of the prediction is within the same complexity of the original Multi-grid algorithm providing hope that it may prove to be a practical accelerator to the existing Multi-grid methods.

ASTRO 475W – Stars and Galaxies

** Investigation of the Relaxation Timescale under Newtonian and Modified Newtonian Gravity **

**Abstract:** We investigate the relaxation behavior of a self gravitating N-body system for a non-softended potential, and compare the result with theoretical models. As often is the case in the literature, this behavior is studied in the case of an isothermal sphere, with varying number of particles. In addition to relaxation for Newtonian Gravity, the relaxation timescale is derived for the case of MOND under analytic approximations and averaging. We find that MOND has behavior distinct from that of classical Newtonian Gravity in the case of relaxation timescale, and attempt to model this case numerically. Due to inconsistencies obtained between the theoretical and numerical models for classical Newtonian Gravity, we did not proceed to the stage of numerically testing MOND. The numerical result obtained could be further evidence for a lack of equipartition solution in the case of self gravitating systems.

ASTRO 534 – Stellar Structure and Evolution

** Fractional Ionization of Hydrogen Gas **

**Abstract:** We investigated numerical solutions to the Saha ionization equations for neutral and ionized hydrogen and helium, in an effort to calculate position dependent gas properties of the stars.

** Steady State Solution to a 5 Solar Mass Star **

**Abstract:** We considered solving the equations of state for a 5 solar mass star using the shooting method. The solution was obtained ignoring the transient terms for the stellar structure.

** Numerical Methods for Polytropic Neutron Star models and their Behavior under General Relativity and Rotation **

**Abstract:** Numerical solutions to the Lane-Embden equations and the Tolman Oppenheimer Volkoff equations for a polytropic sphere under classical and general relativity, respectively. The results were carried out with a complex RK-45 integrator in MATLAB. The Lane-Embden equations were transformed using Hartle perturbation theory of neutron stars to account for the behavior with rotation. The Hartle perturbations were not applied the the TOV equations due to time consideration.

CSE 557 – High Performance Parallel Computing

** Implementation of Parallel QuickSort **

**Abstract:** The Parallel Quick-Sort algorithm was implemented with OpenMPI using C++. The performance of the algorithm as a function of processor distribution was analytically calculated and compared to computational results on arrays of random numbers.

** Determining Hardware Constants **

**Abstract:** Hardware constants were investigated for the Intel Nehalem architecture under random tests through OpenMPI. Tests included: Access Latencies for Single

Strided Access, Access Latencies for Random Access, Communication Latencies, and Communication Bandwidth and Startup Costs. The final hardware constants obtained were used to fit efficiency models from the previously obtained parallel quick sort algorithm.

**Investigation of Graph Partitioning Efficiency with METIS **

**Abstract:** Graph partitioning efficiency was tested using METIS on several sparse matrix graphs from the University of Florida.

** SMV Timing with OpenMP **

**Abstract:** Timing for several sparse matrices using the sparse matrix vector multiplication (SMV) algorithm was obtained for various processor distributions under shared memory OMP. Results appear to be consistent with the previous hardware analysis project.

Physics 527 – Computational Physics and Astrophysics

** Modeling and Optimization of a Coilgun **

**Abstract:** We consider the problem of accelerating a ferromagnetic projectile with a Coilgun (in a vacuum) which has a temperature dependent resistive solenoid. We create a simulation to solve the problem of position and velocity of the projectile as a function of time as well as the temperature and current of the solenoid. We look at different values of capacitance for the main capacitor bank, as well as initial voltages across the main capacitor bank in effort to produce the maximum work done by the Coilgun on the projectile. We conclude that the optimum capacitance for the given physical parameters is around 10^−2 Farads.

** Modeling of Opinion Formation **

**Abstract:** We consider looking at three different models for opinion formation: The Voter Model, the Agreement Model, and the Sznajd Model. All three models consider the case of individuals with a certain opinion influencing other individuals, and consider time for convergence to a solution, as well as the final solution appearance. We conclude that in both the Voter Model and the Sznajd model (both looking at 2 and 4 opinions) a consensus is eventually reached.

** Modeling and Simulation of a Thin Wire Linear Dipole Antenna **

**Abstract:** We derive a general geometry and coordinate independent version of the Pocklington Integrodifferential equation, and apply it to a thin wire PEC linear dipole antenna. We solve for the current distribution via the Method of Moments. We compare several basis functions for use with the Method of Moments. We compare the results of the current distribution and the directivity of the antenna to those of known results. We look at the asymptotic stability of the Delta Gap method. We conclude that the simulation reproduces the known results, and is thus valid for the thin wire approximation.