SNNAP: Approximate computing on programmable SoCs via neural acceleration

Many applications that can take advantage of accelerators are amenable to approximate execution. Past work has shown that neural acceleration is a viable way to accelerate approximate code. In light of the growing availability of on-chip field-programmable gate arrays (FPGAs), this paper explores neural acceleration on off-the-shelf programmable SoCs. We describe the design and implementation of SNNAP, a flexible FPGA-based neural accelerator for approximate programs. SNNAP is designed to work with a compiler workflow that configures the neural network’s topology and weights instead of the programmable logic of the FPGA itself. This approach enables effective use of neural acceleration in commercially available devices and accelerates different applications without costly FPGA reconfigurations.

No hardware expertise is required to accelerate software with SNNAP, so the effort required can be substantially lower than custom hardware design for an FPGA fabric and possibly even lower than current “C-to-gates” high-level synthesis (HLS) tools. Our measurements on a Xilinx Zynq FPGA show that SNNAP yields a geometric mean of 3.8× speedup (as high as 38.1×) and 2.8× energy savings (as high as 28 x) with less than 10% quality loss across all applications but one. We also compare SNNAP with designs generated by commercial HLS tools and show that SNNAP has similar performance overall, with better resource-normalized throughput on 4 out of 7 benchmarks.

Low power and high performance MOSFET

To analysis leakage current and delay for Double Gate MOSFET with Single gate MOSFET at 45nm in CMOS Technology by using the Cadence Virtuoso simulation tool. When compared to single gate MOSFET, the leakage current and delay are observed to be reduced in double gate MOSFET.

The drive current remains the same for both single and double gate MOSFET based on Vgs but the short channel characteristics of double gate MOSFET gets improved. Double gate MOSFET is mostly recommended for low power and high performance application. When compared to bulk Si single gate device, the total power utilization of inverter, static, dynamic circuit and latch by using double gate demonstrates that leakage current and delay reduced by a factor of over 10X.

RF performance of InGaAs-based T-gate junctionless field-effect transistors which applicable for high frequency network systems

The T-gate InGaAs-based JLFET’s which has high frequency RF characteristics have been demonstrated by TCAD tool. To achieve advanced performance of RF characteristics, the T-gate structure is applied, also. By T-gate structure we decrease gate resistance (RG) and achieve a higher maximum oscillation frequency (fmax) compare with planar-type structure.

However, the increase of parasitic gate capacitance degrades current gain cut-off frequency (fT) and this trade-off between parasitic components and optimal device structure will be discussed.

Functional Constraint Extraction From Register Transfer Level for ATPG

The use of scan test patterns, generated at the gate level with automatic test pattern generation (ATPG) tools in design simulation, was proposed in our previous work to improve verification quality. A drawback of this method is the potential presence of illegal (or unreachable) states (ISEs) causing unwanted behavior and false error detection in the verification process. In this brief, we present a new automated tool that helps overcome this problem. The tool extracts functional constraints at the register transfer level on a VHDL description (it can be easily adapted to any other hardware description language).

The constraints extracted are used in the ATPG process to generate pseudofunctional scan test patterns which avoid the ISEs. The whole verification environment incorporating the proposed tool is presented. Experimental results show the tool impact on the reduction of false error detection in verification. In addition, it shows the verification quality improvements with the proposed environment in terms of coverage, time, and complexity.

Dynamic power and performance back-annotation for fast and accurate functional hardware simulation

Virtual platform prototypes are widely used for early design space exploration at the system level. There is, however, a lack of accurate and fast power and performance models of hardware components at such high levels of abstraction. In this paper, we present an approach that extends fast functional hardware models with the ability to produce detailed, cycle-level timing and power estimates. Our approach is based on back-annotating behavioral hardware descriptions with a dynamic power and performance model that allows capturing cycle-accurate and data-dependent activity without a significant loss in simulation speed. By integrating with existing high-level synthesis (HLS) flows, back-annotation is fully automated for custom hardware synthesized by HLS.

We further leverage state-of-the-art machine learning techniques to synthesize abstract power models, where we introduce a structural decomposition technique to reduce model complexities and increase estimation accuracy. We have applied our back-annotation approach to several industrial-strength design examples under various architecture configurations. Results show that our models predict average power consumption to within 1% and cycle-by-cycle power dissipation to within 10% of a commercial gate-level power estimation tool, all while running several orders of magnitude faster.

Round-robin based load balancing in Software Defined Networking

These days our networks have to handle large amount of traffic, serve thousands of clients. It is very difficult for a single server to handle such huge load. The solution is to use multiple servers with load balancer acting as a front end. The clients will send the requests to the load balancer. The load balancer will forward the client requests to different servers depending upon load balancing strategy. Load balancer use dedicated hardware. That hardware is expensive and inflexible. Currently available load balancers contain few algorithms that can be used.

Network administrators can not create their own algorithms since traditional load balancer are vendor locked, non programmable. On the other hand SDN load balancers are programmable and allow you to design and implement your own load balancing strategy. Other advantages of SDN load balancer is we do not need dedicated hardware. The dumb silicon device can be converted to a powerful load balancer by using SDN controllers. In this paper we are implementing and comparing Round-Robin load balancing strategy with already implemented random strategy using an OpenFlow switch connected to a POX controller.

Effective Key Management in Dynamic Wireless Sensor Networks

Recently, wireless sensor networks (WSNs) have been deployed for a wide variety of applications, including military sensing and tracking, patient status monitoring, traffic flow monitoring, where sensory devices often move between different locations. Securing data and communications requires suitable encryption key protocols. In this paper, we propose a certificateless-effective key management (CL-EKM) protocol for secure communication in dynamic WSNs characterized by node mobility.

The CL-EKM supports efficient key updates when a node leaves or joins a cluster and ensures forward and backward key secrecy. The protocol also supports efficient key revocation for compromised nodes and minimizes the impact of a node compromise on the security of other communication links. A security analysis of our scheme shows that our protocol is effective in defending against various attacks. We implement CL-EKM in Contiki OS and simulate it using Cooja simulator to assess its time, energy, communication, and memory performance.

RF performance of InGaAs-based T-gate junctionless field-effect transistors which applicable for high frequency network systems

The T-gate InGaAs-based JLFET’s which has high frequency RF characteristics have been demonstrated by TCAD tool. To achieve advanced performance of RF characteristics, the T-gate structure is applied, also.

By T-gate structure we decrease gate resistance (RG) and achieve a higher maximum oscillation frequency (fmax) compare with planar-type structure. However, the increase of parasitic gate capacitance degrades current gain cut-off frequency (fT) and this trade-off between parasitic components and optimal device structure will be discussed.

Design of full adder and subtractor based on MZI — SOA

A systematic model for all-optical full adder as well as full subtractor is proposed based on principle of Mach Zehnder Interferometer and using Semiconductor Optical Amplifier (MZI-SOA) configuration. MZI plays a role for ultra fast all-optical signal processing, here the non-linear property of SOA are properly utilized for designing the full adder as well as full subtractor. In this model the full adder as well as full subtractor can be effectively designed by properly selecting output terminals of MZI-SOA component.

The design is implemented with the help of OptiSystem software which is one of the powerful software for analyzing Optical components. The proposed mode shows design performance of full adder as well as full subtractor in optical domain and it seems to be future wireless technology.

Centralized ARP proxy server over SDN controller to cut down ARP broadcast in large-scale data center networks

Today’s cloud services are driving the wide-spread deployment of multi-tenant large-scale data centers. These data centers must have agility in order to provide diverse services to users in an efficient way, via dynamic allocation of the virtual machines (VMs) to the servers. However, as the complexity and the size of the data centers have increased, the tremendous address resolution traffic among the massive numbers of VMs has become a significant problem. Some approaches have tried to reduce the ARP broadcast traffic via distributed cache on the switches, or location specific addresses, but this has resulted in unavoidable challenging issues, such as the inconsistency problem between caches and/or address re-allocation to the VM and network reconfiguration whenever the VM migrates.

In this paper, we propose a new centralized ARP proxy model that utilizes the Software Defined Networking (SDN) controller architecture, in which we can leverage the SDN’s centralized control characteristics. In this approach, the SDN controller performs the ARP proxy function, and can significantly reduce the number of ARP broadcast messages over the networks. We prototyped the centralized ARP proxy module on an open-source SDN controller and performed experiments on the Mininet based virtual testbed to evaluate our approach. The experiments show that our approach efficiently processes address resolution while reducing ARP broadcast traffic by dozens of times to hundreds of times.