Computer Science, asked by Nisha6246, 1 year ago

Control flow of execution with the help of a program for matrix multiplication

Answers

Answered by kgurjeet603pduuxd
0

hardware techniques have been proposed to increase the parallelization of these transistors. Lots of hardware implementations are already existed for parallelizing sequential program execution such as Superscalar, Superpipelining, Simultaneous Multithreading (SMT)[1-2], Chip Multiprocessors (CMP) [3] (also known as multi-core processors) and most recently, many-core processors. There are advantages that multi-core and many-core gain while single core processor does not [3-5, 16-18]. Both multi-core and many-core processors gain the benefit from shorter wiring which minimized the delay among cores instead of going off-chip. Similarly, in multi-core and many-core processors power and energy consumption increase linearly when the number of cores increases; while increasing the complexity in single processor design, it consumes a quadratic increase or even more in power and energy consumption. Moreover, the use of identical processing elements in homogeneous architecture reduces the complexity of hardware design and verification hence the entire development cycle. The Single Chip Cloud Computer (SCC) experimental processor [6-10] is a 48 core ‘concept vehicle’ created by Intel Labs as a platform for many-core software research. This SCC system is a second generation processor design that has been successfully developed by Tera-Scale Computing Research Program. SCC is the microprocessor system that has the highest number of cores integrated onto a single chip which intended to encourage more researches on many-core processor research and parallel programming research. The many-core processor researches [11] are high- performance power-efficient fabric, fine-grain power management and message-based programming support. SCC is a research platform that allows voltage and frequency scaling. Thus, it is possible to derive the power and energy consumption characteristics of the SCC by applying different frequency and voltage configuration to the different cores. We observe the power and energy scalability of the SCC system by executing a program with different numbers of cores. By performing such experiment, we can also verify that the message passing interface can hold the power and energy consumption scalability. The rest of the paper is organized as follows. In Section 2 we discuss on our matrix multiplication test program. In the later section, we present our results from varying the number of cores and varying two different frequency and voltage level of all the cores. Finally, a conclusion is presented in Section 4. In our experiment, we implemented our work on SCC system with SCCKit version 1.4.0. SCC system setup consists of two components, one is SCC itself and the other one is called Management Console Personal Computer (MCPC). MCPC is installed with a 64-bit Linux operating system, Ubuntu 10.04 in this case. MCPC system supports basic C programming language compilers such as gcc, g++, icc and icpc. Moreover, MCPC also supports the need of FORTRAN compiler ifort and MKL compilers mkl. The executables are generated by MCPC and run on the SCC cores. We implemented a mixed sequential and parallel workload program that performs a multiplication of two matrices (C=A*B). We tried to imitate the program behavior according to the original program in [12, 13]. This program is written based on C programming language. To calculate the value of C=A*B in parallel the program executes as follows. First, we assign one of the cores as root core and other cores as computing core. At the beginning of the program, the root core creates matrix A and matrix B. The value of the elements in matrix A are calculated by summing up the x-y coordinates of the matrix. The value of the elements in matrix B are calculated by multiplying the x-y coordinates of the matrix. Then, the root core divides matrix A into different number of rows and distributes the fragments of matrix A to other executing cores. Root core also sends the information of the offset of the fragmented matrix A, number of rows of matrix A and the entire matrix B to be multiplied with. This centralized communication within the root core is the sequential part of the program execution. To plot the experimental data, the longer program execution yields a clearer graphical plot. Therefore, in this experiment, we assigned 3,000 rows in matrix A and 150 columns in both matrix A and matrix B.

Similar questions