Matrix Dev C++

-->

Matrix C/C Programs. C Program to check if two given matrices are identical C program to find transpose of a matrix C program for subtraction of matrices C program for addition of two matrices C program to multiply two matrices C/C Program for Print a given matrix in spiral form C/C Program for A Boolean Matrix Question. Write A C Program To Multiply Any Two 3 X 3 Matrices. Write A C Program To Find Average Marks Of Three Subjects Of N Students In A Class By Using Array Of Structure. Write A C Program To Add And Subtract Two Matrices. Write A C Program Using Array Of Objects To Display Area Of Multiple Rectangles.

This step-by-step walkthrough demonstrates how to use C++ AMP to accelerate the execution of matrix multiplication. Two algorithms are presented, one without tiling and one with tiling.

C++

Prerequisites

Before you start:

  • Read C++ AMP Overview.

  • Read Using Tiles.

  • Make sure that you are running at least Windows 7, or Windows Server 2008 R2.

To create the project

Instructions for creating a new project vary depending on which version of Visual Studio you have installed. To see the documentation for your preferred version of Visual Studio, use the Version selector control. It's found at the top of the table of contents on this page.

To create the project in Visual Studio 2019

  1. On the menu bar, choose File > New > Project to open the Create a New Project dialog box.

  2. At the top of the dialog, set Language to C++, set Platform to Windows, and set Project type to Console.

  3. From the filtered list of project types, choose Empty Project then choose Next. In the next page, enter MatrixMultiply in the Name box to specify a name for the project, and specify the project location if desired.

  4. Choose the Create button to create the client project.

  5. In Solution Explorer, open the shortcut menu for Source Files, and then choose Add > New Item.

  6. In the Add New Item dialog box, select C++ File (.cpp), enter MatrixMultiply.cpp in the Name box, and then choose the Add button.

To create a project in Visual Studio 2017 or 2015

Matrix In C

  1. On the menu bar in Visual Studio, choose File > New > Project.

  2. Under Installed in the templates pane, select Visual C++.

  3. Select Empty Project, enter MatrixMultiply in the Name box, and then choose the OK button.

  4. Choose the Next button.

  5. In Solution Explorer, open the shortcut menu for Source Files, and then choose Add > New Item.

  6. In the Add New Item dialog box, select C++ File (.cpp), enter MatrixMultiply.cpp in the Name box, and then choose the Add button.

Multiplication without tiling

In this section, consider the multiplication of two matrices, A and B, which are defined as follows:

A is a 3-by-2 matrix and B is a 2-by-3 matrix. The product of multiplying A by B is the following 3-by-3 matrix. The product is calculated by multiplying the rows of A by the columns of B element by element.

To multiply without using C++ AMP

  1. Open MatrixMultiply.cpp and use the following code to replace the existing code.

    The algorithm is a straightforward implementation of the definition of matrix multiplication. It does not use any parallel or threaded algorithms to reduce the computation time.

  2. On the menu bar, choose File > Save All.

  3. Choose the F5 keyboard shortcut to start debugging and verify that the output is correct.

  4. Choose Enter to exit the application.

Matrices Dev C++

To multiply by using C++ AMP

  1. In MatrixMultiply.cpp, add the following code before the main method.

    The AMP code resembles the non-AMP code. The call to parallel_for_each starts one thread for each element in product.extent, and replaces the for loops for row and column. The value of the cell at the row and column is available in idx. You can access the elements of an array_view object by using either the [] operator and an index variable, or the () operator and the row and column variables. The example demonstrates both methods. The array_view::synchronize method copies the values of the product variable back to the productMatrix variable.

  2. Add the following include and using statements at the top of MatrixMultiply.cpp.

  3. Modify the main method to call the MultiplyWithAMP method.

  4. Press the Ctrl+F5 keyboard shortcut to start debugging and verify that the output is correct.

  5. Press the Spacebar to exit the application.

Multiplication with tiling

Tiling is a technique in which you partition data into equal-sized subsets, which are known as tiles. Three things change when you use tiling.

  • You can create tile_static variables. Access to data in tile_static space can be many times faster than access to data in the global space. An instance of a tile_static variable is created for each tile, and all threads in the tile have access to the variable. The primary benefit of tiling is the performance gain due to tile_static access.

  • You can call the tile_barrier::wait method to stop all of the threads in one tile at a specified line of code. You cannot guarantee the order that the threads will run in, only that all of the threads in one tile will stop at the call to tile_barrier::wait before they continue execution.

  • You have access to the index of the thread relative to the entire array_view object and the index relative to the tile. By using the local index, you can make your code easier to read and debug.

To take advantage of tiling in matrix multiplication, the algorithm must partition the matrix into tiles and then copy the tile data into tile_static variables for faster access. In this example, the matrix is partitioned into submatrices of equal size. The product is found by multiplying the submatrices. The two matrices and their product in this example are:

The matrices are partitioned into four 2x2 matrices, which are defined as follows:

The product of A and B can now be written and calculated as follows:

Because matrices a through h are 2x2 matrices, all of the products and sums of them are also 2x2 matrices. It also follows that the product of A and B is a 4x4 matrix, as expected. To quickly check the algorithm, calculate the value of the element in the first row, first column in the product. In the example, that would be the value of the element in the first row and first column of ae + bg. You only have to calculate the first column, first row of ae and bg for each term. That value for ae is (1 * 1) + (2 * 5) = 11. The value for bg is (3 * 1) + (4 * 5) = 23. The final value is 11 + 23 = 34, which is correct.

To implement this algorithm, the code:

  • Uses a tiled_extent object instead of an extent object in the parallel_for_each call.

  • Uses a tiled_index object instead of an index object in the parallel_for_each call.

  • Creates tile_static variables to hold the submatrices.

  • Uses the tile_barrier::wait method to stop the threads for the calculation of the products of the submatrices.

To multiply by using AMP and tiling

  1. In MatrixMultiply.cpp, add the following code before the main method.

    This example is significantly different than the example without tiling. The code uses these conceptual steps:

    1. Copy the elements of tile[0,0] of a into locA. Copy the elements of tile[0,0] of b into locB. Notice that product is tiled, not a and b. Therefore, you use global indices to access a, b, and product. The call to tile_barrier::wait is essential. It stops all of the threads in the tile until both locA and locB are filled.

    2. Multiply locA and locB and put the results in product.

    3. Copy the elements of tile[0,1] of a into locA. Copy the elements of tile [1,0] of b into locB.

    4. Multiply locA and locB and add them to the results that are already in product.

    5. The multiplication of tile[0,0] is complete.

    6. Repeat for the other four tiles. There is no indexing specifically for the tiles and the threads can execute in any order. As each thread executes, the tile_static variables are created for each tile appropriately and the call to tile_barrier::wait controls the program flow.

    7. As you examine the algorithm closely, notice that each submatrix is loaded into a tile_static memory twice. That data transfer does take time. However, once the data is in tile_static memory, access to the data is much faster. Because calculating the products requires repeated access to the values in the submatrices, there is an overall performance gain. For each algorithm, experimentation is required to find the optimal algorithm and tile size.

    In the non-AMP and non-tile examples, each element of A and B is accessed four times from the global memory to calculate the product. In the tile example, each element is accessed twice from the global memory and four times from the tile_static memory. That is not a significant performance gain. However, if the A and B were 1024x1024 matrices and the tile size were 16, there would be a significant performance gain. In that case, each element would be copied into tile_static memory only 16 times and accessed from tile_static memory 1024 times.

  2. Modify the main method to call the MultiplyWithTiling method, as shown.

  3. Press the Ctrl+F5 keyboard shortcut to start debugging and verify that the output is correct.

  4. Press the Space bar to exit the application.

See also

C++ AMP (C++ Accelerated Massive Parallelism)
Walkthrough: Debugging a C++ AMP Application