Monday, March 28, 2011

Changing the sign of float values using SSE code

The IEEE 754 floating point format defines the memory layout for the C++ float datatype. It consists of a one bit sign, the 8 bit exponent and 23 bits that store the fractional part of the value.
float x = [sign (1 bit) | exponent (8bit) | fraction (23bit)]
We can use this knowledge about the memory-layout in order to change the sign of floating point values without the need for floating point arithmetic.

Sunday, March 27, 2011

Loading two consecutive 3D vectors into SSE registers

In the previous post we have shown how to load a single float3 vector into an SSE register. This required either one 64bit and one 32bit load or three 32bit loads if the data is not aligned on a 8-byte address. This blog entry will show how to use more efficient loads when loading two float3 values into separate SSE registers.

Saturday, March 26, 2011

Loading a 3D vector into an SSE register

In this blog entry I will show you how to load a three element float vector into an SSE register using C++ intrinsics. This tutorial is based on the float3 datatype which holds three floats (see Common Datatypes). An SSE register is able to store four float values and there are methods to load one, two or all four values - but not three. Therefore we need to split the load into two parts and combine them.

Friday, March 25, 2011

Matrix vector multiplication using SSE3

A common operation is the matrix vector product. In the following example we multiply a 4-by-4 matrix and a 4-element vector. For multiplication of a 3x4 matrix with a 3D vector click here.

Using standard C++ code the matrix-vector product would result in 16 multiplications and 12 add operations:

Fast iteration over STL vector elements

The STL class std::vector is a great container for managing dynamic arrays. Unfortunately it introduces a slight overhead when iterating through it using an index. For example, the following code is not very efficient:

Welcome to Fast C++

Have you ever wondered how to make code in C++ efficient and fast? I have seen many programs that are supposed to run fast but contain common pitfalls that make them slower than necessary. Using just some simple tricks and basic knowledge about high performance programming, you can make your current C++ program a lot faster. So stay tuned for useful tips and tricks to make your C++ program lightning fast!