| Sign In/My Account | View Cart |
Bitwise Optimization in Java: Bitfields, Bitboards, and Beyond
Pages: 1, 2
At this point, you can grab a bottle of vodka and a shot glass, because things are getting serious.
Chess used to be a hot research area in computer science during the height of the Cold War. Apparently, independently of one another, both Soviet and American teams came up with a new data structure called a bitboard. The American team, Slate and Atkin, seem to have been first to print with a chapter in Chess Skill in Man and Machine on Chess 4.x. The Soviet team, including Mikhail Donskoy, among others, wrote a bitboard-enabled program called Kaissa. Both programs competed successfully internationally.
Before looking at bitboards, let's first look at the standard way to represent a chess board in Java (and many other languages).
//declare an integer enum for the possible
//state of the 64 individual squares.
final int EMPTY = 0;
final int WHITE_PAWN = 1;
final int WHITE_KNIGHT = 2;
final int WHITE_BISHOP = 3;
final int WHITE_ROOK = 4;
final int WHITE_QUEEN = 5;
final int WHITE_KING = 6;
final int BLACK_PAWN = 7;
final int BLACK_KNIGHT = 8;
final int BLACK_BISHOP = 9;
final int BLACK_ROOK = 10;
final int BLACK_QUEEN = 11;
final int BLACK_KING = 12;
//And we have an array of 64 squares.
int[] board= new int[64];
The array method is extremely straightforward. In contrast, the bitboard structure is made up of twelve 64-bit bitsets; one for each type of piece. They are visualized as being stacked on top of each other.
//Declare twelve 64-bit integers for one board:
long WP, WN, WB, WR, WQ, WK, BP, BN, BB, BR, BQ, BK;

Figure 1. The bitboard data structure
Why is there no layer for empty squares? Since
EMPTY is a function of the other twelve layers,
including it would create redundant data. To calculate
EMPTY, we just join the 12 layers and negate:
long NOT_EMPTY=
WP | WN | WB | WR | WQ | WK |
BP | BN | BB | BR | BQ | BK ;
long EMPTY = ~NOT_EMPTY;
Chess-playing programs work by generating lots of legal moves and then choosing the best one. A number of calculations need to be done. The squares being attacked have to be determined, so that they can be avoided and so that check and checkmate can be determined. Each chess piece has a different way of attacking. Looking at code that affects these different attacks will demonstrate some pros and cons of the bitboard. Some of the pieces fit in our bitboard scheme elegantly, while others don't.
Let's look at the kings first. They simply attack the squares adjacent to them. Depending on where they are, that's from three to eight squares. The king can be in the center, on the edge, or in the corner of the board--all need to be considered while coding.
To generate the king's 64 attacks programmatically at
runtime, we can first start with the base case, eight
attacks, and transform it into the special cases. First,
create a mask for one of the squares in the center--say the 10th square. Figure 2 shows some longs
that will be used to create this mask.

Figure 2. Finding the king's attacks
long KING_ON_B2=
1L | 1L << 1 | 1L << 2 |
1L << 7 | 1L << 9 |
1L << 15 | 1L << 16 | 1L << 17;
We want to shift that box of ones around left and right and up and down visually speaking, but when shifting left and right watch the wrap-around effect:
SHIFTED_LEFT= KING_ON_B2 >>> 1;
Oops. We wrapped the box around (see Figure 2). In chess,
a vertical row is called a file. If we shift a layer
visually left by one, we know that the rightmost file,
file H, should always be zeros. Let's code that:
final long FILE_H=
1L | (1L<<8) | (1L<<16) | (1L<<24) |
(1L<<32) | (1L<<40) | (1L<<48) | (1L<<56);
KING_ON_A2= (KING_ON_B2 >>> 1) & ~FILE_H;
If we want to shift visually right, that's a different case:
KING_ON_B3= (KING_ON_B2 >>> 1) & ~FILE_A;
To shift up and down visually, just shift by eight:
KING_ON_B1= MASK_KING_ON_B2 << 8;
KING_ON_B3= MASK_KING_ON_B2 >>> 8;
In the real world, we can avoid having to write code to generate the king attacks by just hardcoding the 64 possibilities. Also, we are trying to avoid arrays, so let's not just stuff our attacks into a 64-element array. An alternative is a 64-way switch statement--it may not be pretty, but it works well.
Let's look at the pawns next. While there is only one king
per color, there up to eight pawns. We can find all eight
attacks just as easily as we determined a single king's
attack! Note that pawns attack diagonally. To shift
visually up and down, we shift by eight, and for left or right, we
shift by one. Shift by seven or nine to go diagonally (that's
8+1 and 8-1). Figure 3 shows how this works.
PAWN_ATTACKS=
((WP << 7) & ~RANK_A) & ((WP <<9) & ~RANK_H)

Figure 3. Pawn attacks for white
This code fragment works regardless of how many pawns are in play or where they are. Robert Hyatt, the author of Crafty, calls this a bit-parallel operation because it works on a number of pieces simultaneously. Bit-parallel expressions can be very powerful and are the key thing to look for in your own projects. If you see a lot of bit-parallel operations, you might have a good candidate for bitwise optimization.
For a counter example, let's look at how we have to do the pawns with arrays:
for (int i=0; i<56; i++)
{
if (board[i]= WHITE_PAWN)
{
if ((i+1) % 8 != 0) pawnAttacks[i+9]= true;
if ((i+1) % 8 != 1) pawnAttacks[i+7]= true;
}
}
Above, we loop through the almost entire array, but not very quickly. We could rewrite it so that we keep track of the pawns and iterate through the (up to eight) pawns rather than the squares, but we would still be looking at many more bytecodes that need to be run with the bitboard method.
Knights are similar to kings and pawns. We can determine knight attacks with a pre-computed table, just like we did with the king. Since there can be more than one knight per color, the attacks are technically bit-parallel. However, since there are rarely more than two knights per color, that isn't going to help much in a practice sense (there can be more than two if the player promotes his pawns to knights, but that isn't likely).
Bishops, rooks, and queens all slide around the board. While their potential attacks are always the same, the actual attacks depend on what pieces lie along their attack lines. To determine legal moves, you must look at the squares along the attack lines individually. This is our worst-case scenario. There are no bit-parallel operations possible. We are down to accessing individual squares as we would with the array. In addition, atomic access is more cumbersome (i.e., bug-prone) with the bitboard than it is with the array.
An advantage of bitboards is that we can use masks to answer a lot of typical chess questions. For example, if you have a bishop, you might want to know how many of your opponent's pawns are on your bishop's color. Figure 4 illustrates this "attack mask" problem.

Figure 4. How many white pawns on white squares?
Chess has fairly complex rules, and this means there are pros and cons for using the bitboard. For some pieces bitboards are faster and for some they're slower. We went over the slower pieces to show that we aren't dealing with magic pixie dust. We can imagine a game that is very similar to chess, perhaps with different pieces, where bitsets might actually backfire or just not be worth the trouble. Bitset optimization must be considered carefully.
In general, bitsets have these pros and cons:
Pros:
Cons:
To generalize the chess example, think of the horizontal
and vertical positions on the board as two independent
variables or attributes. There are 8x8=64
bits required to represent this. In addition, we
required 12 layers--one for each
type of piece. There are two ways the bitboard can
expand: 1) we can add more bits per layer, or 2)
add more layers. In the real world, we are
limited to 64 bits
per layer maximum, but let's consider a fantasy
128-bit JVM with a doublelong 128-bit
primitive. With 128 bits, we would have enough space to put
all 16 pawns, black and white, on the same layer
(8x8x2=128). That would reduce the number
of layers needed and might simplify some obscure operations,
but doing so would complicate (and also slow down) functions
that work on all pawns of a certain color. All of Java's
bit operations work on all of the bits in
a primitive. Data that requires its own
operations or functions benefits from being in its
own layer. We can do bit-parallel operations on
all of the bits in the layer faster than we can do
bit-parallel operations on some of the bits. We
might find some clever use for the extra 64 bits, but
we don't want to mix the 12 types of pieces.
If we do use more than two variables per layer,
we can still change all of the variables for all of the elements
in a layer at the same time. Consider 3D tic-tac-toe,
represented in Figure 5.
There are three possible values for each of the
three axes, or 3x3x3=27 possible values. We
need a 32-bit bitset for each player.

Figure 5. A format for 3D tic-tac-toe
We can even chain together 64-bit bitsets (Figure 6) to implement Conway's Game of Life, a simulation game played on a large grid. The game goes through generations where the death or birth of a new "cell" is determined by the number of adjacent cells. A dead cell with exactly three live neighbors is (re-)born. A live cell with zero or one neighbor dies (of loneliness). A live cell with more than three neighbors dies (of overcrowding). There are many variants where a different number of neighbors result in birth, life, or death. Figure 6 shows a life construct that will actually survive and move across the grid. To generate the next turn of the simulation we can follow this algorithm:
SUM_IS_0 up to
SUM_IS_8. We can calculate these by using a recursive
algorithm that first considers two layers, then three layers,
then four, and so on up to the eighth.To sum up layers:
//S0 is short for "sum equals zero" etc.
//L1 is short for "layer one" etc.
//Look at just the first two layers:
S0= ~(L1 | L2);
S1= L1 ^ L2;
S2= L1 & L2;
//Now look at the third layer:
S3 = L3 & S2;
S2 = (S2 & ~L3) | (S1 & L3);
S1 = (S1 & ~L3) | (S0 & L3);
S0 = S0 & ~L3;
//The fourth layer.
S4 = S3 & L4;
S3 = (S3 & ~L4) | (S2 & L4);
S2 = (S2 & ~L4) | (S1 & L4);
S1 = (S1 & ~L4) | (S0 & L4);
S0 = S0 & ~L4;
//Repeat this pattern up to the 8th layer.
The algorithm in takes 42 lines of code. It could be tuned a little more if needed, but it gives a number of advantages. It uses no conditionals--which can cause processors to slow. It's a nice simple block of code that a JIT or Hotspot Compiler is sure to compile. And, most importantly, it calculates all 64 cells in parallel.

Figure 6. A format for The Game of Life
Game-related applications of bitsets are easier to find than non-game examples. The reason is that games like chess have rich relationships between each bit. For chess, there are 12 layers of 64 bits, or 768 bits. Each of the 768 are essentially related to the rest somehow. In business apps, information is usually less tightly related.
Keeping an open mind may lead to uses of bitsets in any problem domain. However, it is up to the developer's good judgment whether a particular situation is right for bitsets. Frankly, you may never need to use these techniques. However, the methods above may be just what is needed to speed up certain Java applications. If not, you can still mystify your friends with your superior hacking. Enjoy.
Glen Pepicelli is a software guru living with his dog in upstate New York.
Return to ONJava.com.
Showing messages 1 through 5 of 5.
I didn't cover rotated bitboards for a few reasons. However, I made a mistake in not even mentioning them.
One reason was just space. This isn't a site about chess or AI and we just wanted to introduce Java programmers to these methods and show a real world example. To cover the rotated bitboards in detail would have exploded the size of the article.
Another reason was the audience the article is intended for. The OnJava readers are mostly corporate programmers, probably only a few write AI and very few are writing chess software. So the article is not meant to be a how-to on writing a chess engine. Care was taken to make sure that it was correct in regards to chess, but it is not complete. In other words, if your writing chess software this could be a good introduction but you want to keep reading.
While I didn't cover the rotated boards or other chess programming issues they are interesting topics so anyone who like this article might want to check them out.
Hello,
If anyone would like to talk about this material here is a good place to.
Especially interesting would be and real world projects or applications. Or perhaps you want to see more of something that was touched on in the article?
bishops or queens are no bit-parallel operations.
Internet is full of programming tips related to
chess and there are some very interesting things
you can do with something called rotating
bitboards.
I don't know if the writer has missed the concept
of rotated bitboards or has he left them out for
some other reason, but anyways:
firsly we need a 2-dim array, named for ex.
rank_attacks[64][256]. The array contains all
possible attack that a rook could do on a rank.
the [64] part is for each square and the [256]
part is the occupansy state of the rank. It's
easiest to make an algorithm that generates the
attacks. For example if a white rook was in e4 and
there is a black pawn on b1 and h1 -> occupansy
state for the rank would look in bit notation:
01001001. Our possible attacks would be:
rank_attacks[e4][73] (73 is the bitnotation in
decimal)=01110111 (in bit notation), a 1 in each
square a rook can attack. Now we just & the
rank_attacks with WHITEPIECES. This was easily
done and took very little time, but unfortunately
we can't do it to a file sraight away because in a
file the bits are separated, but in a rank they
are adjacent. For the file part we need a bitboard
that is rotated 90 degrees, it would look
something like this:
a1,a2,a3,a4,a5,a6,a7,a8,
b1,b2,b3,b4,b5,b6,b7,b8,
c1,c2,c3,c4,c5,c6,c7,c8,
d1,d2,d3,d4,d5,d6,d7,d8,
e1,e2,e3,e4,e5,e6,e7,e8,
f1,f2,f3,f4,f5,f6,f7,f8,
g1,g2,g3,g4,g5,g6,g7,g8,
h1,h2,h3,h4,h5,h6,h7,h8
And the board BEFORE rotation was like this:
h1,g1,f1,e1,d1,c1,b1,a1,
h2,g2,f2,e2,d2,c2,b2,a2,
h3,g3,f3,e3,d3,c3,b3,a3,
h4,g4,f4,e4,d4,c4,b4,a4,
h5,g5,f5,e5,d5,c5,b5,a5,
h6,g6,f6,e6,d6,c6,b6,a6,
h7,g7,f7,e7,d7,c7,b7,a7,
h8,g8,f8,e8,d8,c8,b6,a8
Now we have a rotated bitboard, we just need to
make a new 2-dim array, named
file_attacks[64][256] and do the same thing to it
as we did for the rank_attacks array. After we
have the array we can use it similarly to the
rotated bitboard where the filessquares are
adjacent. Bum! we got our rookattacks.
Bishop's rotated bitboard is a little bit more
abstract, since we need to rotate it 45 degrees to
get the diagonals to be adjacent bits on the
bitboard. But after we have rotated it the
attacksquare calculation goes the same way as for
files and ranks with a minor exception: we need to
know the length of the diagonal since diagonals
can be 1 to 8 squares long. I'm not fullu aware of
the steps related to 45 degree rotated bitboards
but I hope you got the idea and understood it. A
rotated bitboard 45 degrees to left would be like:
a1,
b1,a2,
c1,b2,a3,
d1,c2,b3,a4,
e1,d2,c3,b4,a5,
f1,e2,d3,c4,b5,a6,
g1,f2,e3,d4,c5,b6,a7,
h1,g2,f3,e4,d5,c6,b7,a8,
h2,g3,f4,e5,d6,c7,b8,
h3,g4,f5,e6,d7,c8,
h4,g5,f6,e7,d8,
h5,g6,f7,e8,
h6,g7,f8,
h7,g8,
h8
as you can see (hopefully) all diagonal bits are
adjacent.
This might not be entirely correct, because first
of all my native language is finnish and my
english might be poor on some cases and secondly I
have studied bitboards for few days and am not
fully aware of all that is related to them and
finally I'm a newbie in programming, I introduced
myself to c++ in last january and had done only
visualbasic before.
For more information you can try:
http://www.cis.uab.edu/info/faculty/hyatt/bitmaps.html
http://www.chessbrain.net/beowulf/
those are the locations i've gotten most of my
knowledge about bitboards.