Showing posts with label Algo. Show all posts
Showing posts with label Algo. Show all posts

Saturday, February 2, 2008

Optimization

The principle of optimization:
* Cases: What the application will typically do, and if we are more concerned with average or worst case performance before optimization.
* Order: Algorithm & Design -> Local optimization -> Assembly
* Context: The true exec time for one instruction is a time measured in a certain context, and that time is meaningful only in that context. Also it is more meaningful to optimize the segment of codes instead of only one instruction.

Algorithm & Design:
* Two kinds of computing:
-- data processing (data streaming algorithm): data is processed sequentially and only once in general. The relationship of data is not the concern of algorithms. The cache seems overkill here. The digital signal processing is a typical application. How to reduce the computing cases might be an efficient way to do optimization.
-- data transaction processing (data block algorithm): The relationship of data in one block is the focus of algorithms. The data cache is very important for them. Sorting and search are among them.

Instruction Level Optimization:
* Avoid branch hazard:
-- Unroll loops: the result might be diff for W, R, W with R. In general 8~16 times unrolling depth is good.
-- Combine loops
-- Fill-in branch delay in Assembly
* Avoid data hazard:
-- Eliminate the data dependance
-- Prefetch data and instruction: use mem/cache throughput instead of latency.
* Understand each Assembly inst details and diff. Use the best.

Memory Usage Optimization:
* Optimize data structs
-- Split reference data structures, like lists, tree, etc. into small ones and access them simultaneously.
-- Compact data structs as densely as possible to reduce access increment step and increase the utilization ratio of cache lines.
-- Separate data structs to reduce access increment step.
* Use mem/cache pipeline: The basic unit between cache/TLB and memory is one cache line. So make read step increment be the length of one cache line.
* Processing mem in Int data type instead of char, esp. for R. It seems not matter for W.
* Addr alignment for src/dst
* Combine W and R together for large data block which exceeds the size of caches. For small ones, overlapping access of cache might be harmful.
* Compact data within one mem page as densely as possible, due to TLB and page table.
* Put data into diff mem banks to avoid DRAM refreshment and access multiple banks. One page size as increment is good.

Cache Usage Optimization:
* Data and instruct size not exceed cache size: For cache, the max attainable size of cached data is CacheSize + (CacheSize / Way)
* Addr alignment and no line split data
* Put data on diff cache banks in order to access multiple cache banks.
* Increment step of mem not be multiple of the size of one cache set bank, otherwise conflict miss happens.

Write Buffer Usage Optimization:
* Access same data types, eps. R after W
* Organize the codes together to take advantage of WB since the units in it might be flushed randomly.

Sunday, September 9, 2007

Random Number Generation

* BigRand and RandInt
long long bigrand()
{ return (long long)RAND_MAX * rand() + rand(); }
int randint(int l, int u)
{ return l + bigrand() % (u-l+1); }
Here rand() returns one random number with 32bits; bigrand returns one big random number with 64bits and randint(l, u) returns one random number between l and u.

* Select K random numbers in an array of N with sorted output
To select s number out of r remaining, the next number would be selected with probability s/r. It could be expressed by
if (bigrand() % r) < s
The pseudocode is:
select = K
remaining = N
for i = [0, N-1]
   if (bigrand() % remaining) < select
     print i
     select--
   remaining--
Alternatively, the data structure of set could be used:
initialize set S to empty
size = 0
while size < K do
   t = bigrand() % N
   if t is not in S
     insert t into S
     size++
print S in sorted order
The third one allowing number repeating is to shuffle the array, like this:
for i = [0, N-1]
   swap(i, randint(i, n-1))

Saturday, September 8, 2007

Searching

* Sequential Search O(N)

* Binary Search O(logN)

* Hashing O(1)

* Binary Search Tree O(logN)

* Key Index to Array O(1)
Note its different space requirement from that of hashing

* Examples
- Find the missing/duplicate element in one unsorted array
Binary search for large array. In some cases, it could be solved by calculating the difference between the sum of the array and the sum of the range of number.
- Find all sets of angrams in one dictionary
Set a key for each word, then sort by the key to group them.
- Find the angrams of one word in one dictionary
After finish Q2, use binary search to do it.
- Find the Kth largest number in one non-repeated array
Order statistics
- Find the median of two sorted arrays
- Find one number which duplicates more than half size of one array
- Bitmap sort actually uses key index

Sorting

* Selection Sort
Find the smallest one and put it in the leftest slot, then find the second smallest one and put it in the second slot, and so on. The average and worst case are O(N^2).

* Insertion Sort
Compare new element with the sorted list in its left. The worst case is O(N^2) and generally it is used for almost sorted array with O(N). It is stable.

* Heapsort
In place sort without recursion and the worst case is O(NlogN).

* Merge Sort
Divide and conquer. The worst case is O(NlogN) with O(N) space. It is very useful for sorting data on disk that is too large to fit entirely into primary memory.

* Quick Sort
Partition and recursion. The average case is O(NlogN) with O(logN) space.

* Bitmap Sort
The data in one specific range with limited duplication. O(N)

* Binary Search Tree
If the size of the array is not known in advance, the binary search tree could be used to do the sorting. Heap could be used too. However, it might be harder to implement heap with list than with array.

* Examples
- Sort a file containing at most n(10^7) positive integers, each less than n without duplication. The main memory to be used is limited to one megabyte.
Merge sort, quick sort with multiple passes, and bitmap sort (with multiple passes)
- What if the number in Q1 might occur at most ten times or each non-repeated number is associated with one of three prefix: 800, 877 and 880?
- Transpose one 4000X4000 matrix
Apply (row, col) for each element and stable sort (insertion sort) by row then by col.

Friday, September 7, 2007

Bit Operation

* Bit Operations
~, &, |, ^, << and >>.

* Examples
- Check whether the system is big endian or little endian
Why does this solution not work?
int i = 1;
char c = (char)i; /* c == 1 no matter what endianness */
return c;
- Get/clear the lowest bit of 1 (or 0 by using ~x as input)
x & -x; and x & (x-1);
- Get the index of lowest/highest bit of 1
k = 0;
if ((x & 0x1) != 0) return k;
do { ++k;} while (((x >>= 1) & 0x1) == 0) and
while (x >>= 1) ++k;
- Calculate the number of '1' in the one integer
while (x != 0)
{ x = x & (x-1); ++count; }
- Check whether the integer is the power of 2
The number of '1' in the integer is one.
- Get the next power of 2
unsigned int n = 1U << highestOne(x);
if (n == x) return n;
return n << 1;
- Reverse the bits order in one integer
x = ((x & m) << n) | ((x & ~m) >> n); /* m is masking code and n is swapping number */
- Reverse two bits in one word: swap2(x, k1, k2)
- Arithmetic Rotation
x = (x << r) | (x >> (BITNUM - r));
- Circular Block Shifting
x = (1 << n) - 1;
x = x << p | (x >> (BITNUM - p));
- Max(0, x) and Min(0, x)
return x & ~(x >> (BITNUM - 1)); and
return x & (x >> (BITNUM - 1))
- Overflow for signed num
addition: ~(a^b)&(a^s)&0x80000000
subtraction: (a^b)&(a^s)&0x80000000

Thursday, September 6, 2007

String And Array

* Examples
- To get the first letter which is not repeated in the string
Build the key index between character and its occurrence with array or hashtable.
- Remove the specified characters in the string
Build the key index between character and its status (stay or removed). In place deleting operation with two pointers.
- Reverse the order of a sequence of words separated with space OR circularly shift the string
In place reverse with single word reverse after getting the start and end of words.
- Conversion between signed integer and string
1) To convert between char 0~9 and its integer with the aid of '0'
2) Accumulate in the conversion of atoi from left to right
3) Separate digits in the conversion of itoa from right to left and so reverse order output
- Merge two sorted arrays with the length of M and N (M>>N)
Binary search in M

Tree

* Binary Search Tree
- Search in BALANCED BST: O(logN)
- Insert and delete in BALANCED BST: O(logN)
- Obtain the parent node for a given node: O(logN)
- How to get maximal and minimal value in BST
- Output all the nodes SORTED in BST: O(N)
- Recursive algorithm associated

* Heap: Complete Binary Tree
- Implemented efficiently by an array due to its order and shape properties
- Insert, delete_min/max and delete: O(logN)
- Increase and decrease key: O(logN)
- Build heap: O(NlogN)
- Search in heap: O(N)

* Traversal
Preorder, DFS, postorder, in-order, and BFS

* Set
A set contains a sorted set of unique objects, which is usually implemented by BST.

* Examples
- Preorder traversal of BST without recursion
Stack is used to store the intermediate variables.
- Find the lowest ancestor of two nodes in one BST
The one with value between these two nodes would be the answer.
- Find the first K largest numbers or to find a subset of K numbers which sums up to at most T
Partial sort with one heap of K units.
- Find the K largest/smallest number in a heap of N elements with O(KlogK)/O(1). K is much less than N
- Construct a Huffman code and compute the sum of a large set of floating point number

List

* Single Linked List
- One Head Pointer Must Be Associated
To change the head pointer in one sub-routine, the pointer itself instead of its copy must be changed. Therefore the pointer to head pointer must be passed in ANY sub-routines which changes the head/tail pointers.
- Lists End With NULL
This must be checked when the traverse happens.
- Insert And Delete
When deleting one element in the list, the element before this one must be obtained.
When inserting one element, it depends on whether it is inserted before or after a specific element.
- Free Every Element in The List
- Special Cases Processing: Head and Tail; Length of List == 0, 1, 2, ...

* Examples
- Implement stack data structure
We could implement stack with variable array or single linked list. Due to the heavy overhead of maintaining the memory of elements, variable array might be suitable for small data set.
- Insert and delete function in list with head and tail pointers
How to process special cases?
- Get the mth element in reverse order in one list
1) n = N - m; Two traverses to get N and n; O(N)
2) Two pointers with the distance of m
- How to check if the list contains loop?
Fast and slow pointers
- How to delete one element if only the pointer to this element is given, i.e. without traversing?
Copy the content of next element to this one and delete next element.
- How to reverse one list with/without recursion?