2. 618 B. Martset al./ EuropeanJournalof OperationalResearch81 (1995)617-628
find a permutation p of the set N = {1, 2 .... , n} which minimizes the global cost function:
Cost(p) : ~ ~fijdp(i)p(j).
i=l j=l
2. Parallel solution methods
The QAP has shown itself to be computationally a very difficult problem. This problem, of which the
Travelling Salesman problem is a specific case, is NP-hard [13]. Moreover, finding an e-approximate
solution is also NP-hard [22].
But its theoretical complexity is not a sufficient description of the extreme difficulty of this problem,
for which even applications of moderate size (n -- 20) cannot, up to now, be solved exactly.
Therefore, many heuristics have been developed in the past thirty years. More recently, due to the
development of parallel architectures, many parallel algorithms have been fruitfully implemented to
speed up the search and, thus, overcome the difficulty of the problem.
2.1. Parallel exact methods
The best way to solve exactly a quadratic assignment problem is, to date, to use a Branch and Bound
algorithm, as other methods such as cutting planes methods have not been successful, failing to solve
exactly problems of size greater than eight.
As a consequence, Branch and Bound algorithms have been the only exact parallel solution methods
suggested for this problem. They are due to Roucairol, in 1987, [21] and to Crouse and Pardalos, in 1989,
[8]. We will describe later the main characteristics of these algorithms (see Section 4.2). Nevertheless,
these parallel Branch and Bound algorithms are also limited to the solution of small size problems
(size < 15).
This strong limitation explains why most recent approaches to this problem have been heuristics.
2.2. Parallel heuristics
Among sequential solution methods, the adjustments of the recent meta-heuritics (Simulated Anneal-
ing, Tabu Search and Genetic Algorithms) to the QAP provide the best approximate results, outstripping
the results of the first heuristic solution methods (construction methods, exchange methods).
Thus, logically, implementations of parallel heuristics are issued from these meta-heuristics.
In 1989, Brown, Huntley and Spillane [3], on a 32-nodes Intel iPCS/2 hypercube, and Miihlenbein
[17], on a 64 processors system with distributed memory, have developed parallel genetic algorithms.
In 1991, Taillard [24], on a network of 10 transputers T800C, has proposed a parallel Tabu search,
while Chakrapani and Skorin-Kapov [7] have developed a massively parallel Tabu search on a Connec-
tion Machine using 16K processing units.
These algorithms always find the optimal solution for problems of small sizes from the literature.
Since the optimality cannot be proved for higher instances, it is hard to estimate the quality of the
results. Nevertheless, for special instances built with the knowledge of the optimum (Palubetskes [19],
Burkard et al. [5]), the results of these heuristics are very close to optimal.
Let us also mention a connectionist approach, developed by Wang, in 1990, [25] on a microVax
computer, which simulates n 2 processing units performing nonlinear transformations.
3. B. Manset aL/ EuropeanJournalof OperationalResearch81 (1995)617-628 619
3. Sequential Branch and Bound
Let us briefly recall the main principles of the sequential algorithm introduced by Mautor and
Roucairol, in 1992, [16,15], which currently provides the best sequential results for the quadratic
assignment problem.
Meanwhile, we discuss important features, such as lower bounds and branching strategies, concerning
any Branch and Bound algorithm.
3.1. Bounding
Undoubtedly, the computation of the lower bound is one of the major difficulties in the exact solution
of the quadratic assignment problem.
Indeed, up to the present time, this bound is either too loose, the number of nodes of the search tree
becoming quickly huge as the size of the problem increases, or, when the bound is slightly tighter, the
time needed to compute the bound on one node is prohibitive.
The oldest lower bound, developed independantly in 1962 by Gilmore [11] and Lawler [13], is obtained
by solving a linear assignment problem on a (n x n) matrix C = (Cig), where Cig is the lower bound of
the assignment of facility i to site k and is given by the computation of the ranked product of the row fi.
of F and the colunm d.k of D.
This bound is, therefore, quickly computed in O(n 3) but its results are not very tight. For instance, the
Gilmore-Lawler bound is more than 20% away from the best solution for Nugent's problems of size
greater than 20.
The most interesting other lower bounds are based on an eigenvalue approach (Finke, Burkard and
Rendl [10], Rendl and Wolkowicz [20]) or on equivalent dual formulations of the problem (Assad and Xu
[2], Carraresi and MaluceUi [6]). If they provide a slightly better evaluation, it is at the cost of a very
significant increase in computational time. The best ratio quality/time is, therefore, still achieved by the
Gilmore-Lawler bound.
For this reason, the most successful Branch and Bound algorithms (Burkard and Derigs [4], Roucairol
[21], Crouse and Pardalos [8]) use the Gilmore-Lawler bound. Mautor and Roucairol use this bound and
concentrate their effort on more efficient approaches to reduce the enumeration.
3.2. Reduction tests
Symmetric equivalences
In most classical applications of QAP, the sites are located on a regular figure - grid, circle or line -
on which several symmetric or isometric equivalences can be detected.
We want to avoid creating, visiting and bounding these different but equivalent nodes in different
branches of the search tree.
For this purpose, a simple test which is quickly computed, has been introduced by Mautor and
Roucairol [16]. For any partial solution, this test identifies the different isometric classes. Thus, a unit
can be assigned to only one site of each isometric class.
Reduction test using the search gap
This classical test forbids some assignments and, therefore, reduces the size of the B&B tree.
Moreover, it allows to choose an efficient branching for the next level (see Section 3.3).
4. 620 B. Mans et al. / European Journal of Operational Research 81 (1995) 617-628
Let us denote:
lb: the value of the lower bound obtained,
bks: the value of the best known solution (upper bound).
Test. If the alternative cost of the assignment of an unit to a site is greater or equal than the search gap
(difference between the value of the best known solution and the lower bound: bks- lb), then this
assignment can be forbidden (the proof can be found in [16]).
3.3. Branching strategies
Depth first search strategy
As mentioned above, exact methods can only deal with problems of small size, for which efficient and
recent heuristics, such as Tabu search, almost always find the optimal solution or, at least, a solution
extremely close to optimal.
As a consequence, since only the nodes of the critical tree (evaluation lower than the best solution)
have to be examined, a depth first search strategy is more efficient than a best first search strategy. This
approach has two advantages. First, the data structures are lighter. Second, we avoid incessant sortings of
partial solutions and reconstructions of independant subproblems (our experiments have shown that
building the matrices at each exploration of a B&B node takes at least 40% of the global computation
time).
Polytomic branching
The branching scheme is polytomic, that is a selected unit is assigned to all the free (not already
assigned), not forbidden (the alternative cost is in the search gap) and isometrically different sites. This
selected unit is the one with the lowest number of sons to be examined (highest number of forbidden
elements).
This scheme has several advantages. First, with respect to a dichotomic branching, fewer nodes are
created.
Second, the memory requirement is reduced by avoiding the unnecessary information on revoked
assignments generated by classical dichotomic branching schemes, where an unit is assigned or not to a
site.
Moreover, this scheme discriminates between the lower bounds of a node and its sons and allows early
pruning of some branches. Our experiments have shown that this strategy produces a significant decrease
in the size of the search tree.
3.4. Size of the search tree
We can see in Table 1 that the reduction tests and the branching strategy, described above, lead to a
drastic decrease in the total number of created nodes compared to the previous most efficient exact
algorithms (Burkard and Derigs, Roucairol, Crouse and Pardalos).
Table 1
Total number of created nodes
Algorithm Nugent 8 Nugent 12 Nugent 15 Elshafei 19
Burkard-Derigs 403 36 966 2 064415 not proved
Roucairol 428 83 379 not proved not proved
Crouse-Pardalos 798 42 706 1596 353 not proved
Mautor-Roucairol 32 3 474 97 287 491
5. B. Manset al./ EuropeanJournalof OperationalResearch81 (1995)617-628 621
4. Parallel Branch and Bound
411. Main principles
We have parallelized the Mautor-Roucairol sequential algorithm on an asynchronous shared memory
multiprocessor, the Cray 2. Our main concern has been to minimize the waiting time (inactivity) of the
processors and, thus, to obtain the highest speed-up. Thus, our approach considers the major difficulties
appearing in parallelizing B&B methods [14]: task allocation, choice of granularity, overhead
detection ....
As previously argued, we use a depth first search strategy. In a parallel implementation, an additional
reason appears, since the granularity (relative number of operations done between synchronizations)
defined by a parallel best first search strategy would correspond only to the exploration of a B&B node,
which is quickly done. Therefore, the partial sort of the newly generated nodes, at each branching step,
in a global shared list would create many contentions on the memory access, and hence a bottleneck for
processors.
For these reasons, we propose that each processor should execute the same depth first search
algorithm on different B & B subtrees. Each processor is, therefore, assigned to the root of a subtree and it
develops a local depth first search on the corresponding subtree.
When a processor has completed its own exploration ofa subtree, it accesses a shared data structure,
the feeding tree, in which it takes a new node, the root of a new subtree to develop. Since this task
allocation cannot be done statically, processors schedule themselves (self-scheduling technique) and
trigger their own work demand. The consistency and fairness of this shared list of tasks is ensured by
synchronizing with classical primitives as locks.
The procedure stops when the feeding tree is empty and all processors are idle.
The main features of our algorithm are illustrated on the Nugent et al. problem of size 15. Indeed, for
this problem, the size of the tree and the computational time are significant enough for the implementa-
tion choices to be carefully analyzed. It should be noted that we have obtained similar results with other
problems, in that the speed-up has been linear in all cases.
4.2. Task allocation
Different approaches can be adopted for the task allocation. In Roucairol's parallel algorithm [21], a
global heap is used to memorize the generated B&B nodes. Since a best first search strategy is used,
each exploration of a B & B node requires that the shared data structure be accessed once to obtain the
required node and be accessed several times to insert the newly generated nodes. In order to limit this
global access, Crouse and Pardalos [8] initially create several heaps in the global memory, so that each
processor can select one and explore it completely locally. In this case, the parallel B&B execution
terminates when all heaps have been explored and all processors are idle.
Since our approach is quite different, we describe the way we distribute the work amongst the
processors, through the notion of the feeding tree.
z
The feeding tree
Definition 4.1 The Feeding Tree, shared data structure, is the upper part of the B&B tree developed
down to depth (or level) i. The leaves of the Feeding Tree (nodes of level i of the B&B tree) are the
roots of the subtrees allocated to the processors (see Fig. 1).
6. 622 13.Marts et al./European Journal of Operational Research 81 (1995) 617-628
Fig. 1. Feeding Tree and allocated subtrees.
The first free processor initializes the left part of the Feeding Tree until it generates, by successive
branchings, the leftmost node at the chosen depth i. Then, this processor unlocks the shared structure
and begins its own exploration on the subtree, whose the node is the root. While the exploration of the
allocated subtree is not completed, the processor does not need to access the global shared structure.
Gradually, the other processors access the feeding tree, in a mutually exclusive way, and develop it
until a new node of depth i - the next depth-i-node to the right of the last allocated node - with
eventual back-trackings in the feeding tree. Likewise, as soon as a processor becomes idle, the
exploration of its last subtree being completed, it tries to access again the feeding tree to build a new
"depth-i-node".
When the development of the feeding tree is completed (at the first level, it is not possible to branch
on a new partitioned subproblem), each processor that becomes idle terminates (the main program
terminates when the last processor becomes idle).
Since an assignment is fixed at each branching step, the maximal depth of the B&B tree is n - 1,
where n is the size of the QAP problem. The maximal depth of allocated subtrees is, therefore, equal to
(n - i - 1).
Due to the depth first search strategy, in the feeding tree, only the nodes on the path from the root to
the last allocated node (the facilities assigned to reach this depth and the set of the remaining available
locations) have to be memorized. Similarly, the context memorized by a processor is the path between
the current node and its ascendant of depth i.
Thus, the memory requirement for this parallel implementation is slightly increased with respect to
the sequential implementation. Indeed, this increase is only in O(p) ((i +p(n - i - 1)/(n - 1)), where p
is the number of processors.
Algorithm
procedure Parallel B&B
begin
/* initialize Feeding Tree */
Nprocsldle := 0
Create fid, the first leftmost descendant of root with depth i
for each of the Nprocs processors do/* self-schedule */
lock (Feeding Tree)
Create rfid right sibling of rid
7. B. Manset al.~EuropeanJournalof OperationalResearch81 (1995)617-628 623
if rfid ~ null then
Keep context of rfid
rid := rfid
unlock (Feeding Tree)
else /* no more tasks in this branch */
Backtrack until creating a right sibling ancfid
if ancfid ¢ null then
Create rf/d, the first leftmost descendant of ancfid with-depth i
Keep context of rfid
..= 'rid
unlock (Feeding Tree)
else / * no more tasks to assign */
Nprocsldle .'= Nprocsldle + 1
unlock (Feeding Tree)
Terminate
endif
endif
/* Depth First Search exploration */
Expand the whole B&B subtree
if a new best solution nbs is found then
lock (bks)
if (nbs < bks) bks := nbs
unlock (bks)
endif
endo / * Nprocsldle = Nprocs */
end Parallel B&B
4.3. Granularity - memory contentions
As previously mentioned, we want to minimize the waiting time of the processors. Now, a processor
has to wait in two cases:
• when the processor needs to access a global shared structure, locked by another processor (memory
contention),
• during the termination phase, when the processor is idle and has to wait for other processors to
complete their last local exploration (termination wait).
Let us discuss these two cases in more detail.
Memory contention
The overhead problem, fully dependant on the parallel implementation, is decisive on the Cray 2. A
lock operation requires 200 CPU cycles when the lock is free, while the conflict management requires
4000 CPU cycles.
Since the possible updatings of the best known solution are extremely rare and quickly done, only the
dynamic development of the feeding tree can take so much time to induce a significant waiting time for
other processors.
But the total waiting time of processors is a function of level i of the feeding tree.
The deeper this level is, the smaller the average granularity of the tasks allocated to the processors
will be. The processors will then access the feeding tree more frequently. Moreover, the average time for
8. 624 B. Mans et al. / European Journal of Operational Research 81 (1995) 617-628
0.2
0. I
............
o ......... ......... ......... ! .................
,
shared level
Fig. 2. Conflictratio depending on the level(Nugent 15).
a processor to develop the feeding tree and generate a new "depth-i-node" will increase, due to
numerous backtrackings.
This phenomenon is shown in Fig. 2, where the access conflict ratio (percentage of access require-
ments when the feeding tree is locked) is given, with respect to the shared level,
For example, when the depth of the feeding tree is equal to 5, nearly 30% of the access requirements
result in waiting time.
Termination wait
This delay occurs when a processor without work (no more task) waits for the completion of the
subtree(s) allocated to other processor(s). The maximal idle time is the time needed to explore the
largest subtree which can be assigned to a processor.
When the level i of the feeding tree decreases, the average granularity of the tasks (subtrees)
allocated to the processors will increase. Therefore, the probability of a long termination wait becomes
more and more significant.
This is illustrated in Fig. 3, where the difference in the numbers of processed nodes (generated and
explored) by the most and the least active processors is given. The effect of the chosen depth i on the
0.4--
0.3
0.2
O. 1
I nI n- n_
0 . 0 . . . . . . . . . I . . . . . . . . . . . . . . . . . . I . . . . . . . . . I . . . . . . . . . I . . . . . . . . . I
O 1 2 3 4 5 6
shared level
Fig. 3. Difference in the numbers of processed nodes with respect to the shared level.
[:' explored
I generated
9. B. Manset al./ EuropeanJournalof OperationalResearch81 (1995)617"628 625
load balancing is confirmed, since, with a small feeding tree(level 1 or 2), this difference can reach 30%
(a processor has explored 5000 subproblems more than another) while a deeper feeding tree produces a
very good balance of the load.
5. Experiments
Our algorithm, written in Fortran, was run on the Cray 2 which is a 4-processor asynchronous machine
with shared memory.
The test data used for the computational results is as follows:
• Nugent et al. [18]: classical problems for QAP of sizes n = 12, n = 15, and n = 16 (the flow matrix is
extracted from the problem of size 20 and the sites are located on a 4 × 4 square),
• Elshafei: hospital layout problem of size 19 from [9],
• Scriabin and Vergin: economic layout problem of size 20 [23], from Armour and Buffa [1].
5.1. Computational results
We compared the results obtained by our method with those obtained by one of the fastest sequential
algorithms (Burkard and Derigs [4]) and by the two previously most efficient parallel algorithms available
(Crouse and Pardalos [8], and Roucairol [21]). Since Burkard and Derigs' algorithm has been tested in
1980 on a Cyber 76 and because of the evolution in computational performances, we ran their sequential
algorithm on the Cray 2. We also report on two different sets of results from Crouse and Pardalos. The
first is obtained from a sequential run of this algorithm and the second from running the algorithm in
parallel on a four-processor machine.
In Table 2, we present a comparison of the running times of these algorithms.
Some tests done on a Cray YMP are also presente d.
Remark. Since our dedicated use of the Cray 2 (i.e. a single execution on the machine) was limited to 400
seconds, cumulated for all used processors, parallel executions of Nugent of size 16, and of Scriabin and
Vergin of size 20 could not be done with a reasonably busy machine. The amount of time needed for
parallel executions of these problems depends on the number of users working on the machine. Thus, in
these two cases,-we could not reach the predictable linear speedup shown for Nugent et al. of size 15.
Table 2
Comparison of running times (in seconds)..
Algorithm Machine No. Nugent Nugent Nugent Elshafei Scr.-Ver.
procs size 12 size 15 size 16 size 19 size 20
Optimal
solution 578 1150 1550 17212548 110030
Burk. Der. Cray 2 1 24 1290 not proved not proved not proved
Roucairol Cray XMP 4 312 out of time not proved not proved not proved
Cr. Pard. IBM 3090 1 34 2005 not proved not proved not proved
Cr. Pard. IBM 3090 4 10 out of space not proved not proved not proved
Mans et al. Cray 2 1 2.68 109 969 1.04 1189
Mans et al. Cray 2 4 0.99 28 436 0.68 560
Mans et al. Cray YMP 1 not tested 62 not tested not tested not tested
Mans et al. Cray YMP 6 not tested 11 not tested not tested not tested
10. 626 B. Mans et aL/European Journalof OperationalResearch81 (1995) 617-628
38 i i i i i i i
37
36
35
34
Time 3
3
in sec
32
31, "~
3O
29
28 I
2 3 4
sharedlevel
Fig. 4. Running time obtained on Nugent 15 with respect to the level i.
Among the classical quadratic assignment problems from the literature, the problem of Nugent of size
15 was considered to be the hardest to solve by an exact method. Only the most efficient Branch and
Bound algorithms managed to solve this problem and they needed a long computational time (more than
20 minutes) to compute the solution.
Our parallel algorithm manages to solve this problem in a few seconds (11 seconds on the Cray YMP,
28 seconds on the Cray 2).
This result is a good illustration of the efficiency of this parallel algorithm, which, on all the classical
problems, obtains the fastest running time.
Moreover, this algorithm obtains the best solution and the proof of the optimality of this solution for
problems of size 16 to 20 which have never been solved exactly before: Nugent of size 16, Scriabin and
Vergin of size 20, Elshafei of size 19.
We have also run our algorithm to solve randomly generated problems with previously known optimal
solutions, (Palubetskes [19], Burkard and al. [5]). Problems of size 18 were solved sequentially in less than
6 minutes, while they were solved in parallel with a linear speedup.
5.2. Parallel analysis
In order to analyze the effects of the granularity and the sizes of the allocated subtrees, we considered
the level /-depth of the feeding tree-as a parameter and ran our program with different values of i.
1101 , , i ~ , '
i00
9
0 level3
"~ level 4 -b--
Time 70
in see. 60
50
40
30
2
0 i J t I l
2 3 4
Numberof Processors
Fig. 5. Running time (Nugent 15) with respect to the number of processors.
11. B. Mans et al. / European Journal of Operational Research 81 (1995) 617-628 627
4
3.5
3
Speed 2.5
up
2
1.5
t i i i i
level 3 ~ ~ ' ~
lord 4 +- / / J
2 3
Number of Processors
Fig. 6. Speed up on Nugent 15.
The tests have been done on the Nugent 15 problem, while our program was the only one executed by
the machine.
As previously seen, the level i must not be set to extreme values, neither too high due to memory
contentions (see figure 2), nor too low for a good balance of the load (see figure 3).
This influence of the level i is also shown in Fig. 4, where the running times obtained with different
depths of the feeding tree are presented.
On the Nugent 15 problem, the best compromise is achieved when the depth is equal to 3 (or 4).
Of course, the value of the level i leading to the best behaviour of the program depends on the
problem and on the structure of the corresponding search tree. Nevertheless, all the experiments we
have done show that a middle-valued depth of the feeding tree, equal to [n/5] or [n/4] (where n is the
size of the problem), leads to very good performances.
Therefore, as it can be seen in Fig. 5, with an appropriate choice of i, we obtain a significant decrease
in the running time, as the number of processors increases.
A linear speed-up in the number of processors is, thus, achieved (see Fig. 6). We emphasize the fact
that the comparisons of the parallel running times are done with the running time of the best sequential
implementation of the algorithm.
6. Conclusions
We have presented an original and very efficient parallel algorithm for solving the Quadratic
Assignment problem. We introduced the notion of the feeding tree which allows a good distribution of
work to processors.
This algorithm which is the parallel version of a sequential B &B algorithm leads to a linear speedup
(nearly equal to the number of processors), with very little overhead. Since the heuristics for QAP give
very good solutions, no oversearch is pursued (only the critical tree is explored). The parallel implemen-
tation of the Depth First Search strategy is scalable in CPU time and memory requirement.
Nevertheless, whereas the parallelism is fully used, a global improvement of the algorithm is still
required to solve problems of size greater than 20. Without considering the hardware evolution, these
solutions can not be reached with the existing tight bounds.
Acknowledgements
We thank Franz Rendl, Professor at the Technische Universit~it of Graz, Austria, for his helpful
comments and suggestions on this work during the last few years.
12. 628 B. Mans et aL/ European Journal of Operational Research 81 (1995) 617-628
The authors are indebted to Cray France (esp. G. Simeoni) and to CCVR (Centre de Calcul Vectoriel
pour la Recherche, Palaiseau, France), for supporting this research with computer time and for helpful
discussions.
References
[1] Armour, G., and Buffa E., "A heuristic algorithm and simulation approach to the relative allocation of facilities", Management
Science, 9 (1963) 294-309.
[2] Assad, A., and Xu, W., "On lower bounds for a class of quadratic 0-1 programs", Operations Research Letters 4 (1985)
175- 180.
[3] Brown, D., Huntley, C., and Spillane, A., "A parallel genetic heuristic for the quadratic assignment problem", in: Proc. of the
3rd Conference on Genetic Algorithms, 406-415, Arlington, 1989.
[4] Burkard, R., and Derigs, U., Assignment and Matching Problems: Solution Methods with Fortran Programs, Springer Verlag,
Berlin 1980.
[5] Burkard, R., Karisch, S., and Rendl, F., "Qaplib - a quadratic assignment problem library. European Journal of Operational
Research 55 (1991) 115-119.
[6] Carraresi, P., and Malucelli F., "A new lower bound for the quadratic assignment problem", Operations Research 40 (1992)
$22-$27.
[7] Chakrapani, J., and Skorin-Kapov, J., "Massively parallel tabu search for the quadratic assignment problem", Technical
Report HAR-91-06, Harriman School for Management and Policy, NY 11794, 1991.
[8] Crouse, J., and Pardalos, P., "A parallel algorithm for the quadratic assignment problem", in: Proceedings of Supercomputing
89, 351-360, ACM, 1989.
[9] Elshafei, A., "Hospital layout as a quadratic assignment problem", Operational Research Quarterly 28 (1977) 167-179.
[10] Finke, G., Burkard, R., and Rend1, F., "Quadratic assignment problems", Annals of Discrete Math. 28 (1987) 61-82.
[11] Gilmore, P.C., "Optimal and suboptimal algorithms for the quadratic assignment problem", SIAM Journal on Applied Math. 10
(1962) 305-313.
[12] Koopmans, T.C., and Beckman, M.J., "Assignment problems and the location of economic activities", Econometrica 25 (1957)
53-76.
[13] Lawler, E., "The quadratic assignment problem", Management Science 9 (1963) 586-599.
[14] Mans, B., "Contribution a l'algorithmique non num6dque parallele: Parall61isations de m6thodes de recherche arborescentes",
Thbse d'universit6, Universit6 Paris VI, 4, place Jussieu, 75252 Pal-is cedex 05, June 1992.
[15] Mautor, T., "Contribution ~ la r6solution des probl~mes d'implantation: Algorithmes s6quentiels et paraU~les pour l'affecta-
tion quadratique", Th~se d'universit6, Universit6 Paris VI; 4, place Jussieu, 75252 Paris Cedex 05, February 1993.
[16] Mautor, T., and Roucairol C., "A new exact algorithm for the solution of quadratic assignment problems", Discrete Applied
Mathematics to appear, 1993. MASI-RR-92-09-Universit6 de Paris 6, 4 place Jussieu, 75252 Paris C6dex 05.
[17] Mulhenbein, H., "Parallel genetic algorithms, population genetics and combinatorial optimization", in: J.D. Becket I. Eisele,
and F.W. Mundemann, (eds.), Parallelism, Learning, Evolution; Workshop on Evolutionary Models and Strategies and Workshop
on Parallel Processing: Logic, Organization and Technology WOPPLOT89. Springer-Verlag, Berlin, July 1989.
[18] Nugent, C., Vollmann, T., and Ruml. J., "An experimental comparison of techniques for the assignment of facilities to
locations", Operations Research 16 (1968) 150-173.
[19] Palubetskes, G.S., "Generation of quadratic assignment test problems with known optimal solution". Zhurnal Vychislitel'noi
Matematiki i Matematicheskoi Fisiki 28 11 (1988) 1740-t743, (in Russian).
[20] Rendl. F., and Wolkowicz, H., "Applications of parametric programming and eigenvalue maximization to the quadratic
assignment problem", Mathematical Programming (1992) 63-78.
[21] Roucairol, C., "A parallel branch and bound algorithm for the quadratic assignment problem", Discrete Applied Mathematics
18 (1987) 211-225.
[22] Sahni, S., and Gonzalez, T., "P-complete approximation problems", Journal of the ACM 23 (1976) 555-565.
[23] Scriabin, M., and Vergin, R.C., "Comparison of computer algorithms and visual based methodes for plant layout",
Management Science 22 (1975) 172-187.
[24] Taillard, E.,' "Robust tabu search for the quadratic assignment problem", Parallel Computing 17 (1991) 443-455.
[25] Wang, Jun, "A parallel distributed processor for the quadratic assignment problem", in: Proceedings of the INNC90,
International Neural Network Conference 1, 278-281. Paris. France, July 1990.