Challenge

Quadratic Knapsack Problem

Gene cluster analysis: Understanding genetic diseases.

Problem Overview

The quadratic knapsack problem is one of the most popular variants of the single knapsack problem, with applications in many optimization contexts.

The aim is to maximize the value of individual items placed in the knapsack while satisfying a weight constraint. However, pairs of items also have positive interaction values, contributing to the total value within the knapsack.

Challenge Overview

For our challenge, we use a version of the quadratic knapsack problem with configurable difficulty, where the following two parameters can be adjusted in order to vary the difficulty of the challenge:

  • Parameter 1: num_itemsnum\_items is the number of items from which you need to select a subset to put in the knapsack.
  • Parameter 2: better_than_baseline1better\_than\_baseline \geq 1 is the factor by which a solution must be better than the baseline value.

The larger the num_itemsnum\_items, the more number of possible SknapsackS_{knapsack}, making the challenge more difficult. Also, the higher better_than_baselinebetter\_than\_baseline, the less likely a given SknapsackS_{knapsack} will be a solution, making the challenge more difficult.

The weight wiw_i of each of the num_itemsnum\_items is an integer, chosen independently, uniformly at random, and such that each of the item weights 1wi501 \leq w_i \leq 50, for i=1,2,...,num_itemsi=1,2,...,num\_items. The values of the items are nonzero with a density of 25%, meaning they have a 25% probability of being nonzero. The nonzero individual values of the item, viv_i, and the nonzero interaction values of pairs of items, VijV_{ij}, are selected at random from the range [1,100][1,100].

The total value of a knapsack is determined by summing up the individual values of items in the knapsack, as well as the interaction values of every pair of items (i,j)(i,j), where i>ji > j, in the knapsack:

Vknapsack=iknapsackvi+(i,j)knapsackVijV_{knapsack} = \sum_{i \in knapsack}{v_i} + \sum_{(i,j)\in knapsack}{V_{ij}}

We impose a weight constraint W(Sknapsack)0.5W(Sall)W(S_{knapsack}) \leq 0.5 \cdot W(S_{all}), where the knapsack can hold at most half the total weight of all items.

Example

Consider an example of a challenge instance with num_items=4num\_items=4 and better_than_baseline=1.50better\_than\_baseline = 1.50. Let the baseline value be 46:

weights = [39, 29, 15, 43]
individual_values = [0, 14, 0, 75]
interaction_values = [ 0,  0,  0,  0
                       0,  0, 32,  0
                       0, 32,  0,  0
                       0,  0,  0,  0]
max_weight = 63
min_value = baseline*better_than_baseline = 69

The objective is to find a set of items where the total weight is at most 63 but has a total value of at least 69. Now consider the following selection:

selected_items = [2, 3]

When evaluating this selection, we can confirm that the total weight is less than 63, and the total value is more than 69, thereby this selection of items is a solution:

  • Total weight = 15 + 43 = 58
  • Total value = 0 + 75 + 0 = 75

Our Challenge

In TIG, the baseline value is determined by a two-stage approach. First, items are selected based on their value-to-weight ratio, including interaction values, until the capacity is reached. Then, a tabu-based local search refines the solution by swapping items to improve value while avoiding reversals, with early termination for unpromising swaps.

Applications

The Knapsack problems have a wide variety of practical applications. The use of knapsack in integer programming led to breakthroughs in several disciplines, including energy management and cellular network frequency planning.

Although originally studied in the context of logistics, Knapsack problems appear regularly in diverse areas of science and technology. For example, in gene expression data, there are usually thousands of genes, but only a subset of them are informative for a specific problem. The Knapsack Problem can be used to select a subset of genes (items) that maximizes the total information (value) without exceeding the limit of the number of genes that can be included in the analysis (weight limit).

Microarray clustering of differentially expressed genes in blood

Figure 1: Microarray clustering of differentially expressed genes in blood. Genes are clustered in rows, with red indicating high expression, yellow intermediate expression and blue low expression. The Knapsack problem is used to analyse gene expression clustering.