Duplicate Product Lines

This page details our solution to the Define Duplicate Product Lines challenge from dmcommunity.org

Define Duplicate Product Lines

Let’s assume that your organization handles sales orders similar to the one below:

Product SKU Price Quantity
SKU-1000 10 10
SKU-1 20 3
SKU-2 30 1
SKU-1 20 4
SKU-3 40 6
SKU-3 42 8
SKU-4 15 2

Create a decision model that is capable to decide which product lines inside such orders are duplicate. The duplicate product lines can be recognized by user-defined similarity rules, e.g. two product lines are considered “duplicate” if they have the same “sku” and “price”.

To solve this challenge, we simply need to check whether two orders have the same properties. To be able to model these properties, we first define them in our glossary. We create a type for every product (one to seven), a type containing the possible SKU’s, and a type representing the quantities.

Type
Name Type Values
Product Int [1..7]
SKU String SKU1000, SKU1, SKU2, SKU3, SKU4
Quantity Int [0..100]

Because every product has exactly one SKU, Price and Quantity, we model these properties using functions.

Function
Name Type
sku of Product SKU
price of Product Int
quantity of Product Quantity

We also need a way to express that two products are duplicates of each other. For this purpose, we introduce a relation.

Relation
Name
Product is duplicate of Product

Now our glossary is complete, and we can start to add logic. For this specific challenge, the logic is simple: if two products share the same SKU and Price, they are duplicates. We express this in a table as follows.

Define duplicates
U Product called p1 Product called p2 sku of p1 price of p1 p1 is duplicate of p2
1 - not(p1) sku of p2 price of p2 Yes

In other words, the table above states “For every Product p1 and p2 (not equal to p1), if they have the same SKU and the same Price, they are duplicates.” Or, “Duplicates are defined as two products that share the same SKU and Price”.

Now, all that remains is add in our data table containing the product information.

Data Table Product
Product sku of Product price of Product quantity of Product
1 1 SKU1000 10 10
2 2 SKU1 20 3
3 3 SKU2 30 1
4 4 SKU1 20 4
5 5 SKU3 40 6
6 6 SKU3 42 8
7 7 SKU4 15 2

Running the solver using this model gives us the following solution.

Model 1
==========
sku_of_Product := {1->SKU1000, 2->SKU1, 3->SKU2, 4->SKU1, 5->SKU3, 6->SKU3, 7->SKU4}.
price_of_Product := {1->10, 2->20, 3->30, 4->20, 5->40, 6->42, 7->15}.
quantity_of_Product := {1->10, 2->3, 3->1, 4->4, 5->6, 6->8, 7->2}.
Product_is_duplicate_of_Product := {(2,4), (4,2)}.

Elapsed Time:
0.871