Duplicate Product Lines
This page details our solution to the Define Duplicate Product Lines challenge from dmcommunity.org
Define Duplicate Product Lines
Let’s assume that your organization handles sales orders similar to the one below:
Product SKU
Price
Quantity
SKU-1000
10
10
SKU-1
20
3
SKU-2
30
1
SKU-1
20
4
SKU-3
40
6
SKU-3
42
8
SKU-4
15
2
Create a decision model that is capable to decide which product lines inside such orders are duplicate. The duplicate product lines can be recognized by user-defined similarity rules, e.g. two product lines are considered “duplicate” if they have the same “sku” and “price”.
To solve this challenge, we simply need to check whether two orders have the same properties. To be able to model these properties, we first define them in our glossary. We create a type for every product (one to seven), a type containing the possible SKU’s, and a type representing the quantities.
Type | ||
---|---|---|
Name | Type | Values |
Product | Int | [1..7] |
SKU | String | SKU1000, SKU1, SKU2, SKU3, SKU4 |
Quantity | Int | [0..100] |
Because every product has exactly one SKU, Price and Quantity, we model these properties using functions.
Function | |
---|---|
Name | Type |
sku of Product | SKU |
price of Product | Int |
quantity of Product | Quantity |
We also need a way to express that two products are duplicates of each other. For this purpose, we introduce a relation.
Relation |
---|
Name |
Product is duplicate of Product |
Now our glossary is complete, and we can start to add logic. For this specific challenge, the logic is simple: if two products share the same SKU and Price, they are duplicates. We express this in a table as follows.
Define duplicates | |||||
---|---|---|---|---|---|
U | Product called p1 | Product called p2 | sku of p1 | price of p1 | p1 is duplicate of p2 |
1 | - | not(p1) | sku of p2 | price of p2 | Yes |
In other words, the table above states “For every Product p1 and p2 (not equal to p1), if they have the same SKU and the same Price, they are duplicates.” Or, “Duplicates are defined as two products that share the same SKU and Price”.
Now, all that remains is add in our data table containing the product information.
Product | ||||
---|---|---|---|---|
D | Product | sku of Product | price of Product | quantity of Product |
1 | 1 | SKU1000 | 10 | 10 |
2 | 2 | SKU1 | 20 | 3 |
3 | 3 | SKU2 | 30 | 1 |
4 | 4 | SKU1 | 20 | 4 |
5 | 5 | SKU3 | 40 | 6 |
6 | 6 | SKU3 | 42 | 8 |
7 | 7 | SKU4 | 15 | 2 |
Running the solver using this model gives us the following solution.
Model 1
==========
sku_of_Product := {1->SKU1000, 2->SKU1, 3->SKU2, 4->SKU1, 5->SKU3, 6->SKU3, 7->SKU4}.
price_of_Product := {1->10, 2->20, 3->30, 4->20, 5->40, 6->42, 7->15}.
quantity_of_Product := {1->10, 2->3, 3->1, 4->4, 5->6, 6->8, 7->2}.
Product_is_duplicate_of_Product := {(2,4), (4,2)}.
Elapsed Time:
0.871