Unknown

Dataset Information

0

Machine part data with part-of relations and part dissimilarities for planted partition generation.


ABSTRACT: Identifying relationships between entities in data is a central topic across various industries and businesses, from social networks to supply chain and heavy manufacturing industries. In this paper we present data from a database of machinery represented in terms of machine parts. The machine parts are originally organised in tree structures where the vertices are machine part types, and the edges are "part-of" relations. Hence, each tree represents a type of machinery broken down into its machine part constituent types. The data we present is the union over these trees, making up a directed acyclic graph describing the type hierarchy of the machine parts. The motivation for publishing the dataset is the following real-world industry problem: Each tree represents a mechanical design, and over time some designs have been copy-pasted with minor modifications. The new instances have been given new identifiers with no reference to where from they were copied. In hindsight, it is desirable to recover the copy-paste links to for interchange between essentially identical designs. However, telling which parts are copies of which other parts has turned out to be difficult. In particular, the metadata has a tendency of displaying higher similarities within a composite part than between a part and its copy. Due to non-disclosure, we cannot provide the metadata, but we provide element wise dissimilarities that are generated based on the metadata using classical methods such as Jaccard similarity on description texts, material types etc. The dissimilarities are obtained from a data science project in the company owning the data, trying to tackle the very problem of recovering the copy-paste links. Availability of labeled data on this data set is limited, so based on our in-depth knowledge of the problem domain, we present a data synthesisation method that can generate arbitrarily large problem instances of the copy-paste problem based on the sample data, that provides a realistic representation of the real world problem. The problems are presented as planted partitions of vertices of directed acyclic graphs with vertex dissimilarities, and thus constitutes a typical classification problem along the lines of graph- or network clustering. The type of industry data we present is usually company confidential, bound by intellectual property rights, and generally not available to scientists. We therefore publish this anonymised dataset to offer real world sample data and generated problem instances for researchers that are interested in this type of classification problems, and on which theories and algorithms can be tested. The data and the problem generation methodology are backed by a Python implementation, providing both data access and an API for parameterised problem generation. The data is also available as raw files.

SUBMITTER: Bakkelund D 

PROVIDER: S-EPMC8971586 | biostudies-literature | 2022 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Machine part data with part-of relations and part dissimilarities for planted partition generation.

Bakkelund Daniel D  

Data in brief 20220321


Identifying relationships between entities in data is a central topic across various industries and businesses, from social networks to supply chain and heavy manufacturing industries. In this paper we present data from a database of machinery represented in terms of machine parts. The machine parts are originally organised in tree structures where the vertices are machine part types, and the edges are "part-of" relations. Hence, each tree represents a type of machinery broken down into its mach  ...[more]

Similar Datasets

| S-EPMC7946989 | biostudies-literature
| S-EPMC8655230 | biostudies-literature
| S-EPMC3309785 | biostudies-literature
| S-EPMC8132444 | biostudies-literature
| S-EPMC7934511 | biostudies-literature
| S-EPMC10673950 | biostudies-literature
| S-EPMC8610268 | biostudies-literature
| S-EPMC5227707 | biostudies-literature
| S-EPMC8984386 | biostudies-literature
| S-EPMC2815475 | biostudies-literature