Unknown

Dataset Information

0

Dataset for file fragment classification of textual file formats.


ABSTRACT: OBJECTIVES:Classification of textual file formats is a topic of interest in network forensics. There are a few publicly available datasets of files with textual formats. Therewith, there is no public dataset for file fragments of textual file formats. So, a big research challenge in file fragment classification of textual file formats is to compare the performance of the developed methods over the same datasets. DATA DESCRIPTION:In this study, we present a dataset that contains file fragments of five textual file formats: Binary file format for Word 97-Word 2003, Microsoft Word open XML format, portable document format, rich text file, and standard text document. This dataset contains the file fragments in three different languages: English, Persian, and Chinese. For each pair of file format and language, 1500 file fragments are provided. So, the dataset of file fragments contains 22,500 file fragments.

SUBMITTER: Mansouri Hanis F 

PROVIDER: S-EPMC6907108 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Dataset for file fragment classification of textual file formats.

Mansouri Hanis Fatemeh F   Teimouri Mehdi M  

BMC research notes 20191211 1


<h4>Objectives</h4>Classification of textual file formats is a topic of interest in network forensics. There are a few publicly available datasets of files with textual formats. Therewith, there is no public dataset for file fragments of textual file formats. So, a big research challenge in file fragment classification of textual file formats is to compare the performance of the developed methods over the same datasets.<h4>Data description</h4>In this study, we present a dataset that contains fi  ...[more]

Similar Datasets

| S-EPMC6881973 | biostudies-literature
| S-EPMC7160908 | biostudies-literature
| S-EPMC6925457 | biostudies-literature
| S-EPMC4874736 | biostudies-literature
| S-EPMC4906574 | biostudies-literature
| S-EPMC1681455 | biostudies-literature
| S-EPMC5998747 | biostudies-literature
| S-EPMC8661458 | biostudies-literature
| S-EPMC6554222 | biostudies-literature