AID(Aerial Image Dataset) Usage Case

Introduction to the Dataset

AID is a large aerial image dataset generated by collecting sample images from Google Earth images. The dataset consists of 10,000 images and is classified into the following 30 classes:

airport, bare land, baseball field, beach, bridge, center, church, commercial, dense residential, desert, farmland, forest, industrial, meadow, medium residential, mountain, park, parking, playground, pond, port, railway station, resort, river, school, sparse residential, square, stadium, storage tanks, and viaduct. Dataset Introduction Link

Experiment Introduction

In this experiment, we assume that 30% of the randomly selected dataset is labeled and fixed as a validation set, and the remaining 70% of the data is also randomly selected, and 20% of it is labeled for training.

The efficient model was used as the model,
and the current accuracy of the model trained with 20% of the data is 90.1%.

The goal of this experiment is to select 10% of additional training data under the assumed conditions to improve the performance of the model.

결과

요약

Use of DATUMO FST	Method of Selecting Additional Training Data	Percentage of Additional Training Data	Accuracy
X	Before Additional Training	0%	90.1%
X	Random Selection	10%	91.3%
X	Random Selection	15%	92.3%
O	Curation of Data Clustered in Distribution	10%	92.6%
O	Selection of Classes Dispersed in Distribution	10%	92.7%
X	Random Selection	20%	93.0%

reference, it was found that the performance improvement using DATUMO FST was twice as high as random selection. It was also found that to achieve a similar level of performance when training randomly, about 1.7 times more labeled data was needed.

Curation of Data Clustered in Distribution

Accuracy : 92.6%

After uploading the image to DATUMO FST, the distribution of the pre-labeled train+validation set was examined, revealing independent areas for each class and areas where multiple classes were clustered together.

Selected the clustered area and requested curation to proceed with additional learning using 10% of the selected data.

Final Data Split Image

Selection of Variably Distributed Classes

Accuracy: 92.7%

We examined the feature space with the vectors extracted from the model trained on the original 20% data and selected the classes that were deemed to be variably distributed. Then, we inferred the remaining 80% unlabeled data with the previously trained model, curated the inferred data with the selected classes, and selected 10% of the data for additional learning.

AID(Aerial Image Dataset) Usage Case

Introduction to the Dataset​

Experiment Introduction​

결과​

요약​

Curation of Data Clustered in Distribution​

Selection of Variably Distributed Classes​

Introduction to the Dataset

Experiment Introduction

결과

요약

Curation of Data Clustered in Distribution

Selection of Variably Distributed Classes