Skip to main content

AID(Aerial Image Dataset) Usage Case



Introduction to the Dataset

AID is a large aerial image dataset generated by collecting sample images from Google Earth images. The dataset consists of 10,000 images and is classified into the following 30 classes:

airport, bare land, baseball field, beach, bridge, center, church, commercial, dense residential, desert, farmland, forest, industrial, meadow, medium residential, mountain, park, parking, playground, pond, port, railway station, resort, river, school, sparse residential, square, stadium, storage tanks, and viaduct. image Dataset Introduction Link




Experiment Introduction

In this experiment, we assume that 30% of the randomly selected dataset is labeled and fixed as a validation set, and the remaining 70% of the data is also randomly selected, and 20% of it is labeled for training.

The efficient model was used as the model,
and the current accuracy of the model trained with 20% of the data is 90.1%.

The goal of this experiment is to select 10% of additional training data under the assumed conditions to improve the performance of the model.




결과

요약

Use of DATUMO FSTMethod of Selecting Additional Training DataPercentage of Additional Training DataAccuracy
XBefore Additional Training0%90.1%
XRandom Selection10%91.3%
XRandom Selection15%92.3%
OCuration of Data Clustered in Distribution10%92.6%
OSelection of Classes Dispersed in Distribution10%92.7%
XRandom Selection20%93.0%

reference, it was found that the performance improvement using DATUMO FST was twice as high as random selection. It was also found that to achieve a similar level of performance when training randomly, about 1.7 times more labeled data was needed.




Curation of Data Clustered in Distribution

Accuracy : 92.6%

image After uploading the image to DATUMO FST, the distribution of the pre-labeled train+validation set was examined, revealing independent areas for each class and areas where multiple classes were clustered together.

image Selected the clustered area and requested curation to proceed with additional learning using 10% of the selected data.

image Final Data Split Image




Selection of Variably Distributed Classes

Accuracy: 92.7%

image We examined the feature space with the vectors extracted from the model trained on the original 20% data and selected the classes that were deemed to be variably distributed. Then, we inferred the remaining 80% unlabeled data with the previously trained model, curated the inferred data with the selected classes, and selected 10% of the data for additional learning.