Introduced in Unsupervised Anomaly Detection for Auditing Data and Impact of Categorical Encodings2022
The code to create the dataset is available here. The dataset used in the paper is available on github
Maker - Categorical - The brand of the vehicle.GenModel - Categorical - The model of the vehicle.Color - Categorical - Colour of the vehicle.Reg_Year - Categorical - Year of Registration.Body_Type - Categorical - Eg. SUV, Convertible.Runned_Miles - Numerical - Distance covered by the vehicle.Engin_Size - Categorical - Size of engine.GearBox - Categorical - Automatic, Manual.FuelType - Categorical - Petrol, Diesel.Price - Numerical - Price of vehicle.Seat_num - Numerical - Number of seats.Door_num - Numerical - Number of Doors.issue - Categorical - Type of damage.issue_id - Categorical - Specific damage.repair_complexity - Categorical - Difficulty to repair the vehicle.repair_hours - Numerical - Time required to finish the job.repair_cost - Numerical - Cost of repair.Other attributes are not used for evaluation in this work.
breakdown_date and repair_date were added with the idea of inserting anomalies based on the number of days required to repair the vehicle.