<Abstract>
This study investigates various imputation techniques for missing data in green algae and water quality data, comparing traditional and deep-learning methods. Utilizing data from the Daecheong Dam area, four variables―green algae cell count, chlorophyll-a, water temperature, and total phosphorus―were analyzed under artificially induced missing data scenarios. Performance evaluation revealed that for green algae cell counts, NAOMI achieved the lowest RMSE 2039.66) and MAPE (164.55) for short missing periods, while BRITS underperformed for longer gaps. For chlorophyll-a, KNN outperformed other methods with RMSE values as low as 1.89. Linear interpolation excelled for stable variables
water temperature and total phosphorus with RMSEs of 1.15 and 0.0064, respectively. These results underscore the adaptability of advanced models like BRITS and NAOMI in handling complex temporal patterns. Simpler models, such as KNN and linear interpolation, proved sufficient for variables with linear trends or minimal variability. The study emphasizes the importance of aligning imputation strategies with data characteristics to ensure robust water quality.
Key Words: green algae, missing data imputation, BRITS, NAOMI