Skip to content
#

cleansing-data

Here are 21 public repositories matching this topic...

MultiReplace is a Notepad++ plugin for advanced multi-string replacements. It supports reusable lists, CSV column targeting, highlighting, and external data lookups. All replacements can be enhanced with scriptable rules, conditional logic, and math.

  • Updated Jan 25, 2026
  • C

A notebook aimed at predicting and improving water safety by analyzing contaminants and pollution levels in water sources, enhancing public health and ensuring access to clean drinking water.

  • Updated Sep 16, 2024
  • Jupyter Notebook

Linux CLI tools to compare text files and find nearest neighbours across large directories using TF‑IDF or SimHash, with optional dedup workflows, useful in RAG pipelines to remove duplicate documents that have different MD5/SHA-256/SHA-512 hashes but same/similar contents. C++/C performance.

  • Updated Jan 26, 2026
  • C

Improve this page

Add a description, image, and links to the cleansing-data topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cleansing-data topic, visit your repo's landing page and select "manage topics."

Learn more