top of page

List of open-source software

AdaCC: Cumulative Cost Sensitive Boosting

  • GitHub
Link to the publication
Link to the open-source library

This repository introduces the AdaCC method, a dynamic cost-sensitive approach that estimates misclassification costs based on partial ensemble behavior, effectively minimizing balanced errors. Determining optimal classification costs for training cost-sensitive models is a challenging task due to the complex interplay of various factors such as class imbalances, misclassification penalties, and evolving data distributions. AdaCC addresses this challenge by dynamically adapting to the changing nature of the data, providing a practical solution for achieving accurate and fair results in cost-sensitive learning scenarios.

AdaFair: Adaptive Fairness Boosting

  • GitHub
Link to the publication
Link to the open-source library

In this latest update to AdaFair, an algorithm designed for fairness in machine learning, new enhancements have been made to address imbalances and misclassifications in minority groups. By adjusting the data distribution during boosting rounds, the model now gives special attention to misclassified minority instances. This approach, evaluated through a cumulative fairness perspective, benefits different fairness notions. Notably, it also improves performance on minority classes in case of imbalances. The update also includes the use of two fairness notions, "statistical parity" and "equal opportunity," and a refined weak learner selection process. The repository has been recently updated, enhancing AdaFair's capabilities.

Large Scale Entity Analysis

  • GitHub
Link to the publication

In the realm of social media archives, there lies a treasure trove of historical insights that captivates historians, sociologists, and various other enthusiasts. Recognizing the significance of these digital records, our Apache Spark library emerges as a pivotal tool. It equips researchers with essential functions tailored for analyzing annotated short texts, encompassing crucial elements such as time, entities, and sentiments. Through its distributed computation capabilities, this library empowers users to compute a range of entity-related measures. Unlock the potential of social media data and embark on a journey of profound exploration and understanding, facilitated by our intuitive and versatile library.

AnnotatedTweets2RDF

  • GitHub
Link to the publication

Introducing our innovative open-source library designed to transform textual data into RDF tuples effortlessly. Primarily focused on extracting valuable information from Twitter data, this tool ensures precise storage of the extracted data. What sets it apart is its flexibility; not limited to Twitter alone, it can be applied to various datasets that meet specific index criteria. We've seamlessly extracted entities from tweets, enhancing the text with sentiment annotations. The implementation, expertly crafted in Scala and Java, is optimized for distributed environments, making use of Apache Spark and Apache Hadoop. Experience seamless data transformation tailored for efficiency and accuracy with our user-friendly library.

Textual Pseudo Augmentation

  • GitHub
Link to the publication

This innovative open-source library was developed to generate textual pseudo instances, specifically focusing on extracting meaningful sentences from Twitter's data. These sentences are carefully processed and stored in a specialized manner, ensuring effective handling of class imbalance within datasets. The project explores the generation of pseudo instances, comparing their efficacy with simpler methods like under-sampling and over-sampling. Employing the Naive Bayes classifier along with a bag-of-words converter (HashingTF), this implementation optimizes the analysis of textual data. Created using Scala for distributed environments, it seamlessly integrates with Apache Spark, enhancing its scalability and efficiency. To facilitate usage, essential files are consolidated in the common-files folder, requiring placement in the same directory as the .jar file. Additionally, datasets are expected to be stored in HDFS or the same location as common-files, streamlining the entire process for users.

Fairness Aware Ensemble Framework

  • GitHub
Link to the publication

The Fairness-Aware-Ensemble (FAE) Framework is a versatile solution for combating discrimination in imbalanced datasets. It seamlessly integrates with classifiers, employing pre- and post-processing techniques to eliminate bias. By using an equal opportunity metric, users can customize discrimination levels. FAE supports various fairness functions and handles ARFF files in Java. Explore FAE's potential in rectifying bias and fostering fairness within your datasets.

FABBOO: Online Fairness-aware Learning under Class Imbalance

  • GitHub
Link to the publication

In the realm of dynamic data-driven applications, where models constantly evolve due to new data, ensuring fairness is not a one-time task but an ongoing necessity. Existing fairness-aware stream classifiers often overlook class distribution imbalances, leading to discrimination against minority instances. Introducing FABBO, our innovative online fairness-aware approach. FABBO, implemented in MOA, adapts training distribution dynamically, considering stream imbalance and model behavior over the historical data. Our experiments prove that this continuous attention to class balance and fairness results in models with excellent predictive and fairness-related performance. For replicating experiments, the necessary dataset files can be found in the Data directory of this repository.

bottom of page