Research

Trustworthy AI Service Systems

By leveraging his research activities built around service computing and data analytics, Dr. Badr’s current research strategy aims designing and deploying “Trustworthy AI service systems.” To this end, he is investigating research challenges and business problems from a multidisciplinary and systemic perspective, focusing on the following research areas:

AI analytics systems: aim to leverage machine learning (i.e., statistical learning and deep learning) and AI (i.e., NLP, reinforcement learning) in data analytics to draw out meaningful and actionable insights from the flood of raw data to inform and drive smart decisions (i.e., optimization, performance), or to verify and disprove scientific models, theories, and hypotheses.
Trustworthy AI systems: aim to develop AI systems that are testable, secure, reliable and privacy preserving. Towards this goal, multidisciplinary research challenges investigate two different, yet complementary approaches: 1) a risk management framework to evaluate AI cyber-threats, vulnerabilities, and cyber-risks, and develop mitigation strategies, and 2) develop a blockchain-based federated learning framework and tools that preserve the confidentiality of sensitive data in distributed environments.

Trustworthy AI systems:

Recent Research Projects and Grants

Managing Risks in AI Systems: Mitigating Vulnerabilities and Threats Using Design Tactics and Patterns

Team:
- Youakim Badr (PI), School of Graduate Professional Studies
- Raghu Sangwan, Satish Srinivasan, and Partha Mukherjee, (School of Graduate Professional Studies) (co-PIs)
- Prasenjit Mitra, College of IST, Technical Consultant
Program: 2020 industryXchange Grant
Budget: $48,000, Period: 10/2020 – 11/2022

Summary: Advances in AI combined with sensors, actuators and embedded systems technologies has made it feasible to incorporate intelligence into software intensive-systems with the ability to control and adapt their behavior in real time. Designing AI-centric systems, therefore, has become and will be a norm in the future. These systems are likely to be distributed Managing the complexity that comes with designing such dynamic systems requires risk management to handle uncertainty, safety and dependability that, if not addressed, can make these systems vulnerable to potential threats. However, cybersecurity and vulnerability of AI models to adversarial attacks have raised concerns and lead AI models to misclassify or misbehave.

This research project has the following objectives:

Develop AI Risk Management Framework from holistic and multi-disciplinary perspectives to identify cyber threats and assess cyber-risks and mitigation strategies.
Develop fault tolerance mechanisms for AI models to ensure their resilience in production.
Extend software testability to AI testability and define new test tactics and patterns.
Develop monitoring mechanisms to detect propagation of threats and vulnerabilities in distributed environments.

Crowdlearning: Building Trustworthy AI Models from Crowdsourced Data and Edge Computing

Team:

- Youakim Badr (PI), School of Graduate Professional Studies
- Prasenjit Mitra (co-PI), College of Information Sciences and Technology
Program: Center for Security Research and Education – Impact Award
Budget: $57,467; Period: 01/2021 – 12/2022

Summary: This project introduces the concept of “crowdlearning” as a participatory method of building AI models, such Deep Neural Networks (DNN), with the help of a large group of contributors. In crowdlearning, contributors, as paid freelancers or volunteers, collaboratively participate to train a global AI model while keeping all their sample data on their devices (personal PC, mobile phones, self-driving cars, etc..). The project aims at developing a secure framework, called crowdlearning platform, with cluster federated learning to enable a collaborative train AI models from local data in trustless environments. The platform includes pruning algorithms to mitigate backdoors, and relies on blockchain-based cybersecurity mechanisms to control access to local data and edge devices and resist DDoS attacks.

The project tackles the following challenges:

Build a trustworthy federated learning algorithm resistance to adversarial attacks and backdoors.
Develop digital identity and authorization protocols without relying on external security authority (i.e., third party identity provider) to empower contributors with full controls on how their local data and computational resources are used.
Develop a fully functional prototype will implement the crowdlearning platform, and demonstrate its feasibility and performance on real-world datasets, including Projected Healthcare Information under HIPAA.

Trusty AI: Trustworthy and reliable Federated Learning with privacy preserving

Team:

- Antoine Boutet (PI), Jan Aalmoes and Thomas Lebrun (co-PIs) – INRIA / INSA de Lyon, France
- Youakim Badr (co-PI), Robin Qiu, Prasenjit Mitra, Patrick McDenial – Pennsylvania State University
Program: Pack Ambition International 2021- Auvergne-Rhône Alpes Region, France
Period: 09/01/2021 – 08/01/2023

Summary: This project aims at developing a secure Federated Machine Learning Framework and tools that preserve the confidentiality of personal data in distributed environments. To this end, we will extend different federated learning approaches and consider their limitations in terms of accuracy, confidentiality, and robustness related to these approaches. In addition, we will enable our Federated Machine Learning Framework with mechanisms to better understand the distributed AI learning process and ensure unbiased fairness that may occur from users data.

This project will also strengthen a partnership between professors from INSA Lyon, School of Graduate Studies, College of Information Science and Technologies, and School of Computer Science and Engineering at the Pennsylvania State University to not only develop common research topics but also exchange Ph.D. students in Lyon and Penn State research laboratories.

In addition, the project aims to develop several teaching initiatives to allow students from both institutions to benefit from the Federated Machine Learning Framework as a teaching platform to build federatel learning projects and experiments Cybersecurity attacks on AI systems. The mutual visits of faculty also aim to promote double degree programs and summer programs.

Enabling Privacy Preserving in Federated Learning

Team
- Youakim Badr (PI), Antoine Boutet (PI, INRIA / INSA de Lyon, France)
Program: Thomas Jefferson Fund – Face Foundation
Period: 2021/09 – 2023/08

Keywords: Federated Learning, Language Models proxy-based privacy-preserving,..

Accordion #2
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

AI-analytics Systems:

Research Projects (graduate students – MS Data Analytics)

A DRL-based Conversational Chatbot for Personalized Responses driven by learner’s profiles

Aim of Research:
To propose an adaptable conversational AI chatbot system that “naturally” interact with learners and generate answers with different levels of details based on learners’ profiles and acquired knowledge from previous conversation sessions.

Proposed Solution:

Train a Deep Reinforcement Learning and Deep NLP techniques (text summarization, embedding and transformers) based conversational chatbot to improve the future path of conversations and generate answers that improve the learning experience.
Integrate learner profiles in a continual learning loop and mutually update the learner’s profile and tailored responses with the global award of increasing the quality of conversations.
Proof of concept: experiment the chatbot and apply it to a specific domain such as “Deep Learning” to answer a given question with different level of details.

Team: Haruka George’23 (MS Data analytics)
Lifespan: 15/8/2022 – 15/5/2023

Misinformation Detection using NLP Semi-supervised and Supervised Learning Hybrid Approach

Aim of Research:
Social media serves as a platform to outsource all kinds of information from politics to entertainment, from health industry to the country’s administration. Some of the posts on these platforms are reliable, but most of them have some proportion of misinformation. The diversity of posts, such as the use of different languages, abbreviating words and messages with hidden meanings, make it more complex to identify the authenticity of the published information. One of the main challenges in detecting misinformation is that false claims can appear in countless variations and scarcity of ground truth labels to build trustworthy classifiers over time.

Proposed Solution:
This research will focus on identifying an end-to-end framework to find misinformation in social media postings using a NLP integrated hybrid approach, combining semi-supervised and deep supervised learning. More precisely, we are examining the following research question on “How effective are the semi-supervised algorithms in identifying misinformation in online social media postings (twitters)? “ The objectives for evaluating this research question are summarized as follows:

Determine how labels can be propagated within a dataset that is largely unlabelled
Implement hybrid models for classifying whether or not tweets on Twitter include misinformation (true for facts, false for misinformation) •
Evaluate the goodness of manual labelling using probabilistic fact-checkers.
Proof of concept: experiment the framework and apply it to tweets of potential health misinformation related to Covid-19, collected from 03/25/2020 to 10/30/2020 and benchmarks datasets from the literature.

Team: Deeksha Joshi’23 (MS Data Analytics)
Lifespan: 15/8/2022 – 15/5/2023

Towards a knowledge graph driven intelligent tutoring system

Aim of Research:
The rise of Massive Open Online Courses (MOOC) has propelled the rise of e-learning with all-time availability of learning content in the recent times. This scenario demands a similar perpetual tutor which can solve domain related doubts of the users and increase interaction with the learning content. The traditional dialogue systems lack personalization and fail to interact in an effective natural way to the learner. Tutoring Systems (ITS) lack personalization and fail to effectively adapt to newer and sophisticated neural approaches resulting in reduced effectiveness of knowledge acquisition. Therefore, it is vital to design an authoring tool that publishes a knowledge-rich domain- specific tutoring assistant which can help the user converse about a particular topic to solve queries/doubts related to a topic the student is learning on the MOOC’s platform.

Proposed Solution:
We proposed a methodology and algorithms to generate a knowledge graph-driven intelligent tutoring system, which pedagogically utilizes Natural Language Understanding (NLU) techniques to improve the quality of interactions of the tutoring system and the learner. The proposed domain-specific open knowledge graph built from scraping content from wikipedia and Medium websites, and contains a total of ~190,000 entities for specific domains such Machine Learning, statistics, and AI. The graph is used to train and tune a Dual Intent and Entity Transformer (DIET) and achieves accuracies, F-1, recall, and precision values in the range of 90-95% in classifying intents and entities in user messages.

A minimum viable prototype, called Impulso, is under development with Graken (renamed to TypeDB) and RASA by a startup.

Team: Atharva Mungeei’21 (MS Data Analytics)
Lifespan: 15/8/2020 – 15/5/2021

Deep reinforcement learning based energy-efficient heating controller for smart buildings

Aim of Research:
Automation is the new future of the world, which is changing complete the lifestyle of people by integrating technology into their daily activities. Automating building operations and conversion of a building into a “Smart Building” is one of the significant sectors gaining interest in recent years. It is challenging to implement such a strategy due to multiple constraints and factors which contributes to unpredictable variations of the temperature inside buildings (external weather, occupancy, day/night, seasons, appliances etc.). Most of existing solutions rely on supervised learning techniques to build models, which require historical training datasets. However, these models often become obsolete when the environments changes (new features, new appliances, etc.).

Proposed Solution:
This research aims to propose a deep reinforcement learning based solutions/scenarios that allow controlling the temperature according to user preferences while reducing energy costs . Deep reinforcement learning based energy-efficient heating controllers experiments different rewards: the comfort-driven reward (setpoint temperature vs indoor temperature) and integrated cost and comfort driven reward, and different settings: one building – one controller, multiple buildings-one controller, and multiple buildings-multiple controllers.

Team: Anchal Gupta ’19 (MS Data analytics)
Lifespan: 15/8/2018 – 12/15/2019

Publication:

Anchal Gupta, Youakim Badr, Ashkan Negahban, Robin Qiu, “Energy-efficient heating control for smart buildings with deep reinforcement learning,” Journal of Building Engineering, Vol 34, Article number 1101739, 2021 (2021Impact Factor: 7.1)

Research Projects (Research Assistantship(RA))

Non-Verbal Behavior Analyzer (NOVOR)

Objective: develop a platform and models to detect patterns of non-verbal behaviors when humans interact with each other’s and/or with virtual assistants.
Team: Ambika Chundru’22, Shraddha Maurya’19 (RA), Sura Bondugula (RA), and Dr. Minyoung Cheong

Blockchain Data Analytics (Daan.chains)

Objective: build analytics pipelines to explore, understand and get insights from Ethereum and Bitcoin blockchains.
Team: Akash Singh Baghel’20 (RA) and Dr. Partha Mukherjee

Cryptocurrencies Exchange Rates Forecasting

Objective: deploy state of the art Deep Neural Networks models to forecast the Ethereum and Bitcoin Exchange Rates.
Team: Gauravi Bhalchandra Patil’20 (RA), Mokkapati, Yogitha Siva’20 (RA) and Dr. Partha Mukherjee

Auto-Adversarial and Bias Vulnerability Detection

Objective: Build an automated tool to detect adversarial attacks and biases.
Team: Rahu Sharma’20 (RA) and Suraj Bondugula’20 (RA)

Past Projects

Cybersecurity Collaboratory: Cyberspace Threat Identification, Analysis & Proactive Response (2013-2018)

Source of Support: Partner University Fund (PUF) (USA-France)

Academic Partners: University of Lyon 1 (LIRIS Lab), University of Arizona and University of Chicago

Keywords: Information Assurance, Security by Design, Moving Target Techniques, Resilient SOA, etc.

RIOT: Resilient Security in Dynamically Networked Smart Object (IoT) (2015-2016)

Source of Support: Seed Grant, INSA-Lyon’s inter-labs funding program

Partners: LIRIS Lab (INSA-Lyon/University of Lyon 1)

Keywords: Mobile devices, Access control, Delay Tolerant Networking, Simulation, …

LLIOT: Linear Logic for the Internet of Things (2018-2020)

Source of Support: Informatic Federation of Lyon Grant

Partners: LIRIS Lab (INSA Lyon), LIP (ENS Lyon)

Keywords: Linear Logic, automated prover, proof certification, Coq, OCaml.

Brain 2.0: Brain-Smart Object Interfaces for the Elderly People to Control Home Devices with Brainwaves (2016-2018)
Source of Support: COOPERA-International Collaboration Program, AURA

Project life span: 2016-2018

Partner: The University of Pittsburgh, LIRIS lab (INSA)

Keywords: Internet of Things, formal specification, event streaming, signal processing, etc.

CT-ANALYTICS: Big Data Proactive Analytics Platform for Analyzing Citizen behaviors in Urban Worlds (2015-2017)

Source of Support: COOPERA-International Collaboration Program, AURA

Partner: The Pennsylvania State University, LIRIS Lab (INSA-Lyon)

Keywords: Big Data Analytics, Open data, sentiment Analysis, User behavior Analysis, etc.

SemEUse: Design of Semantic and Secure Enterprise Service Bus (2008-2010)

Source of Support: French National Research Agency (ANR)

Industrial Partners: Thales Communication France, France Télécom, EBM Websourcing,

Academic Partners: INRIA Object Web, INRIA ARLES, Télécom SudParis, LIP6, LIESP Lab (INSA-Lyon)

Keywords: Security, Late Binding, Monitoring, QoS, SOA, ESB, Ontology, etc.

ISPRI-PLM: Services for the Integration of Industrial Processes and their Application to Product Lifecycle Management (2009-2011)

Source of Support: Rhône-Alpes Region

Academic Partners: G-SCOP, IREGE, LIESP Lab (INSA-Lyon), LISTIC, STOICA, SYMME, INRIA, SYSCOM

Industrial Partners: ASSETIUM, Arve Industries, EBM Websourcing, AIP-Primeca Rhône Ouest

Keywords: Product Life-Cycle, SOA, Interoperability, Model Driven Engineering, Standardization.

INTER-PROD: Organizational & Technological Interoperability to Support Co-Production (2006-2008)

Source of Support: Rhône-Alpes Region

Academic Partners: G2I, LIESP Lab (INSA-Lyon)

Industrial Partners: EBM Websourcing

Keywords: Co-production, Enterprise Architecture, Organizational Structure, Collaboration, …

COPILOTES: Collaboration and Information Exchange in Supply Chains (2004-2006)

Project life span:

Academic Partners: PRISMa /INSA de Lyon/Lyon 2, COPISORG, G.A.E.L./Grenoble, GILCO/ (Institut National Polytechnique de Grenoble, ENSMSE_G2I/ ENS des Mines de Saint-Etienne.

Industrial Partners: RHODIA, ROSET, FINMATICA, VALRHONA

Keywords: Value Network, Supply Chain, Information Sharing, Process Integration, Best Practice …

BSM: Business Collaborative Service Bus (2009-2011)

Source of Support: Seed grant (INSA-Lyon)

Partners: LIRIS Lab (Database research team, Distributed System research team, SOC research team

Keywords: ESB, Web services, service composition, business processes, …

PERS: Service-Based Pervasive Environment to Assist Elderly People (2005-2007)

Partners: LIESP Lab (INSA-Lyon), LIRIS, EMPERE (INSA-Lyon Inter Labs research project)

Source of Support: Seed Grant, INSA-Lyon’s inter-labs funding program

Keywords: ubiquitous computing, sensors, service composition, machine learning, …