Week Ending 8.11.2024
RESEARCH WATCH: 8.11.2024
This paper addresses a common challenge in machine learning - overfitting on small datasets. The authors propose integrating distributionally robust optimization (DRO) with label smoothing to improve generalization to unseen domains. Their approach, GI-LS, shifts existing data distributions to generate new data, potentially enhancing model performance on limited datasets. This could have significant applications in fields where large, diverse datasets are difficult to obtain, such as medical imaging or rare event detection. The method's effectiveness is demonstrated on small-scale anomaly classification tasks, suggesting it could be particularly useful for detecting unusual patterns or outliers in data-scarce scenarios.
Authors: Yangdi Wang, Zhi-Hai Zhang, Su Xiu Xu, Wenming Guo
Link: https://arxiv.org/abs/2408.05082v1
Date: 2024-08-09
Summary:
Overfitting commonly occurs when applying deep neural networks (DNNs) on small-scale datasets, where DNNs do not generalize well from existing data to unseen data. The main reason resulting in overfitting is that small-scale datasets cannot reflect the situations of the real world. Label smoothing (LS) is an effective regularization method to prevent overfitting, avoiding it by mixing one-hot labels with uniform label vectors. However, LS only focuses on labels while ignoring the distribution of existing data. In this paper, we introduce the distributionally robust optimization (DRO) to LS, achieving shift the existing data distribution flexibly to unseen domains when training DNNs. Specifically, we prove that the regularization of LS can be extended to a regularization term for the DNNs parameters when integrating DRO. The regularization term can be utilized to shift existing data to unseen domains and generate new data. Furthermore, we propose an approximate gradient-iteration label smoothing algorithm (GI-LS) to achieve the findings and train DNNs. We prove that the shift for the existing data does not influence the convergence of GI-LS. Since GI-LS incorporates a series of hyperparameters, we further consider using Bayesian optimization (BO) to find the relatively optimal combinations of these hyperparameters. Taking small-scale anomaly classification tasks as a case, we evaluate GI-LS, and the results clearly demonstrate its superior performance.
--------------------------------------------------------------------------------------------------------
Transformer Explainer: Interactive Learning of Text-Generative Models
This paper introduces an interactive visualization tool to help non-experts understand Transformer models, specifically GPT-2. As AI becomes more prevalent, there's a growing need for accessible educational resources about these complex systems. The Transformer Explainer runs a live GPT-2 instance in the user's browser, allowing real-time experimentation and observation of the model's internal workings. This tool could be valuable for students, researchers, and professionals looking to gain insights into how large language models function. By making AI education more accessible, it may contribute to broader understanding and responsible development of AI technologies.
Authors: Aeree Cho, Grace C. Kim, Alexander Karpekov, Alec Helbling, Zijie J. Wang, Seongmin Lee, Benjamin Hoover, Duen Horng Chau
Link: https://arxiv.org/abs/2408.04619v1
Date: 2024-08-08
Summary:
Transformers have revolutionized machine learning, yet their inner workings remain opaque to many. We present Transformer Explainer, an interactive visualization tool designed for non-experts to learn about Transformers through the GPT-2 model. Our tool helps users understand complex Transformer concepts by integrating a model overview and enabling smooth transitions across abstraction levels of mathematical operations and model structures. It runs a live GPT-2 instance locally in the user's browser, empowering users to experiment with their own input and observe in real-time how the internal components and parameters of the Transformer work together to predict the next tokens. Our tool requires no installation or special hardware, broadening the public's education access to modern generative AI techniques. Our open-sourced tool is available at https://poloclub.github.io/transformer-explainer/. A video demo is available at https://youtu.be/ECR4oAwocjs.
--------------------------------------------------------------------------------------------------------
Tackling Noisy Clients in Federated Learning with End-to-end Label Correction
Federated learning allows multiple parties to collaboratively train machine learning models without sharing raw data, preserving privacy. However, the varying quality of client datasets, particularly label noise, can degrade model performance. This paper proposes FedELC, a two-stage framework to address this issue. It first detects clients with higher noise rates, then applies an end-to-end label correction method. This approach could significantly improve the reliability of federated learning systems in real-world applications where data quality cannot be guaranteed, such as healthcare, finance, or mobile applications, where sensitive data is distributed across multiple devices or institutions.
Authors: Xuefeng Jiang, Sheng Sun, Jia Li, Jingjing Xue, Runhan Li, Zhiyuan Wu, Gang Xu, Yuwei Wang, Min Liu
Link: https://arxiv.org/abs/2408.04301v1
Date: 2024-08-08
Summary:
Recently, federated learning (FL) has achieved wide successes for diverse privacy-sensitive applications without sacrificing the sensitive private information of clients. However, the data quality of client datasets can not be guaranteed since corresponding annotations of different clients often contain complex label noise of varying degrees, which inevitably causes the performance degradation. Intuitively, the performance degradation is dominated by clients with higher noise rates since their trained models contain more misinformation from data, thus it is necessary to devise an effective optimization scheme to mitigate the negative impacts of these noisy clients. In this work, we propose a two-stage framework FedELC to tackle this complicated label noise issue. The first stage aims to guide the detection of noisy clients with higher label noise, while the second stage aims to correct the labels of noisy clients' data via an end-to-end label correction framework which is achieved by learning possible ground-truth labels of noisy clients' datasets via back propagation. We implement sixteen related methods and evaluate five datasets with three types of complicated label noise scenarios for a comprehensive comparison. Extensive experimental results demonstrate our proposed framework achieves superior performance than its counterparts for different scenarios. Additionally, we effectively improve the data quality of detected noisy clients' local datasets with our label correction framework. The code is available at https://github.com/Sprinter1999/FedELC.
--------------------------------------------------------------------------------------------------------
Connective Viewpoints of Signal-to-Noise Diffusion Models
Diffusion models have become fundamental in generative AI, excelling in tasks like image and audio generation. This paper provides a comprehensive study of Signal-to-Noise (S2N) diffusion models, examining noise schedulers through the lens of signal-to-noise ratio and information theory. The authors develop a generalized backward equation to enhance the inference process. This research could lead to improved performance and efficiency in various generative AI applications, potentially benefiting fields such as content creation, data augmentation, and simulation. By deepening our understanding of these models, it may pave the way for more advanced and capable generative AI systems.
Authors: Khanh Doan, Long Tung Vuong, Tuan Nguyen, Anh Tuan Bui, Quyen Tran, Thanh-Toan Do, Dinh Phung, Trung Le
Link: https://arxiv.org/abs/2408.04221v1
Date: 2024-08-08
Summary:
Diffusion models (DM) have become fundamental components of generative models, excelling across various domains such as image creation, audio generation, and complex data interpolation. Signal-to-Noise diffusion models constitute a diverse family covering most state-of-the-art diffusion models. While there have been several attempts to study Signal-to-Noise (S2N) diffusion models from various perspectives, there remains a need for a comprehensive study connecting different viewpoints and exploring new perspectives. In this study, we offer a comprehensive perspective on noise schedulers, examining their role through the lens of the signal-to-noise ratio (SNR) and its connections to information theory. Building upon this framework, we have developed a generalized backward equation to enhance the performance of the inference process.
--------------------------------------------------------------------------------------------------------
Task-oriented Sequential Grounding in 3D Scenes
This paper introduces a new task and dataset for 3D visual grounding, focusing on following step-by-step instructions to complete daily activities in indoor scenes. Unlike previous work that centered on static, object-centric descriptions, this approach addresses the dynamic nature of task-oriented grounding. The SG3D dataset, containing over 22,000 tasks across nearly 5,000 real-world 3D scenes, could significantly advance embodied AI research. Potential applications include more capable home assistance robots, virtual reality training systems, and improved human-robot interaction in complex environments like factories or hospitals.
Authors: Zhuofan Zhang, Ziyu Zhu, Pengxiang Li, Tengyu Liu, Xiaojian Ma, Yixin Chen, Baoxiong Jia, Siyuan Huang, Qing Li
Link: https://arxiv.org/abs/2408.04034v1
Date: 2024-08-07
Summary:
Grounding natural language in physical 3D environments is essential for the advancement of embodied artificial intelligence. Current datasets and models for 3D visual grounding predominantly focus on identifying and localizing objects from static, object-centric descriptions. These approaches do not adequately address the dynamic and sequential nature of task-oriented grounding necessary for practical applications. In this work, we propose a new task: Task-oriented Sequential Grounding in 3D scenes, wherein an agent must follow detailed step-by-step instructions to complete daily activities by locating a sequence of target objects in indoor scenes. To facilitate this task, we introduce SG3D, a large-scale dataset containing 22,346 tasks with 112,236 steps across 4,895 real-world 3D scenes. The dataset is constructed using a combination of RGB-D scans from various 3D scene datasets and an automated task generation pipeline, followed by human verification for quality assurance. We adapted three state-of-the-art 3D visual grounding models to the sequential grounding task and evaluated their performance on SG3D. Our results reveal that while these models perform well on traditional benchmarks, they face significant challenges with task-oriented sequential grounding, underscoring the need for further research in this area.
--------------------------------------------------------------------------------------------------------
Human Speech Perception in Noise: Can Large Language Models Paraphrase to Improve It?
This study explores using Large Language Models (LLMs) to generate acoustically intelligible paraphrases, potentially improving human speech perception in noisy environments. The researchers propose a "prompt-and-select" approach that resulted in a 40% relative improvement in speech perception under challenging conditions. This work could have significant implications for various applications, including hearing aids, telecommunications in noisy environments, and speech recognition systems. It may also contribute to making AI-generated speech more intelligible in real-world settings, benefiting fields like automated customer service, public announcements, and accessibility technologies for the hearing impaired.
Authors: Anupama Chingacham, Miaoran Zhang, Vera Demberg, Dietrich Klakow
Link: https://arxiv.org/abs/2408.04029v1
Date: 2024-08-07
Summary:
Large Language Models (LLMs) can generate text by transferring style attributes like formality resulting in formal or informal text. However, instructing LLMs to generate text that when spoken, is more intelligible in an acoustically difficult environment, is an under-explored topic. We conduct the first study to evaluate LLMs on a novel task of generating acoustically intelligible paraphrases for better human speech perception in noise. Our experiments in English demonstrated that with standard prompting, LLMs struggle to control the non-textual attribute, i.e., acoustic intelligibility, while efficiently capturing the desired textual attributes like semantic equivalence. To remedy this issue, we propose a simple prompting approach, prompt-and-select, which generates paraphrases by decoupling the desired textual and non-textual attributes in the text generation pipeline. Our approach resulted in a 40% relative improvement in human speech perception, by paraphrasing utterances that are highly distorted in a listening condition with babble noise at a signal-to-noise ratio (SNR) -5 dB. This study reveals the limitation of LLMs in capturing non-textual attributes, and our proposed method showcases the potential of using LLMs for better human speech perception in noise.
--------------------------------------------------------------------------------------------------------
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models
WalledEval is a comprehensive AI safety testing toolkit for evaluating large language models (LLMs). It includes over 35 safety benchmarks covering areas like multilingual safety, exaggerated safety, and prompt injections. The toolkit also introduces WalledGuard, a new content moderation tool, and SGXSTest, a benchmark for assessing exaggerated safety in cultural contexts. This resource could be invaluable for AI developers, researchers, and policymakers working to ensure the safe deployment of LLMs. By providing a standardized, comprehensive safety evaluation framework, WalledEval may contribute to more responsible AI development and help mitigate potential risks associated with these powerful models.
Authors: Prannaya Gupta, Le Qi Yau, Hao Han Low, I-Shiang Lee, Hugo Maximus Lim, Yu Xin Teoh, Jia Hng Koh, Dar Win Liew, Rishabh Bhardwaj, Rajat Bhardwaj, Soujanya Poria
Link: https://arxiv.org/abs/2408.03837v2
Date: 2024-08-08
Summary:
WalledEval is a comprehensive AI safety testing toolkit designed to evaluate large language models (LLMs). It accommodates a diverse range of models, including both open-weight and API-based ones, and features over 35 safety benchmarks covering areas such as multilingual safety, exaggerated safety, and prompt injections. The framework supports both LLM and judge benchmarking, and incorporates custom mutators to test safety against various text-style mutations such as future tense and paraphrasing. Additionally, WalledEval introduces WalledGuard, a new, small and performant content moderation tool, and SGXSTest, a benchmark for assessing exaggerated safety in cultural contexts. We make WalledEval publicly available at https://github.com/walledai/walledeval
--------------------------------------------------------------------------------------------------------
LLaVA-OneVision: Easy Visual Task Transfer
LLaVA-OneVision is a family of open large multimodal models (LMMs) that pushes the boundaries of performance in single-image, multi-image, and video scenarios. Notably, it demonstrates strong transfer learning capabilities across different modalities, yielding new emerging abilities. This advancement could have significant implications for various applications, including improved image and video understanding for content moderation, more sophisticated visual search engines, and enhanced computer vision capabilities in fields like autonomous vehicles, robotics, and medical imaging. The model's ability to transfer learning across scenarios may lead to more flexible and adaptable AI systems.
Authors: Bo Li, Yuanhan Zhang, Dong Guo, Renrui Zhang, Feng Li, Hao Zhang, Kaichen Zhang, Yanwei Li, Ziwei Liu, Chunyuan Li
Link: https://arxiv.org/abs/2408.03326v1
Date: 2024-08-06
Summary:
We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision is the first single model that can simultaneously push the performance boundaries of open LMMs in three important computer vision scenarios: single-image, multi-image, and video scenarios. Importantly, the design of LLaVA-OneVision allows strong transfer learning across different modalities/scenarios, yielding new emerging capabilities. In particular, strong video understanding and cross-scenario capabilities are demonstrated through task transfer from images to videos.
--------------------------------------------------------------------------------------------------------
This paper proposes a new synapse model for Spiking Neural Networks (SNNs) that modulates weights based on Interspike Intervals (ISI). This approach aims to improve energy efficiency by reducing the number of spikes while maintaining classification accuracy. The research could have significant implications for low-power, neuromorphic computing applications, such as edge AI devices, brain-computer interfaces, and energy-efficient sensor networks. By making SNNs more energy-efficient, this work may contribute to the development of more sustainable AI systems and enable AI capabilities in resource-constrained environments.
Authors: Dylan Adams, Magda Zajaczkowska, Ashiq Anjum, Andrea Soltoggio, Shirin Dora
Link: https://arxiv.org/abs/2408.02961v1
Date: 2024-08-06
Summary:
Despite basic differences between Spiking Neural Networks (SNN) and Artificial Neural Networks (ANN), most research on SNNs involve adapting ANN-based methods for SNNs. Pruning (dropping connections) and quantization (reducing precision) are often used to improve energy efficiency of SNNs. These methods are very effective for ANNs whose energy needs are determined by signals transmitted on synapses. However, the event-driven paradigm in SNNs implies that energy is consumed by spikes. In this paper, we propose a new synapse model whose weights are modulated by Interspike Intervals (ISI) i.e. time difference between two spikes. SNNs composed of this synapse model, termed ISI Modulated SNNs (IMSNN), can use gradient descent to estimate how the ISI of a neuron changes after updating its synaptic parameters. A higher ISI implies fewer spikes and vice-versa. The learning algorithm for IMSNNs exploits this information to selectively propagate gradients such that learning is achieved by increasing the ISIs resulting in a network that generates fewer spikes. The performance of IMSNNs with dense and convolutional layers have been evaluated in terms of classification accuracy and the number of spikes using the MNIST and FashionMNIST datasets. The performance comparison with conventional SNNs shows that IMSNNs exhibit upto 90% reduction in the number of spikes while maintaining similar classification accuracy.
--------------------------------------------------------------------------------------------------------
This research uses artificial neural networks to accelerate signal-to-noise ratio (SNR) computation for gravitational-wave detection. The proposed model can generate waveforms or compute SNR timeseries extremely quickly on both CPUs and GPUs. This advancement could significantly improve the speed and efficiency of gravitational-wave detection pipelines, enabling faster identification of candidate events and potentially improving our ability to observe multi-messenger astronomical events. The technique might also find applications in other fields requiring rapid signal processing, such as radar systems or medical imaging.
Authors: Ryan Magee, Richard George, Alvin Li, Ritwik Sharma
Link: https://arxiv.org/abs/2408.02470v1
Date: 2024-08-05
Summary:
Matched-filter based gravitational-wave search pipelines identify candidate events within seconds of their arrival on Earth, offering a chance to guide electromagnetic follow-up and observe multi-messenger events. Understanding the detectors' response to an astrophysical signal across the searched signal manifold is paramount to inferring the parameters of the progenitor and deciding which candidates warrant telescope time. In this paper, we use artificial neural networks to accelerate and signal-to-noise ratio (SNR) computation for sufficiently local patches of the signal manifold. Our machine-learning based model generates a single waveform (or equivalently, computes a single SNR timeseries) in 6 milliseconds on a CPU and 0.4 milliseconds on a GPU. When we use the GPU to generate batches of waveforms simultaneously, we find that we can produce $10^4$ waveforms in $\lesssim 1$ ms on a GPU. This is achieved while remaining faithful, on average, to 1 part in $10^4$ (1 part in $10^5$) for binary black hole (binary neutron star) waveforms. The model we present is designed to directly utilize intermediate detection pipeline outputs, and is a step towards ultra-low-latency parameter estimation within search pipelines.
--------------------------------------------------------------------------------------------------------
Multimodal Gender Fairness in Depression Prediction: Insights on Data from the USA & China
This study examines gender fairness in multimodal depression prediction models across datasets from the USA and China. The researchers investigate how acoustic, textual, and visual features and their inter-modal relations vary among subjects from different cultures and genders. This work highlights the importance of considering cultural and gender differences in mental health manifestation when developing AI-based diagnostic tools. The findings could inform the development of fairer and more accurate mental health assessment systems, potentially improving the efficacy of AI-assisted mental health care across diverse populations.
Authors: Joseph Cameron, Jiaee Cheong, Micol Spitale, Hatice Gunes
Link: https://arxiv.org/abs/2408.04026v1
Date: 2024-08-07
Summary:
Social agents and robots are increasingly being used in wellbeing settings. However, a key challenge is that these agents and robots typically rely on machine learning (ML) algorithms to detect and analyse an individual's mental wellbeing. The problem of bias and fairness in ML algorithms is becoming an increasingly greater source of concern. In concurrence, existing literature has also indicated that mental health conditions can manifest differently across genders and cultures. We hypothesise that the representation of features (acoustic, textual, and visual) and their inter-modal relations would vary among subjects from different cultures and genders, thus impacting the performance and fairness of various ML models. We present the very first evaluation of multimodal gender fairness in depression manifestation by undertaking a study on two different datasets from the USA and China. We undertake thorough statistical and ML experimentation and repeat the experiments for several different algorithms to ensure that the results are not algorithm-dependent. Our findings indicate that though there are differences between both datasets, it is not conclusive whether this is due to the difference in depression manifestation as hypothesised or other external factors such as differences in data collection methodology. Our findings further motivate a call for a more consistent and culturally aware data collection process in order to address the problem of ML bias in depression detection and to promote the development of fairer agents and robots for wellbeing.
--------------------------------------------------------------------------------------------------------
EEGMobile presents a model leveraging a pre-trained MobileViT with Knowledge Distillation for EEG regression tasks, specifically gaze prediction. The model achieves performance comparable to the previous state-of-the-art while being significantly faster and smaller. This research could lead to more efficient and practical Brain-Computer Interface (BCI) applications, potentially enabling real-time neural analytics on mobile and wearable devices. Applications could include improved assistive technologies for people with motor disabilities, more intuitive human-computer interaction, and advancements in neuromarketing and cognitive state monitoring.
Authors: Teng Liang, Andrews Damoah
Link: https://arxiv.org/abs/2408.03449v1
Date: 2024-08-06
Summary:
Electroencephalography (EEG) analysis is an important domain in the realm of Brain-Computer Interface (BCI) research. To ensure BCI devices are capable of providing practical applications in the real world, brain signal processing techniques must be fast, accurate, and resource-conscious to deliver low-latency neural analytics. This study presents a model that leverages a pre-trained MobileViT alongside Knowledge Distillation (KD) for EEG regression tasks. Our results showcase that this model is capable of performing at a level comparable (only 3% lower) to the previous State-Of-The-Art (SOTA) on the EEGEyeNet Absolute Position Task while being 33% faster and 60% smaller. Our research presents a cost-effective model applicable to resource-constrained devices and contributes to expanding future research on lightweight, mobile-friendly models for EEG regression.
--------------------------------------------------------------------------------------------------------
Developing PUGG for Polish: A Modern Approach to KBQA, MRC, and IR Dataset Construction
This paper introduces PUGG, the first Polish Knowledge Base Question Answering (KBQA) dataset, along with new datasets for Machine Reading Comprehension (MRC) and Information Retrieval (IR). The researchers present a modern, semi-automated approach for creating these datasets, leveraging Large Language Models to reduce workload. This work could significantly advance natural language processing capabilities for the Polish language, potentially improving Polish-language AI applications in areas such as search engines, virtual assistants, and automated customer service. The methodology could also be adapted to create similar resources for other low-resource languages.
Authors: Albert Sawczyn, Katsiaryna Viarenich, Konrad Wojtasik, Aleksandra DomogaĆa, Marcin Oleksy, Maciej Piasecki, Tomasz Kajdanowicz
Link: https://arxiv.org/abs/2408.02337v1
Date: 2024-08-05
Summary:
Advancements in AI and natural language processing have revolutionized machine-human language interactions, with question answering (QA) systems playing a pivotal role. The knowledge base question answering (KBQA) task, utilizing structured knowledge graphs (KG), allows for handling extensive knowledge-intensive questions. However, a significant gap exists in KBQA datasets, especially for low-resource languages. Many existing construction pipelines for these datasets are outdated and inefficient in human labor, and modern assisting tools like Large Language Models (LLM) are not utilized to reduce the workload. To address this, we have designed and implemented a modern, semi-automated approach for creating datasets, encompassing tasks such as KBQA, Machine Reading Comprehension (MRC), and Information Retrieval (IR), tailored explicitly for low-resource environments. We executed this pipeline and introduced the PUGG dataset, the first Polish KBQA dataset, and novel datasets for MRC and IR. Additionally, we provide a comprehensive implementation, insightful findings, detailed statistics, and evaluation of baseline models.
--------------------------------------------------------------------------------------------------------
Training on the Fly: On-device Self-supervised Learning aboard Nano-drones within 20 mW
This research presents a novel on-device fine-tuning approach for nano-drones, using self-supervised learning based on ego-motion consistency. The method addresses the challenge of domain shift in real-world applications of tiny machine learning (TinyML) systems. By enabling on-device learning with extremely low power consumption, this work could significantly enhance the adaptability and performance of nano-drones and other resource-constrained cyber-physical systems. Potential applications include improved search and rescue operations, more efficient environmental monitoring, and enhanced capabilities for miniature robots in various fields.
Authors: Elia Cereda, Alessandro Giusti, Daniele Palossi
Link: https://arxiv.org/abs/2408.03168v1
Date: 2024-08-06
Summary:
Miniaturized cyber-physical systems (CPSes) powered by tiny machine learning (TinyML), such as nano-drones, are becoming an increasingly attractive technology. Their small form factor (i.e., ~10cm diameter) ensures vast applicability, ranging from the exploration of narrow disaster scenarios to safe human-robot interaction. Simple electronics make these CPSes inexpensive, but strongly limit the computational, memory, and sensing resources available on board. In real-world applications, these limitations are further exacerbated by domain shift. This fundamental machine learning problem implies that model perception performance drops when moving from the training domain to a different deployment one. To cope with and mitigate this general problem, we present a novel on-device fine-tuning approach that relies only on the limited ultra-low power resources available aboard nano-drones. Then, to overcome the lack of ground-truth training labels aboard our CPS, we also employ a self-supervised method based on ego-motion consistency. Albeit our work builds on top of a specific real-world vision-based human pose estimation task, it is widely applicable for many embedded TinyML use cases. Our 512-image on-device training procedure is fully deployed aboard an ultra-low power GWT GAP9 System-on-Chip and requires only 1MB of memory while consuming as low as 19mW or running in just 510ms (at 38mW). Finally, we demonstrate the benefits of our on-device learning approach by field-testing our closed-loop CPS, showing a reduction in horizontal position error of up to 26% vs. a non-fine-tuned state-of-the-art baseline. In the most challenging never-seen-before environment, our on-device learning procedure makes the difference between succeeding or failing the mission.
--------------------------------------------------------------------------------------------------------
On the use of neurosymbolic AI for defending against cyber attacks
This paper advocates for combining connectionist and symbolic AI approaches using neurosymbolic AI for cyber attack detection and response. The authors identify challenges in current AI-based cybersecurity methods and propose several neurosymbolic use cases. This research could lead to more effective and adaptable cybersecurity systems, potentially improving threat detection, incident response, and overall network resilience. By leveraging the strengths of both AI paradigms, neurosymbolic approaches might offer more robust and explainable cybersecurity solutions for various industries and organizations.
Authors: Gudmund Grov, Jonas Halvorsen, Magnus Wiik Eckhoff, Bjørn Jervell Hansen, Martin Eian, Vasileios Mavroeidis
Link: https://arxiv.org/abs/2408.04996v1
Date: 2024-08-09
Summary:
It is generally accepted that all cyber attacks cannot be prevented, creating a need for the ability to detect and respond to cyber attacks. Both connectionist and symbolic AI are currently being used to support such detection and response. In this paper, we make the case for combining them using neurosymbolic AI. We identify a set of challenges when using AI today and propose a set of neurosymbolic use cases we believe are both interesting research directions for the neurosymbolic AI community and can have an impact on the cyber security field. We demonstrate feasibility through two proof-of-concept experiments.
--------------------------------------------------------------------------------------------------------
The Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in Cybercrime Forums
This study evaluates the accuracy of an LLM system based on GPT-3.5-turbo for extracting Cyber Threat Intelligence (CTI) information from cybercrime forums. The research demonstrates the high accuracy of the LLM in summarizing conversations and coding key CTI variables. This approach could significantly enhance the efficiency and effectiveness of cyber threat analysis, potentially enabling faster identification and response to emerging cyber threats. The methodology could be applied to improve cybersecurity tools, inform policy decisions, and enhance threat intelligence sharing across organizations and sectors.
Authors: Vanessa Clairoux-Trepanier, Isa-May Beauchamp, Estelle Ruellan, Masarah Paquet-Clouston, Serge-Olivier Paquette, Eric Clay
Link: https://arxiv.org/abs/2408.03354v2
Date: 2024-08-08
Summary:
Large language models (LLMs) can be used to analyze cyber threat intelligence (CTI) data from cybercrime forums, which contain extensive information and key discussions about emerging cyber threats. However, to date, the level of accuracy and efficiency of LLMs for such critical tasks has yet to be thoroughly evaluated. Hence, this study assesses the accuracy of an LLM system built on the OpenAI GPT-3.5-turbo model [7] to extract CTI information. To do so, a random sample of 500 daily conversations from three cybercrime forums, XSS, Exploit_in, and RAMP, was extracted, and the LLM system was instructed to summarize the conversations and code 10 key CTI variables, such as whether a large organization and/or a critical infrastructure is being targeted. Then, two coders reviewed each conversation and evaluated whether the information extracted by the LLM was accurate. The LLM system performed strikingly well, with an average accuracy score of 98%. Various ways to enhance the model were uncovered, such as the need to help the LLM distinguish between stories and past events, as well as being careful with verb tenses in prompts. Nevertheless, the results of this study highlight the efficiency and relevance of using LLMs for cyber threat intelligence.
--------------------------------------------------------------------------------------------------------
This paper examines the security implications of Retrieval Augmented Generation (RAG), a popular technique for enhancing language models with external knowledge. As RAG systems often use publicly sourced data, they may be vulnerable to indirect prompt injections. The researchers investigate the effectiveness of various attack strategies against RAG systems, finding that most attacks have a success rate of around 40-60%. This work highlights potential security risks in AI systems that rely on external data sources. It could inform the development of more robust RAG implementations and security measures for AI applications in fields like information retrieval, question-answering systems, and chatbots.
Authors: Gianluca De Stefano, Giancarlo Pellegrino, Lea Schönherr
Link: https://arxiv.org/abs/2408.05025v1
Date: 2024-08-09
Summary:
Retrieval Augmented Generation (RAG) is a technique commonly used to equip models with out of distribution knowledge. This process involves collecting, indexing, retrieving, and providing information to an LLM for generating responses. Despite its growing popularity due to its flexibility and low cost, the security implications of RAG have not been extensively studied. The data for such systems are often collected from public sources, providing an attacker a gateway for indirect prompt injections to manipulate the responses of the model. In this paper, we investigate the security of RAG systems against end-to-end indirect prompt manipulations. First, we review existing RAG framework pipelines deriving a prototypical architecture and identifying potentially critical configuration parameters. We then examine prior works searching for techniques that attackers can use to perform indirect prompt manipulations. Finally, implemented Rag n Roll, a framework to determine the effectiveness of attacks against end-to-end RAG applications. Our results show that existing attacks are mostly optimized to boost the ranking of malicious documents during the retrieval phase. However, a higher rank does not immediately translate into a reliable attack. Most attacks, against various configurations, settle around a 40% success rate, which could rise to 60% when considering ambiguous answers as successful attacks (those that include the expected benign one as well). Additionally, when using unoptimized documents, attackers deploying two of them (or more) for a target query can achieve similar results as those using optimized ones. Finally, exploration of the configuration space of a RAG showed limited impact in thwarting the attacks, where the most successful combination severely undermines functionality.
--------------------------------------------------------------------------------------------------------
AI for operational methane emitter monitoring from space
This research introduces MARS-S2L, an AI-driven system for monitoring methane emissions using satellite imagery. Deployed at the UN Environment Programme's International Methane Emissions Observatory, MARS-S2L significantly outperforms existing detection methods. The system has already identified numerous emission events across multiple countries, leading to formal notifications to governments and stakeholders. This technology could play a crucial role in global efforts to combat climate change by enabling more efficient and timely detection of methane leaks. It may inform policy decisions, support enforcement of environmental regulations, and help industries reduce their methane emissions more effectively.
Authors: Anna Vaughan, Gonzalo Mateo-Garcia, Itziar Irakulis-Loitxate, Marc Watine, Pablo Fernandez-Poblaciones, Richard E. Turner, James Requeima, Javier Gorroño, Cynthia Randles, Manfredi Caltagirone, Claudio Cifarelli
Link: https://arxiv.org/abs/2408.04745v1
Date: 2024-08-08
Summary:
Mitigating methane emissions is the fastest way to stop global warming in the short-term and buy humanity time to decarbonise. Despite the demonstrated ability of remote sensing instruments to detect methane plumes, no system has been available to routinely monitor and act on these events. We present MARS-S2L, an automated AI-driven methane emitter monitoring system for Sentinel-2 and Landsat satellite imagery deployed operationally at the United Nations Environment Programme's International Methane Emissions Observatory. We compile a global dataset of thousands of super-emission events for training and evaluation, demonstrating that MARS-S2L can skillfully monitor emissions in a diverse range of regions globally, providing a 216% improvement in mean average precision over a current state-of-the-art detection method. Running this system operationally for six months has yielded 457 near-real-time detections in 22 different countries of which 62 have already been used to provide formal notifications to governments and stakeholders.
--------------------------------------------------------------------------------------------------------
MLC-GCN: Multi-Level Generated Connectome Based GCN for AD Analysis
This paper presents a novel approach to detecting Alzheimer's Disease (AD) using Multi-Level Generated Connectome based Graph Convolutional Networks (MLC-GCN). The method analyzes brain functional connectivity from resting-state fMRI data, showing improved performance in differentiating between AD, Mild Cognitive Impairment (MCI), and normal aging compared to existing techniques. The high explainability of the model enhances its potential for clinical applications. This research could significantly advance early detection and diagnosis of AD, potentially leading to earlier interventions and improved patient outcomes. The underlying methodology may also be applicable to other neurological disorders, opening new avenues for brain connectivity analysis in various clinical contexts.
Authors: Wenqi Zhu, Yinghua Fu, Ze Wang
Link: https://arxiv.org/abs/2408.03358v1
Date: 2024-08-06
Summary:
Alzheimer's Disease (AD) is a currently incurable neurodegeneartive disease. Accurately detecting AD, especially in the early stage, represents a high research priority. AD is characterized by progressive cognitive impairments that are related to alterations in brain functional connectivity (FC). Based on this association, many studies have been published over the decades using FC and machine learning to differentiate AD from healthy aging. The most recent development in this detection method highlights the use of graph neural network (GNN) as the brain functionality analysis. In this paper, we proposed a stack of spatio-temporal feature extraction and graph generation based AD classification model using resting state fMRI. The proposed multi-level generated connectome (MLC) based graph convolutional network (GCN) (MLC-GCN) contains a multi-graph generation block and a GCN prediction block. The multi-graph generation block consists of a hierarchy of spatio-temporal feature extraction layers for extracting spatio-temporal rsfMRI features at different depths and building the corresponding connectomes. The GCN prediction block takes the learned multi-level connectomes to build and optimize GCNs at each level and concatenates the learned graphical features as the final predicting features for AD classification. Through independent cohort validations, MLC-GCN shows better performance for differentiating MCI, AD, and normal aging than state-of-art GCN and rsfMRI based AD classifiers. The proposed MLC-GCN also showed high explainability in terms of learning clinically reasonable connectome node and connectivity features from two independent datasets. While we only tested MLC-GCN on AD, the basic rsfMRI-based multi-level learned GCN based outcome prediction strategy is valid for other diseases or clinical outcomes.
--------------------------------------------------------------------------------------------------------
This paper introduces the Consistent Reasoning Paradox (CRP), which posits that AI systems striving for human-like intelligence through consistent reasoning will inevitably produce incorrect answers in certain scenarios. The paradox highlights the importance of AI systems being able to admit uncertainty by saying "I don't know." This research has profound implications for the development of trustworthy AI and Artificial General Intelligence (AGI). It suggests that to be truly reliable, AI systems must incorporate mechanisms for acknowledging limitations and uncertainties. This insight could influence the design of AI systems across various applications, from decision-support tools to autonomous systems, emphasizing the importance of transparent and honest AI behavior.
Authors: Alexander Bastounis, Paolo Campodonico, Mihaela van der Schaar, Ben Adcock, Anders C. Hansen
Link: https://arxiv.org/abs/2408.02357v1
Date: 2024-08-05
Summary:
We introduce the Consistent Reasoning Paradox (CRP). Consistent reasoning, which lies at the core of human intelligence, is the ability to handle tasks that are equivalent, yet described by different sentences ('Tell me the time!' and 'What is the time?'). The CRP asserts that consistent reasoning implies fallibility -- in particular, human-like intelligence in AI necessarily comes with human-like fallibility. Specifically, it states that there are problems, e.g. in basic arithmetic, where any AI that always answers and strives to mimic human intelligence by reasoning consistently will hallucinate (produce wrong, yet plausible answers) infinitely often. The paradox is that there exists a non-consistently reasoning AI (which therefore cannot be on the level of human intelligence) that will be correct on the same set of problems. The CRP also shows that detecting these hallucinations, even in a probabilistic sense, is strictly harder than solving the original problems, and that there are problems that an AI may answer correctly, but it cannot provide a correct logical explanation for how it arrived at the answer. Therefore, the CRP implies that any trustworthy AI (i.e., an AI that never answers incorrectly) that also reasons consistently must be able to say 'I don't know'. Moreover, this can only be done by implicitly computing a new concept that we introduce, termed the 'I don't know' function -- something currently lacking in modern AI. In view of these insights, the CRP also provides a glimpse into the behaviour of Artificial General Intelligence (AGI). An AGI cannot be 'almost sure', nor can it always explain itself, and therefore to be trustworthy it must be able to say 'I don't know'.
--------------------------------------------------------------------------------------------------------