Week Ending 10.27.2024
RESEARCH WATCH: 10.27.2024
NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction
While scientists have made progress in reconstructing static images from brain activity using fMRI data, reconstructing videos has remained a significant challenge. This groundbreaking research introduces NeuroClips, a framework that can decode both high-level semantics and low-level perception flows from brain activity to reconstruct smooth, high-fidelity videos up to 6 seconds long. The potential applications are vast - from brain-computer interfaces for communication, to medical applications for understanding visual processing disorders, and even potential development of assistive technologies for people with visual impairments.
Authors: Zixuan Gong, Guangyin Bao, Qi Zhang, Zhongwei Wan, Duoqian Miao, Shoujin Wang, Lei Zhu, Changwei Wang, Rongtao Xu, Liang Hu, Ke Liu, Yu Zhang
Link: https://arxiv.org/abs/2410.19452v1
Date: 2024-10-25
Summary:
Reconstruction of static visual stimuli from non-invasion brain activity fMRI achieves great success, owning to advanced deep learning models such as CLIP and Stable Diffusion. However, the research on fMRI-to-video reconstruction remains limited since decoding the spatiotemporal perception of continuous visual experiences is formidably challenging. We contend that the key to addressing these challenges lies in accurately decoding both high-level semantics and low-level perception flows, as perceived by the brain in response to video stimuli. To the end, we propose NeuroClips, an innovative framework to decode high-fidelity and smooth video from fMRI. NeuroClips utilizes a semantics reconstructor to reconstruct video keyframes, guiding semantic accuracy and consistency, and employs a perception reconstructor to capture low-level perceptual details, ensuring video smoothness. During inference, it adopts a pre-trained T2V diffusion model injected with both keyframes and low-level perception flows for video reconstruction. Evaluated on a publicly available fMRI-video dataset, NeuroClips achieves smooth high-fidelity video reconstruction of up to 6s at 8FPS, gaining significant improvements over state-of-the-art models in various metrics, e.g., a 128\% improvement in SSIM and an 81\% improvement in spatiotemporal metrics. Our project is available at https://github.com/gongzix/NeuroClips}{https://github.com/gongzix/NeuroClips.
--------------------------------------------------------------------------------------------------------
Applying sparse autoencoders to unlearn knowledge in language models
This research tackles the challenging problem of selectively removing specific knowledge from language models while minimizing unintended effects on other capabilities. Using sparse autoencoders (SAEs), the team explored techniques for "unlearning" biology-related knowledge from language models. This work has important implications for AI safety and control, potentially allowing for the removal of harmful or sensitive information from models while preserving their general functionality. The findings highlight both the promise and current limitations of using SAEs for targeted knowledge removal.
Authors: Eoin Farrell, Yeu-Tong Lau, Arthur Conmy
Link: https://arxiv.org/abs/2410.19278v1
Date: 2024-10-25
Summary:
We investigate whether sparse autoencoders (SAEs) can be used to remove knowledge from language models. We use the biology subset of the Weapons of Mass Destruction Proxy dataset and test on the gemma-2b-it and gemma-2-2b-it language models. We demonstrate that individual interpretable biology-related SAE features can be used to unlearn biology-related knowledge with minimal side-effects. Our results suggest that negative scaling of feature activations is necessary and that zero ablating features is ineffective. We find that intervening using multiple SAE features simultaneously can unlearn multiple different topics, but with similar or larger unwanted side-effects than the existing Representation Misdirection for Unlearning technique. Current SAE quality or intervention techniques would need to improve to make SAE-based unlearning comparable to the existing fine-tuning based techniques.
--------------------------------------------------------------------------------------------------------
Can Stories Help LLMs Reason? Curating Information Space Through Narrative
This innovative research examines how incorporating narrative elements into prompting can enhance Large Language Models' problem-solving abilities. By structuring complex problems as stories, the approach helps models better understand and solve challenging questions across various scientific domains. The methodology shows particular promise in education, where narrative frameworks could help AI tutoring systems explain complex concepts more effectively. The results demonstrate consistent improvements over traditional prompting techniques, suggesting a valuable new direction for making AI reasoning more effective and accessible.
Authors: Vahid Sadiri Javadi, Johanne R. Trippas, Yash Kumar Lal, Lucie Flek
Link: https://arxiv.org/abs/2410.19221v1
Date: 2024-10-25
Summary:
Narratives are widely recognized as a powerful tool for structuring information and facilitating comprehension of complex ideas in various domains such as science communication. This paper investigates whether incorporating narrative elements can assist Large Language Models (LLMs) in solving complex problems more effectively. We propose a novel approach, Story of Thought (SoT), integrating narrative structures into prompting techniques for problem-solving. This approach involves constructing narratives around problem statements and creating a framework to identify and organize relevant information. Our experiments show that using various LLMs with SoT consistently surpasses using them with other techniques on physics, chemistry, math, and biology questions in both the GPQA and JEEBench datasets. The narrative-based information curation process in SoT enhances problem comprehension by contextualizing critical in-domain information and highlighting causal relationships within the problem space.
--------------------------------------------------------------------------------------------------------
From Efficiency to Equity: Measuring Fairness in Preference Learning
As AI systems increasingly influence decision-making, ensuring fair representation of diverse human preferences becomes crucial. This research introduces novel metrics adapted from economic theories to evaluate fairness in AI preference learning models. The framework offers practical tools for measuring and improving the equitable treatment of different user groups in AI systems. Applications could range from recommendation systems to content moderation, helping ensure AI systems serve diverse populations fairly and effectively.
Authors: Shreeyash Gowaikar, Hugo Berard, Rashid Mushkani, Shin Koseki
Link: https://arxiv.org/abs/2410.18841v1
Date: 2024-10-24
Summary:
As AI systems, particularly generative models, increasingly influence decision-making, ensuring that they are able to fairly represent diverse human preferences becomes crucial. This paper introduces a novel framework for evaluating epistemic fairness in preference learning models inspired by economic theories of inequality and Rawlsian justice. We propose metrics adapted from the Gini Coefficient, Atkinson Index, and Kuznets Ratio to quantify fairness in these models. We validate our approach using two datasets: a custom visual preference dataset (AI-EDI-Space) and the Jester Jokes dataset. Our analysis reveals variations in model performance across users, highlighting potential epistemic injustices. We explore pre-processing and in-processing techniques to mitigate these inequalities, demonstrating a complex relationship between model efficiency and fairness. This work contributes to AI ethics by providing a framework for evaluating and improving epistemic fairness in preference learning models, offering insights for developing more inclusive AI systems in contexts where diverse human preferences are crucial.
--------------------------------------------------------------------------------------------------------
Multi-agent cooperation through learning-aware policy gradients
This research addresses a fundamental challenge in multi-agent systems: achieving cooperation among self-interested learning agents. The team developed a novel policy gradient algorithm that enables agents to model and respond to each other's learning processes. This breakthrough has significant implications for applications like autonomous vehicle coordination, distributed robotics, and multi-agent economic systems. The approach shows particular promise in scenarios requiring temporal coordination and could advance the development of more sophisticated cooperative AI systems.
Authors: Alexander Meulemans, Seijin Kobayashi, Johannes von Oswald, Nino Scherrer, Eric Elmoznino, Blake Richards, Guillaume Lajoie, Blaise Agüera y Arcas, João Sacramento
Link: https://arxiv.org/abs/2410.18636v1
Date: 2024-10-24
Summary:
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning. How can we achieve cooperation among self-interested, independent learning agents? Promising recent work has shown that in certain tasks cooperation can be established between learning-aware agents who model the learning dynamics of each other. Here, we present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning, which takes into account that other agents are themselves learning through trial and error based on multiple noisy trials. We then leverage efficient sequence models to condition behavior on long observation histories that contain traces of the learning dynamics of other agents. Training long-context policies with our algorithm leads to cooperative behavior and high returns on standard social dilemmas, including a challenging environment where temporally-extended action coordination is required. Finally, we derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
--------------------------------------------------------------------------------------------------------
ALTA: Compiler-Based Analysis of Transformers
This research introduces ALTA, a new programming language and compiler that can translate programs into Transformer neural network weights. The innovation allows for precise control over how Transformers process information, particularly for tasks requiring length-invariant algorithms. This breakthrough could significantly impact natural language processing, making it easier to create and analyze AI models with guaranteed behavioral properties. Applications could include more reliable and transparent AI systems for critical applications where behavioral guarantees are essential.
Authors: Peter Shaw, James Cohan, Jacob Eisenstein, Kenton Lee, Jonathan Berant, Kristina Toutanova
Link: https://arxiv.org/abs/2410.18077v1
Date: 2024-10-23
Summary:
We propose a new programming language called ALTA and a compiler that can map ALTA programs to Transformer weights. ALTA is inspired by RASP, a language proposed by Weiss et al. (2021), and Tracr (Lindner et al., 2023), a compiler from RASP programs to Transformer weights. ALTA complements and extends this prior work, offering the ability to express loops and to compile programs to Universal Transformers, among other advantages. ALTA allows us to constructively show how Transformers can represent length-invariant algorithms for computing parity and addition, as well as a solution to the SCAN benchmark of compositional generalization tasks, without requiring intermediate scratchpad decoding steps. We also propose tools to analyze cases where the expressibility of an algorithm is established, but end-to-end training on a given training set fails to induce behavior consistent with the desired algorithm. To this end, we explore training from ALTA execution traces as a more fine-grained supervision signal. This enables additional experiments and theoretical analyses relating the learnability of various algorithms to data availability and modeling decisions, such as positional encodings. We make the ALTA framework -- language specification, symbolic interpreter, and weight compiler -- available to the community to enable further applications and insights.
--------------------------------------------------------------------------------------------------------
POMDP-Driven Cognitive Massive MIMO Radar: Joint Target Detection-Tracking In Unknown Disturbances
This research advances radar technology by applying artificial intelligence to improve target detection and tracking in challenging environments. The system uses a POMDP framework to continuously optimize its performance while maintaining reliability. This approach could revolutionize various applications, from military surveillance to civilian air traffic control, by providing more accurate and robust target tracking in complex environments. The system's ability to adapt to unknown disturbances makes it particularly valuable for real-world deployment.
Authors: Imad Bouhou, Stefano Fortunati, Leila Gharsalli, Alexandre Renaux
Link: https://arxiv.org/abs/2410.17967v1
Date: 2024-10-23
Summary:
The joint detection and tracking of a moving target embedded in an unknown disturbance represents a key feature that motivates the development of the cognitive radar paradigm. Building upon recent advancements in robust target detection with multiple-input multiple-output (MIMO) radars, this work explores the application of a Partially Observable Markov Decision Process (POMDP) framework to enhance the tracking and detection tasks in a statistically unknown environment. In the POMDP setup, the radar system is considered as an intelligent agent that continuously senses the surrounding environment, optimizing its actions to maximize the probability of detection $(P_D)$ and improve the target position and velocity estimation, all this while keeping a constant probability of false alarm $(P_{FA})$. The proposed approach employs an online algorithm that does not require any apriori knowledge of the noise statistics, and it relies on a much more general observation model than the traditional range-azimuth-elevation model employed by conventional tracking algorithms. Simulation results clearly show substantial performance improvement of the POMDP-based algorithm compared to the State-Action-Reward-State-Action (SARSA)-based one that has been recently investigated in the context of massive MIMO (MMIMO) radar systems.
--------------------------------------------------------------------------------------------------------
AI Future Envisioning with PLACARD
This paper documents an innovative workshop approach to exploring potential AI futures through structured group activities. Using a combination of futures studies techniques and interactive card games, the methodology helps participants envision and discuss possible AI developments. This approach could be valuable for AI policy planning, corporate strategy development, and public engagement with AI issues, offering a structured way to explore and prepare for various AI future scenarios.
Authors: Mary C. Tedeschi, Paola Ricaurte, Sridevi Ayloo, Joseph Corneli, Charles Jeffrey Danoff, Sergio Belich
Link: https://arxiv.org/abs/2410.17155v1
Date: 2024-10-22
Summary:
At EuroPLoP 2024 Mary Tedeschi led the "AI Future Envisioning with PLACARD" focus group in Germany. Three conference attendees joined in the room while Sridevi, Paola, and Charles co-facilitated remotely via a web conference. The participants were introduced to a Futures Studies technique with the goal of capturing envisionments of Artificial Intelligence (AI) going forward. To set an atmosphere a technology focused card game was used to make the session more interactive. To close everyone co-created a Project Action Review to recap of the event to capture learnings that has been summarized in this paper. The Focus Group was structured based on lessons learned over six earlier iterations.
--------------------------------------------------------------------------------------------------------
This research tackles the crucial challenge of identifying machine-generated text at the word level, going beyond simple binary classification of entire documents. The approach offers more granular detection capabilities, which could be vital for applications in academic integrity, journalism, and content moderation. The system's ability to work across different domains and generators makes it particularly valuable for real-world applications where hybrid human-AI content is increasingly common.
Authors: Ram Mohan Rao Kadiyala
Link: https://arxiv.org/abs/2410.16659v1
Date: 2024-10-22
Summary:
With increasing usage of generative models for text generation and widespread use of machine generated texts in various domains, being able to distinguish between human written and machine generated texts is a significant challenge. While existing models and proprietary systems focus on identifying whether given text is entirely human written or entirely machine generated, only a few systems provide insights at sentence or paragraph level at likelihood of being machine generated at a non reliable accuracy level, working well only for a set of domains and generators. This paper introduces few reliable approaches for the novel task of identifying which part of a given text is machine generated at a word level while comparing results from different approaches and methods. We present a comparison with proprietary systems , performance of our model on unseen domains' and generators' texts. The findings reveal significant improvements in detection accuracy along with comparison on other aspects of detection capabilities. Finally we discuss potential avenues for improvement and implications of our work. The proposed model is also well suited for detecting which parts of a text are machine generated in outputs of Instruct variants of many LLMs.
--------------------------------------------------------------------------------------------------------
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
This research presents a breakthrough in efficient video processing for AI systems, demonstrating that just 32 tokens can effectively represent video content in multimodal language models. This dramatic reduction in computational requirements could revolutionize video-based AI applications, from content moderation to video search and analysis. The approach's efficiency makes it particularly valuable for deployment on resource-constrained devices while maintaining competitive performance with much larger models.
Authors: Michael S. Ryoo, Honglu Zhou, Shrikant Kendre, Can Qin, Le Xue, Manli Shu, Silvio Savarese, Ran Xu, Caiming Xiong, Juan Carlos Niebles
Link: https://arxiv.org/abs/2410.16267v1
Date: 2024-10-21
Summary:
We present xGen-MM-Vid (BLIP-3-Video): a multimodal language model for videos, particularly designed to efficiently capture temporal information over multiple frames. BLIP-3-Video takes advantage of the 'temporal encoder' in addition to the conventional visual tokenizer, which maps a sequence of tokens over multiple frames into a compact set of visual tokens. This enables BLIP3-Video to use much fewer visual tokens than its competing models (e.g., 32 vs. 4608 tokens). We explore different types of temporal encoders, including learnable spatio-temporal pooling as well as sequential models like Token Turing Machines. We experimentally confirm that BLIP-3-Video obtains video question-answering accuracies comparable to much larger state-of-the-art models (e.g., 34B), while being much smaller (i.e., 4B) and more efficient by using fewer visual tokens. The project website is at https://www.salesforceairesearch.com/opensource/xGen-MM-Vid/index.html
--------------------------------------------------------------------------------------------------------
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
This research introduces an innovative all-in-one judge model for evaluating other AI models' performance. The system can perform various evaluation tasks, from scoring to generating detailed critiques, providing a comprehensive solution for AI assessment. This tool could significantly impact AI development by offering consistent, reproducible evaluation metrics, potentially accelerating the improvement of AI systems while maintaining quality standards.
Authors: Maosong Cao, Alexander Lam, Haodong Duan, Hongwei Liu, Songyang Zhang, Kai Chen
Link: https://arxiv.org/abs/2410.16256v1
Date: 2024-10-21
Summary:
Efficient and accurate evaluation is crucial for the continuous improvement of large language models (LLMs). Among various assessment methods, subjective evaluation has garnered significant attention due to its superior alignment with real-world usage scenarios and human preferences. However, human-based evaluations are costly and lack reproducibility, making precise automated evaluators (judgers) vital in this process. In this report, we introduce \textbf{CompassJudger-1}, the first open-source \textbf{all-in-one} judge LLM. CompassJudger-1 is a general-purpose LLM that demonstrates remarkable versatility. It is capable of: 1. Performing unitary scoring and two-model comparisons as a reward model; 2. Conducting evaluations according to specified formats; 3. Generating critiques; 4. Executing diverse tasks like a general LLM. To assess the evaluation capabilities of different judge models under a unified setting, we have also established \textbf{JudgerBench}, a new benchmark that encompasses various subjective evaluation tasks and covers a wide range of topics. CompassJudger-1 offers a comprehensive solution for various evaluation tasks while maintaining the flexibility to adapt to diverse requirements. Both CompassJudger and JudgerBench are released and available to the research community athttps://github.com/open-compass/CompassJudger. We believe that by open-sourcing these tools, we can foster collaboration and accelerate progress in LLM evaluation methodologies.
--------------------------------------------------------------------------------------------------------
Small Contributions, Small Networks: Efficient Neural Network Pruning Based on Relative Importance
This research presents an innovative approach to neural network pruning, focusing on removing weights with minimal contributions to neuron outputs. The method's statistical foundation and interpretability make it particularly valuable for deploying AI models on resource-constrained devices. Applications could range from mobile AI applications to IoT devices, enabling more efficient deployment of powerful AI models while maintaining performance.
Authors: Mostafa Hussien, Mahmoud Afifi, Kim Khoa Nguyen, Mohamed Cheriet
Link: https://arxiv.org/abs/2410.16151v1
Date: 2024-10-21
Summary:
Recent advancements have scaled neural networks to unprecedented sizes, achieving remarkable performance across a wide range of tasks. However, deploying these large-scale models on resource-constrained devices poses significant challenges due to substantial storage and computational requirements. Neural network pruning has emerged as an effective technique to mitigate these limitations by reducing model size and complexity. In this paper, we introduce an intuitive and interpretable pruning method based on activation statistics, rooted in information theory and statistical analysis. Our approach leverages the statistical properties of neuron activations to identify and remove weights with minimal contributions to neuron outputs. Specifically, we build a distribution of weight contributions across the dataset and utilize its parameters to guide the pruning process. Furthermore, we propose a Pruning-aware Training strategy that incorporates an additional regularization term to enhance the effectiveness of our pruning method. Extensive experiments on multiple datasets and network architectures demonstrate that our method consistently outperforms several baseline and state-of-the-art pruning techniques.
--------------------------------------------------------------------------------------------------------
This research explores various approaches to analyzing health-related social media content using transformer models and large language models. The work focuses on classifying texts about mental health impacts, children's health disorders, and age-related information. These techniques could be valuable for public health monitoring, early detection of health trends, and understanding the impact of environmental factors on mental health through social media analysis.
Authors: Ram Mohan Rao Kadiyala, M. V. P. Chandra Sekhara Rao
Link: https://arxiv.org/abs/2410.15998v1
Date: 2024-10-21
Summary:
Social media is a great source of data for users reporting information and regarding their health and how various things have had an effect on them. This paper presents various approaches using Transformers and Large Language Models and their ensembles, their performance along with advantages and drawbacks for various tasks of SMM4H'24 - Classifying texts on impact of nature and outdoor spaces on the author's mental health (Task 3), Binary classification of tweets reporting their children's health disorders like Asthma, Autism, ADHD and Speech disorder (task 5), Binary classification of users self-reporting their age (task 6).
--------------------------------------------------------------------------------------------------------
Learning to Generate and Evaluate Fact-checking Explanations with Transformers
In our era of rampant misinformation, this research develops transformer-based models that not only fact-check claims but also generate human-readable explanations for their verdicts. The system includes innovative methods for automatically evaluating the quality of these explanations. This work could significantly impact digital literacy efforts, journalism, and social media content moderation by providing transparent, explainable fact-checking capabilities.
Authors: Darius Feher, Abdullah Khered, Hao Zhang, Riza Batista-Navarro, Viktor Schlegel
Link: https://arxiv.org/abs/2410.15669v1
Date: 2024-10-21
Summary:
In an era increasingly dominated by digital platforms, the spread of misinformation poses a significant challenge, highlighting the need for solutions capable of assessing information veracity. Our research contributes to the field of Explainable Artificial Antelligence (XAI) by developing transformer-based fact-checking models that contextualise and justify their decisions by generating human-accessible explanations. Importantly, we also develop models for automatic evaluation of explanations for fact-checking verdicts across different dimensions such as \texttt{(self)-contradiction}, \texttt{hallucination}, \texttt{convincingness} and \texttt{overall quality}. By introducing human-centred evaluation methods and developing specialised datasets, we emphasise the need for aligning Artificial Intelligence (AI)-generated explanations with human judgements. This approach not only advances theoretical knowledge in XAI but also holds practical implications by enhancing the transparency, reliability and users' trust in AI-driven fact-checking systems. Furthermore, the development of our metric learning models is a first step towards potentially increasing efficiency and reducing reliance on extensive manual assessment. Based on experimental results, our best performing generative model \textsc{ROUGE-1} score of 47.77, demonstrating superior performance in generating fact-checking explanations, particularly when provided with high-quality evidence. Additionally, the best performing metric learning model showed a moderately strong correlation with human judgements on objective dimensions such as \texttt{(self)-contradiction and \texttt{hallucination}, achieving a Matthews Correlation Coefficient (MCC) of around 0.7.}
--------------------------------------------------------------------------------------------------------
Security of Language Models for Code: A Systematic Literature Review
This comprehensive review examines the security vulnerabilities of AI models designed for code-related tasks. By systematically analyzing attack and defense strategies, the research provides crucial insights for developing more secure coding assistance tools. This work has significant implications for software development security, particularly as AI-powered coding tools become more prevalent in professional development environments.
Authors: Yuchen Chen, Weisong Sun, Chunrong Fang, Zhenpeng Chen, Yifei Ge, Tingxu Han, Quanjun Zhang, Yang Liu, Zhenyu Chen, Baowen Xu
Link: https://arxiv.org/abs/2410.15631v1
Date: 2024-10-21
Summary:
Language models for code (CodeLMs) have emerged as powerful tools for code-related tasks, outperforming traditional methods and standard machine learning approaches. However, these models are susceptible to security vulnerabilities, drawing increasing research attention from domains such as software engineering, artificial intelligence, and cybersecurity. Despite the growing body of research focused on the security of CodeLMs, a comprehensive survey in this area remains absent. To address this gap, we systematically review 67 relevant papers, organizing them based on attack and defense strategies. Furthermore, we provide an overview of commonly used language models, datasets, and evaluation metrics, and highlight open-source tools and promising directions for future research in securing CodeLMs.
--------------------------------------------------------------------------------------------------------
As cyber threats continue to evolve, the need for effective network intrusion detection becomes increasingly critical. This comprehensive study evaluates 14 different machine learning approaches, including both individual models and ensemble methods, for detecting network intrusions. The research provides valuable insights into which combinations of techniques work best for different scenarios, using two major datasets to validate their findings. This work could significantly impact cybersecurity practices by helping organizations choose the most effective AI-based detection systems for their specific security needs.
Authors: Ismail Bibers, Osvaldo Arreche, Mustafa Abdallah
Link: https://arxiv.org/abs/2410.15597v1
Date: 2024-10-21
Summary:
The escalating frequency of intrusions in networked systems has spurred the exploration of new research avenues in devising artificial intelligence (AI) techniques for intrusion detection systems (IDS). Various AI techniques have been used to automate network intrusion detection tasks, yet each model possesses distinct strengths and weaknesses. Selecting the optimal model for a given dataset can pose a challenge, necessitating the exploration of ensemble methods to enhance generalization and applicability in network intrusion detection. This paper addresses this gap by conducting a comprehensive evaluation of diverse individual models and both simple and advanced ensemble methods for network IDS. We introduce an ensemble learning framework tailored for assessing individual models and ensemble methods in network intrusion detection tasks. Our framework encompasses the loading of input datasets, training of individual models and ensemble methods, and the generation of evaluation metrics. Furthermore, we incorporate all features across individual models and ensemble techniques. The study presents results for our framework, encompassing 14 methods, including various bagging, stacking, blending, and boosting techniques applied to multiple base learners such as decision trees, neural networks, and among others. We evaluate the framework using two distinct network intrusion datasets, RoEduNet-SIMARGL2021 and CICIDS-2017, each possessing unique characteristics. Additionally, we categorize AI models based on their performances on our evaluation metrics and via their confusion matrices. Our assessment demonstrates the efficacy of learning across most setups explored in this study. Furthermore, we contribute to the community by releasing our source codes, providing a foundational ensemble learning framework for network intrusion detection.
--------------------------------------------------------------------------------------------------------
In an innovative approach to cultural preservation, this research develops a RAG-enhanced chatbot specifically designed to maintain and share Taiwanese Hakka cultural heritage. By combining traditional language models with carefully curated cultural databases, the system overcomes the limitations of generic AI in handling culturally-specific content. The chatbot demonstrates significantly improved user engagement and satisfaction, offering a promising model for using AI to preserve and promote endangered cultural traditions while maintaining authenticity and depth of cultural understanding.
Authors: Chen-Chi Chang, Han-Pi Chang, Hung-Shin Lee
Link: https://arxiv.org/abs/2410.15572v1
Date: 2024-10-21
Summary:
In an era where cultural preservation is increasingly intertwined with technological innovation, this study introduces a groundbreaking approach to promoting and safeguarding the rich heritage of Taiwanese Hakka culture through the development of a Retrieval-Augmented Generation (RAG)-enhanced chatbot. Traditional large language models (LLMs), while powerful, often fall short in delivering accurate and contextually rich responses, particularly in culturally specific domains. By integrating external databases with generative AI models, RAG technology bridges this gap, empowering chatbots to not only provide precise answers but also resonate deeply with the cultural nuances that are crucial for authentic interactions. This study delves into the intricate process of augmenting the chatbot's knowledge base with targeted cultural data, specifically curated to reflect the unique aspects of Hakka traditions, language, and practices. Through dynamic information retrieval, the RAG-enhanced chatbot becomes a versatile tool capable of handling complex inquiries that demand an in-depth understanding of Hakka cultural context. This is particularly significant in an age where digital platforms often dilute cultural identities, making the role of culturally aware AI systems more critical than ever. System usability studies conducted as part of our research reveal a marked improvement in both user satisfaction and engagement, highlighting the chatbot's effectiveness in fostering a deeper connection with Hakka culture. The feedback underscores the potential of RAG technology to not only enhance user experience but also to serve as a vital instrument in the broader mission of ethnic mainstreaming and cultural celebration.
--------------------------------------------------------------------------------------------------------
AI, Global Governance, and Digital Sovereignty
This essay explores the complex relationship between AI systems, state power, and digital sovereignty in the modern world. Rather than viewing tech companies as potential replacements for state authority, the research examines how AI creates new dynamics of cooperation and competition between public and private sectors. The analysis offers valuable insights for policymakers, international relations experts, and technology leaders in understanding how AI shapes global governance and the evolving nature of digital sovereignty.
Authors: Swati Srivastava, Justin Bullock
Link: https://arxiv.org/abs/2410.17481v1
Date: 2024-10-23
Summary:
This essay examines how Artificial Intelligence (AI) systems are becoming more integral to international affairs by affecting how global governors exert power and pursue digital sovereignty. We first introduce a taxonomy of multifaceted AI payoffs for governments and corporations related to instrumental, structural, and discursive power in the domains of violence, markets, and rights. We next leverage different institutional and practice perspectives on sovereignty to assess how digital sovereignty is variously implicated in AI-empowered global governance. States both seek sovereign control over AI infrastructures in the institutional approach, while establishing sovereign competence through AI infrastructures in the practice approach. Overall, we present the digital sovereignty stakes of AI as related to entanglements of public and private power. Rather than foreseeing technology companies as replacing states, we argue that AI systems will embed in global governance to create dueling dynamics of public/private cooperation and contestation. We conclude with sketching future directions for IR research on AI and global governance.
--------------------------------------------------------------------------------------------------------
Information for Conversation Generation: Proposals Utilising Knowledge Graphs
This research proposes innovative solutions to common LLM limitations in conversational AI using knowledge graphs. By introducing three novel approaches - dynamic knowledge graph embeddings, emotional value storage, and character consistency through narrative bubbles - the work addresses key challenges in maintaining coherent, emotionally appropriate, and contextually rich conversations. These proposals could significantly improve chatbots and virtual assistants by making their responses more natural, emotionally intelligent, and consistent.
Authors: Alex Clay, Ernesto Jiménez-Ruiz
Link: https://arxiv.org/abs/2410.16196v1
Date: 2024-10-21
Summary:
LLMs are frequently used tools for conversational generation. Without additional information LLMs can generate lower quality responses due to lacking relevant content and hallucinations, as well as the perception of poor emotional capability, and an inability to maintain a consistent character. Knowledge graphs are commonly used forms of external knowledge and may provide solutions to these challenges. This paper introduces three proposals, utilizing knowledge graphs to enhance LLM generation. Firstly, dynamic knowledge graph embeddings and recommendation could allow for the integration of new information and the selection of relevant knowledge for response generation. Secondly, storing entities with emotional values as additional features may provide knowledge that is better emotionally aligned with the user input. Thirdly, integrating character information through narrative bubbles would maintain character consistency, as well as introducing a structure that would readily incorporate new information.
--------------------------------------------------------------------------------------------------------
Training Free Guided Flow Matching with Optimal Control
This paper introduces OC-Flow, a sophisticated framework for controlling generative AI models using optimal control theory. The research extends beyond traditional Euclidean approaches to handle complex geometries, particularly relevant for scientific applications like protein design. The framework shows impressive results in various applications, from image manipulation to molecule generation, offering a theoretically grounded approach to guided generation without requiring additional training, potentially revolutionizing how we control and direct AI generation processes.
Authors: Luran Wang, Chaoran Cheng, Yizhen Liao, Yanru Qu, Ge Liu
Link: https://arxiv.org/abs/2410.18070v1
Date: 2024-10-23
Summary:
Controlled generation with pre-trained Diffusion and Flow Matching models has vast applications. One strategy for guiding ODE-based generative models is through optimizing a target loss $R(x_1)$ while staying close to the prior distribution. Along this line, some recent work showed the effectiveness of guiding flow model by differentiating through its ODE sampling process. Despite the superior performance, the theoretical understanding of this line of methods is still preliminary, leaving space for algorithm improvement. Moreover, existing methods predominately focus on Euclidean data manifold, and there is a compelling need for guided flow methods on complex geometries such as SO(3), which prevails in high-stake scientific applications like protein design. We present OC-Flow, a general and theoretically grounded training-free framework for guided flow matching using optimal control. Building upon advances in optimal control theory, we develop effective and practical algorithms for solving optimal control in guided ODE-based generation and provide a systematic theoretical analysis of the convergence guarantee in both Euclidean and SO(3). We show that existing backprop-through-ODE methods can be interpreted as special cases of Euclidean OC-Flow. OC-Flow achieved superior performance in extensive experiments on text-guided image manipulation, conditional molecule generation, and all-atom peptide design.
--------------------------------------------------------------------------------------------------------