Week Ending 8.18.2024
RESEARCH WATCH: 8.18.2024
Your Turn: Real-World Turning Angle Estimation for Parkinson's Disease Severity Assessment
Parkinson's Disease (PD) affects gait, and changes in turning angles can indicate disease progression. Existing assessments are limited to clinical settings. This paper proposes a deep learning approach to automatically assess turning angles in real-world videos using 3D skeletons from single-camera footage. This method could be used for continuous monitoring of PD patients at home.
Authors: Qiushuo Cheng, Catherine Morgan, Arindam Sikdar, Alessandro Masullo, Alan Whone, Majid Mirmehdi
Link: https://arxiv.org/abs/2408.08182v1
Date: 2024-08-15
Summary:
People with Parkinson's Disease (PD) often experience progressively worsening gait, including changes in how they turn around, as the disease progresses. Existing clinical rating tools are not capable of capturing hour-by-hour variations of PD symptoms, as they are confined to brief assessments within clinic settings. Measuring real-world gait turning angles continuously and passively is a component step towards using gait characteristics as sensitive indicators of disease progression in PD. This paper presents a deep learning-based approach to automatically quantify turning angles by extracting 3D skeletons from videos and calculating the rotation of hip and knee joints. We utilise state-of-the-art human pose estimation models, Fastpose and Strided Transformer, on a total of 1386 turning video clips from 24 subjects (12 people with PD and 12 healthy control volunteers), trimmed from a PD dataset of unscripted free-living videos in a home-like setting (Turn-REMAP). We also curate a turning video dataset, Turn-H3.6M, from the public Human3.6M human pose benchmark with 3D ground truth, to further validate our method. Previous gait research has primarily taken place in clinics or laboratories evaluating scripted gait outcomes, but this work focuses on real-world settings where complexities exist, such as baggy clothing and poor lighting. Due to difficulties in obtaining accurate ground truth data in a free-living setting, we quantise the angle into the nearest bin $45^\circ$ based on the manual labelling of expert clinicians. Our method achieves a turning calculation accuracy of 41.6%, a Mean Absolute Error (MAE) of 34.7{\deg}, and a weighted precision WPrec of 68.3% for Turn-REMAP. This is the first work to explore the use of single monocular camera data to quantify turns by PD patients in a home setting.
--------------------------------------------------------------------------------------------------------
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
Large Language Models (LLMs) can solve software engineering problems, but each has strengths and weaknesses. This paper proposes a framework called DEI (Diversity Empowered Intelligence) that combines multiple LLMs to leverage their unique expertise. This approach significantly improves problem-solving capabilities compared to individual agents.
Authors: Kexun Zhang, Weiran Yao, Zuxin Liu, Yihao Feng, Zhiwei Liu, Rithesh Murthy, Tian Lan, Lei Li, Renze Lou, Jiacheng Xu, Bo Pang, Yingbo Zhou, Shelby Heinecke, Silvio Savarese, Huan Wang, Caiming Xiong
Link: https://arxiv.org/abs/2408.07060v1
Date: 2024-08-13
Summary:
Large language model (LLM) agents have shown great potential in solving real-world software engineering (SWE) problems. The most advanced open-source SWE agent can resolve over 27% of real GitHub issues in SWE-Bench Lite. However, these sophisticated agent frameworks exhibit varying strengths, excelling in certain tasks while underperforming in others. To fully harness the diversity of these agents, we propose DEI (Diversity Empowered Intelligence), a framework that leverages their unique expertise. DEI functions as a meta-module atop existing SWE agent frameworks, managing agent collectives for enhanced problem-solving. Experimental results show that a DEI-guided committee of agents is able to surpass the best individual agent's performance by a large margin. For instance, a group of open-source SWE agents, with a maximum individual resolve rate of 27.3% on SWE-Bench Lite, can achieve a 34.3% resolve rate with DEI, making a 25% improvement and beating most closed-source solutions. Our best-performing group excels with a 55% resolve rate, securing the highest ranking on SWE-Bench Lite. Our findings contribute to the growing body of research on collaborative AI systems and their potential to solve complex software engineering challenges.
--------------------------------------------------------------------------------------------------------
Match Point AI: A Novel AI Framework for Evaluating Data-Driven Tennis Strategies
Many AI applications focus on board or video games. This paper introduces Match Point AI, a framework for simulating tennis matches and evaluating data-driven strategies against real-world data. Early experiments show promise for using this framework to analyze and improve tennis tactics.
Authors: Carlo Nübel, Alexander Dockhorn, Sanaz Mostaghim
Link: https://arxiv.org/abs/2408.05960v1
Date: 2024-08-12
Summary:
Many works in the domain of artificial intelligence in games focus on board or video games due to the ease of reimplementing their mechanics. Decision-making problems in real-world sports share many similarities to such domains. Nevertheless, not many frameworks on sports games exist. In this paper, we present the tennis match simulation environment \textit{Match Point AI}, in which different agents can compete against real-world data-driven bot strategies. Next to presenting the framework, we highlight its capabilities by illustrating, how MCTS can be used in Match Point AI to optimize the shot direction selection problem in tennis. While the framework will be extended in the future, first experiments already reveal that generated shot-by-shot data of simulated tennis matches show realistic characteristics when compared to real-world data. At the same time, reasonable shot placement strategies emerge, which share similarities to the ones found in real-world tennis matches.
--------------------------------------------------------------------------------------------------------
Model counting is a fundamental problem in AI with applications in areas like probabilistic inference and neural network verification. However, it's computationally difficult. This paper assesses the performance of existing model counters on real-world problems from various domains. The findings highlight the need for careful tool selection and potential for combining different counters.
Authors: Arijit Shaw, Kuldeep S. Meel
Link: https://arxiv.org/abs/2408.07059v1
Date: 2024-08-13
Summary:
Model counting is a fundamental problem in automated reasoning with applications in probabilistic inference, network reliability, neural network verification, and more. Although model counting is computationally intractable from a theoretical perspective due to its #P-completeness, the past decade has seen significant progress in developing state-of-the-art model counters to address scalability challenges. In this work, we conduct a rigorous assessment of the scalability of model counters in the wild. To this end, we surveyed 11 application domains and collected an aggregate of 2262 benchmarks from these domains. We then evaluated six state-of-the-art model counters on these instances to assess scalability and runtime performance. Our empirical evaluation demonstrates that the performance of model counters varies significantly across different application domains, underscoring the need for careful selection by the end user. Additionally, we investigated the behavior of different counters with respect to two parameters suggested by the model counting community, finding only a weak correlation. Our analysis highlights the challenges and opportunities for portfolio-based approaches in model counting.
--------------------------------------------------------------------------------------------------------
Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs
Traditional image compression focuses on human perception. This paper proposes a new approach called SDComp that uses Large Multimodal Models (LMMs) to understand the semantic content of images. By identifying and prioritizing important objects, SDComp compresses images for optimal performance in machine learning tasks like object detection.
Authors: Jinming Liu, Yuntao Wei, Junyan Lin, Shengyang Zhao, Heming Sun, Zhibo Chen, Wenjun Zeng, Xin Jin
Link: https://arxiv.org/abs/2408.08575v1
Date: 2024-08-16
Summary:
We present a new image compression paradigm to achieve ``intelligently coding for machine'' by cleverly leveraging the common sense of Large Multimodal Models (LMMs). We are motivated by the evidence that large language/multimodal models are powerful general-purpose semantics predictors for understanding the real world. Different from traditional image compression typically optimized for human eyes, the image coding for machines (ICM) framework we focus on requires the compressed bitstream to more comply with different downstream intelligent analysis tasks. To this end, we employ LMM to \textcolor{red}{tell codec what to compress}: 1) first utilize the powerful semantic understanding capability of LMMs w.r.t object grounding, identification, and importance ranking via prompts, to disentangle image content before compression, 2) and then based on these semantic priors we accordingly encode and transmit objects of the image in order with a structured bitstream. In this way, diverse vision benchmarks including image classification, object detection, instance segmentation, etc., can be well supported with such a semantically structured bitstream. We dub our method ``\textit{SDComp}'' for ``\textit{S}emantically \textit{D}isentangled \textit{Comp}ression'', and compare it with state-of-the-art codecs on a wide variety of different vision tasks. SDComp codec leads to more flexible reconstruction results, promised decoded visual quality, and a more generic/satisfactory intelligent task-supporting ability.
--------------------------------------------------------------------------------------------------------
Crystalline Material Discovery in the Era of Artificial Intelligence
Crystalline materials have diverse properties and applications. Discovering new materials is traditionally time-consuming and expensive. This paper reviews recent advances in using deep learning for crystalline material discovery. These methods can predict properties, aid in material synthesis, and accelerate scientific research.
Authors: Zhenzhong Wang, Haowei Hua, Wanyu Lin, Ming Yang, Kay Chen Tan
Link: https://arxiv.org/abs/2408.08044v1
Date: 2024-08-15
Summary:
Crystalline materials, with their symmetrical and periodic structures, possess a diverse array of properties and have been widely used in various fields, e.g., sustainable development. To discover crystalline materials, traditional experimental and computational approaches are often time-consuming and expensive. In these years, thanks to the explosive amount of crystalline materials data, great interest has been given to data-driven materials discovery. Particularly, recent advancements have exploited the expressive representation ability of deep learning to model the highly complex atomic systems within crystalline materials, opening up new avenues for fast and accurate materials discovery. These works typically focus on four types of tasks, including physicochemical property prediction, crystalline material synthesis, aiding characterization, and force field development; these tasks are essential for scientific research and development in crystalline materials science. Despite the remarkable progress, there is still a lack of systematic research to summarize their correlations, distinctions, and limitations. To fill this gap, we systematically investigated the progress made in deep learning-based material discovery in recent years. We first introduce several data representations of the crystalline materials. Based on the representations, we summarize various fundamental deep learning models and their tailored usages in material discovery tasks. We also point out the remaining challenges and propose several future directions. The main goal of this review is to offer comprehensive and valuable insights and foster progress in the intersection of artificial intelligence and material science.
--------------------------------------------------------------------------------------------------------
The gene function prediction challenge: large language models and knowledge graphs to the rescue
Understanding gene function is crucial in plant science. However, predicting gene function is challenging. This paper reviews current methods and explores how large language models and knowledge graphs can be used to improve gene function prediction accuracy and keep pace with scientific advancements.
Authors: Rohan Shawn Sunil, Shan Chun Lim, Manoj Itharajula, Marek Mutwil
Link: https://arxiv.org/abs/2408.07222v1
Date: 2024-08-13
Summary:
Elucidating gene function is one of the ultimate goals of plant science. Despite this, only ~15% of all genes in the model plant Arabidopsis thaliana have comprehensively experimentally verified functions. While bioinformatical gene function prediction approaches can guide biologists in their experimental efforts, neither the performance of the gene function prediction methods nor the number of experimental characterisation of genes has increased dramatically in recent years. In this review, we will discuss the status quo and the trajectory of gene function elucidation and outline the recent advances in gene function prediction approaches. We will then discuss how recent artificial intelligence advances in large language models and knowledge graphs can be leveraged to accelerate gene function predictions and keep us updated with scientific literature.
--------------------------------------------------------------------------------------------------------
Scaling Up Natural Language Understanding for Multi-Robots Through the Lens of Hierarchy
Long-term planning for robots is complex. This paper proposes a method that uses Large Language Models (LLMs) to translate complex human instructions into a structured format that robots can understand. This allows robots to handle more intricate tasks and collaborate more effectively.
Authors: Shaojun Xu, Xusheng Luo, Yutong Huang, Letian Leng, Ruixuan Liu, Changliu Liu
Link: https://arxiv.org/abs/2408.08188v1
Date: 2024-08-15
Summary:
Long-horizon planning is hindered by challenges such as uncertainty accumulation, computational complexity, delayed rewards and incomplete information. This work proposes an approach to exploit the task hierarchy from human instructions to facilitate multi-robot planning. Using Large Language Models (LLMs), we propose a two-step approach to translate multi-sentence instructions into a structured language, Hierarchical Linear Temporal Logic (LTL), which serves as a formal representation for planning. Initially, LLMs transform the instructions into a hierarchical representation defined as Hierarchical Task Tree, capturing the logical and temporal relations among tasks. Following this, a domain-specific fine-tuning of LLM translates sub-tasks of each task into flat LTL formulas, aggregating them to form hierarchical LTL specifications. These specifications are then leveraged for planning using off-the-shelf planners. Our framework not only bridges the gap between instructions and algorithmic planning but also showcases the potential of LLMs in harnessing hierarchical reasoning to automate multi-robot task planning. Through evaluations in both simulation and real-world experiments involving human participants, we demonstrate that our method can handle more complex instructions compared to existing methods. The results indicate that our approach achieves higher success rates and lower costs in multi-robot task allocation and plan generation. Demos videos are available at https://youtu.be/7WOrDKxIMIs .
--------------------------------------------------------------------------------------------------------
A Multivocal Literature Review on Privacy and Fairness in Federated Learning
Federated Learning allows AI models to be trained on data without sharing the data itself. However, privacy concerns remain. This paper reviews research on integrating privacy and fairness considerations into Federated Learning frameworks. The authors highlight the need for further research on the relationship between these aspects for real-world applications.
Authors: Beatrice Balbierer, Lukas Heinlein, Domenique Zipperling, Niklas Kühl
Link: https://arxiv.org/abs/2408.08666v1
Date: 2024-08-16
Summary:
Federated Learning presents a way to revolutionize AI applications by eliminating the necessity for data sharing. Yet, research has shown that information can still be extracted during training, making additional privacy-preserving measures such as differential privacy imperative. To implement real-world federated learning applications, fairness, ranging from a fair distribution of performance to non-discriminative behaviour, must be considered. Particularly in high-risk applications (e.g. healthcare), avoiding the repetition of past discriminatory errors is paramount. As recent research has demonstrated an inherent tension between privacy and fairness, we conduct a multivocal literature review to examine the current methods to integrate privacy and fairness in federated learning. Our analyses illustrate that the relationship between privacy and fairness has been neglected, posing a critical risk for real-world applications. We highlight the need to explore the relationship between privacy, fairness, and performance, advocating for the creation of integrated federated learning frameworks.
--------------------------------------------------------------------------------------------------------
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Surgical video segmentation is crucial for computer-assisted surgery. Existing methods can be computationally expensive. This paper introduces SurgSAM-2, a model that combines the Segment Anything Model 2 (SAM2) with an Efficient Frame Pruning (EFP) mechanism. EFP reduces memory usage and processing time while maintaining high accuracy. This makes real-time surgical video segmentation more feasible in resource-constrained environments.
Authors: Haofeng Liu, Erli Zhang, Junde Wu, Mingxuan Hong, Yueming Jin
Link: https://arxiv.org/abs/2408.07931v1
Date: 2024-08-15
Summary:
Surgical video segmentation is a critical task in computer-assisted surgery and is vital for enhancing surgical quality and patient outcomes. Recently, the Segment Anything Model 2 (SAM2) framework has shown superior advancements in image and video segmentation. However, SAM2 struggles with efficiency due to the high computational demands of processing high-resolution images and complex and long-range temporal dynamics in surgical videos. To address these challenges, we introduce Surgical SAM 2 (SurgSAM-2), an advanced model to utilize SAM2 with an Efficient Frame Pruning (EFP) mechanism, to facilitate real-time surgical video segmentation. The EFP mechanism dynamically manages the memory bank by selectively retaining only the most informative frames, reducing memory usage and computational cost while maintaining high segmentation accuracy. Our extensive experiments demonstrate that SurgSAM-2 significantly improves both efficiency and segmentation accuracy compared to the vanilla SAM2. Remarkably, SurgSAM-2 achieves a 3$\times$ FPS compared with SAM2, while also delivering state-of-the-art performance after fine-tuning with lower-resolution data. These advancements establish SurgSAM-2 as a leading model for surgical video analysis, making real-time surgical video segmentation in resource-constrained environments a feasible reality.
--------------------------------------------------------------------------------------------------------
Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach
This paper proposes a method to improve the performance of smaller, cheaper LLMs by learning from larger, more powerful ones. It achieves this through a "strategy teaching" approach where the advanced LLM provides strategies to the smaller LLM for various scenarios. This method is interpretable and allows for human oversight, making it potentially safer for real-world applications.
Authors: Tong Wang, K. Sudhir, Dat Hong
Link: https://arxiv.org/abs/2408.07238v1
Date: 2024-08-13
Summary:
Advanced Large language models (LLMs) like GPT-4 or LlaMa 3 provide superior performance in complex human-like interactions. But they are costly, or too large for edge devices such as smartphones and harder to self-host, leading to security and privacy concerns. This paper introduces a novel interpretable knowledge distillation approach to enhance the performance of smaller, more economical LLMs that firms can self-host. We study this problem in the context of building a customer service agent aimed at achieving high customer satisfaction through goal-oriented dialogues. Unlike traditional knowledge distillation, where the "student" model learns directly from the "teacher" model's responses via fine-tuning, our interpretable "strategy" teaching approach involves the teacher providing strategies to improve the student's performance in various scenarios. This method alternates between a "scenario generation" step and a "strategies for improvement" step, creating a customized library of scenarios and optimized strategies for automated prompting. The method requires only black-box access to both student and teacher models; hence it can be used without manipulating model parameters. In our customer service application, the method improves performance, and the learned strategies are transferable to other LLMs and scenarios beyond the training set. The method's interpretabilty helps safeguard against potential harms through human audit.
--------------------------------------------------------------------------------------------------------
KAN You See It? KANs and Sentinel for Effective and Explainable Crop Field Segmentation
This paper explores using a new type of neural network called a Kolmogorov-Arnold Network (KAN) for segmenting crop fields in satellite imagery. KANs offer improved performance compared to traditional models and provide explanations for their predictions, making them more reliable for agricultural applications.
Authors: Daniele Rege Cambrin, Eleonora Poeta, Eliana Pastor, Tania Cerquitelli, Elena Baralis, Paolo Garza
Link: https://arxiv.org/abs/2408.07040v1
Date: 2024-08-13
Summary:
Segmentation of crop fields is essential for enhancing agricultural productivity, monitoring crop health, and promoting sustainable practices. Deep learning models adopted for this task must ensure accurate and reliable predictions to avoid economic losses and environmental impact. The newly proposed Kolmogorov-Arnold networks (KANs) offer promising advancements in the performance of neural networks. This paper analyzes the integration of KAN layers into the U-Net architecture (U-KAN) to segment crop fields using Sentinel-2 and Sentinel-1 satellite images and provides an analysis of the performance and explainability of these networks. Our findings indicate a 2\% improvement in IoU compared to the traditional full-convolutional U-Net model in fewer GFLOPs. Furthermore, gradient-based explanation techniques show that U-KAN predictions are highly plausible and that the network has a very high ability to focus on the boundaries of cultivated areas rather than on the areas themselves. The per-channel relevance analysis also reveals that some channels are irrelevant to this task.
--------------------------------------------------------------------------------------------------------
Understanding Help-Seeking Behavior of Students Using LLMs vs. Web Search for Writing SQL Queries
This study investigates how students use different resources, including web search and Large Language Models (LLMs), to learn how to write SQL queries. The results suggest that while instructor-tuned LLMs can be effective, they may require more interaction than web search.
Authors: Harsh Kumar, Mohi Reza, Jeb Mitchell, Ilya Musabirov, Lisa Zhang, Michael Liut
Link: https://arxiv.org/abs/2408.08401v1
Date: 2024-08-15
Summary:
Growth in the use of large language models (LLMs) in programming education is altering how students write SQL queries. Traditionally, students relied heavily on web search for coding assistance, but this has shifted with the adoption of LLMs like ChatGPT. However, the comparative process and outcomes of using web search versus LLMs for coding help remain underexplored. To address this, we conducted a randomized interview study in a database classroom to compare web search and LLMs, including a publicly available LLM (ChatGPT) and an instructor-tuned LLM, for writing SQL queries. Our findings indicate that using an instructor-tuned LLM required significantly more interactions than both ChatGPT and web search, but resulted in a similar number of edits to the final SQL query. No significant differences were found in the quality of the final SQL queries between conditions, although the LLM conditions directionally showed higher query quality. Furthermore, students using instructor-tuned LLM reported a lower mental demand. These results have implications for learning and productivity in programming education.
--------------------------------------------------------------------------------------------------------
DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model
This research proposes a novel camera design that uses a mask and a diffusion model to capture images without a physical lens. This design reduces camera size and weight while maintaining high-quality image reconstruction, potentially leading to new camera applications.
Authors: Erez Yosef, Raja Giryes
Link: https://arxiv.org/abs/2408.07541v1
Date: 2024-08-14
Summary:
The flat lensless camera design reduces the camera size and weight significantly. In this design, the camera lens is replaced by another optical element that interferes with the incoming light. The image is recovered from the raw sensor measurements using a reconstruction algorithm. Yet, the quality of the reconstructed images is not satisfactory. To mitigate this, we propose utilizing a pre-trained diffusion model with a control network and a learned separable transformation for reconstruction. This allows us to build a prototype flat camera with high-quality imaging, presenting state-of-the-art results in both terms of quality and perceptuality. We demonstrate its ability to leverage also textual descriptions of the captured scene to further enhance reconstruction. Our reconstruction method which leverages the strong capabilities of a pre-trained diffusion model can be used in other imaging systems for improved reconstruction results.
--------------------------------------------------------------------------------------------------------
This paper introduces a new intrusion detection system (IDS) for vehicles that can identify both known and unknown cyberattacks. The system utilizes a combination of deep learning and federated learning to achieve high accuracy and efficiency while preserving data privacy.
Authors: Muzun Althunayyan, Amir Javed, Omer Rana
Link: https://arxiv.org/abs/2408.08433v1
Date: 2024-08-15
Summary:
As connected and autonomous vehicles proliferate, the Controller Area Network (CAN) bus has become the predominant communication standard for in-vehicle networks due to its speed and efficiency. However, the CAN bus lacks basic security measures such as authentication and encryption, making it highly vulnerable to cyberattacks. To ensure in-vehicle security, intrusion detection systems (IDSs) must detect seen attacks and provide a robust defense against new, unseen attacks while remaining lightweight for practical deployment. Previous work has relied solely on the CAN ID feature or has used traditional machine learning (ML) approaches with manual feature extraction. These approaches overlook other exploitable features, making it challenging to adapt to new unseen attack variants and compromising security. This paper introduces a cutting-edge, novel, lightweight, in-vehicle, IDS-leveraging, deep learning (DL) algorithm to address these limitations. The proposed IDS employs a multi-stage approach: an artificial neural network (ANN) in the first stage to detect seen attacks, and a Long Short-Term Memory (LSTM) autoencoder in the second stage to detect new, unseen attacks. To understand and analyze diverse driving behaviors, update the model with the latest attack patterns, and preserve data privacy, we propose a theoretical framework to deploy our IDS in a hierarchical federated learning (H-FL) environment. Experimental results demonstrate that our IDS achieves an F1-score exceeding 0.99 for seen attacks and exceeding 0.95 for novel attacks, with a detection rate of 99.99%. Additionally, the false alarm rate (FAR) is exceptionally low at 0.016%, minimizing false alarms. Despite using DL algorithms known for their effectiveness in identifying sophisticated and zero-day attacks, the IDS remains lightweight, ensuring its feasibility for real-world deployment.
--------------------------------------------------------------------------------------------------------
Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding
This research proposes a method for generating long-form creative text formats, like novels, using Large Language Models (LLMs). The method involves extracting structure and information from existing novels and using it to train the LLM to generate more coherent and engaging stories.
Authors: Huang Lei, Jiaming Guo, Guanhua He, Xishan Zhang, Rui Zhang, Shaohui Peng, Shaoli Liu, Tianshi Chen
Link: https://arxiv.org/abs/2408.08506v1
Date: 2024-08-16
Summary:
Generating long-term texts such as novels using artificial intelligence has always been a challenge. A common approach is to use large language models (LLMs) to construct a hierarchical framework that first plans and then writes. Despite the fact that the generated novels reach a sufficient length, they exhibit poor logical coherence and appeal in their plots and deficiencies in character and event depiction, ultimately compromising the overall narrative quality. In this paper, we propose a method named Extracting Excelsior and Expanding. Ex3 initially extracts structure information from raw novel data. By combining this structure information with the novel data, an instruction-following dataset is meticulously crafted. This dataset is then utilized to fine-tune the LLM, aiming for excelsior generation performance. In the final stage, a tree-like expansion method is deployed to facilitate the generation of arbitrarily long novels. Evaluation against previous methods showcases Ex3's ability to produce higher-quality long-form novels.
--------------------------------------------------------------------------------------------------------
This paper introduces a new AI system for autonomous vehicles that can navigate roundabouts safely and efficiently in various traffic conditions. It utilizes a combination of reinforcement learning and a novel type of neural network (KAN) to make optimal driving decisions.
Authors: Zhihao Lin, Zhen Tian, Qi Zhang, Ziyang Ye, Hanyang Zhuang, Jianglin Lan
Link: https://arxiv.org/abs/2408.08242v1
Date: 2024-08-15
Summary:
Safety and efficiency are crucial for autonomous driving in roundabouts, especially in the context of mixed traffic where autonomous vehicles (AVs) and human-driven vehicles coexist. This paper introduces a learning-based algorithm tailored to foster safe and efficient driving behaviors across varying levels of traffic flows in roundabouts. The proposed algorithm employs a deep Q-learning network to effectively learn safe and efficient driving strategies in complex multi-vehicle roundabouts. Additionally, a KAN (Kolmogorov-Arnold network) enhances the AVs' ability to learn their surroundings robustly and precisely. An action inspector is integrated to replace dangerous actions to avoid collisions when the AV interacts with the environment, and a route planner is proposed to enhance the driving efficiency and safety of the AVs. Moreover, a model predictive control is adopted to ensure stability and precision of the driving actions. The results show that our proposed system consistently achieves safe and efficient driving whilst maintaining a stable training process, as evidenced by the smooth convergence of the reward function and the low variance in the training curves across various traffic flows. Compared to state-of-the-art benchmarks, the proposed algorithm achieves a lower number of collisions and reduced travel time to destination.
--------------------------------------------------------------------------------------------------------
This research proposes a new approach for converting natural language questions into SQL queries, even for complex databases and challenging questions. The system utilizes multiple AI agents working together to break down the problem and generate accurate queries.
Authors: Wenxuan Xie, Gaochen Wu, Bowen Zhou
Link: https://arxiv.org/abs/2408.07930v2
Date: 2024-08-16
Summary:
Recent In-Context Learning based methods have achieved remarkable success in Text-to-SQL task. However, there is still a large gap between the performance of these models and human performance on datasets with complex database schema and difficult questions, such as BIRD. Besides, existing work has neglected to supervise intermediate steps when solving questions iteratively with question decomposition methods, and the schema linking methods used in these works are very rudimentary. To address these issues, we propose MAG-SQL, a multi-agent generative approach with soft schema linking and iterative Sub-SQL refinement. In our framework, an entity-based method with tables' summary is used to select the columns in database, and a novel targets-conditions decomposition method is introduced to decompose those complex questions. Additionally, we build a iterative generating module which includes a Sub-SQL Generator and Sub-SQL Refiner, introducing external oversight for each step of generation. Through a series of ablation studies, the effectiveness of each agent in our framework has been demonstrated. When evaluated on the BIRD benchmark with GPT-4, MAG-SQL achieves an execution accuracy of 61.08%, compared to the baseline accuracy of 46.35% for vanilla GPT-4 and the baseline accuracy of 57.56% for MAC-SQL. Besides, our approach makes similar progress on Spider.
--------------------------------------------------------------------------------------------------------
A theory of understanding for artificial intelligence: composability, catalysts, and learning
This paper proposes a new framework for understanding how AI systems achieve understanding. It focuses on the ability of AI systems to combine different pieces of information (inputs) to produce meaningful outputs.
Authors: Zijian Zhang, Sara Aronowitz, Alán Aspuru-Guzik
Link: https://arxiv.org/abs/2408.08463v1
Date: 2024-08-16
Summary:
Understanding is a crucial yet elusive concept in artificial intelligence (AI). This work proposes a framework for analyzing understanding based on the notion of composability. Given any subject (e.g., a person or an AI), we suggest characterizing its understanding of an object in terms of its ability to process (compose) relevant inputs into satisfactory outputs from the perspective of a verifier. This highly universal framework can readily apply to non-human subjects, such as AIs, non-human animals, and institutions. Further, we propose methods for analyzing the inputs that enhance output quality in compositions, which we call catalysts. We show how the structure of a subject can be revealed by analyzing its components that act as catalysts and argue that a subject's learning ability can be regarded as its ability to compose inputs into its inner catalysts. Finally we examine the importance of learning ability for AIs to attain general intelligence. Our analysis indicates that models capable of generating outputs that can function as their own catalysts, such as language models, establish a foundation for potentially overcoming existing limitations in AI understanding.
--------------------------------------------------------------------------------------------------------
This survey paper provides a comprehensive overview of how transformers and large language models (LLMs) are being used to develop intrusion detection systems (IDS). It discusses the various applications, challenges, and future directions for this field.
Authors: Hamza Kheddar
Link: https://arxiv.org/abs/2408.07583v1
Date: 2024-08-14
Summary:
With significant advancements in Transformers LLMs, NLP has extended its reach into many research fields due to its enhanced capabilities in text generation and user interaction. One field benefiting greatly from these advancements is cybersecurity. In cybersecurity, many parameters that need to be protected and exchanged between senders and receivers are in the form of text and tabular data, making NLP a valuable tool in enhancing the security measures of communication protocols. This survey paper provides a comprehensive analysis of the utilization of Transformers and LLMs in cyber-threat detection systems. The methodology of paper selection and bibliometric analysis is outlined to establish a rigorous framework for evaluating existing research. The fundamentals of Transformers are discussed, including background information on various cyber-attacks and datasets commonly used in this field. The survey explores the application of Transformers in IDSs, focusing on different architectures such as Attention-based models, LLMs like BERT and GPT, CNN/LSTM-Transformer hybrids, emerging approaches like ViTs, among others. Furthermore, it explores the diverse environments and applications where Transformers and LLMs-based IDS have been implemented, including computer networks, IoT devices, critical infrastructure protection, cloud computing, SDN, as well as in autonomous vehicles. The paper also addresses research challenges and future directions in this area, identifying key issues such as interpretability, scalability, and adaptability to evolving threats, and more. Finally, the conclusion summarizes the findings and highlights the significance of Transformers and LLMs in enhancing cyber-threat detection capabilities, while also outlining potential avenues for further research and development.
--------------------------------------------------------------------------------------------------------