Week Ending 4.14.2024
RESEARCH WATCH: 4.14.2024
SemHARQ: Semantic-Aware HARQ for Multi-task Semantic Communications
This paper proposes a novel semantic-aware hybrid automatic repeat request (SemHARQ) framework to improve the robustness and efficiency of transmitting semantic features in task-oriented communications. The key innovations include a multi-task semantic encoder and a feature importance ranking method to prioritize critical features under limited channel resources. This is complemented by a feature distortion evaluation network that detects transmission errors and triggers selective retransmissions. Evaluated in Internet of Vehicles scenarios, SemHARQ achieves significant performance gains over existing approaches, making it a promising solution for reliable and efficient semantic communications.
Authors: Jiangjing Hu, Fengyu Wang, Wenjun Xu, Hui Gao, Ping Zhang
Link: https://arxiv.org/abs/2404.08490v1
Date: 2024-04-12
Summary:
Intelligent task-oriented semantic communications (SemComs) have witnessed great progress with the development of deep learning (DL). In this paper, we propose a semantic-aware hybrid automatic repeat request (SemHARQ) framework for the robust and efficient transmissions of semantic features. First, to improve the robustness and effectiveness of semantic coding, a multi-task semantic encoder is proposed. Meanwhile, a feature importance ranking (FIR) method is investigated to ensure the important features delivery under limited channel resources. Then, to accurately detect the possible transmission errors, a novel feature distortion evaluation (FDE) network is designed to identify the distortion level of each feature, based on which an efficient HARQ method is proposed. Specifically, the corrupted features are retransmitted, where the remaining channel resources are used for incremental transmissions. The system performance is evaluated under different channel conditions in multi-task scenarios in Internet of Vehicles. Extensive experiments show that the proposed framework outperforms state-of-the-art works by more than 20% in rank-1 accuracy for vehicle re-identification, and 10% in vehicle color classification accuracy in the low signal-to-noise ratio regime.
--------------------------------------------------------------------------------------------------------
On the Independence Assumption in Neurosymbolic Learning
This work critically examines the common assumption of conditional independence in neurosymbolic learning systems, which simplifies learning and reasoning but can lead to overconfident predictions and optimization challenges. The authors provide theoretical analysis to show the limitations of this assumption and lay the foundation for developing more expressive neurosymbolic probabilistic models. Their findings have important implications for improving uncertainty quantification and optimization in these hybrid AI systems.
Authors: Emile van Krieken, Pasquale Minervini, Edoardo M. Ponti, Antonio Vergari
Link: https://arxiv.org/abs/2404.08458v1
Date: 2024-04-12
Summary:
State-of-the-art neurosymbolic learning systems use probabilistic reasoning to guide neural networks towards predictions that conform to logical constraints over symbols. Many such systems assume that the probabilities of the considered symbols are conditionally independent given the input to simplify learning and reasoning. We study and criticise this assumption, highlighting how it can hinder optimisation and prevent uncertainty quantification. We prove that loss functions bias conditionally independent neural networks to become overconfident in their predictions. As a result, they are unable to represent uncertainty over multiple valid options. Furthermore, we prove that these loss functions are difficult to optimise: they are non-convex, and their minima are usually highly disconnected. Our theoretical analysis gives the foundation for replacing the conditional independence assumption and designing more expressive neurosymbolic probabilistic models.
--------------------------------------------------------------------------------------------------------
Reducing hallucination in structured outputs via Retrieval-Augmented Generation
Hallucination, or the generation of irrelevant or factually incorrect content, is a well-known limitation of large language models. This paper presents a Retrieval-Augmented Generation (RAG) approach to significantly reduce hallucination in the context of generating structured outputs, such as workflows from natural language requirements. The authors demonstrate the effectiveness of their system in improving generalization and reducing resource requirements, making it a valuable contribution towards more reliable and deployable generative AI applications.
Authors: Patrice Béchard, Orlando Marquez Ayala
Link: https://arxiv.org/abs/2404.08189v1
Date: 2024-04-12
Summary:
A common and fundamental limitation of Generative AI (GenAI) is its propensity to hallucinate. While large language models (LLM) have taken the world by storm, without eliminating or at least reducing hallucinations, real-world GenAI systems may face challenges in user adoption. In the process of deploying an enterprise application that produces workflows based on natural language requirements, we devised a system leveraging Retrieval Augmented Generation (RAG) to greatly improve the quality of the structured output that represents such workflows. Thanks to our implementation of RAG, our proposed system significantly reduces hallucinations in the output and improves the generalization of our LLM in out-of-domain settings. In addition, we show that using a small, well-trained retriever encoder can reduce the size of the accompanying LLM, thereby making deployments of LLM-based systems less resource-intensive.
--------------------------------------------------------------------------------------------------------
CAT: Contrastive Adapter Training for Personalized Image Generation
Contrastive Adapter Training for Personalized Image Generation: Adapters like LoRA have enabled diffusion models to be personalized for image generation at low cost. However, this often leads to a loss of diversity, especially within the same class of objects. The authors introduce Contrastive Adapter Training (CAT), a strategy that preserves the base model's original knowledge during adapter training. CAT improves the quality of personalized image generation, as demonstrated through qualitative and quantitative evaluations. This work highlights the importance of maintaining model robustness and diversity when developing personalized generative AI systems.
Authors: Jae Wan Park, Sang Hyun Park, Jun Young Koh, Junha Lee, Min Song
Link: https://arxiv.org/abs/2404.07554v1
Date: 2024-04-11
Summary:
The emergence of various adapters, including Low-Rank Adaptation (LoRA) applied from the field of natural language processing, has allowed diffusion models to personalize image generation at a low cost. However, due to the various challenges including limited datasets and shortage of regularization and computation resources, adapter training often results in unsatisfactory outcomes, leading to the corruption of the backbone model's prior knowledge. One of the well known phenomena is the loss of diversity in object generation, especially within the same class which leads to generating almost identical objects with minor variations. This poses challenges in generation capabilities. To solve this issue, we present Contrastive Adapter Training (CAT), a simple yet effective strategy to enhance adapter training through the application of CAT loss. Our approach facilitates the preservation of the base model's original knowledge when the model initiates adapters. Furthermore, we introduce the Knowledge Preservation Score (KPS) to evaluate CAT's ability to keep the former information. We qualitatively and quantitatively compare CAT's improvement. Finally, we mention the possibility of CAT in the aspects of multi-concept adapter and optimization.
--------------------------------------------------------------------------------------------------------
Active particle motion in Poiseuille flow through rectangular channels
This paper investigates the dynamics of active particles, such as microswimmers, suspended in fluid flow through rectangular channels. The authors derive a constant of motion and observe a diverse set of particle trajectories, including regular and chaotic motion. Their findings on the transition to chaotic dynamics provide insights that may be useful for understanding and controlling the behavior of natural and artificial microswimmers in microfluidic environments.
Authors: Rahil N. Valani, Brendan Harding, Yvonne M. Stokes
Link: https://arxiv.org/abs/2404.07420v1
Date: 2024-04-11
Summary:
We investigate the dynamics of a point-like active particle suspended in fluid flow through a straight channel. For this particle-fluid system, we derive a constant of motion for a general unidirectional fluid flow, and apply it to an approximation of Poiseuille flow through rectangular cross-sections. For a given rectangular cross-section, this results in a $4$D nonlinear conservative dynamical system with one constant of motion and a dimensionless parameter as the ratio of maximum flow speed to intrinsic active particle speed. We observe a diverse set of active particle trajectories with variations in system parameters and initial conditions which we classify into different types of swinging, trapping, tumbling and wandering motion. Regular (periodic/quasiperiodic) motion as well as chaotic active particle motion are observed for these trajectories and quantified using largest Lyapunov exponents. We explore the transition to chaotic motion using Poincar\'e maps and show ``sticky" chaotic tumbling trajectories that have long transients near a periodic state. Outcomes of this work may have implications for dynamics of natural and artificial microswimmers in experimental microfluidic channels that typically have rectangular cross-sections.
--------------------------------------------------------------------------------------------------------
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Large language models face challenges in efficiently processing long input sequences. This work introduces a novel attention mechanism called Infini-attention, which incorporates a compressive memory into the standard attention to enable transformer-based models to handle infinitely long inputs with bounded memory and computation. The authors demonstrate the effectiveness of their approach on various long-context language tasks, making it a promising direction for scaling up language models without sacrificing efficiency.
Authors: Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal
Link: https://arxiv.org/abs/2404.07143v1
Date: 2024-04-10
Summary:
This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block. We demonstrate the effectiveness of our approach on long-context language modeling benchmarks, 1M sequence length passkey context block retrieval and 500K length book summarization tasks with 1B and 8B LLMs. Our approach introduces minimal bounded memory parameters and enables fast streaming inference for LLMs.
--------------------------------------------------------------------------------------------------------
Joint Active And Passive IRS Aided Wireless Communication: Elements Allocation and Achievable Rate
The authors investigate the problem of optimally allocating reflecting elements between active intelligent reflecting surfaces (AIRS) and passive IRS (PIRS) to maximize the achievable rate in a joint AIRS-PIRS wireless communication system. Their analysis reveals that the PIRS should be allocated more elements than the AIRS to achieve optimal performance, and they provide simulation results to validate their proposed allocation algorithm. This work offers insights for the design and deployment of hybrid AIRS-PIRS systems to enhance wireless communication performance.
Authors: Chaoying Huang, Wen Chen, Qingqing Wu
Link: https://arxiv.org/abs/2404.06880v2
Date: 2024-04-11
Summary:
Equipping reflecting elements at the active intelligent reflecting surface (AIRS) enhances signal amplification capability but meanwhile incurs non-negligible amplification noise, which thus challenges the determination of elements allocation for maximizing achievable rate in multi-cooperative AIRS and passive IRS (PIRS) jointly aided wireless communication system. To tackle this issue, we consider the downlink communication from a single-antenna transmitter (Tx) to a single-antenna receiver (Rx), which aided by a pair of AIRS and PIRS with two different deployment orders. Specifically, we target to determine the number of AIRS/PIRS elements over both transmission orders under given deployment budget for the achievable rate maximization. Our analysis illustrates that the PIRS should be allocated more elements than the AIRS for achieving optimized rate and linear signal-to-noise ratio (SNR) scaling orders are attained in both schemes. Simulation results are provided to evaluate the proposed algorithm and compare the rate performance of the AIRS and PIRS jointly aided wireless system with various benchmark systems.
--------------------------------------------------------------------------------------------------------
Physics of acceptors in GaN: Koopmans tuned HSE hybrid functional calculations and experiment
The authors employ a Koopmans-compliant version of the Heyd-Scuseria-Ernzerhof (HSE) hybrid functional to provide a more accurate theoretical description of various acceptor defects in gallium nitride (GaN). By eliminating self-interaction errors, their approach achieves significantly improved quantitative agreement with experimental photoluminescence data compared to standard HSE calculations. This work advances the understanding of defect physics in GaN, which is crucial for the development of high-performance optoelectronic and power electronic devices based on this important semiconductor material.
Authors: Denis O. Demchenko, Mykhailo Vorobiov, Oleksandr Andrieiev, Mikhail A. Reshchikov, Benjamin MvEwen, Fatemeh Shahedipour-Sandvik
Link: https://arxiv.org/abs/2404.06603v1
Date: 2024-04-09
Summary:
The Heyd-Scuseria-Ernzerhof (HSE) hybrid functional has become a widely used tool for theoretical calculations of point defects in semiconductors. It generally offers a satisfactory qualitative description of defect properties, including the donor/acceptor nature of defects, lowest energy charge states, thermodynamic and optical transition levels, Franck-Condon shifts, photoluminescence (PL) band shapes, and carrier capture cross sections. However, there are noticeable quantitative discrepancies in these properties when compared to experimental results. Some of these discrepancies arise from the presence of self-interaction in various parametrizations of the HSE. Other errors are due to the use of the periodic boundary conditions. In this study, we demonstrate that the error corrections scheme based on extrapolation to the dilute limit effectively eliminates the errors due to artificial electrostatic interactions of periodic images and interactions due to the defect state delocalization. This yields parametrizations of HSE that satisfy the generalized Koopmans' condition, essentially eliminating self-interaction from defect state orbitals. We apply this HSE Koopmans tuning individually to a range of cation site acceptors in GaN (Be\textsubscript{Ga}, Mg\textsubscript{Ga}, Zn\textsubscript{Ga}, Ca\textsubscript{Ga}, Cd\textsubscript{Ga}, and Hg\textsubscript{Ga}) and compare the HSE results with experimental data from PL spectra. The Koopmans-compliant HSE calculations show a significantly improved quantitative agreement with the experiment.
--------------------------------------------------------------------------------------------------------
Less is More for Improving Automatic Evaluation of Factual Consistency
Assessing the factual consistency of generated text is a crucial task for developing reliable natural language generation models. The authors find that utilizing a smaller, cleaner subset of training data can actually improve the performance of the state-of-the-art AlignScore model for factual consistency evaluation. Their proposed LIM-RA model demonstrates superior performance across various benchmarks, establishing a new state-of-the-art. This work highlights the importance of carefully curating training data and the potential benefits of taking a "less is more" approach in improving automatic evaluation of text generation.
Authors: Tong Wang, Ninad Kulkarni, Yanjun Qi
Link: https://arxiv.org/abs/2404.06579v1
Date: 2024-04-09
Summary:
Assessing the factual consistency of automatically generated texts in relation to source context is crucial for developing reliable natural language generation applications. Recent literature proposes AlignScore which uses a unified alignment model to evaluate factual consistency and substantially outperforms previous methods across many benchmark tasks. In this paper, we take a closer look of datasets used in AlignScore and uncover an unexpected finding: utilizing a smaller number of data points can actually improve performance. We process the original AlignScore training dataset to remove noise, augment with robustness-enhanced samples, and utilize a subset comprising 10\% of the data to train an improved factual consistency evaluation model, we call LIM-RA (Less Is More for Robust AlignScore). LIM-RA demonstrates superior performance, consistently outperforming AlignScore and other strong baselines like ChatGPT across four benchmarks (two utilizing traditional natural language generation datasets and two focused on large language model outputs). Our experiments show that LIM-RA achieves the highest score on 24 of the 33 test datasets, while staying competitive on the rest, establishing the new state-of-the-art benchmarks.
--------------------------------------------------------------------------------------------------------
ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos
This paper presents a novel semi-supervised approach, ActNetFormer, for action recognition in videos. By combining the strengths of 3D convolutional neural networks and video transformers, ActNetFormer can effectively capture both local and global contextual information in video data. The authors introduce a cross-architecture pseudo-labeling and contrastive learning framework to leverage both labeled and unlabeled data, enabling the model to achieve state-of-the-art performance in action recognition tasks with only a fraction of labeled data. This work advances the field of video understanding and has applications in diverse domains such as surveillance, robotics, and sports analytics.
Authors: Sharana Dharshikgan Suresh Dass, Hrishav Bakul Barua, Ganesh Krishnasamy, Raveendran Paramesran, Raphael C. -W. Phan
Link: https://arxiv.org/abs/2404.06243v1
Date: 2024-04-09
Summary:
Human action or activity recognition in videos is a fundamental task in computer vision with applications in surveillance and monitoring, self-driving cars, sports analytics, human-robot interaction and many more. Traditional supervised methods require large annotated datasets for training, which are expensive and time-consuming to acquire. This work proposes a novel approach using Cross-Architecture Pseudo-Labeling with contrastive learning for semi-supervised action recognition. Our framework leverages both labeled and unlabelled data to robustly learn action representations in videos, combining pseudo-labeling with contrastive learning for effective learning from both types of samples. We introduce a novel cross-architecture approach where 3D Convolutional Neural Networks (3D CNNs) and video transformers (VIT) are utilised to capture different aspects of action representations; hence we call it ActNetFormer. The 3D CNNs excel at capturing spatial features and local dependencies in the temporal domain, while VIT excels at capturing long-range dependencies across frames. By integrating these complementary architectures within the ActNetFormer framework, our approach can effectively capture both local and global contextual information of an action. This comprehensive representation learning enables the model to achieve better performance in semi-supervised action recognition tasks by leveraging the strengths of each of these architectures. Experimental results on standard action recognition datasets demonstrate that our approach performs better than the existing methods, achieving state-of-the-art performance with only a fraction of labeled data. The official website of this work is available at: https://github.com/rana2149/ActNetFormer.
--------------------------------------------------------------------------------------------------------
Band-Attention Modulated RetNet for Face Forgery Detection
The authors propose a lightweight network, BAR-Net, for efficient and effective face forgery detection using transformer-based architectures. BAR-Net introduces a novel Band-Attention Modulation mechanism that treats the input spectrogram as a series of frequency bands with learnable weights, allowing the model to effectively capture global context while maintaining spatial priors. Experiments show that BAR-Net outperforms state-of-the-art methods in face forgery detection, making it a promising solution for this important computer vision task with applications in security and content authentication.
Authors: Zhida Zhang, Jie Cao, Wenkui Yang, Qihang Fan, Kai Zhou, Ran He
Link: https://arxiv.org/abs/2404.06022v1
Date: 2024-04-09
Summary:
The transformer networks are extensively utilized in face forgery detection due to their scalability across large datasets.Despite their success, transformers face challenges in balancing the capture of global context, which is crucial for unveiling forgery clues, with computational complexity.To mitigate this issue, we introduce Band-Attention modulated RetNet (BAR-Net), a lightweight network designed to efficiently process extensive visual contexts while avoiding catastrophic forgetting.Our approach empowers the target token to perceive global information by assigning differential attention levels to tokens at varying distances. We implement self-attention along both spatial axes, thereby maintaining spatial priors and easing the computational burden.Moreover, we present the adaptive frequency Band-Attention Modulation mechanism, which treats the entire Discrete Cosine Transform spectrogram as a series of frequency bands with learnable weights.Together, BAR-Net achieves favorable performance on several face forgery datasets, outperforming current state-of-the-art methods.
--------------------------------------------------------------------------------------------------------
Symbolic regression, which aims to find analytical expressions that best fit the data, can provide interpretable models. However, evaluating the quality of explanations provided by these models remains a challenge. This work introduces a benchmark scheme to assess the performance of various explanation methods for symbolic regression and other regression models. The authors' findings provide insights into the strengths and limitations of different explanatory approaches, offering guidance for practitioners seeking to develop interpretable machine learning solutions, particularly in domains such as scientific discovery and decision support.
Authors: Guilherme Seidyo Imai Aldeia, Fabricio Olivetti de Franca
Link: https://arxiv.org/abs/2404.05908v1
Date: 2024-04-08
Summary:
In some situations, the interpretability of the machine learning models plays a role as important as the model accuracy. Interpretability comes from the need to trust the prediction model, verify some of its properties, or even enforce them to improve fairness. Many model-agnostic explanatory methods exists to provide explanations for black-box models. In the regression task, the practitioner can use white-boxes or gray-boxes models to achieve more interpretable results, which is the case of symbolic regression. When using an explanatory method, and since interpretability lacks a rigorous definition, there is a need to evaluate and compare the quality and different explainers. This paper proposes a benchmark scheme to evaluate explanatory methods to explain regression models, mainly symbolic regression models. Experiments were performed using 100 physics equations with different interpretable and non-interpretable regression methods and popular explanation methods, evaluating the performance of the explainers performance with several explanation measures. In addition, we further analyzed four benchmarks from the GP community. The results have shown that Symbolic Regression models can be an interesting alternative to white-box and black-box models that is capable of returning accurate models with appropriate explanations. Regarding the explainers, we observed that Partial Effects and SHAP were the most robust explanation models, with Integrated Gradients being unstable only with tree-based models. This benchmark is publicly available for further experiments.
--------------------------------------------------------------------------------------------------------
Teaching Higher-Order Logic Using Isabelle
The authors present a formalization of higher-order logic in the Isabelle proof assistant, designed to be as small and readable as possible. This development serves as an accessible introduction to higher-order logic and proof assistants, targeting an audience interested in learning these foundational concepts without the complexity of the full Isabelle/HOL framework. The work showcases a sample proof and discusses the authors' experience in teaching higher-order logic using this formalization, making it a valuable resource for educators and students in the field of formal methods and logic.
Authors: Simon Tobias Lund, Jørgen Villadsen
Link: https://arxiv.org/abs/2404.05458v1
Date: 2024-04-08
Summary:
We present a formalization of higher-order logic in the Isabelle proof assistant, building directly on the foundational framework Isabelle/Pure and developed to be as small and readable as possible. It should therefore serve as a good introduction for someone looking into learning about higher-order logic and proof assistants, without having to study the much more complex Isabelle/HOL with heavier automation. To showcase our development and approach we explain a sample proof, describe the axioms and rules of our higher-order logic, and discuss our experience with teaching the subject in a classroom setting.
--------------------------------------------------------------------------------------------------------
Large language models (LLMs) have shown remarkable capabilities, but generating accurate step-by-step reasoning remains a challenge. This work addresses this gap by introducing two key contributions: (1) AutoRace, a method for automatically evaluating the reasoning chains generated by LLMs, and (2) LLM Reasoners, a library for standardized implementation of diverse reasoning algorithms. With these tools, the authors conduct a comprehensive analysis of different reasoning approaches, revealing insights about the factors that contribute to effective reasoning in LLMs. This work advances the state of the art in interpretable and robust reasoning with large language models.
Authors: Shibo Hao, Yi Gu, Haotian Luo, Tianyang Liu, Xiyan Shao, Xinyuan Wang, Shuhua Xie, Haodi Ma, Adithya Samavedhi, Qiyue Gao, Zhen Wang, Zhiting Hu
Link: https://arxiv.org/abs/2404.05221v1
Date: 2024-04-08
Summary:
Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs) to address complex problems and enhance robustness and interpretability. Despite the flux of research on developing advanced reasoning approaches, systematically analyzing the diverse LLMs and reasoning strategies in generating reasoning chains remains a significant challenge. The difficulties stem from the lack of two key elements: (1) an automatic method for evaluating the generated reasoning chains on different tasks, and (2) a unified formalism and implementation of the diverse reasoning approaches for systematic comparison. This paper aims to close the gap: (1) We introduce AutoRace for fully automated reasoning chain evaluation. Existing metrics rely on expensive human annotations or pre-defined LLM prompts not adaptable to different tasks. In contrast, AutoRace automatically creates detailed evaluation criteria tailored for each task, and uses GPT-4 for accurate evaluation following the criteria. (2) We develop LLM Reasoners, a library for standardized modular implementation of existing and new reasoning algorithms, under a unified formulation of the search, reward, and world model components. With the new evaluation and library, (3) we conduct extensive study of different reasoning approaches (e.g., CoT, ToT, RAP). The analysis reveals interesting findings about different factors contributing to reasoning, including the reward-guidance, breadth-vs-depth in search, world model, and prompt formats, etc.
--------------------------------------------------------------------------------------------------------
WildGraph: Realistic Graph-based Trajectory Generation for Wildlife
Generating realistic trajectories of wildlife movement is crucial for various applications, but collecting real-world data is often challenging due to ethical and technical constraints. The authors propose WildGraph, a hierarchical approach that learns the global movement characteristics from a small set of real samples and recursively refines the trajectory generation in localized regions. Experiments on wildlife migration datasets demonstrate that WildGraph outperforms existing methods in generating realistic long-horizon trajectories, making it a valuable tool for wildlife movement studies and conservation efforts.
Authors: Ali Al-Lawati, Elsayed Eshra, Prasenjit Mitra
Link: https://arxiv.org/abs/2404.08068v1
Date: 2024-04-11
Summary:
Trajectory generation is an important task in movement studies; it circumvents the privacy, ethical, and technical challenges of collecting real trajectories from the target population. In particular, real trajectories in the wildlife domain are scarce as a result of ethical and environmental constraints of the collection process. In this paper, we consider the problem of generating long-horizon trajectories, akin to wildlife migration, based on a small set of real samples. We propose a hierarchical approach to learn the global movement characteristics of the real dataset and recursively refine localized regions. Our solution, WildGraph, discretizes the geographic path into a prototype network of H3 (https://www.uber.com/blog/h3/) regions and leverages a recurrent variational auto-encoder to probabilistically generate paths over the regions, based on occupancy. WildGraph successfully generates realistic months-long trajectories using a sample size as small as 60. Experiments performed on two wildlife migration datasets demonstrate that our proposed method improves the generalization of the generated trajectories in comparison to existing work while achieving superior or comparable performance in several benchmark metrics. Our code is published on the following repository: \url{https://github.com/aliwister/wildgraph}.
--------------------------------------------------------------------------------------------------------
Use of a Structured Knowledge Base Enhances Metadata Curation by Large Language Models
Accurate metadata are essential for dataset findability, accessibility, interoperability, and reusability. This work investigates the potential of large language models, specifically GPT-4, to improve metadata adherence to standards. The authors show that while LLMs alone may not be able to correct legacy metadata, integrating them with a structured knowledge base can significantly enhance their performance in automated metadata curation. This finding has important implications for improving the quality and usability of scientific data through the effective deployment of large language models.
Authors: Sowmya S. Sundaram, Benjamin Solomon, Avani Khatri, Anisha Laumas, Purvesh Khatri, Mark A. Musen
Link: https://arxiv.org/abs/2404.05893v1
Date: 2024-04-08
Summary:
Metadata play a crucial role in ensuring the findability, accessibility, interoperability, and reusability of datasets. This paper investigates the potential of large language models (LLMs), specifically GPT-4, to improve adherence to metadata standards. We conducted experiments on 200 random data records describing human samples relating to lung cancer from the NCBI BioSample repository, evaluating GPT-4's ability to suggest edits for adherence to metadata standards. We computed the adherence accuracy of field name-field value pairs through a peer review process, and we observed a marginal average improvement in adherence to the standard data dictionary from 79% to 80% (p<0.01). We then prompted GPT-4 with domain information in the form of the textual descriptions of CEDAR templates and recorded a significant improvement to 97% from 79% (p<0.01). These results indicate that, while LLMs may not be able to correct legacy metadata to ensure satisfactory adherence to standards when unaided, they do show promise for use in automated metadata curation when integrated with a structured knowledge base.
--------------------------------------------------------------------------------------------------------
This paper provides a primer on non-discrimination law in Europe, addressing key characteristics and differences compared to other regions like the US. The target audience is computer scientists and AI practitioners who need to understand the legal framework around non-discrimination, which is crucial for developing fair and ethical AI systems. The paper explains EU-wide non-discrimination rules, including direct and indirect discrimination, and highlights relevant regulations like the GDPR and EU AI Act. By clarifying the European legal landscape, this work aims to help researchers and developers navigate the challenges of building non-discriminatory AI applications.
Authors: Frederik Zuiderveen Borgesius, Nina Baranowska, Philipp Hacker, Alessandro Fabris
Link: https://arxiv.org/abs/2404.08519v1
Date: 2024-04-12
Summary:
This brief paper provides an introduction to non-discrimination law in Europe. It answers the questions: What are the key characteristics of non-discrimination law in Europe, and how do the different statutes relate to one another? Our main target group is computer scientists and users of artificial intelligence (AI) interested in an introduction to non-discrimination law in Europe. Notably, non-discrimination law in Europe differs significantly from non-discrimination law in other countries, such as the US. We aim to describe the law in such a way that non-lawyers and non-European lawyers can easily grasp its contents and challenges. The paper shows that the human right to non-discrimination, to some extent, protects individuals against private actors, such as companies. We introduce the EU-wide non-discrimination rules which are included in a number of EU directives, and also explain the difference between direct and indirect discrimination. Significantly, an organization can be fined for indirect discrimination even if the company, or its AI system, discriminated by accident. The last section broadens the horizon to include bias-relevant law and cases from the GDPR, the EU AI Act, and related statutes. Finally, we give reading tips for those inclined to learn more about non-discrimination law in Europe.
--------------------------------------------------------------------------------------------------------
Reframing the Mind-Body Picture: Applying Formal Systems to the Relationship of Mind and Matter
This paper explores the relationship between mind and matter using a formal framework from set theory and category theory. The authors aim to provide a simple yet informative lens through which to view theories of the mind-body connection. While the specific technical approach is not detailed in the synopsis, the work suggests a novel perspective on this longstanding philosophical and scientific question, with potential implications for fields ranging from cognitive science to metaphysics.
Authors: Ryan Williams
Link: https://arxiv.org/abs/2404.07719v1
Date: 2024-04-11
Summary:
This paper aims to show that a simple framework, utilizing basic formalisms from set theory and category theory, can clarify and inform our theories of the relation between mind and matter.
--------------------------------------------------------------------------------------------------------
Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation
Medical image segmentation is a critical task for clinical applications, but it can be challenging when only limited labeled data is available. This work presents SaLIP, a zero-shot segmentation framework that combines the strengths of two powerful vision models, Segment Anything Model (SAM) and CLIP. By leveraging SAM's segmentation capabilities and CLIP's recognition skills, SaLIP can effectively perform organ segmentation without requiring extensive task-specific training or prompting. The authors demonstrate significant performance gains over traditional SAM on diverse medical imaging tasks, making SaLIP a promising approach for practical zero-shot segmentation in healthcare.
Authors: Sidra Aleem, Fangyijie Wang, Mayug Maniparambil, Eric Arazo, Julia Dietlmeier, Kathleen Curran, Noel E. O'Connor, Suzanne Little
Link: https://arxiv.org/abs/2404.06362v1
Date: 2024-04-09
Summary:
The Segment Anything Model (SAM) and CLIP are remarkable vision foundation models (VFMs). SAM, a prompt driven segmentation model, excels in segmentation tasks across diverse domains, while CLIP is renowned for its zero shot recognition capabilities. However, their unified potential has not yet been explored in medical image segmentation. To adapt SAM to medical imaging, existing methods primarily rely on tuning strategies that require extensive data or prior prompts tailored to the specific task, making it particularly challenging when only a limited number of data samples are available. This work presents an in depth exploration of integrating SAM and CLIP into a unified framework for medical image segmentation. Specifically, we propose a simple unified framework, SaLIP, for organ segmentation. Initially, SAM is used for part based segmentation within the image, followed by CLIP to retrieve the mask corresponding to the region of interest (ROI) from the pool of SAM generated masks. Finally, SAM is prompted by the retrieved ROI to segment a specific organ. Thus, SaLIP is training and fine tuning free and does not rely on domain expertise or labeled data for prompt engineering. Our method shows substantial enhancements in zero shot segmentation, showcasing notable improvements in DICE scores across diverse segmentation tasks like brain (63.46%), lung (50.11%), and fetal head (30.82%), when compared to un prompted SAM. Code and text prompts will be available online.
--------------------------------------------------------------------------------------------------------
Diffusing in Someone Else's Shoes: Robotic Perspective Taking with Diffusion
Enabling robots to learn from human demonstrations is an important capability, but the mental translation from third-person to first-person perspective is a challenging task. This work introduces a novel diffusion model that can directly learn this perspective translation, allowing robots to benefit from easily-produced third-person demonstrations while still being able to imitate the first-person perspective. By bridging this gap, the proposed approach can enhance robot learning from human teachers, with potential applications in areas like robot teleoperation, imitation learning, and human-robot interaction.
Authors: Josua Spisak, Matthias Kerzel, Stefan Wermter
Link: https://arxiv.org/abs/2404.07735v1
Date: 2024-04-11
Summary:
Humanoid robots can benefit from their similarity to the human shape by learning from humans. When humans teach other humans how to perform actions, they often demonstrate the actions and the learning human can try to imitate the demonstration. Being able to mentally transfer from a demonstration seen from a third-person perspective to how it should look from a first-person perspective is fundamental for this ability in humans. As this is a challenging task, it is often simplified for robots by creating a demonstration in the first-person perspective. Creating these demonstrations requires more effort but allows for an easier imitation. We introduce a novel diffusion model aimed at enabling the robot to directly learn from the third-person demonstrations. Our model is capable of learning and generating the first-person perspective from the third-person perspective by translating the size and rotations of objects and the environment between two perspectives. This allows us to utilise the benefits of easy-to-produce third-person demonstrations and easy-to-imitate first-person demonstrations. The model can either represent the first-person perspective in an RGB image or calculate the joint values. Our approach significantly outperforms other image-to-image models in this task.
--------------------------------------------------------------------------------------------------------