Week Ending 10.20.2024

RESEARCH WATCH: 10.20.2024

This paper explores the connection between Occam's razor - the principle that simpler explanations tend to be better - and in-context learning in large language models. The authors show that the next-token prediction loss used to train in-context learners is equivalent to a data compression technique called prequential coding. This insight provides a theoretical foundation for understanding in-context learning and suggests ways to improve current methods. By linking in-context learning to data compression and model simplicity, this work could lead to more efficient and effective language models, with potential applications in natural language processing, machine translation, and AI-assisted writing.

Authors: Eric Elmoznino, Tom Marty, Tejas Kasetty, Leo Gagnon, Sarthak Mittal, Mahan Fathi, Dhanya Sridhar, Guillaume Lajoie

Link: https://arxiv.org/abs/2410.14086v1

Date: 2024-10-17

Summary:

The goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees for generalization without further assumptions, in practice we observe that simple models which explain the training data generalize best: a principle called Occam's razor. Despite the need for simple models, most current approaches in machine learning only minimize the training error, and at best indirectly promote simplicity through regularization or architecture design. Here, we draw a connection between Occam's razor and in-context learning: an emergent ability of certain sequence models like Transformers to learn at inference time from past observations in a sequence. In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. We make our code available at https://github.com/3rdCore/PrequentialCode.

--------------------------------------------------------------------------------------------------------

FaceSaliencyAug: Mitigating Geographic, Gender and Stereotypical Biases via Saliency-Based Data Augmentation

This study addresses the critical issue of bias in computer vision models, particularly focusing on geographical, gender, and stereotypical biases. The researchers propose a novel approach called FaceSaliencyAug, which uses saliency-based data augmentation to mitigate these biases in Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). By applying masks to salient regions of face images and then restoring the original image, this method enhances data diversity and improves model performance. The approach shows promise in reducing gender bias across various occupational datasets. This research could lead to fairer and more inclusive computer vision systems, with applications in facial recognition, hiring processes, and social media content moderation.

Authors: Teerath Kumar, Alessandra Mileo, Malika Bendechache

Link: https://arxiv.org/abs/2410.14070v1

Date: 2024-10-17

Summary:

Geographical, gender and stereotypical biases in computer vision models pose significant challenges to their performance and fairness. {In this study, we present an approach named FaceSaliencyAug aimed at addressing the gender bias in} {Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). Leveraging the salient regions} { of faces detected by saliency, the propose approach mitigates geographical and stereotypical biases } {in the datasets. FaceSaliencyAug} randomly selects masks from a predefined search space and applies them to the salient region of face images, subsequently restoring the original image with masked salient region. {The proposed} augmentation strategy enhances data diversity, thereby improving model performance and debiasing effects. We quantify dataset diversity using Image Similarity Score (ISS) across five datasets, including Flickr Faces HQ (FFHQ), WIKI, IMDB, Labelled Faces in the Wild (LFW), UTK Faces, and Diverse Dataset. The proposed approach demonstrates superior diversity metrics, as evaluated by ISS-intra and ISS-inter algorithms. Furthermore, we evaluate the effectiveness of our approach in mitigating gender bias on CEO, Engineer, Nurse, and School Teacher datasets. We use the Image-Image Association Score (IIAS) to measure gender bias in these occupations. Our experiments reveal a reduction in gender bias for both CNNs and ViTs, indicating the efficacy of our method in promoting fairness and inclusivity in computer vision models.

--------------------------------------------------------------------------------------------------------

Artificial Kuramoto Oscillatory Neurons

This paper introduces Artificial Kuramoto Oscillatory Neurons (AKOrN) as a dynamic alternative to traditional threshold units in neural networks. Building on ideas from neuroscience and AI about the importance of binding between neurons and dynamic representations, AKOrN uses synchronization dynamics to bind neurons together. The authors demonstrate performance improvements across various tasks, including unsupervised object discovery, adversarial robustness, calibrated uncertainty quantification, and reasoning. This approach could revolutionize neural network design, potentially leading to more robust and versatile AI systems with applications in computer vision, natural language processing, and decision-making algorithms.

Authors: Takeru Miyato, Sindy Löwe, Andreas Geiger, Max Welling

Link: https://arxiv.org/abs/2410.13821v1

Date: 2024-10-17

Summary:

It has long been known in both neuroscience and AI that ``binding'' between neurons leads to a form of competitive learning where representations are compressed in order to represent more abstract concepts in deeper layers of the network. More recently, it was also hypothesized that dynamic (spatiotemporal) representations play an important role in both neuroscience and AI. Building on these ideas, we introduce Artificial Kuramoto Oscillatory Neurons (AKOrN) as a dynamical alternative to threshold units, which can be combined with arbitrary connectivity designs such as fully connected, convolutional, or attentive mechanisms. Our generalized Kuramoto updates bind neurons together through their synchronization dynamics. We show that this idea provides performance improvements across a wide spectrum of tasks such as unsupervised object discovery, adversarial robustness, calibrated uncertainty quantification, and reasoning. We believe that these empirical results show the importance of rethinking our assumptions at the most basic neuronal level of neural representation, and in particular show the importance of dynamical representations.

--------------------------------------------------------------------------------------------------------

RGB to Hyperspectral: Spectral Reconstruction for Enhanced Surgical Imaging

This study investigates the reconstruction of hyperspectral signatures from RGB data to enhance surgical imaging. Using datasets from porcine surgery and neurosurgery, the researchers evaluate various architectures based on convolutional neural networks (CNNs) and transformer models. The findings show that transformer models perform better at predicting accurate spectral profiles by effectively integrating spatial information. This research opens up new possibilities for real-time surgical imaging, potentially improving surgical decision-making and outcomes. Applications could include more precise tumor detection, tissue differentiation, and intraoperative guidance systems.

Authors: Tobias Czempiel, Alfie Roddan, Maria Leiloglou, Zepeng Hu, Kevin O'Neill, Giulio Anichini, Danail Stoyanov, Daniel Elson

Link: https://arxiv.org/abs/2410.13570v1

Date: 2024-10-17

Summary:

This study investigates the reconstruction of hyperspectral signatures from RGB data to enhance surgical imaging, utilizing the publicly available HeiPorSPECTRAL dataset from porcine surgery and an in-house neurosurgery dataset. Various architectures based on convolutional neural networks (CNNs) and transformer models are evaluated using comprehensive metrics. Transformer models exhibit superior performance in terms of RMSE, SAM, PSNR and SSIM by effectively integrating spatial information to predict accurate spectral profiles, encompassing both visible and extended spectral ranges. Qualitative assessments demonstrate the capability to predict spectral profiles critical for informed surgical decision-making during procedures. Challenges associated with capturing both the visible and extended hyperspectral ranges are highlighted using the MAE, emphasizing the complexities involved. The findings open up the new research direction of hyperspectral reconstruction for surgical applications and clinical use cases in real-time surgical environments.

--------------------------------------------------------------------------------------------------------

Unlocking Legal Knowledge: A Multilingual Dataset for Judicial Summarization in Switzerland

This paper introduces the Swiss Leading Decision Summarization (SLDS) dataset, featuring 18,000 court rulings from the Swiss Federal Supreme Court in German, French, and Italian, along with German headnotes. The researchers fine-tune and evaluate various language models for multilingual legal summarization. This resource could significantly enhance legal research efficiency by automating headnote creation for hundreds of thousands of decisions. Potential applications include improved legal information retrieval systems, cross-lingual legal research tools, and AI-assisted legal analysis for lawyers and researchers working with Swiss law or multilingual legal contexts.

Authors: Luca Rolshoven, Vishvaksenan Rasiah, Srinanda Brügger Bose, Matthias Stürmer, Joel Niklaus

Link: https://arxiv.org/abs/2410.13456v1

Date: 2024-10-17

Summary:

Legal research is a time-consuming task that most lawyers face on a daily basis. A large part of legal research entails looking up relevant caselaw and bringing it in relation to the case at hand. Lawyers heavily rely on summaries (also called headnotes) to find the right cases quickly. However, not all decisions are annotated with headnotes and writing them is time-consuming. Automated headnote creation has the potential to make hundreds of thousands of decisions more accessible for legal research in Switzerland alone. To kickstart this, we introduce the Swiss Leading Decision Summarization ( SLDS) dataset, a novel cross-lingual resource featuring 18K court rulings from the Swiss Federal Supreme Court (SFSC), in German, French, and Italian, along with German headnotes. We fine-tune and evaluate three mT5 variants, along with proprietary models. Our analysis highlights that while proprietary models perform well in zero-shot and one-shot settings, fine-tuned smaller models still provide a strong competitive edge. We publicly release the dataset to facilitate further research in multilingual legal summarization and the development of assistive technologies for legal professionals

--------------------------------------------------------------------------------------------------------

LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild

This study introduces an LLM Honeypot system for monitoring autonomous AI hacking agents. The researchers deployed a customized SSH honeypot and used prompt injections with temporal analysis to identify LLM-based agents among attackers. Over a few weeks, they collected 800,000 hacking attempts and identified 6 potential AI agents. This research aims to improve awareness of AI hacking agents and enhance preparedness for their risks. Potential applications include developing more robust cybersecurity measures, creating AI-resistant systems, and informing policy decisions regarding the use and regulation of AI in cybersecurity contexts.

Authors: Reworr, Dmitrii Volkov

Link: https://arxiv.org/abs/2410.13919v1

Date: 2024-10-17

Summary:

We introduce the LLM Honeypot, a system for monitoring autonomous AI hacking agents. We deployed a customized SSH honeypot and applied prompt injections with temporal analysis to identify LLM-based agents among attackers. Over a trial run of a few weeks in a public environment, we collected 800,000 hacking attempts and 6 potential AI agents, which we plan to analyze in depth in future work. Our objectives aim to improve awareness of AI hacking agents and enhance preparedness for their risks.

--------------------------------------------------------------------------------------------------------

Trust but Verify: Programmatic VLM Evaluation in the Wild

This paper introduces Programmatic VLM Evaluation (PROVE), a new benchmarking paradigm for evaluating Vision-Language Model (VLM) responses to open-ended queries. The researchers use a large language model to generate diverse question-answer pairs and verification programs based on scene-graph representations of images. This approach allows for both helpfulness and truthfulness evaluation of VLM responses. PROVE could significantly improve the assessment and development of more reliable and accurate VLMs, with potential applications in image captioning, visual question answering, and AI-assisted image analysis across various industries.

Authors: Viraj Prabhu, Senthil Purushwalkam, An Yan, Caiming Xiong, Ran Xu

Link: https://arxiv.org/abs/2410.13121v1

Date: 2024-10-17

Summary:

Vision-Language Models (VLMs) often generate plausible but incorrect responses to visual queries. However, reliably quantifying the effect of such hallucinations in free-form responses to open-ended queries is challenging as it requires visually verifying each claim within the response. We propose Programmatic VLM Evaluation (PROVE), a new benchmarking paradigm for evaluating VLM responses to open-ended queries. To construct PROVE, we provide a large language model (LLM) with a high-fidelity scene-graph representation constructed from a hyper-detailed image caption, and prompt it to generate diverse question-answer (QA) pairs, as well as programs that can be executed over the scene graph object to verify each QA pair. We thus construct a benchmark of 10.5k challenging but visually grounded QA pairs. Next, to evaluate free-form model responses to queries in PROVE, we propose a programmatic evaluation strategy that measures both the helpfulness and truthfulness of a response within a unified scene graph-based framework. We benchmark the helpfulness-truthfulness trade-offs of a range of VLMs on PROVE, finding that very few are in-fact able to achieve a good balance between the two. Project page: \url{https://prove-explorer.netlify.app/}.

--------------------------------------------------------------------------------------------------------

Reinforcement Learning with Euclidean Data Augmentation for State-Based Continuous Control

This paper explores a new data augmentation strategy for reinforcement learning in continuous control tasks. The researchers propose using Euclidean transformations like rotations on state features based on limb configurations, rather than joint configurations. This approach significantly improves both data efficiency and asymptotic performance of reinforcement learning across various continuous control tasks. Potential applications include more efficient training of robotic systems, improved performance in simulated environments, and better generalization of learned control policies to real-world scenarios in fields such as robotics and autonomous systems.

Authors: Jinzhu Luo, Dingyang Chen, Qi Zhang

Link: https://arxiv.org/abs/2410.12983v1

Date: 2024-10-16

Summary:

Data augmentation creates new data points by transforming the original ones for a reinforcement learning (RL) agent to learn from, which has been shown to be effective for the objective of improving the data efficiency of RL for continuous control. Prior work towards this objective has been largely restricted to perturbation-based data augmentation where new data points are created by perturbing the original ones, which has been impressively effective for tasks where the RL agent observes control states as images with perturbations including random cropping, shifting, etc. This work focuses on state-based control, where the RL agent can directly observe raw kinematic and task features, and considers an alternative data augmentation applied to these features based on Euclidean symmetries under transformations like rotations. We show that the default state features used in exiting benchmark tasks that are based on joint configurations are not amenable to Euclidean transformations. We therefore advocate using state features based on configurations of the limbs (i.e., the rigid bodies connected by the joints) that instead provide rich augmented data under Euclidean transformations. With minimal hyperparameter tuning, we show this new Euclidean data augmentation strategy significantly improves both data efficiency and asymptotic performance of RL on a wide range of continuous control tasks.

--------------------------------------------------------------------------------------------------------

A Prompt-Based Knowledge Graph Foundation Model for Universal In-Context Reasoning

This paper introduces KG-ICL, a prompt-based Knowledge Graph (KG) foundation model that aims to achieve universal reasoning ability across diverse KGs. The model uses a prompt graph centered on a query-related example fact as context and employs a unified tokenizer and message passing neural networks for prompt encoding and KG reasoning. KG-ICL outperforms baselines on most datasets, demonstrating strong generalization and universal reasoning capabilities. This approach could revolutionize knowledge-driven tasks across various domains, with potential applications in question-answering systems, recommendation engines, and intelligent decision support systems that require reasoning over large-scale knowledge graphs.

Authors: Yuanning Cui, Zequn Sun, Wei Hu

Link: https://arxiv.org/abs/2410.12288v1

Date: 2024-10-16

Summary:

Extensive knowledge graphs (KGs) have been constructed to facilitate knowledge-driven tasks across various scenarios. However, existing work usually develops separate reasoning models for different KGs, lacking the ability to generalize and transfer knowledge across diverse KGs and reasoning settings. In this paper, we propose a prompt-based KG foundation model via in-context learning, namely KG-ICL, to achieve a universal reasoning ability. Specifically, we introduce a prompt graph centered with a query-related example fact as context to understand the query relation. To encode prompt graphs with the generalization ability to unseen entities and relations in queries, we first propose a unified tokenizer that maps entities and relations in prompt graphs to predefined tokens. Then, we propose two message passing neural networks to perform prompt encoding and KG reasoning, respectively. We conduct evaluation on 43 different KGs in both transductive and inductive settings. Results indicate that the proposed KG-ICL outperforms baselines on most datasets, showcasing its outstanding generalization and universal reasoning capabilities. The source code is accessible on GitHub: https://github.com/nju-websoft/KG-ICL.

--------------------------------------------------------------------------------------------------------

The State of Robot Motion Generation

This paper provides a comprehensive review of robot motion generation methods developed over 50 years of robotics research. It covers a wide range of methodologies, from those using explicit models to those learning implicit ones, and discusses the current state-of-the-art as well as properties of various approaches. By highlighting opportunities for integration across different methodologies, this review could inspire new hybrid approaches to robot motion generation. Potential applications include more adaptable and efficient motion planning in industrial robotics, autonomous vehicles, and humanoid robots, leading to improved performance in complex and dynamic environments.

Authors: Kostas E. Bekris, Joe Doerr, Patrick Meng, Sumanth Tangirala

Link: https://arxiv.org/abs/2410.12172v1

Date: 2024-10-16

Summary:

This paper reviews the large spectrum of methods for generating robot motion proposed over the 50 years of robotics research culminating in recent developments. It crosses the boundaries of methodologies, typically not surveyed together, from those that operate over explicit models to those that learn implicit ones. The paper discusses the current state-of-the-art as well as properties of varying methodologies, highlighting opportunities for integration.

--------------------------------------------------------------------------------------------------------

CrediRAG: Network-Augmented Credibility-Based Retrieval for Misinformation Detection in Reddit

This paper introduces CrediRAG, a fake news detection model that combines language models with access to a political knowledge base and a dense social network. The model uses a news retriever to assign initial misinformation scores based on source credibility, then refines these estimates using a weighted post-to-post network. CrediRAG achieves an 11% increase in F1-score for detecting misinformative posts compared to state-of-the-art methods. This approach could significantly improve online misinformation detection, with potential applications in social media content moderation, fact-checking systems, and tools to help users critically evaluate information sources.

Authors: Ashwin Ram, Yigit Ege Bayiz, Arash Amini, Mustafa Munir, Radu Marculescu

Link: https://arxiv.org/abs/2410.12061v1

Date: 2024-10-15

Summary:

Fake news threatens democracy and exacerbates the polarization and divisions in society; therefore, accurately detecting online misinformation is the foundation of addressing this issue. We present CrediRAG, the first fake news detection model that combines language models with access to a rich external political knowledge base with a dense social network to detect fake news across social media at scale. CrediRAG uses a news retriever to initially assign a misinformation score to each post based on the source credibility of similar news articles to the post title content. CrediRAG then improves the initial retrieval estimations through a novel weighted post-to-post network connected based on shared commenters and weighted by the average stance of all shared commenters across every pair of posts. We achieve 11% increase in the F1-score in detecting misinformative posts over state-of-the-art methods. Extensive experiments conducted on curated real-world Reddit data of over 200,000 posts demonstrate the superior performance of CrediRAG on existing baselines. Thus, our approach offers a more accurate and scalable solution to combat the spread of fake news across social media platforms.

--------------------------------------------------------------------------------------------------------

BlendRL: A Framework for Merging Symbolic and Neural Policy Learning

This paper introduces BlendRL, a neuro-symbolic reinforcement learning framework that integrates both symbolic reasoning and neural networks within RL agents. The framework uses mixtures of logic and neural policies, aiming to combine the interpretable reasoning of symbolic agents with the flexible low-level reactions of neural agents. BlendRL outperforms both neural and symbolic baselines in standard Atari environments and shows robustness to environmental changes. This approach could lead to more versatile and interpretable AI systems, with potential applications in complex decision-making tasks, robotics, and adaptive control systems that require both high-level reasoning and low-level reactivity.

Authors: Hikaru Shindo, Quentin Delfosse, Devendra Singh Dhami, Kristian Kersting

Link: https://arxiv.org/abs/2410.11689v1

Date: 2024-10-15

Summary:

Humans can leverage both symbolic reasoning and intuitive reactions. In contrast, reinforcement learning policies are typically encoded in either opaque systems like neural networks or symbolic systems that rely on predefined symbols and rules. This disjointed approach severely limits the agents' capabilities, as they often lack either the flexible low-level reaction characteristic of neural agents or the interpretable reasoning of symbolic agents. To overcome this challenge, we introduce BlendRL, a neuro-symbolic RL framework that harmoniously integrates both paradigms within RL agents that use mixtures of both logic and neural policies. We empirically demonstrate that BlendRL agents outperform both neural and symbolic baselines in standard Atari environments, and showcase their robustness to environmental changes. Additionally, we analyze the interaction between neural and symbolic policies, illustrating how their hybrid use helps agents overcome each other's limitations.

--------------------------------------------------------------------------------------------------------

Are UFOs Driving Innovation? The Illusion of Causality in Large Language Models

This study investigates whether large language models (LLMs) develop illusions of causality, a cognitive bias where people believe in causal connections without supporting evidence. The researchers evaluated news headlines generated by different LLMs to determine if they incorrectly framed correlations as causal relationships. They also examined whether incorporating bias into prompts increases the likelihood of causal illusions. The findings show varying degrees of susceptibility to causal illusions among different models. This research highlights potential pitfalls in using LLMs for tasks requiring causal reasoning and could inform the development of more robust and unbiased language models for applications in journalism, scientific writing, and automated content generation.

Authors: María Victoria Carro, Francisca Gauna Selasco, Denise Alejandra Mester, Mario Alejandro Leiva

Link: https://arxiv.org/abs/2410.11684v1

Date: 2024-10-15

Summary:

Illusions of causality occur when people develop the belief that there is a causal connection between two variables with no supporting evidence. This cognitive bias has been proposed to underlie many societal problems including social prejudice, stereotype formation, misinformation and superstitious thinking. In this research we investigate whether large language models develop the illusion of causality in real-world settings. We evaluated and compared news headlines generated by GPT-4o-Mini, Claude-3.5-Sonnet, and Gemini-1.5-Pro to determine whether the models incorrectly framed correlations as causal relationships. In order to also measure sycophantic behavior, which occurs when a model aligns with a user's beliefs in order to look favorable even if it is not objectively correct, we additionally incorporated the bias into the prompts, observing if this manipulation increases the likelihood of the models exhibiting the illusion of causality. We found that Claude-3.5-Sonnet is the model that presents the lowest degree of causal illusion aligned with experiments on Correlation-to-Causation Exaggeration in human-written press releases. On the other hand, our findings suggest that while mimicry sycophancy increases the likelihood of causal illusions in these models, especially in GPT-4o-Mini, Claude-3.5-Sonnet remains the most robust against this cognitive bias.

--------------------------------------------------------------------------------------------------------

Depth Any Video with Scalable Synthetic Data

This paper introduces Depth Any Video, a model for video depth estimation that addresses the challenge of consistent and scalable ground truth data. The researchers developed a synthetic data pipeline generating 40,000 video clips with precise depth annotations and leveraged priors from generative video diffusion models. The model can handle videos of varying lengths and frame rates, even single frames, and uses a depth interpolation method for high-resolution video depth across long sequences. This approach could significantly improve depth estimation in videos, with potential applications in augmented reality, autonomous driving, robotics, and computer vision tasks requiring accurate depth perception.

Authors: Honghui Yang, Di Huang, Wei Yin, Chunhua Shen, Haifeng Liu, Xiaofei He, Binbin Lin, Wanli Ouyang, Tong He

Link: https://arxiv.org/abs/2410.10815v1

Date: 2024-10-14

Summary:

Video depth estimation has long been hindered by the scarcity of consistent and scalable ground truth data, leading to inconsistent and unreliable results. In this paper, we introduce Depth Any Video, a model that tackles the challenge through two key innovations. First, we develop a scalable synthetic data pipeline, capturing real-time video depth data from diverse synthetic environments, yielding 40,000 video clips of 5-second duration, each with precise depth annotations. Second, we leverage the powerful priors of generative video diffusion models to handle real-world videos effectively, integrating advanced techniques such as rotary position encoding and flow matching to further enhance flexibility and efficiency. Unlike previous models, which are limited to fixed-length video sequences, our approach introduces a novel mixed-duration training strategy that handles videos of varying lengths and performs robustly across different frame rates-even on single frames. At inference, we propose a depth interpolation method that enables our model to infer high-resolution video depth across sequences of up to 150 frames. Our model outperforms all previous generative depth models in terms of spatial accuracy and temporal consistency.

--------------------------------------------------------------------------------------------------------

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

This paper presents SeedLM, a novel post-training compression method for Large Language Models (LLMs) that uses seeds of pseudo-random generators to encode and compress model weights. The method reduces memory access and leverages idle compute cycles during inference, effectively speeding up memory-bound tasks. SeedLM achieves better zero-shot accuracy retention at low bit precision than state-of-the-art techniques while maintaining performance comparable to FP16 baselines. This approach could enable more efficient deployment of LLMs on resource-constrained devices, with potential applications in mobile AI, edge computing, and improved accessibility of large language models across various platforms.

Authors: Rasoul Shafipour, David Harrison, Maxwell Horton, Jeffrey Marker, Houman Bedayat, Sachin Mehta, Mohammad Rastegari, Mahyar Najibi, Saman Naderiparizi

Link: https://arxiv.org/abs/2410.10714v2

Date: 2024-10-16

Summary:

Large Language Models (LLMs) have transformed natural language processing, but face significant challenges in widespread deployment due to their high runtime cost. In this paper, we introduce SeedLM, a novel post-training compression method that uses seeds of pseudo-random generators to encode and compress model weights. Specifically, for each block of weights, we find a seed that is fed into a Linear Feedback Shift Register (LFSR) during inference to efficiently generate a random matrix. This matrix is then linearly combined with compressed coefficients to reconstruct the weight block. SeedLM reduces memory access and leverages idle compute cycles during inference, effectively speeding up memory-bound tasks by trading compute for fewer memory accesses. Unlike state-of-the-art compression methods that rely on calibration data, our approach is data-free and generalizes well across diverse tasks. Our experiments with Llama 3 70B, which is particularly challenging to compress, show that SeedLM achieves significantly better zero-shot accuracy retention at 4- and 3-bit than state-of-the-art techniques, while maintaining performance comparable to FP16 baselines. Additionally, FPGA-based tests demonstrate that 4-bit SeedLM, as model size increases to 70B, approaches a 4x speed-up over an FP16 Llama 2/3 baseline.

--------------------------------------------------------------------------------------------------------

Revisiting and Benchmarking Graph Autoencoders: A Contrastive Learning Perspective

This paper establishes connections between Graph Autoencoders (GAEs) and contrastive learning, introducing lrGAE (left-right GAE), a general and powerful GAE framework leveraging contrastive learning principles. The researchers demonstrate how contrastive learning can be applied to GAEs and provide a comprehensive benchmark for GAEs across diverse graph-based learning tasks. This work could lead to improved understanding and performance of GAEs, with potential applications in social network analysis, recommendation systems, bioinformatics, and any field requiring meaningful representations of graph-structured data.

Authors: Jintang Li, Ruofan Wu, Yuchang Zhu, Huizhe Zhang, Xinzhou Jin, Guibin Zhang, Zulun Zhu, Zibin Zheng, Liang Chen

Link: https://arxiv.org/abs/2410.10241v1

Date: 2024-10-14

Summary:

Graph autoencoders (GAEs) are self-supervised learning models that can learn meaningful representations of graph-structured data by reconstructing the input graph from a low-dimensional latent space. Over the past few years, GAEs have gained significant attention in academia and industry. In particular, the recent advent of GAEs with masked autoencoding schemes marks a significant advancement in graph self-supervised learning research. While numerous GAEs have been proposed, the underlying mechanisms of GAEs are not well understood, and a comprehensive benchmark for GAEs is still lacking. In this work, we bridge the gap between GAEs and contrastive learning by establishing conceptual and methodological connections. We revisit the GAEs studied in previous works and demonstrate how contrastive learning principles can be applied to GAEs. Motivated by these insights, we introduce lrGAE (left-right GAE), a general and powerful GAE framework that leverages contrastive learning principles to learn meaningful representations. Our proposed lrGAE not only facilitates a deeper understanding of GAEs but also sets a new benchmark for GAEs across diverse graph-based learning tasks. The source code for lrGAE, including the baselines and all the code for reproducing the results, is publicly available at https://github.com/EdisonLeeeee/lrGAE.

--------------------------------------------------------------------------------------------------------

DynamicER: Resolving Emerging Mentions to Dynamic Entities for RAG

This paper introduces a novel task aimed at resolving emerging mentions to dynamic entities in continuously updating knowledge bases, crucial for retrieval-augmented generation (RAG) with knowledge bases. The researchers propose a temporal segmented clustering method with continual adaptation to manage the temporal dynamics of evolving entities and emerging mentions. This approach could significantly improve the performance of RAG models on entity-centric knowledge-intensive tasks, with potential applications in question-answering systems, chatbots, and information retrieval systems that need to stay up-to-date with rapidly evolving knowledge and language use.

Authors: Jinyoung Kim, Dayoon Ko, Gunhee Kim

Link: https://arxiv.org/abs/2410.11494v1

Date: 2024-10-15

Summary:

In the rapidly evolving landscape of language, resolving new linguistic expressions in continuously updating knowledge bases remains a formidable challenge. This challenge becomes critical in retrieval-augmented generation (RAG) with knowledge bases, as emerging expressions hinder the retrieval of relevant documents, leading to generator hallucinations. To address this issue, we introduce a novel task aimed at resolving emerging mentions to dynamic entities and present DynamicER benchmark. Our benchmark includes dynamic entity mention resolution and entity-centric knowledge-intensive QA task, evaluating entity linking and RAG model's adaptability to new expressions, respectively. We discovered that current entity linking models struggle to link these new expressions to entities. Therefore, we propose a temporal segmented clustering method with continual adaptation, effectively managing the temporal dynamics of evolving entities and emerging mentions. Extensive experiments demonstrate that our method outperforms existing baselines, enhancing RAG model performance on QA task with resolved mentions.

--------------------------------------------------------------------------------------------------------

Digital Humanities in the TIME-US Project: Richness and Contribution of Interdisciplinary Methods for Labour History

This paper discusses the TIME-US project, which aims to quantify women's work in the past using diverse historical sources and digital humanities methods. The project mobilizes varied sources containing traces of women's professional activities in France during a specific period, including both printed and handwritten materials. By gathering and analyzing these sources, the project seeks to make invisible women's labor visible. This approach could revolutionize labor history research, providing new insights into women's economic contributions and potentially informing contemporary discussions on gender equality in the workplace.

Authors: Marie Puren

Link: https://arxiv.org/abs/2410.14222v1

Date: 2024-10-18

Summary:

In 2015, the Annales journal, traditionally open to interdisciplinary approaches in history, referred to 'the current historiographical moment [as] call [ing] for an experimentation of approaches'. 1 Although this observation did not exclusively refer to the new possibilities offered by the technological advancements of the time -particularly in the field of artificial intelligence 2 -it was nonetheless motivated by these rapid and numerous changes, which also affect the historiographical landscape. A year earlier, St\'ephane Lamass\'e and Philippe Rygiel spoke of the 'new frontiers of the historian', frontiers opened a few years earlier by the realisation of the unprecedented impact of new technologies on historical practices, leading to a 'mutation des conditions de production et de diffusion des connaissances historiques, voire de la nature de celles-ci' ('transformation of the conditions of production and dissemination of historical knowledge, and even the nature of this knowledge'). 3 It was in this fertile ground, conducive to the cross-fertilisation of approaches, that the TIME-US project was born in 2016. TIME-US is directly the result of this awareness and reflects the transformations induced by major technological advancements, disrupting not only our daily practices but also our historical practices. 1 Annales 2015, 216. 2 For example, convolutional neural networks, which have revolutionised the field of artificial intelligence, began to gain popularity just before the 2010s. 3 Translated by the author. Lamass\'e and Rygiel 2014. To quantify women's work in the past, labour historians cannot rely on the classic sources of their discipline, which allow to produce large statistical data series, systematically treatable in the form of databases. What to do when such data are not available? Should the task simply be abandoned? As Maria {\AA}gren points out, the invisibility of women's participation in the labour market does not mean non-existence 8 ; there must therefore be traces of it. To quantify women's economic activity, Sara Horrell and Jane Humphries, for example, turned to household budgets from 59 different sources (from Parliamentary Papers to autobiographical texts), which had never before been systematically used to identify women's work patterns and their contribution to family income. 9 In her study A Bitter Living: Women, Markets, and Social Capital in Early Modern Germany published in 2003, Sheilagh Ogilvie used information contained in court records to identify activities carried out by women and the time spent on these activities. Court records were not intended to record such information; yet, in their testimonies, witnesses often described in detail the activities they were engaged in while a crime was unfolding before their eyes. Sheilagh Ogilvie thus identified nearly 3000 such observations. 10 These works have opened two main avenues for the TIME-US project. First, making already digitised sources accessible in homogeneous corpora. 11 Following the example of previous research, TIME-US mobilised varied sources containing traces of professional activities carried out by women in France during the period studied: these include both printed (posters and petitions, working-class newspapers, and contemporary surveys on workers) and handwritten sources (labour court decisions, police reports, company archives, personal archives, surveys, petitions). 12 One of the project's objectives was to gather and 8 {\AA}gren 2018a, 144. 9 Horrell and Humphries 1995.

--------------------------------------------------------------------------------------------------------

Cliqueformer: Model-Based Optimization with Structured Transformers

This paper introduces Cliqueformer, a scalable transformer-based architecture that learns the structure of model-based optimization (MBO) tasks in the form of functional graphical models (FGMs). The approach aims to improve design optimization in various domains by learning the black-box function's structure, bypassing distribution shift problems. Cliqueformer demonstrates state-of-the-art performance on tasks ranging from high-dimensional black-box functions to real-world chemical and genetic design problems. This method could significantly enhance optimization processes in drug discovery, material science, and other fields requiring efficient design of complex systems based on limited data.

Authors: Jakub Grudzien Kuba, Pieter Abbeel, Sergey Levine

Link: https://arxiv.org/abs/2410.13106v1

Date: 2024-10-17

Summary:

Expressive large-scale neural networks enable training powerful models for prediction tasks. However, in many engineering and science domains, such models are intended to be used not just for prediction, but for design -- e.g., creating new proteins that serve as effective therapeutics, or creating new materials or chemicals that maximize a downstream performance measure. Thus, researchers have recently grown an interest in building deep learning methods that solve offline \emph{model-based optimization} (MBO) problems, in which design candidates are optimized with respect to surrogate models learned from offline data. However, straightforward application of predictive models that are effective at predicting in-distribution properties of a design are not necessarily the best suited for use in creating new designs. Thus, the most successful algorithms that tackle MBO draw inspiration from reinforcement learning and generative modeling to meet the in-distribution constraints. Meanwhile, recent theoretical works have observed that exploiting the structure of the target black-box function is an effective strategy for solving MBO from offline data. Unfortunately, discovering such structure remains an open problem. In this paper, following first principles, we develop a model that learns the structure of an MBO task and empirically leads to improved designs. To this end, we introduce \emph{Cliqueformer} -- a scalable transformer-based architecture that learns the black-box function's structure in the form of its \emph{functional graphical model} (FGM), thus bypassing the problem of distribution shift, previously tackled by conservative approaches. We evaluate Cliqueformer on various tasks, ranging from high-dimensional black-box functions from MBO literature to real-world tasks of chemical and genetic design, consistently demonstrating its state-of-the-art performance.

--------------------------------------------------------------------------------------------------------

A Pattern to Align Them All: Integrating Different Modalities to Define Multi-Modal Entities

This paper proposes a novel ontology design pattern for capturing the semantics of multi-modal entities in Knowledge Graphs. The pattern separates the abstract entity and its information from its physical realization across different media. This approach aims to facilitate the harmonization and integration of different existing multi-modal ontologies. The proposed model could significantly improve the representation and reasoning capabilities of multi-modal knowledge graphs, with potential applications in fields such as medicine, digital humanities, and any domain requiring the integration of diverse information modalities for more comprehensive and flexible knowledge representation.

Authors: Gianluca Apriceno, Valentina Tamma, Tania Bailoni, Jacopo de Berardinis, Mauro Dragoni

Link: https://arxiv.org/abs/2410.13803v1

Date: 2024-10-17

Summary:

The ability to reason with and integrate different sensory inputs is the foundation underpinning human intelligence and it is the reason for the growing interest in modelling multi-modal information within Knowledge Graphs. Multi-Modal Knowledge Graphs extend traditional Knowledge Graphs by associating an entity with its possible modal representations, including text, images, audio, and videos, all of which are used to convey the semantics of the entity. Despite the increasing attention that Multi-Modal Knowledge Graphs have received, there is a lack of consensus about the definitions and modelling of modalities, whose definition is often determined by application domains. In this paper, we propose a novel ontology design pattern that captures the separation of concerns between an entity (and the information it conveys), whose semantics can have different manifestations across different media, and its realisation in terms of a physical information entity. By introducing this abstract model, we aim to facilitate the harmonisation and integration of different existing multi-modal ontologies which is crucial for many intelligent applications across different domains spanning from medicine to digital humanities.

--------------------------------------------------------------------------------------------------------

EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.

Artificial Intelligence, Research WatchCraig SmithOctober 21, 2024Comment