Week Ending 5.5.2024
RESEARCH WATCH: 5.5.2024
Towards Green Communication: Soft Decoding Scheme for OOK Signals in Zero-Energy Devices
The rise of IoT devices requires energy-efficient communication methods. This paper proposes a soft decoding scheme for on-off keying signals to enable reliable, low-power communications for IoT and 6G services without frequent battery replacements. The technique could extend the range of zero-energy devices and support massive connectivity.
Authors: Ticao Zhang, Dennis Hui, Mehrnaz Afshang, Mohammad Mozaffari
Link: https://arxiv.org/abs/2405.01785v1
Date: 2024-05-03
Summary:
The booming of Internet-of-Things (IoT) is expected to provide more intelligent and reliable communication services for higher network coverage, massive connectivity, and low-cost solutions for 6G services. However, frequent charging and battery replacement of these massive IoT devices brings a series of challenges. Zero energy devices, which rely on energy-harvesting technologies and can operate without battery replacement or charging, play a pivotal role in facilitating the massive use of IoT devices. In order to enable reliable communications of such low-power devices, Manchester-coded on-off keying (OOK) modulation and non-coherent detections are attractive techniques due to their energy efficiency, robustness in noisy environments, and simplicity in receiver design. Moreover, to extend their communication range, employing channel coding along with enhanced detection schemes is crucial. In this paper, a novel soft-decision decoder is designed for OOK-based low-power receivers to enhance their detection performance. In addition, exact closed-form expressions and two simplified approximations are derived for the log-likelihood ratio (LLR), an essential metric for soft decoding. Numerical results demonstrate the significant coverage gain achieved through soft decoding for convolutional code.
--------------------------------------------------------------------------------------------------------
Multi-user ISAC through Stacked Intelligent Metasurfaces: New Algorithms and Experiments
This work explores using stacked intelligent metasurfaces for integrated sensing and communication systems. The proposed algorithms jointly optimize transmit beamforming and metasurface configurations to enhance sensing capabilities while meeting communication user requirements. This could enable advanced signal processing for applications requiring both sensing and communication functionalities.
Authors: Ziqing Wang, Hongzheng Liu, Jianan Zhang, Rujing Xiong, Kai Wan, Xuewen Qian, Marco Di Renzo, Robert Caiming Qiu
Link: https://arxiv.org/abs/2405.01104v1
Date: 2024-05-02
Summary:
This paper investigates a Stacked Intelligent Metasurfaces (SIM)-assisted Integrated Sensing and Communications (ISAC) system. An extended target model is considered, where the BS aims to estimate the complete target response matrix relative to the SIM. Under the constraints of minimum Signal-to-Interference-plus-Noise Ratio (SINR) for the communication users (CUs) and maximum transmit power, we jointly optimize the transmit beamforming at the base station (BS) and the end-to-end transmission matrix of the SIM, to minimize the Cram\'er-Rao Bound (CRB) for target estimation. Effective algorithms such as the alternating optimization (AO) and semidefinite relaxation (SDR) are employed to solve the non-convex SINR-constrained CRB minimization problem. Finally, we design and build an experimental platform for SIM, and evaluate the performance of the proposed algorithms for communication and sensing tasks.
--------------------------------------------------------------------------------------------------------
Distinguishing between stars formed in the Milky Way and those accreted from other galaxies is crucial for understanding galactic evolution. This study develops machine learning models to efficiently separate in-situ and accreted star populations using simulation data. Successfully applying these to observational data could provide insights into the Milky Way's assembly history.
Authors: Andrea Sante, Andreea S. Font, Sandra Ortega-Martorell, Ivan Olier, Ian G. McCarthy
Link: https://arxiv.org/abs/2405.00102v1
Date: 2024-04-30
Summary:
We present several machine learning (ML) models developed to efficiently separate stars formed in-situ in Milky Way-type galaxies from those that were formed externally and later accreted. These models, which include examples from artificial neural networks, decision trees and dimensionality reduction techniques, are trained on a sample of disc-like, Milky Way-mass galaxies drawn from the ARTEMIS cosmological hydrodynamical zoom-in simulations. We find that the input parameters which provide an optimal performance for these models consist of a combination of stellar positions, kinematics, chemical abundances ([Fe/H] and [$\alpha$/Fe]) and photometric properties. Models from all categories perform similarly well, with area under the precision-recall curve (PR-AUC) scores of $\simeq 0.6$. Beyond a galactocentric radius of $5$~kpc, models retrieve $>90\%$ of accreted stars, with a sample purity close to $60\%$, however the purity can be increased by adjusting the classification threshold. For one model, we also include host galaxy-specific properties in the training, to account for the variability of accretion histories of the hosts, however this does not lead to an improvement in performance. The ML models can identify accreted stars even in regions heavily dominated by the in-situ component (e.g., in the disc), and perform well on an unseen suite of simulations (the Auriga simulations). The general applicability bodes well for application of such methods on observational data to identify accreted substructures in the Milky Way without the need to resort to selection cuts for minimising the contamination from in-situ stars.
--------------------------------------------------------------------------------------------------------
KAN: Kolmogorov-Arnold Networks
Kolmogorov-Arnold Networks replace the fixed activation functions in traditional neural networks with learnable functions on the network edges. This architectural change is shown to improve accuracy and interpretability over standard networks. KANs could enhance areas like data fitting, solving PDEs, and even aiding scientific discovery.
Authors: Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić, Thomas Y. Hou, Max Tegmark
Link: https://arxiv.org/abs/2404.19756v2
Date: 2024-05-02
Summary:
Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs.
--------------------------------------------------------------------------------------------------------
As AI models are increasingly deployed, this paper raises concerns about the potential for adversarial attacks to manipulate interpretation methods like partial dependence plots while retaining original model predictions. Exposing such vulnerabilities highlights the need for robust interpretation techniques to ensure trustworthy AI systems.
Authors: Xi Xin, Giles Hooker, Fei Huang
Link: https://arxiv.org/abs/2404.18702v2
Date: 2024-05-01
Summary:
The adoption of artificial intelligence (AI) across industries has led to the widespread use of complex black-box models and interpretation tools for decision making. This paper proposes an adversarial framework to uncover the vulnerability of permutation-based interpretation methods for machine learning tasks, with a particular focus on partial dependence (PD) plots. This adversarial framework modifies the original black box model to manipulate its predictions for instances in the extrapolation domain. As a result, it produces deceptive PD plots that can conceal discriminatory behaviors while preserving most of the original model's predictions. This framework can produce multiple fooled PD plots via a single model. By using real-world datasets including an auto insurance claims dataset and COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) dataset, our results show that it is possible to intentionally hide the discriminatory behavior of a predictor and make the black-box model appear neutral through interpretation tools like PD plots while retaining almost all the predictions of the original black-box model. Managerial insights for regulators and practitioners are provided based on the findings.
--------------------------------------------------------------------------------------------------------
Post-hoc and manifold explanations analysis of facial expression data based on deep learning
Applying deep learning to human cognitive outputs like facial expressions, this work explores how neural networks process and interpret such data. The findings provide insights into human emotional processing and demonstrate the potential of AI for psychological research by enhancing the explainability of these "black box" models.
Authors: Yang Xiao
Link: https://arxiv.org/abs/2404.18352v1
Date: 2024-04-29
Summary:
The complex information processing system of humans generates a lot of objective and subjective evaluations, making the exploration of human cognitive products of great cutting-edge theoretical value. In recent years, deep learning technologies, which are inspired by biological brain mechanisms, have made significant strides in the application of psychological or cognitive scientific research, particularly in the memorization and recognition of facial data. This paper investigates through experimental research how neural networks process and store facial expression data and associate these data with a range of psychological attributes produced by humans. Researchers utilized deep learning model VGG16, demonstrating that neural networks can learn and reproduce key features of facial data, thereby storing image memories. Moreover, the experimental results reveal the potential of deep learning models in understanding human emotions and cognitive processes and establish a manifold visualization interpretation of cognitive products or psychological attributes from a non-Euclidean space perspective, offering new insights into enhancing the explainability of AI. This study not only advances the application of AI technology in the field of psychology but also provides a new psychological theoretical understanding the information processing of the AI. The code is available in here: https://github.com/NKUShaw/Psychoinformatics.
--------------------------------------------------------------------------------------------------------
MFTraj: Map-Free, Behavior-Driven Trajectory Prediction for Autonomous Driving
For autonomous driving, accurately predicting vehicle trajectories in dynamic environments is critical. This paper introduces a map-free, behavior-aware deep learning model that captures complex interactions using historical trajectories. Such a model could enable safer, more reliable autonomous systems without relying on high-definition maps.
Authors: Haicheng Liao, Zhenning Li, Chengyue Wang, Huanming Shen, Bonan Wang, Dongping Liao, Guofa Li, Chengzhong Xu
Link: https://arxiv.org/abs/2405.01266v1
Date: 2024-05-02
Summary:
This paper introduces a trajectory prediction model tailored for autonomous driving, focusing on capturing complex interactions in dynamic traffic scenarios without reliance on high-definition maps. The model, termed MFTraj, harnesses historical trajectory data combined with a novel dynamic geometric graph-based behavior-aware module. At its core, an adaptive structure-aware interactive graph convolutional network captures both positional and behavioral features of road users, preserving spatial-temporal intricacies. Enhanced by a linear attention mechanism, the model achieves computational efficiency and reduced parameter overhead. Evaluations on the Argoverse, NGSIM, HighD, and MoCAD datasets underscore MFTraj's robustness and adaptability, outperforming numerous benchmarks even in data-challenged scenarios without the need for additional information such as HD maps or vectorized maps. Importantly, it maintains competitive performance even in scenarios with substantial missing data, on par with most existing state-of-the-art models. The results and methodology suggest a significant advancement in autonomous driving trajectory prediction, paving the way for safer and more efficient autonomous systems.
--------------------------------------------------------------------------------------------------------
Individual Fairness Through Reweighting and Tuning
Mitigating bias and ensuring fairness in AI systems is an increasing priority. This paper investigates enforcing individual fairness through regularization techniques like the graph Laplacian regularizer. The approach could potentially improve the fairness of models applied to different domains while maintaining accuracy.
Authors: Abdoul Jalil Djiberou Mahamadou, Lea Goetz, Russ Altman
Link: https://arxiv.org/abs/2405.01711v1
Date: 2024-05-02
Summary:
Inherent bias within society can be amplified and perpetuated by artificial intelligence (AI) systems. To address this issue, a wide range of solutions have been proposed to identify and mitigate bias and enforce fairness for individuals and groups. Recently, Graph Laplacian Regularizer (GLR), a regularization technique from the semi-supervised learning literature has been used as a substitute for the common Lipschitz condition to enhance individual fairness (IF). Notable prior work has shown that enforcing IF through a GLR can improve the transfer learning accuracy of AI models under covariate shifts. However, the prior work defines a GLR on the source and target data combined, implicitly assuming that the target data are available at train time, which might not hold in practice. In this work, we investigated whether defining a GLR independently on the train and target data could maintain similar accuracy compared to the prior work model. Furthermore, we introduced the Normalized Fairness Gain score (FGN) to measure IF for in-processing algorithmic fairness techniques. FGN quantifies the amount of gained fairness when a GLR is used versus not. We evaluated the new and original methods under FGN, the Prediction Consistency (PC), and traditional classification metrics on the German Credit Approval dataset. The results showed that the two models achieved similar statistical mean performances over five-fold cross-validation. Furthermore, the proposed metric showed that PC scores can be misleading as the scores can be high and statistically similar to fairness-enhanced models while FGN scores are small. This work therefore provides new insights into when a GLR effectively enhances IF and the pitfalls of PC.
--------------------------------------------------------------------------------------------------------
Large language models have shown an impressive ability to mimic human writing styles across domains. Studying their performance on creative, witty texts like Reddit's "Showerthoughts" could shed light on their limitations for open-ended generation and their potential for style adaptation in specific contexts.
Authors: Tolga Buz, Benjamin Frost, Nikola Genchev, Moritz Schneider, Lucie-Aimée Kaffee, Gerard de Melo
Link: https://arxiv.org/abs/2405.01660v1
Date: 2024-05-02
Summary:
Recent Large Language Models (LLMs) have shown the ability to generate content that is difficult or impossible to distinguish from human writing. We investigate the ability of differently-sized LLMs to replicate human writing style in short, creative texts in the domain of Showerthoughts, thoughts that may occur during mundane activities. We compare GPT-2 and GPT-Neo fine-tuned on Reddit data as well as GPT-3.5 invoked in a zero-shot manner, against human-authored texts. We measure human preference on the texts across the specific dimensions that account for the quality of creative, witty texts. Additionally, we compare the ability of humans versus fine-tuned RoBERTa classifiers to detect AI-generated texts. We conclude that human evaluators rate the generated texts slightly worse on average regarding their creative quality, but they are unable to reliably distinguish between human-written and AI-generated texts. We further provide a dataset for creative, witty text generation based on Reddit Showerthoughts posts.
--------------------------------------------------------------------------------------------------------
Assembling Modular, Hierarchical Cognitive Map Learners with Hyperdimensional Computing
Assembling modular cognitive systems inspired by the brain's hierarchical organization is explored here. Repurposing pre-trained components expressed through hyperdimensional computing allows solving complex tasks like the Tower of Hanoi puzzle. This could inform the development of more biologically plausible artificial cognitive architectures.
Authors: Nathan McDonald, Anthony Dematteo
Link: https://arxiv.org/abs/2404.19051v1
Date: 2024-04-29
Summary:
Cognitive map learners (CML) are a collection of separate yet collaboratively trained single-layer artificial neural networks (matrices), which navigate an abstract graph by learning internal representations of the node states, edge actions, and edge action availabilities. A consequence of this atypical segregation of information is that the CML performs near-optimal path planning between any two graph node states. However, the CML does not learn when or why to transition from one node to another. This work created CMLs with node states expressed as high dimensional vectors consistent with hyperdimensional computing (HDC), a form of symbolic machine learning (ML). This work evaluated HDC-based CMLs as ML modules, capable of receiving external inputs and computing output responses which are semantically meaningful for other HDC-based modules. Several CMLs were prepared independently then repurposed to solve the Tower of Hanoi puzzle without retraining these CMLs and without explicit reference to their respective graph topologies. This work suggests a template for building levels of biologically plausible cognitive abstraction and orchestration.
--------------------------------------------------------------------------------------------------------
Chiral magnets exhibiting nonreciprocal transport have potential for novel computing and signal processing applications, but face challenges in materials synthesis and operating temperatures. This work demonstrates artificial chiral magnets composed of helical nickel surfaces that exhibit reprogrammable, nonreciprocal magnon transport at room temperature and zero field, overcoming previous limitations. The materials-by-design approach could enable practical realizations of chiral magnetic devices.
Authors: Mingran Xu, Axel J. M. Deenen, Huixin Guo, Dirk Grundler
Link: https://arxiv.org/abs/2404.19153v2
Date: 2024-05-01
Summary:
Chiral magnets are materials which possess unique helical arrangements of magnetic moments, which give rise to nonreciprocal transport and fascinating physics phenomena. On the one hand, their exploration is guided by the prospects of unconventional signal processing, computation schemes and magnetic memory. On the other hand, progress in applications is hindered by the challenging materials synthesis, limited scalability and typically low critical temperature. Here, we report the creation and exploration of artificial chiral magnets (ACMs) at room temperature. By employing a mass production compatible deposition technology, we synthesize ACMs, which consist of helical Ni surfaces on central cylinders. Using optical microscopy, we reveal nonreciprocal magnon transport at GHz frequencies. It is controlled by programmable toroidal moments which result from the ACM's geometrical handedness and field-dependent spin chirality. We present materials-by-design rules which optimize the helically curved ferromagnets for 3D nonreciprocal transport at room temperature and zero magnetic field.
--------------------------------------------------------------------------------------------------------
Joint sentiment analysis of lyrics and audio in music
Automatically analyzing the sentiment conveyed through music is challenging, as both the audio and lyrics contribute to the perceived mood. This study evaluates models for sentiment analysis from audio and lyrics separately, investigating approaches to combine the two modalities. Robust multimodal sentiment analysis could enhance music recommendation systems and support psychological studies on music's emotional impact.
Authors: Lea Schaab, Anna Kruspe
Link: https://arxiv.org/abs/2405.01988v1
Date: 2024-05-03
Summary:
Sentiment or mood can express themselves on various levels in music. In automatic analysis, the actual audio data is usually analyzed, but the lyrics can also play a crucial role in the perception of moods. We first evaluate various models for sentiment analysis based on lyrics and audio separately. The corresponding approaches already show satisfactory results, but they also exhibit weaknesses, the causes of which we examine in more detail. Furthermore, different approaches to combining the audio and lyrics results are proposed and evaluated. Considering both modalities generally leads to improved performance. We investigate misclassifications and (also intentional) contradictions between audio and lyrics sentiment more closely, and identify possible causes. Finally, we address fundamental problems in this research area, such as high subjectivity, lack of data, and inconsistency in emotion taxonomies.
--------------------------------------------------------------------------------------------------------
Guiding Attention in End-to-End Driving Models
End-to-end driving models hold promise for affordable autonomous vehicles, but require large training datasets and lack interpretability. This paper guides the attention of such models during training using semantic maps, improving driving performance without architectural changes or test-time maps. The technique could enhance autonomous driving capabilities, especially in data-limited scenarios, while providing more intuitive visualizations.
Authors: Diego Porres, Yi Xiao, Gabriel Villalonga, Alexandre Levy, Antonio M. López
Link: https://arxiv.org/abs/2405.00242v1
Date: 2024-04-30
Summary:
Vision-based end-to-end driving models trained by imitation learning can lead to affordable solutions for autonomous driving. However, training these well-performing models usually requires a huge amount of data, while still lacking explicit and intuitive activation maps to reveal the inner workings of these models while driving. In this paper, we study how to guide the attention of these models to improve their driving quality and obtain more intuitive activation maps by adding a loss term during training using salient semantic maps. In contrast to previous work, our method does not require these salient semantic maps to be available during testing time, as well as removing the need to modify the model's architecture to which it is applied. We perform tests using perfect and noisy salient semantic maps with encouraging results in both, the latter of which is inspired by possible errors encountered with real data. Using CIL++ as a representative state-of-the-art model and the CARLA simulator with its standard benchmarks, we conduct experiments that show the effectiveness of our method in training better autonomous driving models, especially when data and computational resources are scarce.
--------------------------------------------------------------------------------------------------------
Video anomaly detection identifies unusual events in video streams, with applications in surveillance, healthcare and more. However, models often struggle with dynamic real-world conditions. This work proposes an online learning framework to continuously adapt anomaly detectors to new environments while streaming, mirroring deployment challenges. Enhancing adaptivity could improve the practical utility of video anomaly detection across diverse domains.
Authors: Shanle Yao, Ghazal Alinezhad Noghre, Armin Danesh Pazho, Hamed Tabkhi
Link: https://arxiv.org/abs/2404.18747v1
Date: 2024-04-29
Summary:
Video Anomaly Detection (VAD) identifies unusual activities in video streams, a key technology with broad applications ranging from surveillance to healthcare. Tackling VAD in real-life settings poses significant challenges due to the dynamic nature of human actions, environmental variations, and domain shifts. Many research initiatives neglect these complexities, often concentrating on traditional testing methods that fail to account for performance on unseen datasets, creating a gap between theoretical models and their real-world utility. Online learning is a potential strategy to mitigate this issue by allowing models to adapt to new information continuously. This paper assesses how well current VAD algorithms can adjust to real-life conditions through an online learning framework, particularly those based on pose analysis, for their efficiency and privacy advantages. Our proposed framework enables continuous model updates with streaming data from novel environments, thus mirroring actual world challenges and evaluating the models' ability to adapt in real-time while maintaining accuracy. We investigate three state-of-the-art models in this setting, focusing on their adaptability across different domains. Our findings indicate that, even under the most challenging conditions, our online learning approach allows a model to preserve 89.39% of its original effectiveness compared to its offline-trained counterpart in a specific target domain.
--------------------------------------------------------------------------------------------------------
Research on Intelligent Aided Diagnosis System of Medical Image Based on Computer Deep Learning
Deep learning is revolutionizing medical imaging analysis, but most frameworks focus on individual tasks like reconstruction or segmentation. This work introduces ATOMMIC, a multitask toolbox enabling integrated AI applications across the entire MRI pipeline through techniques like physics-based models and multitask learning. ATOMMIC could generalize AI capabilities to facilitate comprehensive MRI analysis.
Authors: Jiajie Yuan, Linxiao Wu, Yulu Gong, Zhou Yu, Ziang Liu, Shuyao He
Link: https://arxiv.org/abs/2404.18419v1
Date: 2024-04-29
Summary:
This paper combines Struts and Hibernate two architectures together, using DAO (Data Access Object) to store and access data. Then a set of dual-mode humidity medical image library suitable for deep network is established, and a dual-mode medical image assisted diagnosis method based on the image is proposed. Through the test of various feature extraction methods, the optimal operating characteristic under curve product (AUROC) is 0.9985, the recall rate is 0.9814, and the accuracy is 0.9833. This method can be applied to clinical diagnosis, and it is a practical method. Any outpatient doctor can register quickly through the system, or log in to the platform to upload the image to obtain more accurate images. Through the system, each outpatient physician can quickly register or log in to the platform for image uploading, thus obtaining more accurate images. The segmentation of images can guide doctors in clinical departments. Then the image is analyzed to determine the location and nature of the tumor, so as to make targeted treatment.
--------------------------------------------------------------------------------------------------------
Ferroelectrically-enhanced Schottky barrier transistors for Logic-in-Memory applications
Logic-in-memory architectures unifying logic and storage show promise for energy-efficient AI hardware. This study explores using ferroelectric Schottky barrier transistors, demonstrating tunability between n/p-type operation by varying the ferroelectric polarization. The multi-state, non-volatile memory characteristics could enable novel low-power logic-in-memory designs suitable for applications like neural network execution.
Authors: Daniele Nazzari, Lukas Wind, Masiar Sistani, Dominik Mayr, Kihye Kim, Walter M. Weber
Link: https://arxiv.org/abs/2404.19535v1
Date: 2024-04-30
Summary:
Artificial neural networks (ANNs) have had an enormous impact on a multitude of sectors, from research to industry, generating an unprecedented demand for tailor-suited hardware platforms. Their training and execution is highly memory-intensive, clearly evidencing the limitations affecting the currently available hardware based on the von Neumann architecture, which requires frequent data shuttling due to the physical separation of logic and memory units. This does not only limit the achievable performances but also greatly increases the energy consumption, hindering the integration of ANNs into low-power platforms. New Logic in Memory (LiM) architectures, able to unify memory and logic functionalities into a single component, are highly promising for overcoming these limitations, by drastically reducing the need of data transfers. Recently, it has been shown that a very flexible platform for logic applications can be realized recurring to a multi-gated Schottky-Barrier Field Effect Transistor (SBFET). If equipped with memory capabilities, this architecture could represent an ideal building block for versatile LiM hardware. To reach this goal, here we investigate the integration of a ferroelectric Hf$_{0.5}$Zr$_{0.5}$O$_2$ (HZO) layer onto Dual Top Gated SBFETs. We demonstrate that HZO polarization charges can be successfully employed to tune the height of the two Schottky barriers, influencing the injection behavior, thus defining the transistor mode, switching it between n and p-type transport. The modulation strength is strongly dependent on the polarization pulse height, allowing for the selection of multiple current levels. All these achievable states can be well retained over time, thanks to the HZO stability. The presented result show how ferroelectric-enhanced SBFETs are promising for the realization of novel LiM hardware, enabling low-power circuits for ANNs execution.
--------------------------------------------------------------------------------------------------------
U-Nets excel at tasks like image segmentation and denoising, but lack a theoretical underpinning for their architecture. This paper interprets U-Nets as implementing belief propagation in generative hierarchical models, providing efficient bounds for denoising and a unified view with convolutional networks for classification. The insights could guide U-Net architecture design across language and vision domains.
Authors: Song Mei
Link: https://arxiv.org/abs/2404.18444v2
Date: 2024-05-01
Summary:
U-Nets are among the most widely used architectures in computer vision, renowned for their exceptional performance in applications such as image segmentation, denoising, and diffusion modeling. However, a theoretical explanation of the U-Net architecture design has not yet been fully established. This paper introduces a novel interpretation of the U-Net architecture by studying certain generative hierarchical models, which are tree-structured graphical models extensively utilized in both language and image domains. With their encoder-decoder structure, long skip connections, and pooling and up-sampling layers, we demonstrate how U-Nets can naturally implement the belief propagation denoising algorithm in such generative hierarchical models, thereby efficiently approximating the denoising functions. This leads to an efficient sample complexity bound for learning the denoising function using U-Nets within these models. Additionally, we discuss the broader implications of these findings for diffusion models in generative hierarchical models. We also demonstrate that the conventional architecture of convolutional neural networks (ConvNets) is ideally suited for classification tasks within these models. This offers a unified view of the roles of ConvNets and U-Nets, highlighting the versatility of generative hierarchical models in modeling complex data distributions across language and image domains.
--------------------------------------------------------------------------------------------------------
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
Evaluating AI agents' workplace capabilities is crucial for high-stakes business deployments. WorkBench introduces a sandbox with databases and office tools to benchmark agents on realistic tasks like scheduling meetings. Comprehensive evaluations reveal significant weaknesses even in state-of-the-art models, underscoring the need for advancements before enterprise AI adoption.
Authors: Olly Styles, Sam Miller, Patricio Cerda-Mardini, Tanaya Guha, Victor Sanchez, Bertie Vidgen
Link: https://arxiv.org/abs/2405.00823v1
Date: 2024-05-01
Summary:
We introduce WorkBench: a benchmark dataset for evaluating agents' ability to execute tasks in a workplace setting. WorkBench contains a sandbox environment with five databases, 26 tools, and 690 tasks. These tasks represent common business activities, such as sending emails and scheduling meetings. The tasks in WorkBench are challenging as they require planning, tool selection, and often multiple actions. If a task has been successfully executed, one (or more) of the database values may change. The correct outcome for each task is unique and unambiguous, which allows for robust, automated evaluation. We call this key contribution outcome-centric evaluation. We evaluate five existing ReAct agents on WorkBench, finding they successfully complete as few as 3% of tasks (Llama2-70B), and just 43% for the best-performing (GPT-4). We further find that agents' errors can result in the wrong action being taken, such as an email being sent to the wrong person. WorkBench reveals weaknesses in agents' ability to undertake common business activities, raising questions about their use in high-stakes workplace settings. WorkBench is publicly available as a free resource at https://github.com/olly-styles/WorkBench.
--------------------------------------------------------------------------------------------------------
IntraMix: Intra-Class Mixup Generation for Accurate Labels and Neighbors
Graph neural networks struggle with insufficient high-quality labels and lack of neighborhoods in graphs. IntraMix tackles both issues through intra-class Mixup data augmentation, generating enhanced labels and establishing neighborhoods. As a model-agnostic framework, IntraMix could boost GNN performance across applications by addressing fundamental graph dataset challenges.
Authors: Shenghe Zheng, Hongzhi Wang, Xianglong Liu
Link: https://arxiv.org/abs/2405.00957v1
Date: 2024-05-02
Summary:
Graph Neural Networks (GNNs) demonstrate excellent performance on graphs, with their core idea about aggregating neighborhood information and learning from labels. However, the prevailing challenges in most graph datasets are twofold of Insufficient High-Quality Labels and Lack of Neighborhoods, resulting in weak GNNs. Existing data augmentation methods designed to address these two issues often tackle only one. They may either require extensive training of generators, rely on overly simplistic strategies, or demand substantial prior knowledge, leading to suboptimal generalization abilities. To simultaneously address both of these two challenges, we propose an elegant method called IntraMix. IntraMix innovatively employs Mixup among low-quality labeled data of the same class, generating high-quality labeled data at minimal cost. Additionally, it establishes neighborhoods for the generated data by connecting them with data from the same class with high confidence, thereby enriching the neighborhoods of graphs. IntraMix efficiently tackles both challenges faced by graphs and challenges the prior notion of the limited effectiveness of Mixup in node classification. IntraMix serves as a universal framework that can be readily applied to all GNNs. Extensive experiments demonstrate the effectiveness of IntraMix across various GNNs and datasets.
--------------------------------------------------------------------------------------------------------
Authors: Dimitrios Karkalousos, Ivana Išgum, Henk A. Marquering, Matthan W. A. Caan
Link: https://arxiv.org/abs/2404.19665v1
Date: 2024-04-30
Summary:
AI is revolutionizing MRI along the acquisition and processing chain. Advanced AI frameworks have been developed to apply AI in various successive tasks, such as image reconstruction, quantitative parameter map estimation, and image segmentation. Existing frameworks are often designed to perform tasks independently or are focused on specific models or datasets, limiting generalization. We introduce ATOMMIC, an open-source toolbox that streamlines AI applications for accelerated MRI reconstruction and analysis. ATOMMIC implements several tasks using DL networks and enables MultiTask Learning (MTL) to perform related tasks integrated, targeting generalization in the MRI domain. We first review the current state of AI frameworks for MRI through a comprehensive literature search and by parsing 12,479 GitHub repositories. We benchmark 25 DL models on eight publicly available datasets to present distinct applications of ATOMMIC on accelerated MRI reconstruction, image segmentation, quantitative parameter map estimation, and joint accelerated MRI reconstruction and image segmentation utilizing MTL. Our findings demonstrate that ATOMMIC is the only MTL framework with harmonized complex-valued and real-valued data support. Evaluations on single tasks show that physics-based models, which enforce data consistency by leveraging the physical properties of MRI, outperform other models in reconstructing highly accelerated acquisitions. Physics-based models that produce high reconstruction quality can accurately estimate quantitative parameter maps. When high-performing reconstruction models are combined with robust segmentation networks utilizing MTL, performance is improved in both tasks. ATOMMIC facilitates MRI reconstruction and analysis by standardizing workflows, enhancing data interoperability, integrating unique features like MTL, and effectively benchmarking DL models.
--------------------------------------------------------------------------------------------------------