Publications
Mode Combinability: Exploring Convex Combinations of Permutation Aligned Models
Neural Networks, Elsevier
23 February 2024
We explore element-wise convex combinations of two permutation-aligned neural network parameter vectors $\Theta_A$ and $\Theta_B$ of size $d$. We conduct extensive experiments by examining various distributions of such model combinations parametrized by elements of the hypercube and its vicinity. Our findings reveal that broad regions of the hypercube form surfaces of low loss values, indicating that the notion of linear mode connectivity extends to a more general phenomenon which we call mode combinability.
Training BERT Models to Carry Over a Coding System Developed on One Corpus to Another
LREC-COLING
20-25 May 2024
This paper describes how we train BERT models to carry over a coding system developed on the paragraphs of a Hungarian literary journal to another. The aim of the coding system is to track trends in the perception of literary translation around the political transformation in 1989 in Hungary. To evaluate not only task performance but also the consistence of the annotation, moreover, to get better predictions from an ensemble, we use 10-fold crossvalidation.
The density of planar sets avoiding unit distances
Mathematical Programming, Springer
06 October 2023
By improving upon previous estimates on a problem posed by L. Moser, we prove a conjecture of Erdős that the density of any measurable planar set avoiding unit distances is less than 1/4. Our argument implies the upper bound of 0.2470.
Lemmas: Generation, Selection, Application
TABLEAUX 2023
14 September 2023
Noting that lemmas are a key feature of mathematics, we engage in an investigation of the role of lemmas in automated theorem proving. The paper describes experiments with a combined system involving learning technology that generates useful lemmas for automated theorem provers, demonstrating improvement for several representative systems and solving a hard problem not solved by any system for twenty years. By focusing on condensed detachment problems we simplify the setting considerably, allowing us to get at the essence of lemmas and their role in proof search.
Piercing the chessboard
SIAM Journal on Discrete Mathematics
13 July 2023
We consider the minimum number of lines $h_n$ and $p_n$ needed to intersect or pierce, respectively, all the cells of the $n \times n$ chessboard. Determining these values can also be interpreted as a strengthening of the classical plank problem for integer points. Using the symmetric plank theorem of K. Ball, we prove that $h = \lceil \frac{n}{2} \rceil$ for each $n \leq 1$. Studying the piercing problem, we show that $0.
On higher order Fourier analysis in characteristic p
Ergodic theory and dynamical systems, 2023
27 January 2023
In this paper, the nilspace approach to higher-order Fourier analysis is developed in the setting of vector spaces over a prime field $\mathbb{F}_p$, with applications mainly in ergodic theory. A key requisite for this development is to identify a class of nilspaces adequate for this setting. We introduce such a class, whose members we call $p$-homogeneous nilspaces. One of our main results characterizes these objects in terms of a simple algebraic property.
Negative Sampling in Variational Autoencoders
IEEE 2nd Conference on Information Technology and Data Science (CITDS), 2022
16-18 May 2022
Modern deep artificial neural networks have achieved great success in the domain of computer vision and beyond. However, their application to many real-world tasks is undermined by certain limitations, such as overconfident uncertainty estimates on out-of-distribution data or performance deterioration under data distribution shifts. Several types of deep learning models used for density estimation through probabilistic generative modeling have been shown to fail to detect out-of-distribution samples by assigning higher likelihoods to anomalous data.
Optimal transport with f-divergence regularization and generalized Sinkhorn algorithm
AISTATS 2022
28-30 March 2022
Entropic regularization provides a generalization of the original optimal transport problem. It introduces a penalty term defined by the Kullback-Leibler divergence, making the problem more tractable via the celebrated Sinkhorn algorithm. Replacing the Kullback-Leibler divergence with a general $f$-divergence leads to a natural generalization. The case of divergences defined by superlinear functions was recently studied by Di Marino and Gerolin. Using convex analysis, we extend the theory developed so far to include all $f$-divergences defined by functions of Legendre type, and prove that under some mild conditions, strong duality holds, optimums in both the primal and dual problems are attained, the generalization of the $c$-transform is well-defined, and we give sufficient conditions for the generalized Sinkhorn algorithm to converge to an optimal solution.
Similarity and Matching of Neural Network Representations
NeurIPS, 2021
6-14 December 2021
We employ a toolset — dubbed Dr. Frankenstein — to analyse the similarity of representations in deep neural networks. With this toolset we aim to match the activations on given layers of two trained neural networks by joining them with a stitching layer. We demonstrate that the inner representations emerging in deep convolutional neural networks with the same architecture but different initialisations can be matched with a surprisingly high degree of accuracy even with a single, affine stitching layer.
The Role of Entropy in Guiding a Connection Prover
TABLEAUX 2021
6-9 September 2021
In this work we study how to learn good algorithms for selecting reasoning steps in theorem proving. We explore this in the connection tableau calculus implemented by leanCoP where the partial tableau provides a clean and compact notion of a state to which a limited number of inferences can be applied. We start by incorporating a state-of-the-art learning algorithm — a graph neural network (GNN) – into the plCoP theorem prover.
Towards Finding Longer Proofs
TABLEAUX 2021
6-9 September 2021
In this work we study how to learn good algorithms for selecting reasoning steps in theorem proving. We explore this in the connection tableau calculus implemented by leanCoP where the partial tableau provides a clean and compact notion of a state to which a limited number of inferences can be applied. We start by incorporating a state-of-the-art learning algorithm — a graph neural network (GNN) – into the plCoP theorem prover.
On measure-preserving Fp-omega-systems of order k
Journal d'Analyse Mathématique, 2024
5-11 September 2021
Building on previous work in the nilspace-theoretic approach to the study of Host–Kra factors of measure-preserving systems, we prove that every ergodic $\mathbb{F}_p^\omega$-system of order $k$ is a factor of an Abramov $\mathbb{F}_p^\omega$-system of order $k$. This answers a question of Jamneshan, Shalom and Tao.
Ordering Subgoals in a Backward Chaining Prover
Proceedings of The 6th Conference on Artificial Intelligence and Theorem Proving (AITP), 2021
5-11 September 2021
Many automated theorem provers are based on backward chaining: reasoning starts from a goal statement which we aim to prove and each inference step reduces one of the goals to a (possibly empty) set of new subgoals. We thus maintain a set of open goals that need to be proven and the proof is complete once the open goal set becomes empty. For each goal, there can be several valid inferences, resulting in different successor goal sets and selecting the right inference constitutes the core problem of such theorem provers which has been thoroughly studied in the past half century.
Ordering Subgoals in a Backward Chaining Prover
Proceedings of The 6th Conference on Artificial Intelligence and Theorem Proving (AITP), 2021
5-11 September 2021
World models represent the basic mechanisms of a system and can provide predictions about how transformations (actions) affect the state of the system. Such models have recently gained attention in Reinforcement Learning (RL) and in several domains model based learning systems performed similarly or better than highly tuned model free variants [1, 8, 12]. World models can increase sample efficiency since trajectories can be generated without interacting with the environment, and they can aid exploration by yielding a semantically meaningful latent structure that allows for identifying promising directions.
A Refinement of Cauchy-Schwarz Complexity, with Applications
European journal of combinatorics, 2022
24 August 2021
We introduce a notion of complexity for systems of linear forms called \emph{sequential Cauchy-Schwarz complexity}, which is parametrized by two positive integers $k,\ell$ and refines the notion of Cauchy-Schwarz complexity introduced by Green and Tao. We prove that if a system of linear forms has sequential Cauchy-Schwarz complexity at most $(k,\ell)$ then any average of 1-bounded functions over this system is controlled by the $2^{1-\ell}$-th power of the Gowers $U^{k+1}$-norms of the functions.
Towards solving the 7-in-a-row game
IEEE Conference on Games (CoG), 2021
17-20 August 2021
Our paper explores the game theoretic value of the 7-in-a-row game. We reduce the problem to solving a finite board game, which we target using Proof Number Search. We present a number of heuristic improvements to Proof Number Search and examine their effect within the context of this particular game. Although our paper does not solve the 7-in-a-row game, our experiments indicate that we have made significant progress towards it.
Moreau-Yosida f-divergences
International Conference on Machine Learning, 2021
18-24 Jul 2021
Variational representations of $f$-divergences are central to many machine learning algorithms, with Lipschitz constrained variants recently gaining attention. Inspired by this, we define the Moreau-Yosida approximation of $f$-divergences with respect to the Wasserstein-$1$ metric. The corresponding variational formulas provide a generalization of a number of recent results, novel special cases of interest and a relaxation of the hard Lipschitz constraint. Additionally, we prove that the so-called tight variational representation of $f$-divergences can be to be taken over the quotient space of Lipschitz functions, and give a characterization of functions achieving the supremum in the variational representation.