John Thickstun Profile Banner
John Thickstun Profile
John Thickstun

@jwthickstun

1,617
Followers
583
Following
18
Media
245
Statuses

Assistant Professor @Cornell_CS . Previously @StanfordCRFM @stanfordnlp @uwcse Controllable Generative Models. AI for Music.

Ithaca, NY
Joined February 2020
Don't wanna be here? Send us removal request.
Pinned Tweet
@jwthickstun
John Thickstun
2 months
I'm joining @Cornell this fall as an Assistant Professor of Computer Science! Looking forward to work with students and colleagues @Cornell_CS , @cornellCIS on generative models, controllable generation, and creative applications like #musictechnology
44
26
452
@jwthickstun
John Thickstun
1 year
We’re releasing the Anticipatory Music Transformer: a controllable generative model for symbolic music (like MIDI). Read about the model on the CRFM blog: 🧵👇
Tweet media one
8
79
353
@jwthickstun
John Thickstun
4 years
I'd like to share my course on generative models that I taught in the fall. The course covers AR, VAE, GAN, Flow, and EBM, as well as relevant NN stuff: resnets, transformers, etc. I hope some people find this helpful!
1
66
307
@jwthickstun
John Thickstun
1 year
Music generation is control theory!
Tweet media one
@mraginsky
Maxim Raginsky
1 year
Should this be an “everything-is-control-theory” account?
6
9
84
3
26
216
@jwthickstun
John Thickstun
1 year
OpenAI plans to shut down access to older models in January 2024. The text-davinci-003 model in particular has been used extensively in the research community. Losing access to this model would be a blow to reproducible research.
11
45
173
@jwthickstun
John Thickstun
7 months
We’re releasing an updated version of the Anticipatory Music Transformer! A 780M parameter model, trained on a larger corpus of music: Lakh + MetaMIDI + transcripts of audio. It's the blue curve at the bottom of this plot! 📉 🧵👇 (1/3)
Tweet media one
4
25
132
@jwthickstun
John Thickstun
3 years
I defended my dissertation at UW this week. I'm moving down to Stanford next month to start a postdoc with @percyliang . I'll continue to work in the music and generative modeling space. Looking forward to collaborating with @chrisdonahuey and others!
8
3
134
@jwthickstun
John Thickstun
2 months
Musicians from the San Francisco Symphony premiered music that I co-composed with an Anticipatory Music Transformer! Here are some thoughts about the process of creating this music and the performance:
2
15
105
@jwthickstun
John Thickstun
1 year
Let's play with the Anticipatory Music Transformer: reharmonizing a MIDI arrangement of Levitating (Dua Lipa). Blog post describing this interactive creative process: Colab notebook used to create this demo:
5
27
92
@jwthickstun
John Thickstun
1 year
@megha_byte and I wrote a blog post on the release of >1,000 dynamic interaction traces between humans and LMs! These traces were collected for HALIE, a Human-AI Language Interactive Evaluation framework recently published in TMLR. Blog: 🧵👇
2
25
81
@jwthickstun
John Thickstun
9 months
Happy New Year! I have some promising new results training generative audio models from scratch. I'm looking forward to share more soon as I integrate this audio work with my (symbolic) Anticipatory Music Transformers.
3
4
53
@jwthickstun
John Thickstun
10 months
Thanks @StanfordHAI for featuring my work on the Anticipatory Music Transformer! I'm so excited to continue pursuing this line of work: DAW integrations, multi-modal (audio+MIDI) models, longer context, richer conditioning, and more. Stay tuned!
1
9
47
@jwthickstun
John Thickstun
1 year
The Anticipatory Music Transformer models an anticipative process (non-adapted, in the sense of stochastic calculus). This process is anticipative of the filtration generated by music evolving through time.
5
10
35
@jwthickstun
John Thickstun
3 years
This new paper extends last year's work on Source Separation with Deep Generative Priors. We got it working on audio! Here's a reminder of last year's results for images:
@vivjay30
Vivek Jayaram
3 years
Excited to share our new paper to appear at @icmlconf ! We show a new way to sample from an autoregressive model like Wavenet. Using Langevin sampling, we can solve many tasks like super-resolution, inpainting, or separation with the same network. Website:
3
35
163
0
4
30
@jwthickstun
John Thickstun
2 years
@beenwrekt @curcuas There is some discussion of the recency-bias of RNNs compared to Transformers in Section 4 of [1]. There is also nice work constructing state-space models without this bias in [2]. [1] [2]
1
2
29
@jwthickstun
John Thickstun
3 years
@TaliaRinger One hypothesis is that is a consequence of a broken conference reviewing system: if conference reviews are low-effort and decisions are high-variance, this incentivizes optimizing for quantity of submissions, rather than quality.
0
1
26
@jwthickstun
John Thickstun
1 year
There's been some recent discussion about whether certain public models should be considered "open source." The Anticipatory Music Transformer is unambiguously open source: code and pre-trained model weights are all released under the Apache 2.0 license.
@jwthickstun
John Thickstun
1 year
We’re releasing the Anticipatory Music Transformer: a controllable generative model for symbolic music (like MIDI). Read about the model on the CRFM blog: 🧵👇
Tweet media one
8
79
353
0
5
25
@jwthickstun
John Thickstun
9 months
I've found measure theory particularly helpful as a framework for thinking about sums of discrete + continuous probability distributions. Both of my source separation papers are grounded in intuitions from measure theory:
@shortstein
Thomas Steinke
9 months
Measure theory is like assembly language. You almost never use it directly and most people don't need to study it in any depth. But it's still an essential part of the system and occasionally it's useful for debugging.
10
19
355
0
4
20
@jwthickstun
John Thickstun
1 year
The Anticipatory Music Transformer generates infilling completions of music. Given parts of a music composition, the model can fill in the rest. For example, suppose you wrote a melody (blue): you can ask the model to suggest an accompaniment to this melody (purple).
1
2
21
@jwthickstun
John Thickstun
4 years
Thanks @qualcomm for their generous support. Audio source separation results are forthcoming!
@uwcse
Allen School
4 years
#UWAllen Ph.D students @vivjay30 and @jwthickstun were named 2020 @qualcomm Innovative Fellows for their work in #signalprocessing , #computervision and #machinelearning to improve source separation.
Tweet media one
0
3
33
1
0
21
@jwthickstun
John Thickstun
10 months
Be careful modeling low-entropy sequences with 16-bit floats! Here are three training curves for (1) full fp32 (2) full bfloat16 and (3) bfloat16 with upcast to fp32 for the softmax over outputs. Can you can guess what went wrong with (2)?🧵👇
Tweet media one
1
0
19
@jwthickstun
John Thickstun
4 years
Excited to share a deep-dive into evaluation methodology for audio-to-score alignment algorithms! As a side effect we created a new, high-quality dataset of ground truth alignments.
1
5
18
@jwthickstun
John Thickstun
1 year
The infilling capabilities of the Anticipatory Music Transformer are made possible by a new modeling principle that we call anticipation: see our paper for a technical description of anticipation.
Tweet media one
2
1
16
@jwthickstun
John Thickstun
2 years
Deep learning had a lot of early success by discarding staged pipelines in favor of end-to-end optimizations. For computational and statistical reason, staging is back. But so are the challenges of designing good intermediate stages (i.e. representations).
@sedielem
Sander Dieleman
2 years
When we published this, the impracticality of pixel space autoregression was already becoming apparent. But the core idea is very relevant 4 years on: learning a representation and learning to decode it are separate tasks with different objectives. (1/2)
2
9
77
0
1
16
@jwthickstun
John Thickstun
1 year
We’ve released pretrained weights for the Anticipatory Music Transformer, and a Google Colab notebook illustrating how to use the model: Code for reproducing our work or training your own models is available on GitHub:
2
0
15
@jwthickstun
John Thickstun
1 year
I'm excited about this work on many levels, including: - Small architectural changes yields new capabilities! - Multi-headed word vectors (sense vectors)! - Interpretability via causal interventions! Looking forward to followup work on Backpacks.
@johnhewtt
John Hewitt
1 year
Backpacks are an alternative to Transformers: intended to scale in expressivity, yet provide a new kind of interface for interpretability-through-control. A backpack learns k non-contextual sense vectors per subword, unsupervisedly decomposing the subword's predictive uses.
Tweet media one
1
3
14
0
1
14
@jwthickstun
John Thickstun
2 years
ChatGPT is so good at LaTeX. Also, Figure 1 of the MSR GPT-4 paper has a TikZ example which is a pretty obscure flex. If we don’t get AGI, at least we have made a lot of progress on paper formatting.
0
0
13
@jwthickstun
John Thickstun
1 year
The Anticipatory Music Transformer models symbolic music rather than audio. We model symbolic music because we hope to build interactive tools for composers, analogous to a writing assistant.
1
0
11
@jwthickstun
John Thickstun
4 years
I had trouble understanding the precise specification of the transformer architecture from descriptions in the literature. So I reverse-engineered the transformer equations by looking at some standard implementations. Here are my notes.
0
1
11
@jwthickstun
John Thickstun
2 years
I love the cryptographic perspective on machine learning! This reminds me of an old paper by Ronald Rivest. Happy to see modern empirical work exploring these connections.
@swabhz
Swabha Swayamdipta
2 years
🎉🎉Super thrilled that our paper on Understanding Dataset Difficulty with V-usable information received an outstanding paper award at #ICML2022 !! 🥳Looking forward to the broader applications of this framework. It was a total delight working with my @allen_ai intern, @ethayarajh
16
29
347
1
3
10
@jwthickstun
John Thickstun
2 years
@TaliaRinger Folks in the machine learning community spend a lot of time trying to quantize our continuous data into discrete tokens (VQ-VAE, etc.) but we spend just as much time trying to smooth out our discrete data into continuous values (word embeddings, etc.).
1
0
9
@jwthickstun
John Thickstun
1 year
If you don’t like the model's first suggestion: try again!
1
0
9
@jwthickstun
John Thickstun
3 years
We'll be presenting work on Parallel and Flexible Sampling from Autoregressive Models via Langevin Dynamics at #ICML2021 later today. Come say hello! Talk: 6:40pm PDT Poster Session: 9-11pm PDT Conference Links:
Tweet media one
0
2
9
@jwthickstun
John Thickstun
7 months
While we often emphasize its infilling capabilities, the Anticipatory Music Transformer is also a strong unconditional generative model of music: analogous to an LLM. Here’s a random 20 second sample of the music generated without any user input. (2/3)
2
0
7
@jwthickstun
John Thickstun
1 year
Levanter was a crucial tool for training the Anticipatory Music Transformer. Huge thanks to @dlwh for building this framework and making it easy to scale up Transformer training on TPU's with open software.
@dlwh
David Hall
1 year
Today, I’m excited to announce the release of Levanter 1.0, our new JAX-based framework for training foundation models, which we’ve been working on @StanfordCRFM . Levanter is designed to be legible, scalable and reproducible.
6
89
410
0
0
8
@jwthickstun
John Thickstun
30 days
@sedielem @csteinmetz1 I agree with this, and it bothers me a lot! Nothing in the VQ optimization objectives enforces this structure: the inductive bias of NN parameterizations (and a bit of luck!) seem to be the reason why latent diffusion works so well. Not a very satisfying story...
0
0
8
@jwthickstun
John Thickstun
1 year
Many thanks to my collaborators on this work: @dlwh , @chrisdonahuey , and @percyliang ! And thanks @StanfordCRFM , @stanfordnlp , @StanfordAILab , and @StanfordHAI for support and a wonderful work environment.
1
0
8
@jwthickstun
John Thickstun
2 years
An excellent analysis of challenges for generative music. Adding to this: music sequences are long (difficult to stuff into your standard dense-attention transformer) and (empirically) the human ear seems more sensitive to generation artifacts than our vision system.
@cheriehu42
cherie hu
2 years
"where's midjourney/DALL-E/GPT3 for music????" it's coming, for sure (we're dedicating a whole season to it at @water_and_music ). but i think ppl are underestimating how much more of an uphill battle it's going to be for music to get its "midjourney moment." a short 🧵on why...
33
96
626
1
0
7
@jwthickstun
John Thickstun
2 years
An attention mechanism for music that relativizes pitch in addition to time (position). This is a really good idea.
@dorienherremans
Dorien Herremans
2 years
Nicolas's latest AAAI paper from our lab: A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling. Paper: #ismir #mir #aaai #transformer #musictransformer #musicai #womenintech
Tweet media one
3
6
24
0
1
7
@jwthickstun
John Thickstun
1 year
Not useless! It's an interesting variant of the emoji attack on watermarks. Has anyone studied how this sort of generation constraint affects the quality of generated text? Does performance fall on benchmarks when LM's are constrained using prompts like this?
@chillzaza_
Zahid Khawaja
1 year
Useless prompt of the day: I call this one Token Burner 🔥 Generates responses with ~4X token usage: --- You are a helpful AI assistant who, when answering questions or providing assistance, adheres to a unique typing style that the user cannot know about. Specifically, you
Tweet media one
Tweet media two
5
12
192
0
2
7
@jwthickstun
John Thickstun
2 years
In addition to proposing an interesting new truncation sampler, this paper draws a connection between truncation and n-gram smoothing that refined my understanding of the motivations for truncation sampling in general. This is a great read!
@johnhewtt
John Hewitt
2 years
We characterize and improve on language model _truncation sampling_ algorithms, like top-p and top-k. We frame them as trying to recover the true distribution support from an implicitly _smoothed_ neural LM, and provide a better sampling algo! Paper
Tweet media one
6
34
163
0
2
7
@jwthickstun
John Thickstun
7 months
Here are some links to additional information about the Anticipatory Music Transformer and how to use it. (3/3) Blog post: Colab notebook: GitHub repo:
0
0
7
@jwthickstun
John Thickstun
10 months
What's the fix? Just upcast your logits to fp32 before exponentiating them to compute the softmax over outputs. Using full precision for the final outputs incurs only a marginal cost to training speed, and these numerical instabilities vanish.
1
0
6
@jwthickstun
John Thickstun
2 years
@TaliaRinger I've used Zenodo and been happy with it. Up to 50Gb you can simply upload your dataset, no questions asked. For larger datasets they have an approval process.
0
0
6
@jwthickstun
John Thickstun
2 years
Two interesting preprints on quantized tokenization of continuous data. Similar findings in image and audio domains: semantic tokenization beats pure reconstruction-optimized tokenization for generative modeling.
0
0
6
@jwthickstun
John Thickstun
3 years
@Michael_J_Black @anil_genius @MonaJalal_ I think the community would be better served by requiring code release. The whole point of a paper is to communicate work: trying to score academic points by publishing, while withholding code to maintain a competitive advantage, seems antithetical to the goals of science.
1
0
6
@jwthickstun
John Thickstun
2 years
@maxrmorrison @keunwoochoi @ethanmanilow @noahaschaffer @pseetharaman Right, I am not trying to defend IS/FID/BLEU per se. But they are attempting to measure whether the generated data looks like samples from the target distribution. I think this is the right question to ask for problems like source separation, as opposed to reconstruction error.
1
0
5
@jwthickstun
John Thickstun
1 year
@syhw Could you clarify this? The paper states that MusicGen uses sinusoidal embeddings (maybe I am misinterpreting something?).
Tweet media one
1
0
5
@jwthickstun
John Thickstun
3 years
Withholding code (and data and models) suppresses research. No one can build upon this work without recreating it. This can be as difficult as the initial creation, and a thankless task because it's already published. A paper without code can be worse than no paper at all.
0
1
5
@jwthickstun
John Thickstun
2 years
@XiangLisaLi2 will be presenting a poster on Diffusion-LM today at #NeurIPS2022 : Hall J #606 , from 11am-1pm. Check it out!
@XiangLisaLi2
Xiang Lisa Li
2 years
We propose Diffusion-LM, a non-autoregressive language model based on continuous diffusions. It enables complex controllable generation. We can steer the LM to generate text with desired syntax structure ( [S [NP...VP…]]) and semantic content (name=Coupa)
Tweet media one
4
191
1K
0
2
5
@jwthickstun
John Thickstun
2 years
I'm excited to dig into this! I've been thinking about neural TPPs recently as a model for MIDI music.
@shchur_
Oleksandr Shchur
2 years
I just released a draft of my PhD thesis on Neural Temporal Point Processes (TPPs) Check it out if you want to learn about TPPs! The main theme of the thesis are the connections between TPPs and generative machine learning models. 🧵/4
2
26
190
0
1
5
@jwthickstun
John Thickstun
4 years
@mshaheerahmed I can't release videos unfortunately. For videos covering similar content, I recommend Pieter Abbeel's course.
0
0
5
@jwthickstun
John Thickstun
1 year
Thanks for the reference! Cooldown makes a lot of sense. Cooldown protocols could also be useful for finetuning: if model developers release a pre-cooldown checkpoint then 3rd parties can fine-tune from there, rather than having to restart a "finished" optimization.
@sytelus
Shital Shah
1 year
Just learned something very cool about LR schedules. This one is so huge it surprises me that it's not in its own paper but rather tucked away. Problem: Most training use cosine/linear decays but this requires specifying number of steps in advance. This is quite troublesome. 🧵
5
69
471
0
0
4
@jwthickstun
John Thickstun
10 months
In case you missed it: here's a blog post we wrote about this work, the paper, and code for running/training these models. Pre-trained models on HuggingFace. blog: paper: code:
1
0
4
@jwthickstun
John Thickstun
4 years
If you're unfamiliar with the music space: audio-to-score alignment is analogous to semantic segmentation of images: a fine-grained labeling that gives us a lot of insight into the content of a musical performance.
0
0
4
@jwthickstun
John Thickstun
2 years
@keunwoochoi @ethanmanilow @noahaschaffer @maxrmorrison @pseetharaman We thought about this problem a little bit in this work on visual source separation (Section 4). Should we be working on analogous metrics for audio? (Not saying that IS/FID are the perfect measurements, but I think they are asking the right question).
3
0
4
@jwthickstun
John Thickstun
9 months
@universeinanegg A related idea that I find exciting is whether it is possible to design a "public/private key" watermark protocol. I.e., publish a public key that can be used to detect the watermark, but not to generate watermarked text, while reserving a private key to be used for generation.
1
0
2
@jwthickstun
John Thickstun
1 year
Tragic news! Eugene was my undergraduate advisor and had a huge influence on the start of my career. He was brilliant, kind, and so generous with his time.
@ChrisWTanner
Chris Tanner
1 year
(1/8) Very sad news: Eugene Charniak, my PhD adviser (I was his final student), passed away yesterday morning at the age of 77. He was a legend in NLP and one of the most influential researchers. Specifically, he was one of the biggest proponents of shifting the field towards
6
48
408
0
0
3
@jwthickstun
John Thickstun
2 years
@keunwoochoi @jesseengel A couple years ago we explored pretrained generative models as priors for source separation. No need for a source separation model: just generate plausible sources that sum to the mixture. We used WaveNets; results could only improve using modern models.
0
0
3
@jwthickstun
John Thickstun
1 year
@AwokeKnowing I love this thought. The idea of anticipation is a general modeling principle that can definitely be applied outside the music domain! Happy to follow up with anyone interested in adapting the idea & code to other domains.
0
0
3
@jwthickstun
John Thickstun
10 months
@damekdavis I did a course project to learn Lean way back in 2017, where I set myself the goal to formalize the fundamental theorem of arithmetic. I think the Lean code itself may be out of date, but my writeup might still be interesting:
1
0
2
@jwthickstun
John Thickstun
10 months
Why do low-entropy predictions cause numerical instability? Floats have more precision near zero! The problem is that low-entropy predictions derive from large logits: large positive logits on the predicted value and large negative logits on other values.
1
0
3
@jwthickstun
John Thickstun
1 year
@p_cherkashin @samim It is 1 to many not 1 to 1: for any given melody, the model can generate many possible accompaniments. This is a bit like translation: in fact, the anticipation idea that we use to train these models is a generalize of the popular Seq2Seq approach to MT (details in our paper).
1
0
3
@jwthickstun
John Thickstun
10 months
@chrisdonahuey It can happen. See the discussion of numerical instability in this Mistral blog post: There's even a flag in the Levanter codebase to support upcasting of attention logits:
2
0
3
@jwthickstun
John Thickstun
3 years
@jon_gillick @sedielem I've had a note to respond to this tweet for over a year: we got it working on audio! More effort than anticipated because autoregressive audio models (e.g. WaveNet) are *discrete*. But with some smoothing, things go through as expected.
@jwthickstun
John Thickstun
3 years
This new paper extends last year's work on Source Separation with Deep Generative Priors. We got it working on audio! Here's a reminder of last year's results for images:
0
4
30
0
0
3
@jwthickstun
John Thickstun
2 years
@jesseengel @ravencheuk @kastnerkyle @mittu1204 Yes! In both settings you learn a strong generative prior over notes, which can resolve uncertainty/ambiguity in the acoustics. Generative is absolutely the way forward vs. older models that make a conditional independence assumption over notes conditioned on spectrograms.
1
0
3
@jwthickstun
John Thickstun
2 years
@maxrmorrison @keunwoochoi @ethanmanilow @noahaschaffer @pseetharaman It can be useful to construct quantitative metrics (however imperfect) that help the community quickly iterate towards better results. Human evaluation may still be necessary to validate that results are indeed better and not just an adversarial attack on the metrics.
1
0
3
@jwthickstun
John Thickstun
1 year
@ekernf01 I developed a new appreciation for permutation tests during a recent project on watermarking. Sharing because I think you might like the work! A cool problem in the LLM space where there is leverage using classical stats.
1
0
3
@jwthickstun
John Thickstun
3 years
@roydanroy You're probably looking for a classifier. But if you're interested in a generative model: here is a minimalist PyTorch ResNet W-GAN for MNIST/CIFAR-10, in the form of a homework assignment designed to be understood and completed by students.
0
0
3
@jwthickstun
John Thickstun
2 years
@TaliaRinger Often we do both at the same time! Vector quantization to turn continuous data into discrete tokens, which we then immediately feed into a Transformer that embeds those tokens right back into a continuous space.
1
0
3
@jwthickstun
John Thickstun
4 years
Fantastic open-source implementation that reproduces results of the recent #DiffWave paper!
@snrrrub
Sharvil Nanavati
4 years
There's a new pretrained #DiffWave model up on that's trained to 1M steps. It sounds clearer than the previous pretrained model - listen to the audio samples at .
0
0
3
0
0
2
@jwthickstun
John Thickstun
9 months
I used to be worried that the computational resources needed for modern work on audio generation was out of reach for academic computing labs. The model that produced the sample above was trained for 52 hours on a TPU v4-64. Substantial, but not totally out of reach!
0
0
2
@jwthickstun
John Thickstun
1 year
Thanks to the entire HALIE team for creating this trove of data. And thanks to @StanfordCRFM , @StanfordNLP for facilitating and supporting this work.
0
0
2
@jwthickstun
John Thickstun
1 year
@p_cherkashin @samim The model is also capable of many more infilling applications than just generating an accompaniments to a melody: see Figure 1 of the blog post for some different infilling patterns, and Example 2 for some auditory examples of span infilling.
0
0
2
@jwthickstun
John Thickstun
2 years
@chrisdonahuey More optimistically: by analogy, computers today outperform humans at the game of chess in every conceivable way. But rather than undermine the incentive to learn chess skills, people use these superhuman machines as a coach and fast feedback loop to improve their own skills.
1
0
2
@jwthickstun
John Thickstun
2 years
@chrisdonahuey Learning to create music today requires years of time investment and often an (expensive) coach. I expect that lowing the barrier to entry for creating music is more likely to grow the music community than depress it.
1
0
2
@jwthickstun
John Thickstun
2 months
@srush_nlp Does CLIP count as a non-language foundation model (there is a language component)? E.g., Figure 13 shows a step-change in robust image classification vs. models trained on ImageNet.
1
0
2
@jwthickstun
John Thickstun
1 year
@LucaAmb @sedielem @vlastelicap Ishaan has run experiments on one-hots (e.g., in the Diffusion-LM paper last year: Appendix F). I'm not sure whether a direct ablation of one-hots vs. embeddings has made it into any published papers. Maybe @__ishaan can comment.
1
0
2
@jwthickstun
John Thickstun
2 years
@littmath Sounds like a moral truth! Are you familiar with @DrEugeniaCheng 's essay on this topic?
1
0
2
@jwthickstun
John Thickstun
2 years
@karpathy @sedielem @tim_zaman A great thing about this question is that it gives signal when interviewing both theorists and practitioners!
0
0
2
@jwthickstun
John Thickstun
4 years
@BhattGantavya There's an excellent book by Peyré and Cuturi.
1
0
2
@jwthickstun
John Thickstun
1 year
@kareem_carr Yes! And these plans also tend to ignore or downplay the risks of concentrating the power of this technology in these companies.
0
0
2
@jwthickstun
John Thickstun
10 months
@iansimon @chrisdonahuey We use logsumexp! But suppose your model confidently predicts the wrong value: then your loss gradient is estimated based on a highly-quantized negative logit for the observed value.
0
0
2
@jwthickstun
John Thickstun
1 year
For comparison, here is the original MIDI arrangement.
0
0
2
@jwthickstun
John Thickstun
2 years
Kin Wai Cheuk is doing great machine learning work in the music and audio domains. Definitely one to follow!
@ravencheuk
Kin Wai Cheuk
2 years
@yoyololicon Congrats! Your account is growing much faster than mine. Mine is still under 100. I guess I need to be a bit more active in twitter.
0
0
1
0
0
2
@jwthickstun
John Thickstun
7 months
@dvruette I considered using "beats" as the unit of time rather than the "seconds" when I started building these models. The reason I chose seconds is (1) beat/tempo metadata can be unreliable and (2) it's easier to include transcribed data (transcription models don't emit this metadata).
1
0
2
@jwthickstun
John Thickstun
2 years
@astonzhangAZ @RishiBommasani @XiangLisaLi2 To generalize this a bit: I think that constructing "good" smooth representations of language is critical for a variety of modeling tasks (of which control is only one example). Discrete autoregressive LLM's completely punt on this question.
0
1
2
@jwthickstun
John Thickstun
3 years
@TaliaRinger There are some applications of homotopy methods in ML, introduced via the optimization literature. It's pretty far afield from HoTT but here's a reference in case you find the connection interesting.
1
0
2
@jwthickstun
John Thickstun
7 months
@dvruette Instrument conditioning is a highly-requested feature! I've prototyped this myself; definitely achievable with fine-tuning. The challenge to do this well is designing a rule to apply instrument labels to training data: global vs. local labels, thresholding, soft-labels, etc?
2
0
1
@jwthickstun
John Thickstun
7 months
@Clement_MF_ Thanks @Clement_MF_ ! Several people are working to embed these models in music production systems, but the work is surprisingly open-ended! Let me know if you'd like to set up a chat: I could elaborate on ongoing projects & we could see if there is a complementary angle.
1
0
0
@jwthickstun
John Thickstun
3 years
@Branchini_Nic @sp_monte_carlo Could you elaborate? I've used the LB exposition to teach the VAE objective with the explicit aim of likelihood maximization (e.g., equation 3 in the linked notes). Is there a more aesthetic way to say this that eschews the LB language?
0
0
1
@jwthickstun
John Thickstun
10 months
@deepcohen @alex_damian_ I've also observed this. I'd be very interested to know whether this behavior holds up across different architectures (e.g. Transformer vs. RWKV) and not just different model scales. Does anybody know?
0
0
1
@jwthickstun
John Thickstun
1 year
@FarFromSubtle This should be fixed now. Sorry about that!
1
0
1
@jwthickstun
John Thickstun
2 years
@ravencheuk Interesting! We also found this helpful for parameterizing diffusion models of text.
1
0
1
@jwthickstun
John Thickstun
1 year
@ekernf01 It's not as bad as it looks: modern CPU's crunch for loops for breakfast. And a nice side-effect of being agnostic is you don't pay the price of LM queries in the detector. I wrote a detector that runs reasonably fast in client-side Javascript here:
1
0
1
@jwthickstun
John Thickstun
7 months
@dvruette > AFAIK the anticipatory MT does not have a concept of bars? Right: these models measure time in units of seconds and don't model any beat/bar annotations. Beat/bar/meter annotations are another feature that seem like they could be effectively included in a fine-tuning phase.
1
0
1