Make a 909 kick on the Make Noise 0-Coast, and more drum modeling fun

Forget even all the music that was made with it for a second. How the sound of a TR-909 kick was made can open new doors.

The post Make a 909 kick on the Make Noise 0-Coast, and more drum modeling fun appeared first on CDM Create Digital Music.

AI upscaling makes this Lumiere Bros film look new – and you can use the same technique

A.I.! Good gawd y’all – what is it good for? Absolutely … upscaling, actually. Some of machine learning’s powers may prove to be simple but transformative.

And in fact, this “enhance” feature we always imagined from sci-fi becomes real. Just watch as a pioneering Lumiere Brothers film is transformed so it seems like something shot with money from the Polish government and screened at a big arty film festival, not 1896. It’s spooky.

It’s the work of Denis Shiryaev. (If you speak Russian, you can also follow his Telegram channel.) Here’s the original source, which isn’t necessarily even a perfect archive:

It’s easy to see the possibilities here – this is a dream both for archivists and people wanting to economically and creatively push the boundaries of high-framerate and slow-motion footage. What’s remarkable is that there’s a workflow here you might use on your own computer.

And while there are legitimate fears of AI in black boxes controlled by states and large corporations, here the results are either open source or available commercially. There are two tools here.

Enlarging photos and videos is the work of a commercial tool, which promises 600% scaling improvements “while preserving image quality.”

It’s US$99.99, which seems well worth it for the quality payoff. (More for commercial licenses. There’s also a free trial available.) Uniquely, that tool also is optimized for Intel Core processors with Iris Plus, so you don’t need to fire up a specific GPU like the NVIDIA. They don’t say a lot about how it works, other than it’s a deep learning neural network.

We can guess, though. The trick is that machine learning trains on existing data of high-res images to allow mathematical prediction on lower-resolution images. There’s been copious documentation of AI-powered upscaling, and why it works mathematically better than traditional interpolation algorithms. (This video is an example.) Many of those used GANs (generative adverserial networks), though, and I think it’s a safe bet that Gigapixel is closer to this (also slightly implied by the language Gigapixel uses):

Deep learning based super resolution, without using a GAN [Towards data science]

Some more expert data scientists may be able to fill in details, but at least that article would get you started if you’re curious to roll your own solution for a custom solution. (Unless you’re handy with Intel optimization, it’s worth the hundred bucks, but for those of you who are advanced coders and data scientists, knock yourself out.)

The quality of motion may be just as important, and that side of this example is free. To increase the framerate, they employ a technique developed by an academic-private partnership (Google, University of California Merced, and Shanghai’s Jiao Tong University):

Depth-Aware Video Frame Interpolation

Short version – you combine some good old-fashioned optical flow prediction together with convolutional neural networks, and then use a depth map so that big objects moving through the frame don’t totally screw up the processing.

Result – freakin’ awesome slow mo go karts, that’s what! Go, math!

This also illustrates that automation isn’t necessarily the enemy. Remember watching huge lists of low-wage animators scroll past at the end of movies? That might well be something you want to automate (in-betweening) in favor of more-skilled design. Watch this:

A lot of the public misperception of AI is that it will make the animated movie, because technology is “always getting better” (which rather confuses Moore’s Law and the human brain – not related). It may be more accurate to say that these processes will excel at pushing the boundaries of some of our tech (like CCD sensors, which eventually run into the laws of physics). And they may well automate processes that were rote work to begin with, like in-betweening frames of animation, which is a tedious task that was already getting pushed to cheap labor markets.

I don’t want to wade into that, necessarily – animation isn’t my field, let alone labor practices. But suffice to say even a quick Google search will quickly come up with stories like this article on Filipino animators and low wages and poor conditions. Of course, the bad news is, just as those workers collectivize, AI could automate their job away entirely. But it might also mean a Filipino animation company would face a level playing field using this software with the companies that once hired them, only now with the ability to do actual creative work.

Anyway, that’s only animation; you can’t outsource your crappy video and photos, so it’s a moot point there.

Another common misconception – perhaps one even shared by some sloppy programmers – is that processes improve the more computational resources you throw at them. That’s not necessarily the case – objectively even not always the case. In any event, the fact that these work now, and in ways that are pleasing to the eye, means you don’t have to mess with ill-informed hypothetical futures.

I spotted this on the VJ Union Facebook group, where Sean Caruso suggests this workflow: since you can only use Topaz on sequences of images, you can import into After Effects and go on and use Twixtor Pro to double framerate, too. Of course, coders and people handy with tools like ffmpeg won’t need the Adobe subscription. (ffmpeg, not so much? There’s a CDM story for that, with some useful comment thread, too.)

Having blabbered on like this, I’m sure someone can now say something more intelligent or something I’ve missed – which I would welcome, fire away!

Now if you’ll excuse me, I want to escape to that 1896 train platform again. Ahhhh…

The post AI upscaling makes this Lumiere Bros film look new – and you can use the same technique appeared first on CDM Create Digital Music.

In Adversarial Feelings, Lorem explores AI’s emotional undercurrents

In glitching collisions of faces, percussive bolts of lightning, Lorem has ripped open machine learning’s generative powers in a new audiovisual work. Here’s the artist on what he’s doing, as he’s about to join a new inquisitive club series in Berlin.

Machine learning that derives gestures from System Exclusive MIDI data … surprising spectacles of unnatural adversarial neural nets … Lorem’s latest AV work has it all.

And by pairing producer Francesco D’Abbraccio with a team of creators across media, it brings together a serious think tank of artist-engineers pushing machine learning and neural nets to new places. The project, as he describes it:

Lorem is a music-driven mutidisciplinary project working with neural networks and AI systems to produce sounds, visuals and texts. In the last three years I had the opportunity to collaborate with AI artists (Mario Klingemann, Yuma Kishi), AI researchers (Damien Henry, Nicola Cattabiani), Videoartists (Karol Sudolski, Mirek Hardiker) and music intruments designers (Luca Pagan, Paolo Ferrari) to produce original materials.

Adversarial Feelings is the first release by Lorem, and it’s a 22 min AV piece + 9 music tracks and a book. The record will be released on APR 19th on Krisis via Cargo Music.

And what about achieving intimacy with nets? He explains:

Neural Networks are nowadays widely used to detect, classify and reconstruct emotions, mainly in order to map users behaviours and to affect them in effective ways. But what happens when we use Machine Learning to perform human feelings? And what if we use it to produce autonomous behaviours, rather then to affect consumers? Adversarial Feelings is an attempt to inform non-human intelligence with “emotional data sets”, in order to build an “algorithmic intimacy” through those intelligent devices. The goal is to observe subjective/affective dimension of intimacy from the outside, to speak about human emotions as perceived by non-human eyes. Transposing them into a new shape helps Lorem to embrace a new perspective, and to recognise fractured experiences.

I spoke with Francesco as he made the plane trip toward Berlin. Friday night, he joins a new series called KEYS, which injects new inquiry into the club space – AV performance, talks, all mixed up with nightlife. It’s the sort of thing you get in festivals, but in festivals all those ideas have been packaged and finished. KEYS, at a new post-industrial space called Trauma Bar near Hauptbahnhof, is a laboratory. And, of course, I like laboratories. So I was pleased to hear what mad science was generating all of this – the team of humans and machines alike.

So I understand the ‘AI’ theme – am I correct in understanding that the focus to derive this emotional meaning was on text? Did it figure into the work in any other ways, too?

Neural Networks and AI were involved in almost every step of the project. On the musical side, they were used mainly to generate MIDI patterns, to deal with SysEx from a digital sampler and to manage recursive re-sampling and intelligent timestretch. Rather then generating the final audio, the goal here was to simulate musician’s behaviors and his creative processes.

On the video side, [neural networks] (especially GANs [generative adverserial networks]) were employed both to generate images and to explore the latent spaces through custom tailored algorithms, in order to let the system edit the video autonomously, according with the audio source.

What data were you training on for the musical patterns?

MIDI – basically I trained the NN on patterns I create.

And wait, SysEx, what? What were you doing with that?

Basically I record every change of state of a sampler (i.e. the automations on a knob), and I ask the machine to “play” the same patch of the sampler according to what it learned from my behavior.

What led you to getting involved in this area? And was there some education involved just given the technical complexity of machine learning, for instance?

I always tried to express my work through multidisciplinary projects. I am very fascinated by the way AI approaches data, allowing us to work across different media with the same perspective. Intelligent devices are really a great tool to melt languages. On the other hand, AI emergency discloses political questions we try to face since some years at Krisis Publishing.
I started working through the Lorem project three years ago, and I was really a newbie on the technical side. I am not a hyper-skilled programmer, and building a collaborative platform has been really important to Lorem’s development. I had the chance to collaborate with AI artists (Klingemann, Kishi), researchers (Henry, Cattabiani, Ferrari), digital artists (Sudolski, Hardiker)…

How did the collaborations work – Mario I’ve known for a while; how did you work with such a diverse team; who did what? What kind of feedback did you get from them?

To be honest, I was very surprised about how open and responsive is the AI community! Some of the people involved are really huge points of reference for me (like Mario, for instance), and I didn’t expect to really get them on Adversarial Feelings. Some of the people involved prepared original contents for the release (Mario, for instance, realised a video on “The Sky would Clear What the …”, Yuma Kishi realized the girl/flower on “Sonnet#002” and Damien Henry did the train hallucination on “Shonx – Canton” remix. With other people involved, the collaboration was more based on producing something together, such a video, a piece of code or a way to explore Latent Spaces.

What was the role of instrument builders – what are we hearing in the sound, then?

Some of the artists and researchers involved realized some videos from the audio tracks (Mario Klingemann, Yuma Kishi). Damien Henry gave me the right to use a video he made with his Next Frame Prediction model. Karol Sudolski and Nicola Cattabiani worked with me in developing respectively “Are Eyes invisible Socket Contenders” + “Natural Readers” and “3402 Selves”. Karol Sudolski also realized the video part on “Trying to Speak”. Nicola Cattabiani developed the ELERP algorithm with me (to let the network edit videos according with the music) and GRUMIDI (the network working with my midi files). Mirek Hardiker built the data set for the third chapter of the book.

I wonder what it means for you to make this an immersive performance. What’s the experience you want for that audience; how does that fit into your theme?

I would say Adversarial Feelings is a AV show totally based on emotions. I always try to prepare the most intense, emotional and direct experience I can.

You talk about the emotional content here and its role in the machine learning. How are you relating emotionally to that content; what’s your feeling as you’re performing this? And did the algorithmic material produce a different emotional investment or connection for you?

It’s a bit like when I was a kid and I was listening at my recorded voice… it was always strange: I wasn’t fully able to recognize my voice as it sounded from the outside. I think neural networks can be an interesting tool to observe our own subjectivity from external, non-human eyes.

The AI hook is of course really visible at the moment. How do you relate to other artists who have done high-profile material in this area recently (Herndon/Dryhurst, Actress, etc.)? And do you feel there’s a growing scene here – is this a medium that has a chance to flourish, or will the electronic arts world just move on to the next buzzword in a year before people get the chance to flesh out more ideas?

I messaged a couple of times Holly Herndon online… I’m really into her work since her early releases, and when I heard she was working on AI systems I was trying to finish Adversarial Feelings videos… so I was so curious to discover her way to deal with intelligent systems! She’s a really talented artist, and I love the way she’s able to embed conceptual/political frameworks inside her music. Proto is a really complex, inspiring device.

More in general, I think the advent of a new technology always discloses new possibilities in artistic practices. I directly experienced the impact of internet (and of digital culture) on art, design and music when I was a kid. I’m thrilled by the fact at this point new configurations are not yet codified in established languages, and I feel working on AI today give me the possibility to be part of a public debate about how to set new standards for the discipline.

What can we expect to see / hear today in Berlin? Is it meaningful to get to do this in this context in KEYS / Trauma Bar?

I am curious too, to be honest. I am very excited to take part of such situation, beside artists and researchers I really respect and enjoy. I think the guys at KEYS are trying to do something beautiful and challenging.

Live in Berlin, 7 June

Lorem will join Lexachast (an ongoing collaborative work by Amnesia Scanner, Bill Kouligas and Harm van den Dorpel), N1L (an A/V artist, producer/dj based between Riga, Berlin, and Cairo), and a series of other tantalizing performances and lectures at Trauma Bar.

KEYS: Artificial Intelligence | Lexachast • Lorem • N1L & more [Facebook event]

Lorem project lives here:

The post In Adversarial Feelings, Lorem explores AI’s emotional undercurrents appeared first on CDM Create Digital Music.

Now ‘AI’ takes on writing death metal, country music hits, more

Machine learning is synthesizing death metal. It might make your death metal radio DJ nervous – but it could also mean music software works with timbre and time in new ways. That news – plus some comical abuse of neural networks for writing genre-specific lyrics in genres like country – next.

Okay, first, whether this makes you urgently want to hear machine learning death metal or it drives you into a rage, either way you’ll want the death metal stream. And yes, it’s a totally live stream – you know, generative style. Tune in, bot out:

Okay, first it’s important to say, the whole point of this is, you need data sets to train on. That is, machines aren’t composing music, so much as creatively regurgitating existing samples based on fairly clever predictive mathematical models. In the case of the death metal example, this is SampleRNN – a recurrent neural network that uses sample material, repurposed from its original intended application working with speak. (Check the original project, though it’s been forked for the results here.)

This is a big, big point, actually – if this sounds a lot like existing music, it’s partly because it is actually sampling that content. The particular death metal example is nice in that the creators have published an academic article. But they’re open about saying they actually intend “overfitting” – that is, little bits of samples are actually playing back. Machines aren’t learning to generate this content from scratch; they’re actually piecing together those samples in interesting ways.

That’s relevant on two levels. One, because once you understand that’s what’s happening, you’ll recognize that machines aren’t magically replacing humans. (This works well for death metal partly because to non connoisseurs of the genre, the way angry guitar riffs and undecipherable shouting are plugged together already sounds quite random.)

But two, the fact that sample content is being re-stitched in time like this means this could suggest a very different kind of future sampler. Instead of playing the same 3-second audio on repeat or loop, for instance, you might pour hours or days of singing bowls into your sampler and then adjust dials that recreated those sounds in more organic ways. It might make for new instruments and production software.

Here’s what the creators say:

Thus, we want the out-put to overfit short timescale patterns (timbres, instruments, singers, percussion) and underfit long timescale patterns(rhythms, riffs, sections, transitions, compositions) so that it sounds like a recording of the original musicians playing new musical compositions in their style.

Sure enough, you can go check their code:

Or read the full article:

Generating Albums with SampleRNN to Imitate Metal, Rock, and Punk Bands

The reason I’m belaboring this is simple. Big corporations like Spotify might use this sort of research to develop, well, crappy mediocre channels of background music that make vaguely coherent workout soundtracks or faux Brian Eno or something that sounded like Erik Satie got caught in an opium den and re-composed his piano repertoire in a half daze. And that would, well, sort of suck.

Alternatively, though, you could make something like a sampler or DAW more human and less conventionally predictable. You know, instead of applying a sample slice to a pad and then having the same snippet repeat every eighth note. (Guilty as charged, your honor.)

It should also be understood that, perversely, this may all be raising the value of music rather than lowering it. Given the amount of recorded music currently available, and given that it can already often be licensed or played for mere cents, the machine learning re-generation of these same genres actually requires more machine computation and more human intervention – because of the amount of human work required to even select datasets and set parameters and choose results.

DADABOTS, for their part, have made an entire channel of this stuff. The funny thing is, even when they’re training on The Beatles, what you get sounds like … well, some of the sort of experimental sound you might expect on your low-power college radio station. You know, in a good way – weird, digital drones, of exactly the sort we enjoy. I think there’s a layperson impression that these processes will magically improve. That may misunderstand the nature of the mathematics involved – on the contrary, it may be that these sorts of predictive models always produce these sorts of aesthetic results. (The same team use Markov Chains to generate track names for their Bandcamp label. Markov Chains work as well as they did a century ago; they didn’t just start working better.)

I enjoy listening to The Beatles as though an alien civilization has had to digitally reconstruct their oeuvre from some fallout-shrouded, nuclear-singed remains of the number-one hits box set post apocalypse. (“Help! I need somebody! Help! The human race is dead!” You know, like that.)

As it moves to black metal and death metal, their Bandcamp labels progresses in surreal coherence:

This album gets especially interesting, as you get weird rhythmic patterns in the samples. And there’s nothing saying this couldn’t in turn inspire new human efforts. (I once met Stewart Copeland, who talked about how surreal it was hearing human drummers learn to play the rhythms, unplugged, that he could only achieve with The Police using delay pedals.)

I’m really digging this one:

So, digital sample RNN processes mostly generate angry and angular experimental sounds – in a good way. That’s certainly true now, and could be true in the future.

What’s up in other genres?

SONGULARITY is making a pop album. They’re focusing on lyrics (and a very funny faux generated Coachella poster). In this case, though, the work is constrained to text – far easier to produce convincingly than sound. Even a Markov Chain can give you interesting or amusing results; with machine learning applied character-by-character to text, what you get is a hilarious sort of futuristic Mad Libs. (It’s also clear humans are cherry-picking the best results, so these are really humans working with the algorithms much as you might use chance operations in music or poetry.)

Whether this says anything about the future of machines, though, the dadaist results are actually funny parody.

And that gives us results like You Can’t Take My Door:

Barbed whiskey good and whiskey straight.

These projects work because lyrics are already slightly surreal and nonsensical. Machines chart directly into the uncanny valley instead of away from it, creating the element of surprise and exaggerated un-realness that is fundamental to why we laugh at a lot of humor in the first place.

This also produced this Morrissey “Bored With This Desire To Get Ripped” – thanks to the ingenious idea of training the dataset not just with Morrissey lyrics, but also Amazon customer reviews of the P90X home workout DVD system. (Like I said – human genius wins, every time.)

Or there’s Dylan mixed with negative Yelp reviews from Manhattan:

And maybe in this limited sense, the machines are telling us something about how we learn. Part of the poetic flow is about drawing on all our wetware neural connections between everything we’ve heard before – as in the half-awake state of creative vibrations. That is, we follow our own predictive logic without doing the usual censoring that keeps our language rational. Thinking this way, it’s not that we would use machine learning to replace the lyricist. Rather, just as with chance operations in the past, we can use this surreal nonsense to free ourselves from the constraints that normal behavior require.

We shouldn’t underestimate, though, human intervention in using these lyrics. The neural nets are good at stringing together short bits of words, but the normal act of composition – deciding the larger scale structure, choosing funnier bits over weaker ones, recognizing patterns – remain human.

Recurrent neural networks probably won’t be playing Coachella any time soon, but if you need a band name, they’re your go-to. More funny text mangling from the Botnik crew.

My guess is, once the hype dies down, these particular approaches will wind up joining the pantheon of drunken walks and Markov Chains and fractals and other psuedo-random or generative algorithmic techniques. I sincerely hope that we don’t wait for that to happen, but use the hype to seize the opportunity to better educate ourselves about the math underneath (or collaborate with mathematicians), and see these more hardware-intensive processes in the context of some of these older ideas.

If you want to know why there’s so much hype and popular interest, though, the human brain may itself hold the answer. We are all of us hard-wired to delight in patterns, which means arguably there’s nothing more human than being endlessly entertained by what these algorithms produce.

But you know, I’m a marathon runner in my sorry way.

The post Now ‘AI’ takes on writing death metal, country music hits, more appeared first on CDM Create Digital Music.

Exploring machine learning for music, live: Gamma_LAB AI

AI in music is as big a buzzword as in other fields. So now’s the time to put it to the test – to reconnect to history, human practice, and context, and see what holds up. That’s the goal of the Gamma_LAB AI in St. Petersburg next month. An open call is running now.

Machine learning for AI has trended so fast that there are disconnects between genres and specializations. Mathematicians or coders may get going on ideas without checking whether they work with musicians or composers or musicologists – and the other way around.

I’m excited to join as one of the hosts with Gamma_LAB AI partly because it brings together all those possible disciplines, puts international participants in an intensive laboratory, and then shares the results in one of the summer’s biggest festivals for new electronic music and media. We’ll make some of those connections because those people will finally be together in one room, and eventually on one live stage. That investigation can be critical, skeptical, and can diverge from clichéd techniques – the environment is wide open and packed with skills from an array of disciplines.

Natalia Fuchs, co-producer of GAMMA Festival, founder of ARTYPICAL and media art historian, is curating Gamma_LAB AI. The lab will run in May in St. Petersburg, with an open call due this Monday April 8 (hurry!), and then there will be a full AI-stage as a part of Gamma Festival.

Image: Helena Nikonole.

Invited participants will delve into three genres – baroque, jazz, and techno. The idea is not just a bunch of mangled generative compositions, but a broad look at how machine learning could analyze deep international archives of material in these fields, and how the work might be used creatively as an instrument or improviser. We expect participants with backgrounds in musicianship and composition as well as in coding, mathematics, and engineering, and people in between, also researchers and theorists.

To guide that work, we’re working to setup collaboration and confrontation between historical approaches and today’s bleeding-edge computational work. Media artist Helena Nikonole became conceptual artist of the Lab. She will bring her interests in connecting AI with new aesthetics and media, as she has exhibited from ZKM to CTM to Garage Museum of Contemporary Art. Dr. Konstantin Yakovlev joins as one of Russia’s leading mathematicians and computer scientists working at the forefront of AI, machine learning, and smart robotics – meaning we’re guaranteed some of the top technical talent. (Warning: crash course likely.)

Russia has an extraordinarily rich culture of artistic and engineering exploration, in AI as elsewhere. Some of that work was seen recently at Berlin’s CTM Festival exhibition. Helena for her part has created work that, among others, applies machine learning to unraveling the structure of birdsong (with a bird-human translator perhaps on the horizon), and hacked into Internet-connected CCTV cameras and voice synthesis to meld machine learning-generated sacred texts with … well, some guys trapped in an elevator. See below:

Bird Language

deus X mchn

I’m humbled to get to work with them and in one of the world’s great musical cities, because I hope we also get to see how these new models relate to older ones, and where gaps lie in music theory and computation. (We’re including some musicians/composers with serious background in these fields, and some rich archives that haven’t been approached like this ever before.)

I came from a musicology background, so I see in so-called “AI” a chance to take musicology and theory closer to the music, not further away. Google recently presented a Bach ”doodle” – more on that soon, in fact – with the goal of replicating some details of Bach’s composition. To those of us with a music theory background, some of the challenges of doing that are familiar: analyzing music is different from composing it, even for the human mind. To me, part of why it’s important to attempt working in this field is that there’s a lot to learn from mistakes and failures.

It’s not so much that you’re making a robo-Bach – any more than your baroque theory class will turn all the students into honorary members of the extended Bach family. (Send your CV to your local Lutheran church!) It’s a chance to find new possibilities in this history we might not have seen before. And it lets us test (and break) our ideas about how music works with larger sets of data – say, all of Bach’s cantatas at once, or a set of jazz transcriptions, or a library full of nothing but different kick drums, if you like. This isn’t so much about testing “AI,” whatever you want that to mean – it’s a way to push our human understanding to its limits.

Oh yes, and we’ll definitely be pushing our own human limits – in a fun way, I’m sure.

A small group of participants will be involved in the heart of St. Petersburg from May 11-22, with time to investigate and collaborate, plus inputs (including at the massive Planetarium No. 1).

But where this gets really interesting – and do expect to follow along here on CDM – is that we will wind up in July with an AI mainstage at the globally celebrated Gamma Festival. Artist participants will create their own AI-inspired audiovisual performances and improvisations, acoustic and electronic hybrids, and new live scenarios. The finalists will be invited to the festival and fully covered in terms of expenses.

So just as I’ve gotten to do with partners at CTM Festival (and recently with southeast Asia’s Nusasonic), we’re making the ultimate laboratory experiment in front of a live audience. Research, make, rave, repeat.

The open call deadline is fast approaching if you think you might want to participate.

Facebook event

To apply:
Participation at GAMMA_LAB AI is free for the selected candidates. Send a letter of intent and portfolio to by end of day April 8, 2019. Participants have to bring personal computers of sufficient capacity to work on their projects during the Laboratory. Transportation and living expenses during the Laboratory are paid by the participants themselves. The organizers provide visa support, as well as the travel of the best Lab participants to GAMMA festival in July.

The post Exploring machine learning for music, live: Gamma_LAB AI appeared first on CDM Create Digital Music.

Azure Kinect promises new motion, tracking for art

Gamers’ interest may come and go, but artists are always exploring the potential of computer vision for expression. Microsoft this month has resurrected the Kinect, albeit in pricey, limited form. Let’s fit it to the family tree.

Time flies: musicians and electronic artists have now had access to readily available computer vision since the turn of this century. That initially looked like webcams, paired with libraries like the free OpenCV (still a viable option), and later repurposed gaming devices from Sony and Microsoft platforms.

And then came Kinect. Kinect was a darling of live visual projects and art installations, because of its relatively sophisticated skeletal tracking and various artist-friendly developer tools.

History time

A full ten years ago, I was writing about the Microsoft project and interactions, in its first iteration as the pre-release Project Natal. Xbox 360 support followed in 2010, Windows support in 2012 – while digital artists quickly hacked in Mac (and rudimentary Linux) support. An artists in music an digital media quickly followed.

For those of you just joining us, Kinect shines infrared light at a scene, and takes an infrared image (so it can work irrespective of other lighting) which it converts into a 3D depth map of the scene. From that depth image, Microsoft’s software can also track the skeleton image of one or two people, which lets you respond to the movement of bodies. Microsoft and partner PrimeSense weren’t the only to try this scheme, but they were the ones to ship the most units and attract the most developers.

We’re now on the third major revision of the camera hardware.

2010: Original Kinect for Xbox 360. The original. Proprietary connector with breakout to USB and power. These devices are far more common, as they were cheaper and shipped more widely. Despite the name, they do work with both open drivers for respective desktop systems.

2012: Kinect for Windows. Looks and works almost identically to Kinect for 360, with some minor differences (near mode).

Raw use of depth maps and the like for the above yielded countless music videos, and the skeletal tracking even more numerous and typically awkward “wave your hands around to play the music” examples.

Here’s me with a quick demo for the TED organization, preceded by some discussion of why I think gesture matter. It’s… slightly embarrassing, only in that it was produced on an extremely tight schedule, and I think the creative exploration of what I was saying about gesture just wasn’t ready yet. (Not only had I not quite caught up, but camera tech like what Microsoft is shipping this year is far better suited to the task than the original Kinect camera was.) But the points I’m making here have some fresh meaning for me now.

2013: Kinect for Xbox One. Here’s where things got more interesting – because of a major hardware upgrade, these cameras are far more effective at tracking and yield greater performance.

  • Active IR tracking in the dark
  • Wider field of vision
  • 6 skeletons (people) instead of two
  • More tracking features, with additional joints and creepier features like heart rate and facial expression
  • 1080p color camera
  • Faster performance/throughput (which was key to more expressive results)

Kinect One, the second camera (confusing!), definitely allowed more expressive applications. One high point for me was the simple but utterly effective work of Chris Milk and team, “The Treachery of Sanctuary.”

And then it ended. Microsoft unbundled the camera from Xbox One, meaning developers couldn’t count on gamers owning the hardware, and quietly discontinued the last camera at the end of October 2017.

Everything old is new again

I have mixed feelings – as I’m sure you do – about these cameras, even with the later results on Kinect One. For gaming, the devices were abandoned – by gamers, by developers, and by Microsoft as the company ditched the Xbox strategy. (Parallel work at Sony didn’t fare much better.)

It’s hard to keep up with consumer expectations. By implying “computer vision,” any such technology has to compete with your own brain – and your own brain is really, really, really good. “Sensors” and “computation” are all merged in organic harmony, allowing you to rapidly detect the tiniest nuance. You can read a poker player’s tell in an instant, while Kinect will lose the ability to recognize that your leg is attached to your body. Microsoft launched Project Natal talking about seeing a ball and kicking a ball, but… you can do that with a real ball, and you really can’t do that with a camera, so they quite literally got off on the wrong foot.

It’s not just gaming, either. On the art side, the very potential of these cameras to make the same demos over and over again – yet another magic mirror – might well be their downfall.

So why am I even bothering to write this?

Simple: the existing, state-of-the-art Kinect One camera is now available on the used market for well under a hundred bucks – for less than the cost of a mid-range computer mouse. Microsoft’s gaming business whims are your budget buy. The computers to process that data are faster and cheaper. And the software is more mature.

So while digital art has long been driven by novelty … who cares? Actual music and art making requires practice and maturity of both tools and artist. It takes time. So oddly while creative specialists were ahead of the curve on these sorts of devices, the same communities might well innovate in the lagging cycle of the same technology.

And oh yeah – the next generation looks very powerful.

Kinect: The Next Generation

Let’s get the bad news out of the way first: the new Kinect is both more expensive ($400) and less available (launching only in the US and China… in June). Ugh. And that continues Microsoft’s trend here of starting with general purpose hardware for mass audiences and working up to … wait, working up to increasingly expensive hardware for smaller and smaller groups of developers.

That is definitely backwards from how this is normally meant to work.

But the good news here is unexpected. Kinect was lost, and now is found.

The safe bet was that Microsoft would just abandon Kinect after the gaming failure. But to the company’s credit, they’ve pressed on, with some clear interest in letting developers, researchers, and artists decide what this thing is really for. Smart move: those folks often come up with inspiration that doesn’t fit the demands of the gaming industry.

So now Kinect is back, dubbed Azure Kinect – Microsoft is also hell-bent on turning Azure “cloud services” into a catch-all solution for all things, everywhere.

And the hardware looks … well, kind of amazing. It might be described as a first post-smartphone device. Say what? Well, now that smartphones have largely finalized their sensing capabilities, they’ve oddly left the arena open to other tech defining new areas.

For a really good write-up, you’ll want to read this great run-down:

All you need to know on Azure Kinect
[The Ghost Howls, a VR/tech blog, see also a detailed run-down of HoloLens 2 which also just came out]

Here are the highlights, though. Azure Kinect is the child of Kinect and HoloLens. It’s a VR-era sensor, but standalone – which is perfect for performance and art.

Fundamentally, the formula is the same – depth camera, conventional RGB camera, some microphones, additional sensors. But now you get more sensing capabilities and substantially beefed-up image processing.

1MP depth camera (not 640×480) – straight off of HoloLens 2, Microsoft’s augmented reality plaform
Two modes: wide and narrow field of view
4K RGB camera (with standard USB camera operation)
7 microphone array
Gyroscope + accelerometer

And it connects both by USB-C (which can also be used for power) or as a standalone camera with “cloud connection.” (You know, I’m pretty sure that means it has a wifi radio, but oddly all the tech reporters who talked to Microsoft bought the “cloud” buzzword and no one says outright. I’ll double-check.)

Also, now Microsoft supports both Windows and Linux. (Ubuntu 18.04 + OpenGL v 4.4).

Downers: 30 fps operation, limited range.

Somehing something, hospitals or assembly lines Azure services, something that looks like an IBM / Cisco ad:

That in itself is interesting. Artists using the same thing as gamers sort of … didn’t work well. But artists using the same tool as an assembly line is something new.

And here’s the best part for live performance and interaction design – you can freely combine as many cameras as you want, and sync them without any weird tricks.

All in all, this looks like it might be the best networked camera, full stop, let alone best for tracking, depth sensing, and other applications. And Microsoft are planning special SDKs for the sensor, body tracking, vision, and speech.

Also, the fact that it doesn’t plug into an Xbox is a feature, not a bug to me – it means Microsoft are finally focusing on the more innovative, experimental uses of these cameras.

So don’t write off Kinect now. In fact, with Kinect One so cheap, it might be worth picking one up and trying Microsoft’s own SDK just for practice.

Azure Kinect DK preorder / product page

The post Azure Kinect promises new motion, tracking for art appeared first on CDM Create Digital Music.

Magenta Studio lets you use AI tools for inspiration in Ableton Live

Instead of just accepting all this machine learning hype, why not put it to the test? Magenta Studio lets you experiment with open source machine learning tools, standalone or inside Ableton Live.

Magenta provides a pretty graspable way to get started with an field of research that can get a bit murky. By giving you easy access to machine learning models for musical patterns, you can generate and modify rhythms and melodies. The team at Google AI first showed Magenta Studio at Ableton’s Loop conference in LA in November, but after some vigorous development, it’s a lot more ready for primetime now, both on Mac and Windows.

If you’re working with Ableton Live, you can use Magenta Studio as a set of devices. Because they’re built with Electron (a popular cross-platform JavaScript tool), though, there’s also a standalone version. Developers can dig far deeper into the tools and modify them for your own purposes – and even if you have just a little comfort with the command line, you can also train your own models. (More on that in a bit.)

Side note of interest to developers: this is also a great showcase for doing powerful stuff with machine learning using just JavaScript, applying even GPU acceleration without having to handle a bunch of complex, platform-specific libraries.

I got to sit down with the developers in LA, and also have been playing with the latest builds of Magenta Studio. But let’s back up and first talk about what this means.

Magenta Studio is out now, with more information on the Magenta project and other Google work on musical applications on machine learning:


Artificial Intelligence – well, apologies, I could have fit the letters “ML” into the headline above but no one would know what I was talking about.

Machine learning is a better term. What Magenta and TensorFlow are based on is applying algorithmic analysis to large volumes of data. “TensorFlow” may sound like some kind of stress exercise ball you keep at your desk. But it’s really about creating an engine that can very quickly process lots of tensors – geometric units that can be combined into, for example, artificial neural networks.

Seeing the results of this machine learning in action means having a different way of generating and modifying musical information. It takes the stuff you’ve been doing in music software with tools like grids, and lets you use a mathematical model that’s more sophisticated – and that gives you different results you can hear.

You may know Magenta from its involvement in the NSynth synthesizer —

That also has its own Ableton Live device, from a couple years back.

NSynth uses models to map sounds to other sounds and interpolate between them – it actually applies the techniques we’ll see in this case (for notes/rhythms) to audio itself. Some of the grittier artifacts produced by this process even proved desirable to some users, as they’re something a bit unique – and you can again play around in Ableton Live.

But even if that particular application didn’t impress you – trying to find new instrument timbres – the note/rhythm-based ideas make this effort worth a new look.

Recurrent Neural Networks are a kind of mathematical model that algorithmically loops over and over. We say it’s “learning” in the sense that there are some parallels to very low-level understandings of how neurons work in biology, but this is on a more basic level – running the algorithm repeatedly means that you can predict sequences more and more effectively given a particular data set.

Magenta’s “musical” library applies a set of learning principles to musical note data. That means it needs a set of data to “train” on – and part of the results you get are based on that training set. Build a model based on a data set of bluegrass melodies, for instance, and you’ll have different outputs from the model than if you started with Gregorian plainchant or Indonesian gamelan.

One reason that it’s cool that Magenta and Magenta Studio are open source is, you’re totally free to dig in and train your own data sets. (That requires a little more knowledge and some time for your computer or a server to churn away, but it also means you shouldn’t judge Magenta Studio on these initial results alone.)

What’s in Magenta Studio

Magenta Studio has a few different tools. Many are based on MusicVAE – a recent research model that looked at how machine learning could be applied to how different melodies relate to one another. Music theorists have looked at melodic and rhythmic transformations for a long time, and very often use mathematical models to make more sophisticated descriptions of how these function. Machine learning lets you work from large sets of data, and then not only make a model, but morph between patterns and even generate new ones – which is why this gets interesting for music software.

Crucially, you don’t have to understand or even much care about the math and analysis going on here – expert mathematicians and amateur musicians alike can hear and judge the results. If you want to read a summary of that MusicVAE research, you can. But it’s a lot better to dive in and see what the results are like first. And now instead of just watching a YouTube demo video or song snippet example, you can play with the tools interactively.

Magenta Studio lets you work with MIDI data, right in your Ableton Live Session View. You’ll make new clips – sometimes starting from existing clips you input – and the device will spit out the results as MIDI you can use to control instruments and drum racks. There’s also a slide called “Temperature” which determines how the model is sampled mathematically. It’s not quite like adjusting randomness – hence they chose this new name – but it will give you some control over how predictable or unpredictable the results will be (if you also accept that the relationship may not be entirely linear). And you can choose number of variations, and length in bars.

The data these tools were trained on represents millions of melodies and rhythms. That is, they’ve chosen a dataset that will give you fairly generic, vanilla results – in the context of Western music, of course. (And Live’s interface is fairly set up with expectations about what a drum kit is, and with melodies around a 12-tone equal tempered piano, so this fits that interface… not to mention, arguably there’s some cultural affinity for that standardization itself and the whole idea of making this sort of machine learning model, but I digress.)

Here are your options:

Generate: This makes a new melody or rhythm with no input required – it’s the equivalent of rolling the dice (erm, machine learning style, so very much not random) and hearing what you get.

Continue: This is actually a bit closer to what Magenta Studio’s research was meant to do – punch in the beginning of a pattern, and it will fill in where it predicts that pattern could go next. It means you can take a single clip and finish it – or generate a bunch of variations/continuations of an idea quickly.

Interpolate: Instead of one clip, use two clips and merge/morph between them.

Groove: Adjust timing and velocity to “humanize” a clip to a particular feel. This is possibly the most interesting of the lot, because it’s a bit more focused – and immediately solves a problem that software hasn’t solved terribly well in the past. Since the data set is focused on 15 hours of real drummers, the results here sound more musically specific. And you get a “humanize” that’s (arguably) closer to what your ears would expect to hear than the crude percentage-based templates of the past. And yes, it makes quantized recordings sound more interesting.

Drumify: Same dataset as Groove, but this creates a new clip based on the groove of the input. It’s … sort of like if Band-in-a-Box rhythms weren’t awful, basically. (Apologies to the developers of Band-in-a-Box.) So it works well for percussion that ‘accompanies’ an input.

So, is it useful?

It may seem un-human or un-musical to use any kind of machine learning in software. But from the moment you pick up an instrument, or read notation, you’re working with a model of music. And that model will impact how you play and think.

More to the point with something like Magenta is, do you really get musically useful results?

Groove to me is really interesting. It effectively means you can make less rigid groove quantization, because instead of some fixed variations applied to a grid, you get a much more sophisticated model that adapts based on input. And with different training sets, you could get different grooves. Drumify is also compelling for the same reason.

Generate is also fun, though even in the case of Continue, the issue is that these tools don’t particularly solve a problem so much as they do give you a fun way of thwarting your own intentions. That is, much like using the I Ching (see John Cage, others) or a randomize function (see… all of us, with a plug-in or two), you can break out of your usual habits and create some surprise even if you’re alone in a studio or some other work environment.

One simple issue here is that a model of a sequence is not a complete model of music. Even monophonic music can deal with weight, expression, timbre. Yes, theoretically you can apply each of those elements as new dimensions and feed them into machine learning models, but – let’s take chant music, for example. Composers were working with less quantifiable elements as they worked, too, like the meaning and sound of the text, positions in the liturgy, multi-layered quotes and references to other compositions. And that’s the simplest case – music from punk to techno to piano sonatas will challenge these models in Magenta.

I bring this up not because I want to dismiss the Magenta project – on the contrary, if you’re aware of these things, having a musical game like this is even more fun.

The moment you begin using Magenta Studio, you’re already extending some of the statistical prowess of the machine learning engine with your own human input. You’re choosing which results you like. You’re adding instrumentation. You’re adjusting the Temperature slider using your ear – when in fact there’s often no real mathematical indication of where it “should” be set.

And that means that hackers digging into these models could also produce new results. People are still finding new applications for quantize functions, which haven’t changed since the 1980s. With tools like Magenta, we get a whole new slew of mathematical techniques to apply to music. Changing a dataset or making minor modifications to these plug-ins could yield very different results.

And for that matter, even if you play with Magenta Studio for a weekend, then get bored and return to practicing your own music, even that’s a benefit.

Where this could go next

There are lots of people out there selling you “AI” solutions and – yeah, of course, with this much buzz, a lot of it is snake oil. But that’s not the experience you have talking to the Magenta team, partly because they’re engaged in pure research. That puts them in line with the history of pure technical inquiries past, like the team at Bell Labs (with Max Mathews) that first created computer synthesis. Magenta is open, but also open-ended.

As Jesse Engel tells CDM:

We’re a research group (not a Google product group), which means that Magenta Studio is not static and has a lot more interesting models are probably on the way.

Things like more impressive MIDI generation ( – check out “Score Conditioned”)

And state of the art transcription: (

And new controller paradigms:

Our goal is just for someone to come away with an understanding that this is an avenue we’ve created to close the gap between our ongoing research and actual music making, because feedback is important to our research.

So okay, music makers – have at it:

The post Magenta Studio lets you use AI tools for inspiration in Ableton Live appeared first on CDM Create Digital Music.

Free download: A 400-page guide to experimental Eastern Europe sounds

If experimental music and Europe make you think only of cities like Paris and London, you’re missing a big part of the story. Now you can grab a huge reference on fringe and weird electronic music from the east – and it’s free. (At least that would please Marx.)

Berlin, and Europe in general, have exploded as hubs for experimental sounds. And if you want an answer to why that’s happened lately, look in no small part to the ingenuity, technical and artistic, of central and eastern Europe. These artistic cultures flourished during the Cold War, sometimes with support from Communist states, sometimes very much in the face of adversity and resistance from those same nations. And then in a more connected Europe, brought together by newly open borders and cheap road and air transit, a younger generation continues to advance the state of the art – and the state of the weird.

Old biases die hard, though. Cold War (or simply racist) attitudes often rob central and eastern Europe of deserved credit. And then there’s the simple problem of writing a history that’s fragmented by language and divisions that arose between East and West.

So it’s worth checking out this guide. It’s an amazing atlas covering history and new scenes, and the PDF edition is now available to download for free (if you can’t locate the print version).

SOUND EXCHANGE was a project from 2012-2012, connected to events in seven cities – Kraków, Bratislava, Tallinn, Vilnius, Budapest, Riga and, Prague. That’s Poland, Slovakia, Estonia, Lithuania, Hungary, Latvia, and Czech, respectively. It’s also relevant that we’re seeing these countries produce music tech alongside music – Bastl Instruments in Czech, Polyend in Poland, and Erica Synths in Latvia, just to name three that have lately gotten a lot of attention (and there are others).

There’s 400 pages – in both German and English – with a huge range of stuff. There’s fringe rock music in Germany, radio art from Czech, intermedia and multimedia art from across the region, what Latvia has been up to in experimental music since independence … and the list goes on. Technology and music practice go hand in hand, too, as workshops and music concerts intertwine to spread new ideas – both before and after the fall of communism, via different conduits.

It’s a fitting moment to rediscover this exhibition; CTM Festival here in Berlin has been a showcase for some of the east-meets-west projects including Sound Exchange’s outcomes. And CTM itself is arguably a recipient of a lot of that energy, in the one capital that sits astride east and west – even today, in some ways, minus the wall. The festival is turning 20 this year, and not incidentally, East Berlin-founded label Raster is showcasing its own artists in an exhibition and DJ sets.

Maybe it’s not bedtime reading, but even a skim is a good guide:

Download (uh, happy to re-host this if the bandwidth doesn’t hold up – I know how the kids love their Latvian experimental music book research):

The post Free download: A 400-page guide to experimental Eastern Europe sounds appeared first on CDM Create Digital Music.

The taste of music: listen to LJ Rich talk about synesthesia

Synesthesia is a term that gets thrown around a lot, usually to describe the common associations of color, image, and music. But for some people, intermingling of senses can be far more extreme. Listen to LJ Rich talk about what happens when hearing and taste intersect.

LJ Rich is most widely known as presenter of the BBC program Click. She’s also got vast musical experience, from composition to engineering. But for someone so involved in music, her experience is out of the ordinary. Her sense of taste evokes music, and her sense of music evokes taste.

The thing about our senses is, each of us assumes our own experience is the same as everyone else’s – until we hear otherwise. And then, it’s almost impossible to describe; human experience is far more relativistic than any of us can ever hope to understand. But listening to LJ talk about this will both give you insight into her unique way of taking in sound, as well as giving some clues to why music is profound and often cross-sensory for so many of us. She can describe what Debussy tastes like.

This opens up some new interdisciplinary performances; at Music Tech Fest (MTF) in Umeå, Sweden, LJ performed onstage with a bartender, who mixed a cocktail onstage.

LJ’s talk is the content of the first episode of the new MTF podcast. Listen at their site:

More on her expansive projects:

She’s also been interview on this topic:
Understanding the Connection Between Synesthesia and Absolute Pitch [The Scientist]

You Know What London Looks Like. But Have You Really Heard It? [The New York Times]

I’ll be with the MTF crew by tomorrow morning in Karlsruhe, Germany, as I participate in their #MTFLabs program, for a 24-hour laboratory – live performance in the ZKM Open Codes exhibition.

More on that soon.

The post The taste of music: listen to LJ Rich talk about synesthesia appeared first on CDM Create Digital Music.

What makes music and creativity? A talk with Susan Rogers

What makes creativity work in music? What happens in the brain? Susan Rogers has uniquely contemplated those questions both alongside artists like Prince and in research into the mind.

I got the chance to interview Dr. Rogers at SONAR+D last month, and I found my own mind wandering to how her mind works, as she characterized different kinds of intelligence. She exudes an easy sense of empathy, and in both her talks at Ableton Loop and SONAR, she’s quick to remove her own ego and move her role out of the immediate act of creativity. I imagine the ability to do so would be essential when you’re in the studio with Prince or David Byrne or the various other oversized personalities she’s managed to work with over the years. Even our audience members seemed to immediately trust her – that unique unsung talent of the best kinds of people who work behind the superstars in music.

There was a fair bit of talk about Prince at Ableton Loop. But in Barcelona, we got to focus on the mind itself – and as Susan emphasized backstage, how to define what music is in the first place. And that moves us into her work in cognition and the neuroscience that works to decipher it.

Susan is so uniquely positioned to understand this now, surrounded by young, hungry rising musical stars at Berklee atop her decades of experience.

But I also really hope we start more cross-disciplinary conversations about the topic. There’s a slide bringing up classical greats – musicology has been so caught up in comparing manuscripts and whatnot that I think there’s a vast opportunity for more interaction with fields like neuroscience. And some of what Susan describes about creativity and its variability, its interaction with depression and social isolation, the different kinds of aptitudes and thinking styles and what that means for collaboration, I suspect speaks to a lot of us on a deeply personal level. And that may be true in our lives even if we’re nothing like Prince.

Have a watch – I’m sure you’ll be as engaged throughout as I was onstage.

And I hope we look deeper into this, as what better mystery in music to explore than the mind?

Ranging from Neurology to Prince, Susan Rogers’ talk is must-watch

The post What makes music and creativity? A talk with Susan Rogers appeared first on CDM Create Digital Music.