I’m sitting by the ocean in Toucheng, in Taiwan, and really the thing I want to talk about is my experiences at the residency. And in some ways the biggest thing that I learned was that there’s an enormous breadth of hardware that people are working on and that’s really exciting. And all the new ways in which people are working with hardware. And for me the experience and desire to come to Shenzhen was primarily driven by my curiosity of how things are made in the world, stemming from me having made things previously. From surfboards to backpacks to bike bags, garments, etc., as well as software.
I just have this curiosity and fascination with how do you go from the stage of an idea to a prototype to small scale, past prototype, and then ultimately into large scale manufacturing. And what is that pipeline like? And how is it accessible? And how does it become more accessible to people? That was kind of the main question that I was coming in with.
And in addition I had some ideas for hardware that I was interested in as well. And I think a lot of it was more an exploration, at least of the ideas, seeing what’s possible around them rather than executing the ideas themselves.
However, there was one which was the focus, which was hardware speech to text device and primarily I figured we have these models and it would be possible to do that inference all on some device. Instead of having it have to be a piece of software that I maintain. Largely, distribution of accelerated computing is hard. And I almost wanted to sidestep the whole thing by just saying put it into a hardware device and call it a day.
And I will share the learnings from that. Which is, yes, it is possible. It depends what you’re looking for. And that was also the exploration I was doing is trying to figure out what I was looking for in a device that would do speech to text on hardware. And I really wasn’t sure what this would be like. I really wasn’t sure. I didn’t know if it was going to be something where it would do all offline inference, or if we would do online inference, or if it would do different modalities of streaming inference. And so I experimented with them all. I began with the chips and boards that we had available. And I chose generally the Rockchip platform because it seemed like it had fairly powerful CPUs as well as an NPU to work with, if I wanted. And so I experimented.
I went to benchmarks, and I was working primarily with the RADXA Zero 3 and RADXA ROCK 4 and this has the RK3566 and RK3576 respectively. And learned a lot. Like, yeah, you want powerful stuff. And right now the acceleration is not good, though you can run stuff on CPU only and it’s okay. It’s okay, not fantastic. So, largely, that’s what I explored.
And I also really wanted to explore form factors, because something like this, I think there’s a very obvious form factors in a way of like you could put in a microphone, or a desktop microphone kind of a form factor and that would be reasonable. But boring. And I really think computers should be more fun and especially seeing people experimenting and the stuff that you could find on Taobao, shout out Kelin, and how to build this thing and put it into an unconventional form factor so I went kind of exploring for things that I found maybe interesting and I have some photos of these.
And I kind of settled on this, like, lamp, which was an octopus lamp. And basically the idea is, you’d touch it, and you’d start talking to it. And again shout out Kelin, as you’re talking to it, it would show your voice activity and also be typing for you. And so where this project got to during the residency was everything works. You can talk to it, it types. I decided on streaming inference, though sentence by sentence is also not bad. And yeah, where I settled is basically like I have a button. I took apart the octopus thing, ordered new parts, got the parts, got it working, blah blah blah. And the only thing that doesn’t really work reliably is like plugging it into a computer and having it start by default.Why this is, I’m still a bit unsure, but I will show demos of it working properly. Queue the demos.
So, and really like, what’s next is, I think getting the full thing working. I want a nice small demo. And then I also think, like, there’s some interesting thing to be said about this maybe in a keyboard form factor. Or maybe there’s a dedicated push to talk button on your keyboard. And it actually does all of the inference on your keyboard itself. I think that’s the best form factor. So I’m like maybe interested in exploring that at some point using a Compute Module for doing this and having to attach to a PCB. I think that would be pretty curious. So may explore that in the future and also have everything go through a microcontroller for the keyboard input itself because doing this on a full Linux machine is really weird and awkward. And I think probably doing it via microcontroller makes way more sense.
So, yeah, but otherwise, like, I think after I started talking to Jonathan about Tiles, the Tiles stuff really interested me because it was another project that I was curious about, which is like, generally, I really love my smart watch, my Garmin. And the biggest problem with it is that I have to click a button to record activities. And I really would love to prototype a small wearable. Also, shout out to Andy Kong (Chargerless) that basically is recording all of this data all the time. And I never have to click that button. It just knows if I start climbing or if I start swimming or whatever, because like we have enough data overall. And I think like I want to have enough data over on myself where we can easily infer these things and have a model infer these things. Like, if I go to the ocean and I jump in the water or like you see that I’m like paddling for surfing or whatever it is, you should know that I’m surfing. It’s like. Like, obviously, it’s harder than just that, but that’s the idea. Same thing swimming. That’s like if I if you see my location like all of a sudden pop up at the Nanshan sports complex and like then all of a sudden the IMU data looks like swimming roughly, be pretty cool to like know that that’s swimming and have it be post processed after the fact rather than necessarily needing to do it all on device. Because quite frankly, I don’t care to look at it on device most of the time. I want to see it somewhere else and be processed somewhere else. I really want to prototype this at some point, or maybe I’ll just end up using Andy’s chargerless device for this if possible. But yeah, I just like this idea and I want to like it got me interested in Tiles basically.
And Jonathan and I talked about a variety of things, this being one of the projects, also another project where you kind of have a thing that’s recording. Another wearable that’s just recording something really simple. And also it does processing after the fact. Also all local offline. Since I have a big speech to text background. And yeah, I am kind of curious about that. And I think it’s interesting to explore hardware that is hard and think of creative ways to mess around with the hardware as opposed to doing the easiest thing possible. Just calling an API. Largely this is what I was thinking about.
And also I think some of the factory tours were incredible. I went to the textile factory or textile market with Yi, going to Huaqiangbei multiple times. As well as getting to go to the Seeed Studio factory with Kelin and Evan and get to actually see that assembly line was really cool. And then getting to go to see the textile factory in particular. It was really fascinating to me and shout out to Lauren for organizing that. Yeah, overall wonderful, wonderful, wonderful experience.