Did you feel like your app is more sluggish, has much worse perceived latency, or oddly not as satisfying as other apps installed on the same device? The common sayings is that because Unity adds more audio latency, but did you know you could improve the input latency as well?
Responsive game translates to more 5 stars reviews for those who didn't care to write a review for you before. Existing 5 stars reviewers already has something specific to say, but things like snappiness and responsiveness must be felt. For players, it is probably hard to explain what's exactly the cause of this "fun experience" they are getting but they will come to your review page and just say that the game was fun. For us, it is time to make that magic happen by reducing latency.
I have confirmed by creating a basic Unity app which just play an audio on touch down vs. a basic Xcode iOS app which also play an audio on touch down. You can clone the project on GitHub to confirm this by yourself. The difference is clear as I touch the screen.
If you can't feel the difference, I encourage you to do a simple sound wave recording test. Keep your phone at the same distance from your computer's mic and tap the screen nail-down. The sound wave interval between the peak of nail's sound and the peak of response sound should immediately reveals the difference visually. Newer device might have a better latency, but the difference between Unity and native app should be relative for every devices.
Naturally many would suspect that Unity has added some audio processing pipeline. I proceed to develop a Native Audio Plugin which certainly help.
But why some of the difference still remains? What else could be the difference between this simple Unity and Xcode app knowing that Unity in fact output an Xcode project as well?
Thinking carefully, my app test sound on touch and turns out the real culprit is the touch handling. I found that instead of using either polling
Input.touches or waiting for uGUI's
TouchDown event (these two was benchmarked to arrive at the same frame, with event triggers coming before all
Update), letting the OS tell Unity directly ("callback") whenever a touch happen is faster.
Unity received those callbacks too and process them into
Input.touches you check in each frame. That touch collection changes every frame.
But did you know that natively, all input are handed to you by callbacks? Native Touch achieve lowest latency input by bringing back callback-style input handling to you in Unity, by hacking into the earliest line possible at native side that we are able to cleanly add a
static method callback point to C#, even before giving the input to Unity. In the Implementation page you will learn details of its inner workings to the point that you could just implement the whole plugin yourself. Native stuff are scary and being the most transparent to developers is important.
Comparing 2 approaches, Unity's
Input.___ state polling-based input API is easy to use for beginners, but a disadvantage is that it must be slower than the native callback-based API and who knows how many frames Unity take to make them available for polling since the touch's actual time. For you to see the states available right now in a particular frame, the callbacks must have happened in an earlier frame so Unity is able to update the states in the first place. What if we could just act right there instead? Native Touch is exactly that. Native side is sending callback, so Native Touch tries to make that linked to C#.
Native Touch is much harder to use. For example you lose the ability to check for continuously DOWN finger as in callback language that's just 1 down and that's it. In
Input.touches you could poll to see down state continuously every frame. And this callback might be in a completely different thread from Unity, where OS handle the touch! (It is the case with Android) Will you make this trade off?
Native Touch also provided you an API similar to
NativeTouch.touches, for platforms with bad callback efficiency like Android IL2CPP. You can still see a touch appearing in
NativeTouch.touches way faster than in
Input.touches though, because it doesn't process the touch. It just writes to memory area really fast.
By the way, for those not accustomed to native development on iOS and Android, since the beginning both has an entry point of any input as a callback and not state polling like Unity. It is clear that this callback way could be the fastest because it aligns closely with the native side.
After making an another test app which applies both Native Touch and Native Audio, the experience is now comparable to Xcode ones.
There are hundreds of existing iOS Unity games out there with the default touch delay and people are enjoying them without complaints. Tap, flick, swipe, pinch. They all worked properly. But it is because that is enough does not mean that is the best.
By handling the touch in that harder-to-use callback or even waiting to read the touches manually on
NativeTouch.touches, you could be getting the data earlier. Since callback occurs instantly, and if for some reason the callback is slow you could use
NativeTouch.touches which is just a memory area native side wrote to. As opposed to
Input.touches which who knows how much Unity prepared it for us.
Depending on platforms, if that callback moment happen to be faster enough to pass the frame threshold, you might even be able to use that touch data to do things like moving objects before the graphic submit occurs, and see visible difference. It is the reason you see red box can get ahead of yellow box in the video demo. Android is the platform that could benefit from this as touch callback was in a different thread and not in sync with Unity.
For perspective, 23ms is approximately an acceptable latency for a pianist. If you are able to get 1 frame advantage with Native Touch, it make a difference between acceptable and not acceptable responsive threshold for musicians. 16.67ms is also a score timing window of "Marvelous" judgement in Dance Dance Revolution. There are players that can stay focused inside this small window 666 times consecutively. For fighting game players, you know how important "a frame" is.
Or even, if your game is based on gesture action like a grid-type game or requires a complex line drawing as the main experience, it is always a good idea to improve an area that your player will be experiencing over and over. If you improve the core gameplay part, no matter how small an improvement it will multiplies and make a difference.
It might sounds strange at first that a faster touch fixes an audio latency problem. But thinking carefully all the bad audio experience starts from user's input. Be it a button feedback sound, a drumming app, or a music games that has a sound feedback. The keyword here is "feedback".
Common solution on Android to fix audio latency/delay is that you play the sound earlier to compensate the audio latency. (Have your player "calibrate" by themself) The problem is, usually that an audio was issued immediately as a result of player's interaction, which would be a touch unless you are making some kind of device shaking-only game.
You cannot move a response sound earlier in time, unless you are a psychic or use neural network to predict player's input. You have to wait for the touch, and probably more processing of that touch. (Is it at the correct position? At the correct moment? "Perfect" sound or "offbeat" sound?) Then finally playing it as soon as possible.
The audio latency can be fixed by other means like Native Audio, but that latency the player felt has already included touch processing time, until you can use that
if to test where did the touch go and play sound according to that. By using Native Touch to cut down that time you can get a FREE -16.67ms audio latency independently to how much audio latency your device currently has if you managed to land a frame advantage.
iTunes is a musical app. But do it need low latency play? No! No one cares if the song starts immediately on pressing play or not. The goodness is in the song and you are listening to it! In this case audio fidelity is the most important, and audio latency is not that of the concern. It is not an **interactive** audio application.
Also to prevent audio cracking because of buffer underrun they usually do so by increasing buffer size, which in turn increase latency. But music player do not care about latency, so that is generally a free quality.
Application like digital audio workstation (DAW) on mobile phone or live performing musical apps like Looper, Launchpad falls into this category. The app **is interactive**, but the reference of what is the "correct" timing are all controllable. Imagine you start a drum loop. Each sound might have delay based on device, but all delays are equal, results in a perfect sequence albeit variable start time. When starting another loops, it is 100% possible for the software to compensate and match the beat that is currently playing. This class of application is immune to mobile audio latency.
Apps like GarageBand (in live playing mode) is in this category. The sound have to respond when you touch the screen. A latency can impact experience, but if you are rehearsing by yourself you might be able to ignore the latency since if you play perfectly, the output sound will all have equal latency and will be perfect with a bit of delay.
Also remember that **all** instruments in the world do have their own latency. Imagine a piano which the sound must travel through its chamber, or an effected guitar with rather long chain, or a vocalist which heard their own voice immediately "in the head". How could all of them managed to play together in sync at the concert? With practice you will get used to the latency naturally! So this class is not as bad as it sounds with a bit of latency.
There are many music games on mobile phone made by Unity like Cytus, Deemo, Dynamix, VOEZ, Lanota, Arcaea, etc. If there is a sound feedback on hitting the note, this is the hardest class of the latency problem. Unlike Sequencer class, even though the song is predictable and the game know all the notes at all points in the song you cannot predict if the sound will play or not since it depends on player's performance. (Unless the sound is played regardless of hit or miss or bad judgement, then this class can be reduced to Sequencer class.)
It is harder than Instrument class, since now we have backing track playing as a reference and also a visual indicator. If you hit on time according to the visuals or music, you will get "Perfect" judgement but the sound will be off the backing track. When this happen, even though you get Perfect already you will automatically adapt to hit earlier to make that respond sound match with the song, in which case you will not get the Perfect judgement anymore. In the Instrument class, if you are live jamming with others this might happen too but if you adapt to hit early you can get accurate sound and not be punished by the judgement like in games.
Even a little bit of latency will be very obvious in a music game. Since there is a beat in the song for reference, players will be able to tell right away that he/she is hearing 2 separate sound (the beat in the song and the response sound) even if the player scores a perfect.
Also you can think music games works like instrument. Game with respond sound like Beatmania IIDX, Groove Coaster, or DJMAX Respect for example if you listen to the "button sound" of good players that tapped to the music, the tap sound is obviously **before** the audio, but they could consistently do that throughout the song that the response sound comes out perfectly in sync and get perfect scores. These games in a way, works like a real instrument. **You live with the latency** and get good.
On the contrary a game like Dance Dance Revolution has no response sound, and is properly calibrated so that you can step **exactly** to the music and get Marvelous judgement. If you go listen to footstep of good players, you can hear that it matches perfectly with the audio. In effect that means a game like this had already accounted for the time audio traveled from loudspeaker to your ear in the judgement calibration!
You used to poll for touch in Unity via
Input.___ but did you know when exactly those touches were made? We are forced to use in-frame time based on asking
Time.realTimeSinceStartup or calculating from
Time.deltaTime for so long since the creation of Unity that we feel was such a natural thing to do.
By that, however, you are assuming the touch happened right there at that line of code (
Time.realTimeSinceStartup) or at the beginning of this frame (
Time.deltaTime). Neither are true because for the touch to be available for polling right now it must had happened somewhere in the previous frame. This is also a weakness of polling-based API.
And so, in timing-based game you are favoring early-pressing players and punishing late-pressing players because rather than the time they touch the screen you are using the time in the next nearest frame.
In reality iOS and Android both natively gives you a touch timestamp in "time since phone startup" unit for each touch, but Unity discarded them. With an additional
NativeTouch.GetNativeTouchTime() method to ask the time of the same unit separately without any touches, you can anchor it to your game time and make meaningful decision with a timestamp that came with the touches.
Touch struct has processed and "unify" all possible data of all platforms for us, the structure of data inside it has been designed as a general catch-all solution. But did you know only in Android you can get the touch's "ellipse" area (not just circle) with getTouchMajor and getTouchMinor? On iOS no such thing exists, but you can get tapCount to detect double-tap, triple-tap natively which is not available on Android.
Native Touch contains "full mode" (EXPERIMENTAL) which reports all the data from iOS's
UITouch and Android's
MotionEvent API with no attempt to make them consistent. You are ready to get everything available on each platform's documentation at will and even those not available in Unity's convenience
Touch struct. When not in this full mode you get only the essentials, saving you more processing time required for asking the values and spend less time copying smaller data to method arguments. In the future if native side adds something more, you will be ready to get them in Unity by editing Native Touch's source code.
(The reason for EXPERIMENTAL is because I don't have sufficient device to extensively test it, like iPad Pro or Apple Pencil 2.)
What you get from
Input.touches is only an interpretation of the "ground truth" native callbacks that may happen more than 1 time before this frame. On both iOS and Android Unity has its own interpretation algorithm which "summarize" the callbacks but unfortunately discarded some movement details. With Native Touch, we can get all of the original movements reported by the OS. Look at this example, from the actual 9 callbacks we received from native compared with Unity's
Input.GetTouch only 3 representative data was derived, the rest discarded.
For extreme details of how native callbacks translated into what we see in
Input.touches, I welcome you to the Callback Details page. Even for non Native Touch user, have you ever wondered how differently Unity prepared the
Touch for us on each platform?
The plugin still send the touch through to normal Unity's input pipeline, so uGUI, Event Triggers, your raycasters, etc. still works. (But for sure we will receive our special callback or even reading off the touch buffer, sooner than that.) For advanced user, there is an option to completely disable Unity touch as well by returning immediately after the callback if you wish to spare Unity even more processing time.
While containing many false assumptions I have get my feet (fingers) wet with many native solutions and arrives that
AudioTrack (Android) and
AVAudioPlayer (iOS) are better than the rest. Watch the research video or read the first experiment's note here. The plugin is still in development at this point and I did not care about input latency at this time.
This time the detailed write-up was hosted on GitHub with repro project, I have found that touch/input latency also plays a role in the perceived audio latency. By going native not just audio but also on touch, we can get up to 1 frame earlier (can be 2 on Android) of touch response and that translate to FREE -16ms audio latency as long as that audio plays on touch response. (That's a lot when you "feel" it as a part of interaction)
A 1-take video which is my final proof that the plugin will have any benefit. At this point the first version of Native Touch is out along with "Native Audio" to also solve the audio latency together.
I have discovered a way to get faster touch on Android. Previously named "iOS Native Touch" I renamed it to just "Native Touch" with Android support. You are free to use Native Audio in combination if you want even more.