Starting an app with a virtual or augmented reality component is a bit daunting for most. When starting out, frameworks like Unity3D or Unreal Engine simplify a great many things. But, from an Android developer perspective, you need to forget all about the nice APIs and libraries you are used to. You will need to shoehorn your business logic in frameworks that are designed to build games first, not apps — not a great proposition.
In this talk from 360|AnDev, Etienne Caron helps you explore the path less traveled! He shows how you can build VR and AR applications in Java and Android Studio thanks to Project Tango’s Java SDKs, plus a few helpful 3D graphics libraries.
Introduction (0:00)
My name is Etienne Caron. I am a member of Google’s Developer Expert Program, and I also work for an awesome Canadian company called Shopify.
Project Tango is a platform that gives devices the capacity to understand their position in space. It does this by leveraging computer vision and specialty sensors.
The sensors, depending on the device, are not necessarily the same. These are abstracted from your code. Our goal today is to learn how to make a Tango app. First, we are going to look at how we can map a real space to a virtual one. Then we are going to populate that space with virtual objects. Finally, we are going to see how to share our creation with the world.
Tango (1:10)
Why make a Tango app to start with? The first Tango consumer device will be the Lenovo Phab 2 Pro (coming out November 2016). Right away, should you try develop a mass-market app for Tango devices? The answer is probably no. At least not yet. This is, for now, a specialty device.
Let’s rule out ideas that depend on mass-marketed options; try to think more of consultancy type scenarios, where your consumers or customers will be buying the Tango devices and probably lending them out to their own customers. The type of use-case ideas you are going to get there are: real estate, interior decoration, furniture sales, in-store analytics, VR content production (such as real-time sharing), and augmented museum exhibits.
Motion Tracking (3:09)
Tango APIs are awesome to map out real world spaces. Let’s see how we can use them and use Tango’s motion tracking and depth perception features to that effect.
Tango provides the position and the orientation of a user’s device in full six degrees of freedom. It is the combination of position and orientation that is referred to as the device’s pose.
protected void onResume() {
super.onResume();
if (!isConnected) {
tango = new Tango(ControlRoomActivity.this, () -> {
try {
TangoSupport.initialize();
connectTango();
isConnected = true;
} catch (TangoOutOfDateException e) {
Log.e(TAG, getString(R.string.exception_out_of_date), e);
}
});
}
}
This is your typical, boring, API initialization code. The process starts onResume
, where you will be creating a new Tango object. At that moment, you are going to pass that constructor a runnable, and the configuration of the Tango object instance will be taking place in that runnable, that will be executed from the constructor.
On the flip side, when you hit onPause
, we have to disconnect from the Tango service. That is important because Tango is resource hungry. If you lock up the services, it tends to get crashy quickly.
You might have noticed the synchronized
blocks. These are a bit all over the place in Tango sample code. My personal sample codes will get rid of them, but I did not feel very confident playing around with these things. Let’s just ignore those conveniently for now.
Typically, you are going to get data from Tango via callbacks. We have an example here, with the Tango connect listeners.
protected void onPause() {
super.onPause();
synchronized (this) {
if (isConnected) {
tango.disconnectCamera(TANGO_CAMERA_COLOR);
tango.disconnect();
isConnected = false;
}
}
}
That is the main listener handler you are going to be subscribing to. The first handler is for onPause
. As your device moves through 3D space, it is calculating its position and orientation up to 100 times a second. This pose data is returned in two parts. onPoseAvailable
, the position vector in meters. That first callback is where we get the position of the device, again. Then the second one, onXyzIjAvailable
gives you the orientation of the device.
tango.connectListener(framePairs, new OnTangoUpdateListener() {
@Override
public void onPoseAvailable(TangoPoseData pose) {
// We could process pose data here, but we are not
// directly using onPoseAvailable() for this app.
logPose(pose);
}
...
tango.connectListener(framePairs, new OnTangoUpdateListener() {
@Override
public void onPoseAvailable(TangoPoseData pose) {
// We could process pose data here, but we are not
// directly using onPoseAvailable() for this app.
logPose(pose);
}
@Override
public void onXyzIjAvailable(TangoXyzIjData xyzIj) {
// Save the cloud and point data for later use.
tangoPointCloudManager.updateXyzIj(xyzIj);
}
...
Euclidean Space (6:00)
We are working in three dimensional Euclidean space (a fancy way of saying that we have points in our space and they are placed relative to an X, Y, and Z axis). In geometry, a vertex is a point in space, but in computer graphics, it is used to mean a bag of things that relate to a point. A very generic data structure could describe simply the position of a point in 2D or 3D space, but it is often used to pass around vertex-related attributes, such as color. A quaternion is the quotient of two vectors which, for all intents and purposes, fit in an area of four doubles. Instead of quaternions, you could define orientation by specifying rotation along the X, Y, and Z axes.
Vertex (6:12)
However, you have to be careful of gimbal lock. Gimbal lock makes it mathematically harder to define an orientation. Sometimes you’ll try to apply a rotation, and you are going to be locked into a pattern. Quaternions are not subject to gimbal lock, and are simpler to compose.
Quaternion math is a bit more intense, but essentially the APIs do most of the heavy lifting. You just need to know what quaternions are, what they represent, and how they fit in. For example, a four double array. Thanks to motion tracking, we have an idea where the tablet is in space. We are going to need to find out what is out there in the world as well, and that is where depth perception APIs come in. This API can tell us the shape of things around us.
Tango services provide a function to get depth data in the form of point clouds. This format gives us vertices, X, Y, Z coordinates for as many points in the scene as are possible to calculate for the tablet at that point. Each dimension is a floating point value recording the position of each of these points in meters in the coordinate frame of the depth-sensing camera itself. The device knowing the location of itself and the camera, has an idea of where these points are and how far they are from the camera.
Depth Perception (7:30)
When you combine depth-perception and motion-tracing APIs, you can start building a full picture of space around you.
If you have had a chance to play around with the Tango tablet, you have probably seen the Tango Constructor application which allows you to create detailed 3D models of rooms and their contents. We’d like to allow our users to create a scene-overlay for the room they are standing in. That requires us to show, first, a live feed of the camera on-screen. Then, by moving about the room with the Tango device, our users can tap on the surface they see on the screen to place virtual objects on that plane.
// Use default configuration for Tango Service, plus low latency IMU integration.
TangoConfig config = tango.getConfig(TangoConfig.CONFIG_TYPE_DEFAULT);
config.putBoolean(TangoConfig.KEY_BOOLEAN_DEPTH, true);
config.putBoolean(TangoConfig.KEY_BOOLEAN_COLORCAMERA, true);
// NOTE: Low latency integration is necessary to achieve a precise alignment of
// NOTE: virtual objects with the RBG image and produce a good AR effect. config.putBoolean(TangoConfig.KEY_BOOLEAN_LOWLATENCYIMUINTEGRATION, true);
// NOTE: These are extra motion tracking flags. config.putBoolean(TangoConfig.KEY_BOOLEAN_MOTIONTRACKING, true); config.putBoolean(TangoConfig.KEY_BOOLEAN_AUTORECOVERY, true);
tango.connect(config);
tango.connectListener(framePairs, new OnTangoUpdateListener() {
public void onPoseAvailable(TangoPoseData pose) {
// We could process pose data here, but we are not
// directly using onPoseAvailable() for this app.
logPose(pose);
}
public void onFrameAvailable(int cameraId) {
// Check if the frame available is for the camera we want and update its frame on the view.
if (cameraId == TangoCameraIntrinsics.TANGO_CAMERA_COLOR) {
// Mark a camera frame is available for rendering in the OpenGL thread
isFrameAvailableTangoThread.set(true);
surfaceView.requestRender();
}
}
public void onXyzIjAvailable(TangoXyzIjData xyzIj) {
// Save the cloud and point data for later use.
tangoPointCloudManager.updateXyzIj(xyzIj);
}
public void onTangoEvent(TangoEvent event) {
// Information about events that occur in the Tango system.
// Allows you to monitor the health of services at runtime.
}
});
First, we want to switch on Tango services that we need. We have touched on the depth system that provides us with a point cloud, and we are also going to need a color camera feed, and something called low latency support. Low latency is required if you want to take the frames from the camera and achieve a very precise alignment with the virtual objects, the point cloud, and that camera feed.
Once setup is done, we set up our Tango update listeners. Note that the update listener that is used to subscribe not only to motion tracking, but also to get it for image and event data. Event data has more to do with the state of services. Tango is not always able to tell where it is, there is sometimes drift. (If there is a bright light flashing into the device, it might not understand where it is.) You can be made aware of these events through that listener.
onFrameAvailable is triggered every time a new image is available from the RGB or the fisheye cameras, and when we get frame data we let our
surfaceView know about it. The
surfaceView will take care of updating itself. It will call
onXyzIjAvailable`, that gets called every time a new point cloud is available.
The timing is not necessarily always precisely the same thing. We are not necessarily working off the same thread. There are no promises made there.
public void onFrameAvailable(int cameraId) {
// Check if frame is for the right camera
if (cameraId == TangoCameraIntrinsics.TANGO_CAMERA_COLOR) {
// Mark a camera frame is available for rendering
isFrameAvailableTangoThread.set(true);
surfaceView.requestRender();
}
}
public void onXyzIjAvailable(TangoXyzIjData xyzIj) {
// Save the cloud and point data for later use.
tangoPointCloudManager.updateXyzIj(xyzIj);
}
Here we use a Tango support class, called tangoPointCloudManager
, to store the information of the point cloud for future use, and then we try to match up frames from the camera.
You might be wondering what work is needed to turn a touch event on a screen into a plane in 3D space. It is a lot less than you might think. When a user taps on the screen, we call the doFitPlane
method with a pair of coordinates, u and v, and a timestamp. The timestamp is necessary to match up that touch event with the point cloud data and the camera data. We first ask the point cloud manager for the latest point cloud, in the doFitPlane
method. Then using the Tango support classes, we get an instance of Tango pose data calculated for us. This relative pose tells us where the color camera is relative to the depth information.
Tango has all the facilities to match up this information that might be a bit disjointed. We still need to manipulate it, but we are not doing any of the actual work. fitPlaneModelNearClick
gets us the plane at the touch coordinates we provided. We only need to keep track of which is which, pass to the right methods, and follow the process. This is almost boilerplate code, really.
One last conversion step is needed to map from the Tango system into an OpenGL-based system. Focus on the transform, and then you can see that we have intersection points, plane model, and the matrix. All of this calculated together, at the very end we get an OpenGLTPplane
. We get a surface, where the user clicked in the world. You click on the screen, and it tells you in 3D, in front of you, where that click has landed.
public boolean onTouch(View view, MotionEvent motionEvent) {
if (motionEvent.getAction() == MotionEvent.ACTION_UP) {
// Calculate click location in u,v (0;1) coordinates.
float u = motionEvent.getX() / view.getWidth();
float v = motionEvent.getY() / view.getHeight();
try {
float[] planeFitTransform;
synchronized (this) {
planeFitTransform = doFitPlane(u, v, rgbTimestampGlThread);
}
if (planeFitTransform != null) {
// Update the position of the rendered cube
// to the pose of the detected plane
renderer.updateObjectPose(planeFitTransform);
}
} catch (TangoException t) {
...
/**
* Use the TangoSupport library with point cloud data to calculate the
plane
* of the world feature pointed at the location the camera is looking.
* It returns the transform of the fitted plane in a double array.
*/
private float[] doFitPlane(float u, float v, double rgbTimestamp) {
TangoXyzIjData xyzIj = tangoPointCloudManager.getLatestXyzIj();
if (xyzIj == null) {
return null;
}
...
// We need to calculate the transform between the color camera at the
// time the user clicked, and the depth camera at the time the depth
// cloud was acquired.
TangoPoseData colorTdepthPose =
TangoSupport.calculateRelativePose(
rgbTimestamp, TangoPoseData.COORDINATE_FRAME_CAMERA_COLOR,
xyzIj.timestamp, TangoPoseData.COORDINATE_FRAME_CAMERA_DEPTH);
// Perform plane fitting with the latest available point cloud data.
IntersectionPointPlaneModelPair intersectionPointPlaneModelPair =
TangoSupport.fitPlaneModelNearClick(
xyzIj, tangoCameraIntrinsics, colorTdepthPose, u, v);
// Get the transform from depth camera to OpenGL world at
// the timestamp of the cloud.
TangoMatrixTransformData transform =
TangoSupport.getMatrixTransformAtTime(
xyzIj.timestamp,
TangoPoseData.COORDINATE_FRAME_START_OF_SERVICE,
TangoPoseData.COORDINATE_FRAME_CAMERA_DEPTH,
TANGO_SUPPORT_ENGINE_OPENGL,
TANGO_SUPPORT_ENGINE_TANGO);
if (transform.statusCode == TangoPoseData.POSE_VALID) {
float[] openGlTPlane = calculatePlaneTransform(
intersectionPointPlaneModelPair.intersectionPoint,
intersectionPointPlaneModelPair.planeModel, transform.matrix);
return openGlTPlane;
} else {
...
How do we actually render all of this nice information on screen? Thankfully, we have a nice little library called Rajawali.
Rajawali (13:11)
Going back to our touch method, and our activity, we can see that we are passing the coordinates we have just calculated to a renderer object. Renderer is where most of the Rajawali code lives. Without venturing too deep in OpenGL language, it is good to understand that Rajawali is built on top of OpenGL for Android. It follows a similar object model as to what you see here.
public boolean onTouch(View view, MotionEvent motionEvent) {
if (motionEvent.getAction() == MotionEvent.ACTION_UP) {
// Calculate click location in u,v (0;1) coordinates.
float u = motionEvent.getX() / view.getWidth();
float v = motionEvent.getY() / view.getHeight();
try {
float[] planeFitTransform;
synchronized (this) {
planeFitTransform = doFitPlane(u, v, rgbTimestampGlThread);
}
if (planeFitTransform != null) {
// Update the position of the rendered cube
// to the pose of the detected plane
renderer.updateObjectPose(planeFitTransform);
}
} catch (TangoException t) {
...
You have something called the GLSurfaceView, that is descendant of a View, which is a thing that is placed on your screen where the rendering is going to take place. And you have something called a GLSurfaceView.Renderer
(which you will be indirectly implementing), where all the drawing is happening, and all the rendering calls are being done.
On Android GLSurfaceView are special views that take that render object we just showed. But rendering is not happening on the main thread. GL surfaces have their own dedicated thread. Make sure that they are very performant and can hit those 60 frames per second. It is not something we have to deal with typically, but when you start passing information to your renderer it is where you need to be careful, because you can run into weird race conditions: you are basically working with a second main thread. A second UI thread is typically not a big problem (if you just pay attention to it), but it is definitely worth knowing about it.
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
logTextView = (TextView) findViewById(R.id.log_text);
surfaceView = new RajawaliSurfaceView(this);
renderer = new ControlRoomRenderer(this);
surfaceView.setOnTouchListener(this);
surfaceView.setSurfaceRenderer(renderer);
((LinearLayout)findViewById(R.id.parent)).addView(surfaceView);
tangoPointCloudManager = new TangoPointCloudManager();
}
protected void initScene() {
// A quad covering the whole background, where the
// Tango color camera contents will be rendered.
ScreenQuad backgroundQuad = new ScreenQuad();
Material tangoCameraMaterial = new Material();
tangoCameraMaterial.setColorInfluence(0);
// We need to use Rajawali's {@code StreamingTexture} to set up
// GL_TEXTURE_EXTERNAL_OES rendering
tangoCameraTexture =
new StreamingTexture("camera",
(StreamingTexture.ISurfaceListener) null);
try {
tangoCameraMaterial.addTexture(tangoCameraTexture);
backgroundQuad.setMaterial(tangoCameraMaterial);
} catch (ATexture.TextureException e) {
Log.e(TAG, "Exception creating texture for RGB camera contents", e);
}
getCurrentScene().addChildAt(backgroundQuad, 0);
// Add a directional light in an arbitrary direction.
DirectionalLight light = new DirectionalLight(1, -0.5, -1);
light.setColor(1, 1, 1);
light.setPower(1.2f);
light.setPosition(0, 10, 0);
getCurrentScene().addLight(light);
private Material buildMaterial(int color) {
Material material = new Material();
material.setColor(color);
material.enableLighting(true);
material.setDiffuseMethod(new DiffuseMethod.Lambert());
material.setSpecularMethod(new SpecularMethod.Phong());
return material;
}
// Build a Sphere
sphere = new Sphere(0.25f,20,20);
sphere.setMaterial(sphereMaterial);
sphere.setPosition(0, 0, 0);
sphere.setVisible(false);
getCurrentScene().addChild(sphere);
/**
* Save the updated plane fit pose to update the AR object on the next render pass.
* This is synchronized against concurrent access in the render loop above.
*/
public synchronized void updateObjectPose(float[] planeFitTransform) {
objectTransform = new Matrix4(planeFitTransform);
objectPoseUpdated = true;
}
protected void onRender(long elapsedRealTime, double deltaTime) {
// Update the AR object if necessary
// Synchronize against concurrent access with the setter below.
synchronized (this) {
if (objectPoseUpdated) {
sphere.setPosition(objectTransform.getTranslation());
sphere.setOrientation(
new Quaternion().fromMatrix(objectTransform).conjugate());
sphere.moveForward(0.25f);
sphere.setVisible(true);
objectPoseUpdated = false;
}
}
super.onRender(elapsedRealTime, deltaTime);
}
We can see here that the onCreate
where we are going to be creating our surface view instance - the Rajawali surface view, which is a descendant of the GL surface view. And we assign a new renderer to it (the renderer does the draw calls and does the actual work).
Rajawali renderers are where most of the magic happens. Much of the code that you are going to have to write for Rajawali renderers happen in this method called initScene
, that you are going to overload and implement. You can see here how we hook up the video camera feeds for the Tango example, where we want to overlay objects in our video feed. We need to show that video feed somewhere.
The first building block of AR is here. We want to see the real world. The background quad object that we are creating here is the first that is going to be added to the scene, and you can think of it as a projection screen that is going to be in the backdrop. That is where all of the pixels from the camera are going to be drawn from the video feed. Then we are going to be initializing a couple different components that are needed in the 3D scene that is going to be overlaid over all of this.
You always need a source of lighting in 3D. Directional light here acts very much like the sun. At construction we specify the direction it will point towards, and then we can configure, the light’s color. Typically we leave that white. You can define its power and positioning. Then we have to add it to a scene (that is, the object we are dealing with Rajawali). That is where everything is living. You can think of it as your top view.
Materials are very important, as that is what our 3D objects are going to be made of. When the light hits these objects, it is going to be reflected in a certain way and materials are going to decide how that will happen. You have many options available. Typically you can set a color, or you can set a texture, whatever lighting effects you want. You can decide if you want your object to react to light or only show a texture. Materials are then assigned to objects.
We see an example: a sphere material that is being assigned to a sphere object. The sphere object is fairly simple to create basic geometric shapes in 3D. We specify the size. The first parameter is the radius of that sphere, and the other two parameters are the number of faces that are going to be composing that sphere. It is going to be smooth, and you get a complex object (or a rougher one, that is how you decide). You set the material; in this case, we set position to zero because in the examples we are going to be showing, we are not actually showing the objects first until they are placed into the world. We also set them to visible false. You can see parallels with views here. Then we get the current scene, we add the child to that, and our sphere is part of our hierarchy now.
To pose an object (a plane, a geometric shape or a more complex 3D object) involves the same procedure. Each time, you calculate something called a plane-fit transform from that screen touch event, and you call updateObjectPose
in our renderer.
Here we are going to assign the that is going to say where the objects will land, and we’ll flag the change for the render loop that comes later on. The render loop would look at that flag and go, “I have something to do now. I should be placing that object.” Very often, you create a scene you might not even implement on render at all in your renderer.
First we set the position. We are going to see that the sample code places a few things in the scene. Once we are done, we put the objectPoseUpdated
back to false and wait for other events before we try to change things around. Not to waste any cycles, and we are going to call our parents on render, which is doing all the hard work of displaying the objects we created and placed in that scene.
Some extra operations can be necessary to properly place virtual objects in the real world. For example, orientation is awesome if you grab a plane on the wall with Tango, but it is not always what you might expect. You may have to create affordances. For a piece of furniture, you might have to create some way for the user to click twice and reposition, or rotate that plane on the floor.
public void updateRenderCameraPose(TangoPoseData cameraPose) {
float[] = cameraPose.getTranslationAsFloats();
float[] = cameraPose.getRotationAsFloats();
getCurrentCamera().setPosition(translation[0], translation[1], translation[2]);
Quaternion quaternion = new Quaternion( getCurrentCamera().setRotation(
rotation[3], rotation[0], rotation[1], rotation[2]);
getCurrentCamera().setRotation(quaternion.conjugate());
}
/**
* Sets the projection matrix for the scen camera to match the
* parameters of the color camera,
* provided by the {@code TangoCameraIntrinsics}.
*/
public void setProjectionMatrix(TangoCameraIntrinsics intrinsics) {
Matrix4 projectionMatrix = ScenePoseCalculator.calculateProjectionMatrix(
intrinsics.width, intrinsics.height,
intrinsics.fx, intrinsics.fy, intrinsics.cx, intrinsics.cy);
getCurrentCamera().setProjectionMatrix(projectionMatrix);
}
The Tango tablet itself is moving through space. Its view on the world needs to constantly update for the augmented reality effect to be complete. updateRenderCameraPose
receives the device’s pose data. We have the position, the orientation, and you will note that the orientation data needs some extra massaging to be compatible with Rajawali.
There are a couple more manipulations for the camera and VR renderings to stay in tune. Thankfully, we have the systems helping to keep everything in sync.
Blender (23:55)
Blender is an open source and free editor, and it’s my editor of choice.
Rajawali supports a decent amount of 3D file formats, but when I have tested them, the parsers provided proveed a bit finicky. I landed on using OBG ones. That works, it is rock solid. It does not have all the nice features that you might want if you are a professional animator, but it will get you most of the way.
OBG is one of the formats that Rajawali seems to understand with the smallest amount of drama. Blender is able to import and export these formats nicely. Blender is complicated, but the good news is you can get your minimum viable dose of information quickly.
private Object3D buildOBJ() {
Object3D o ;
LoaderOBJ objParser = new LoaderOBJ(
mContext.getResources(), mTextureManager,
R.raw.simple_tree_obj);
try {
objParser.parse();
} catch (ParsingException e) {
e.printStackTrace();
}
o = objParser.getParsedObject();
o.setPosition(0,-8,-1);
getCurrentScene().addChild(o);
Animation3D anim = new RotateOnAxisAnimation(Vector3.Axis.Y, 360);
anim.setDurationMilliseconds(16000);
anim.setRepeatMode(Animation.RepeatMode.INFINITE);
anim.setTransformable3D(o);
getCurrentScene().registerAnimation(anim);
anim.play();
In the File menu, there is an import menu. If you know about the Cardboard Design Labs by Google, they have released a Unity project that has all the models in there and they are very little, nice low-polycount models that allow you to create nice little scenes of environments in mountain-type environments. Blender supports importing these FBXs.
Demo (33:48)
See the above video for a demo of putting these concepts into action.
Q&A (40:50)
Q: Is Tango hardware required for any of this? Etienne: For the AR component, sadly, yes. There are some SDKs that allow you to do something similar with computer vision. There is something called Vuforia, but I would not necessarily recommend jumping into it. It is a multiplatform SDK, and it’s wonky on Android, in my opinion. OpenCV is probably something I would look at. If we look at Pokemon Go, when you set up AR, it is the technique it is using. You can expect that result. If you think it is clunky, or maybe not that great, then it is probably what you are going to get out of OpenCV. But there are ways of doing this. Now, for anything that is 3D, which is why I was focused also on Daydream and Google VR SDKs, that works fine on any regular device. All you need is the Cardboard headsets, or if you want to buy a fancy one it might be worth mentioning that these are actually what the people working on Daydream are using with the Nexus 6ps. If you have a spare Nexus 5 hanging around, you can use it as a remote. You have the full stack to start developing right now.
Receive news and updates from Realm straight to your inbox