Text-To-3D

Published in

Artificial Intelligence in Plain English

6 min readFeb 20, 2024

Generative AI using Machine Learning is fast becoming a part of daily life for many computer-aided tasks, already in the film industry advanced AI techniques are being used in software such as Houdini.

Most recently it’s Stable-Diffusion that’s stolen the show, in a domain where Adversarial Generative Networks (GAN’s) ruled dominant for so long, creating the stunning images we see from online services such as CivitAI and NightCafe.

But Stable-Diffusion in hand with Language Models powered by Transformer networks has opened the door to an interesting new area of research in Machine Learning called generative 3D, typically generated from a text prompt or image, hence the two new terms taking 3D asset creation by storm: text-to-3D and image-to-3D.

Stable-Dreamfusion can be attributed as the turning point of mass adoption for such generative models, although more recently the State Of The Art (SOTA) in this domain is being maintained by a project named ThreeStudio.

There are a couple of online services currently that make the use of Text-To-3D and Image-To-3D a simple one-click experience, you input a text prompt or image and simply hit return and you get a selection of results to choose from and refine a higher quality asset. Most notably Luma Genie is leading the forefront in this industry at the moment with the best quality results and a completely free service — having recently been awarded a third round of funding totalling 68.5 million if you include the initial two rounds.

There are other services too such as Meshy.AI, Sudo’s 3Dgen and Tripo3D which are equally as capable however lack some of the finer details that Luma Genie provides, such as a mesh re-topology service — this can be quite an important factor because the outputs of these services tend to be a mess of uniformly spaced triangles and that’s not nice to work with or UV unwrap, but there’s no reason why you can’t do this as an external post-process using 3D re-topology software such as 3DCoat or Quad Remesher.

Even if you’re not looking to generate anything specific yourself, these services can be a great source of free 3D content for your project, be it film, games, animation, you name it. Usually, you will find the Discord servers relevant to these projects to be full of 3D content that other people have generated from text prompts and images, although the laws on copyright vary wildly from country to country, a topic I will be covering later on in this article.

So how does it all work? Typically the process works like this;

You start by generating a regular Stable-Diffusion output from a text prompt such as “full body of a woman tpose”

Once you have this image you will feed it into another network that has been trained by a dataset such as Objaverse-XL, most commonly this would be the Zero123plus model, although there are other similar variants such as the Zero123-XL model. These networks generate consistent images of the same object from different viewing angles.

Different view angles of the original image produced by Zero123plus

These images of multiple viewing angles are then fed into a Neural Radiance Field (NeRF) this will output a point-cloud of densities and NerfAcc by NeRF Studio is commonly used for this purpose by most projects. Although Nvidia’s instant-ngp is faster at the expense of quality.

Finally, the point cloud is turned into a triangulated mesh using Nvidia’s DMTet.

Future improvements to this technology would be in the area of edge refinement, because these meshes are generated from point clouds they tend to have soft edges making them all have a baby toy-like appearance when a lower quality NeRF is performed, increasing the amount of rays will reduce this effect somewhat but quickly becomes disproportionate to the compute time spent vs the quality of the output model. As such a post-process to detect and refine edges would go a long way — although this is easier said than done.

Retopology is important too and as mentioned above Luma Genie is already paving the way in this area.

So this leads us to the final topic of this article, the laws concerning AI generated artworks. 🌶️🌶️🌶️ (I’ll only be covering a few countries)

China

“a court in the People’s Republic of China decided that a work generated with Stable Diffusion had copyright, and therefore the author could sue for copyright infringement.”

Reference: Andres Guadamuz, A.G. December 9, 2023. Chinese court declares that AI-generated image has copyright: Technollama.

United States of America

“A work of art created by artificial intelligence without any human input cannot be copyrighted under U.S. law, a U.S. court in Washington, D.C., has ruled.”

Reference: Blake Brittain, B.B. August 21, 2023 7:29 PM GMT+1. AI-generated art cannot receive copyrights, US court says: Reuters.

A friend of mine pointed out how silly this is because at what percentage of computer-aided design would be considered eligible for copyright? Maybe the devil is in the details.

United Kingdom

“UK law allows for copyright in ‘computer-generated works’, with a reduced term of 50 years from when it was created (as opposed to the 70 years from the death of the author in the case of artwork created by a human)”

Reference: 14 September 2023. UK Copyright in AI-Generated Artwork: StrachanIP.

But this is where the law gets a little bit obscure, as the article points out that it is difficult to say who actually owns the artwork because it would default down to the first owner of the copyright and since the artwork is generated from a dataset of copyright human-made assets (Objaverse-XL) it’s not so clear where copyright would be attributed in a court of law.

Australia

“Since this technology is so new, it is not clear that works created with the help of AI will be protected by copyright. As a general rule, a work can only be protected by copyright in Australia if there is a human author who contributed ‘independent intellectual effort’. Because of this, it is possible that works generated by AI which don’t have enough human input won’t be protected by copyright.”

Reference: Artificial Intelligence (AI) and Copyright: Artslaw.

Very much the same situation as the United Kingdom, an obscure reference to what percentile of human input was involved. Not ideal, and there is yet to be solid legislation to outline these parameters. This fact sheet from Legalvision may help.

I hope you found the read informative and if you would like to learn more about this topic I have more detailed information in my Itch.io forum thread here or if you’d just like to download some free 3D assets generated by Luma Genie and Meshy.AI I have two hand picked asset packs of the best content produced by these services over specific time period here and here.

In Plain English 🚀

Thank you for being a part of the In Plain English community! Before you go:

Be sure to clap and follow the writer ️👏️️
Follow us: X | LinkedIn | YouTube | Discord | Newsletter
Visit our other platforms: Stackademic | CoFeed | Venture
More content at PlainEnglish.io