Moderated by: Yiliang Zhao & Amy Zhao

Openforum Introduction

The Openspace Ventures team consists of 24 members from 12 different nationalities. Before venture capital, we came from different backgrounds such as technology, private equity, banking, strategy consulting and media. Every week we get together virtually to “shoot the breeze” in a moderated session on a topic of interest. Sometimes it is core to our business of investing in Southeast Asian technology. Other times it is tangential – either way it helps us to talk it through Openspace style and distill a few more things about our market, our companies and ourselves.

TikTok, The Level Playing Field

Tiktok is a short-form video sharing App that allows users to create and share 15 – 60 seconds videos. Since launching in 2016, ittook the world by storm, growing to 2 billion downloads and 800 million MAUs. While facebook and Twitter took 10-15 years to reach 1 billion users, TikTok took slightly more than two years. It is probably one of the only few things that Gen Z and 75-year old politicians in the White House can talk about in common (although their interests in the App are for very different reasons). From a quick show of hands, most of the Openspace team members have used TikTok, some have even created content on the platform. The remainder who are too skeptical to install the App on their phones, seem to hold the perennial fear that when they stare into TikTok, some “wicked” AI is staring back into them. Nonetheless, we thought it would be an interesting topic to explore, for us to distil some key insights and draw inspirations from this viral short video platform – possibly the only consumer internet company that has managed to pierce through cultural walls and amass a vast audience in both US and China (and beyond) in such rapid pace. Now with Openspace’s in-house PhD and AI expert Dr. Zhao Yiliang, there seems to be no better time than now to do a deep dive into TikTok and perhaps attempt to deconstruct its mysterious AI algorithm.

Before we started the discussion, a survey was sent out to each team member with one question: “In one word or sentence, what do you think is TikTok’s magic/biggest success factor?

Figure 1 shows the word cloud of survey responses. We have to admit that the answers are far more interesting than we initially expected (praise the Openspace creativity!). However, the biggest word is predictably “algorithm”. Let’s bear this in mind when we first explore the history and the product.

Figure 1 Word Cloud of Survey Responses

History & Timeline

Figure 2 History and Milestones of TikTok

In 2016, ByteDance started a short video App Douyin (抖音) in China. Before that, another App from ByteDance – TouTiao (头条), had already emerged to be the top news aggregator in China. TouTiao was one of the first news App to rely heavily on AI algorithm to generate highly targeted news feed to its users. ByteDance drew inspiration from the success of TouTiao and applied that to Douyin. Meanwhile, in the US, a lip sync video platform Musical.ly founded by Chinese entrepreneurs Alex Zhu and Lewis Yang were taking the teenage world by storm.

In 2017, ByteDance launched Douyin’s international version TikTok for the global market and subsequently bought Musical.ly for $1 billion in the same year. Rather interestingly, instead of merging the two Apps at the back end, ByteDance kept TikTok as a separate App and asked Musical.ly users (~100 million then) to sign up on TikTok. They also spent $1 billion in marketing campaigns everywhere to ask users to download TikTok.

Fast forward to today, TikTok has achieved some amazing numbers:

Figure 3 Growth Statistics of TikTok

TikTok’s MAUs grew much faster than the rest of the platforms. In the US market, its MAU number is much higher than that of Instagram and overtook Twitter and WhatsApp in September 2020.

Figure 4 Comparisons with other mainstream social network platforms

Product & Business Model

TikTok’s interface is a vertical screen that occupies the entire screen, with only a few buttons through which people can comment, share, like, view and follow the creator. At the bottom are the messenger, create and user profile buttons. It is very intuitive and easy to navigate.

Figure 5 TikTok / Douyin’s Intuitive UI

There are four key components of TikTok’s value proposition:

  1. Creativity: people who want to create videos had a hard time finding a tool that enables them to quickly create very cute or funny videos with a lot of filters and other functionalities.
  2. Music: it has a very large bank of music which users can use to create your videos.
  3. Social media (social feature): users can follow people, share videos and all the videos shared are watermarked with the TikTok logo, so people can go back to the platform and find the creators. In other words, the watermark feature leverages all the mainstream social networks such as Facebook, Twitter and Instagram to help TikTok to market its platform when the watermark is there.
  4. AI algorithm: the relevance of the short videos recommended is crucial to keep users engaged and continue spending time on the App. We will elaborate this in more details later.

Figure 6 TikTok Value Proposition
  • brand takeover. Brand Takeover Ads Appear upon opening the App — presenting a full-screen video to the targeted audience. They are one of TikTok Ad’s best options for delivering mass awareness and driving direct sales since advertisers can place their messaging right in front of their target audience.
  • In-feed video ads are the best way to promote a video on a large scale to an engaged audience. These native ad units perform best with vertical video material, and they have a clear call to action that can link through to a page or App of choice from content creators.
  • Hashtag challenge is one of the biggest sources of ads income for TikTok. Similar to platforms like Instagram, hashtags play a significant role on TikTok. They are the primary way how users share and find content, but also how communities are built. There are endless hashtag challenges that can be found in the discovery. Basically, brands don’t even have to create their own ad campaign; by participating in the hashtag challenges, users create their own videos to feature the brand or campaign, with the hashtag as a title. For example, if Uniqlo wants to create a campaign to spread positivity during coronavirus, using #UniqloSpreadLove, someone can create a video with Uniqlo logo to share on the platform.  Essentially, through participating in the challenge, users spread the message for the ad sponsors.
  • Branded lenses. Colgate has done a lens/filter on TikTok, which when you put it on your face your teeth look super white. This is also a popular way for brands to market on TikTok.

Figure 7 Advertising strategy of TikTok

Other than ads, TikTok also generates revenue from virtual gifting, similar to what Kumu is doing. Users can buy TikTok coins and use the coins they can send little emojis to the people who are in the video or the content creators.

Another route of monetization is social commerce, where viewers watch a video with embedded links for products featured in the video. The link takes the viewers directly to an e-commerce page, where they can finish the entire purchase process and get it delivered.

Figure 8 TikTok’s Virtual Gifts and Social Commerce

 

Underlying AI Algorithm and LifeCycle of TikTok Videos

Below is a simplified illustration of the basic philosophy of the recommendation system that most of the social media or social networks are using.

Figure 9 Basics of Recommendation System

Figure 9 shows two basic approaches. The first one is collaborative-based filtering: If you like something that is similar to someone else, then the other thing that you like may also be liked by the other person who has similar preferences as you. A simple way to implement this is to convert this bipartite network to a network where the connections between two people are represented by the similarity of their preferences. Based on the similarity of the preferences we can then provide recommendations. This can also be applied to item-based recommendation, where we recommend people to items based on item-item similarities.

The other one is content-based filtering. For example, if a user reads and likes a book, content-based filtering will recommend books that are similar to the book liked by the user.

The problem of content-based filtering is the lack of variety, as it keeps pushing the similar contents. Ifa user has watched 50 or 100 videos that are very similar to each other, chances are, he/she will get bored. Then it is better to recommend something else or something very different to the user instead.

Coming back to TikTok, what they do is in fact not very complicated. They use machine learning algorithms to tag the users and the contents with the concepts from the same space. The subsequent matchmaking is very straightforward: they just match the contents to the viewers based on whether they are tagged with similar/same contents. Needless to say that the actual implementation is more involved. With its army of data scientists and software engineers, TikTok could create multiple machine learning models at different levels to handle information in a more consistent and comprehensive manner. That is to say, it is not the AI algorithm that is an uncrackable code, it is the volume of training data it generates + being an early starter that allowed TikTok to achieve exponential growth.

Figure 10 Life Cycle of TikTok videos

Figure 10 shows the life cycle of a video. What will happen to a newly created video? An important thing to note is that, when putting a video through the recommendation engine, TikTok does not take into consideration the creator’s existing popularity information, meaning even if someone has already created a lot of popular videos, TikTok does not automatically assign a high score to his/her next video.

This is how the recommendation process works on TikTok: after a video is uploaded to TikTok, and provided no violation is detected, the video is put through for check of duplication. If this video is very similar to a bunch of other videos currently in the database, TikTok will only show the videos to a limited audience. Usually, it includes the fans of that content creator and the creator himself, meaning there’s limited traffic. Otherwise, the video will be sent to an initial traffic pool. The algo will randomly select 200 to 300 people to show them the video and monitor the performance. A few metrics are tracked, such as the percent completion, number of thumbs-ups, amount of interaction, sharing count, etc. These scores are then mapped to a final score to be assigned to the video. The final score is then used to determine whether to push this video into a higher order traffic pool, e.g. a 10,000 traffic pool. Consistent high score videos are then pushed to an even bigger traffic pool, and this goes on. This is how a video with high quality content becomes viral in a very short period of time, maybe within a day or a few hours. This cycle will continue for about one week, after which the popular video is put into a cooldown limited traffic or maybe it’s just available to the friends and the content creator himself/herself.

For videos with low scores after the initial few iterations, they may be put into somewhere called grave. They still have a chance to be popular again if the same content creator creates another popular video, and people refer back to his/her page for prior videos.

The above process tells us that TikTok does not have different treatment for videos from new content creators and existing popular content creators. This is one of the key reasons why new content creators without existing massive followings are attracted to TikTok – everyone starts equal, and everyone gets unlimited chances. This also further motivates creator to continuously churn out high quality content.

TikTok, the level playing field.

How hard is it to replicate the algorithm?

It is not hard. Product features are very easy to copy. Standard machine learning models are used in various stages of the whole pipeline. For example, concept detection, tagging, preference matching are all handled by its corresponding ML models. It may only take 6 months to code up everything before releasing it. However, it takes time to come up with a comprehensive design of the whole architecture as well as enough data to kick off the flywheel.

It is not the case that the competitor cannot replicate it, but the algorithm itself is not the only reason for TikTok’s success. The company also has great product and operation teams, which really understand users and the content creation process without necessarily having to understand the content itself. In fact, almost all their engineers are in China and the recommendation algorithms work very well for Hindu videos, where they won’t understand a single thing of what the story is about.

In addition, the operation and content creation team work with top content creators extensively for the upcoming ones. They target at the top 10% content creators and they attach these product guys to them and help improve the content and optimize the metrics. Examples of help include how to film better so that content created becomes a lot better and is able to attract more people.

 What are the key differences between TikTok and other social media platforms?

Firstly, their unique onboarding process. Growth teams in many social network platforms always talk about this “magic moment”. In the case of Facebook, the magic moment is three friends in five days –  if a user has three friends within five days, he/she is expected to have enough content in the feed and understand Facebook’s functions and value proposition, therefore the retention rate of these customers will be substantially higher. In the early days of TikTok the Magic Moment was simply “opening new videos”, because the platform just keeps showing funny videos and the users keep swiping to the next one, another one, and he/she instantly gets it. And then he/she will be asked to sign up.

Secondly, consistency of the content: snackable. The 15-second video length makes the use of TikTok addictive. Compared to Facebook or Twitter, where there is a mixture of content (Friends’ cat videos, politics, cooking demonstration, etc.) shown to users. It might be additive in its own way, but it is not as snackable and it does not drive the same chemical release as something which is so casual that the users can turn off 99% of their brains. YouTube is also addictive but not snackable. After a 40-minute documentary, YouTube may feed the viewer another long video, which may or may not get watched.

Thirdly, an explicit following/followed relationship (social graph) is not required for TikTok users. Recall: The reason for young kids to move from Facebook to Snapchat was because all their parents started signing up on Facebook. TikTok does this one step further: simply through a user swiping a couple of videos, he/she will quickly find things or people with common interest and they can branch out into their own new space. Even if the parents are on TikTok, they will be in completely different spaces from their kids. They won’t necessarily see what their kids see, and they won’t know what kind of video content their kids are interested in if their kids are not content creators.

Fourth, TikTok is very much source agnostic.  While videos posted by users with large followings are likely to get more views, neither follower count nor prior high-performing videos are direct factors in the recommendation system. This source agnostic approach to recommendations has been crucial to TikTok’s appeal, since it creates the possibility that a user’s video can be plucked out of relative obscurity and achieve virality seemingly out of nowhere. It has at least two advantages: 1) More new content creators are motivated to put effort into creating high quality content. 2) Viewers are more likely to receive recommendations that better match their preferences.

The FYP and short videos occupy the entire screen, and it makes it easier for TikTok to capture the entire sentiment of how the user reacts to the videos. Information such as how long did the user stay on the page, whether the user shared the videos, whether the user decides to follow this content creator, etc. TikTok can even tell when the user does not like a video. If we think about Twitter or Facebook, with their vertical scrolling UI, it is hard to tell when people do not like a particular content. If a user stares at a Facebook comment for a long time, does it mean he/she likes it or does not like it? Facebook does not really know when a user doesn’t like it. They can only tell when a user clicks the like button, which makes it hard to tell and capture the actual user sentiment.

But TikTok is very different. If a user swipes off a video very quickly, they know immediately that this user does not like the content. From a bunch of interactions on these 15 seconds videos, TikTok just re-trains the model using these feedbacks in order to be able to feed users the exact things they want regardless of their culture, demographics, where they are from and what languages they speak. TikTok Engineers do not even have to understand the videos to create the accurate recommendation engines. This is something that Twitter, Instagram or Facebook just cannot do even if they also have great engineers as they don’t have that kind of rich data points on users’ sentiments and feedbacks to the content.

The equipment used to record the content influences the way the content is produced. Over the past hundred years, there were first films, which were very much stage focused. Subsequently, there’s television and then soap operas came up on television.  The next one was reality television when cameras became cheaper and we could put cameras in everybody’s rooms. Now we have videos created using smart phones and TikTok, where a lot more focus has been put into creating interactive content. This is the generation of people who grow up with mobile phones as content creation tools and create content using mobile phones that enable certain special effects (music, filters, etc.) to be created. This is something that previous content creation processes did not have.

In addition, TikTok was able to catch up well with the format shift.  Facebook initially was doing text only updates on the web and it really focused on building the website. Instagram became popular by enabling users to put filters on the photo in the background. From the users’ perspective, it is very easy to just snap a photo using the phone and then upload it. This was a format shift from text to photo especially when the mobile networks were starting to support the bandwidth to upload photos edited/created by great execution tools. Snapchat was the one that captured you well, they had the best filters and all these different things to make fun and ephemeral videos.  But the failing of all of them was that they relied on the back of the social network where a user only gets content from those that he/she follows or is connected to. What TikTok did is:  they also do mobile videos but the format shift here is that they want to do it in a way where they present the content to users much more natively while relying less on network connections.

What do we learn and how do we apply these to Southeast Asia?

  1. Unbundling TikTok and Future Form of Discovery

TikTok’s single screen display + precise tagging + high frequency video churning design has managed to train the AI algorithm to near perfection, creating the flywheel of “more accurate prediction, more user engagement, generating more data to train the AI algorithm, more accurate prediction”….and it goes on. When the precise interests of users have been mapped out, the door of unlimited possibilities is opened. It is foreseeable that more industries could adopt this shift and move towards more efficient discovery – streaming, gaming, online education, digital insurance, job searching? The evolution of e-commerce has already seen a graduate shift from search-based (Lazada, Shoppe), to social-based (Instagram, Whatsapp and Facebook shops) – and now with platforms like TikTok – to algorithm-based. It is still early days in SEA but it wouldn’t be long before that the trends start to shift.

The “Tipping/virtual gifting” function is certainly not new but still narrowly used primarily by live streaming and short video platforms. While ads or subscription revenues are still main sources of income for a lot of consumer platforms, virtual gifting opens up a very interesting way of monetisation and could have much wider use cases. Imagine adding on top to an online subscription platform with the tipping function – other than paying an annual fee to a course/journal you can also tip your favourite author or online tutor with virtual gifts on their best pieces/classes, more flexibility and motivation can be created when high quality content is further rewarded by user dollars.

  1. Future Form of Content – Catching New Waves

TikTok has created a new form of content that is highly interactive and captivating. With an excellent go-to-market strategy of starting with lip-sync videos and targeting Millennials and Gen Z, they were able to quickly gather momentum, disrupting the current content consumption pattern. Just looking at data from China alone, you can see that short-form video and live streaming is the fastest growing segment of screen time, expected to surpass instant messaging in 2025.

Figure 11 China Mobile Internet Users’ Avg. Daily Time on Core Apps

One can certainly argue that the trend will only accelerate with increased smartphone penetration, and the highly effective TikTok flywheel spinning at an ever-fast speed. However, as previously mentioned, media and content is a rapidly evolving space with quick and mass consumer adoption patterns. When a certain form of entertainment is deemed passé or social capital has been thoroughly mined, consumers are ruthless with turning towards new territories – the switching cost is not high. There is no doubt that new waves of content are forming, and they could catch on like how TikTok caught on in the first place. TikTok rode on the mobile digital wave, but for all you know, maybe mobile phones will no longer be the primary platform for content consumption, maybe we could grow unsatisfied with the 2D visuals altogether and ARVR becomes the next interactive form of shopping, learning and engaging with our favourite influencers? Instead of doubling down on the TikTok model and chasing the wave that is running far ahead of us, perhaps it could pay off more to look beyond the horizon and catch the new waves that are emerging.

  1. Future Form of Social Interaction – A More Philosophical Question

Looking back at the word cloud of survey answers – what a map of gems! Taking away the glaring word “algorithm”, we cannot help but noticing words like “dopamine”, “addiction”, “ADHD”, which are thought-provoking. There is no doubt that from a business and product’s perspective, TikTok has almost perfected its game – effective go-to-market with simple and easy content creation tools, full screen FYP (For You Page), cycling content every 15 seconds (or less), precise capture of user sentiment – all that wrapping into a powerful algorithm at the back end, deriving an interest graph without a social graph. The videos are so short that a continued swiping motion almost becomes a muscle memory – users are stuck in the 15-second dopamine hit cycle, while creators are busy collecting the social capital they couldn’t otherwise get close to harnessing on other social media platforms. This seems to be the ultimate win for users, creators, and investors.

Endless snackable and addictive consumption with desensitized brains, we might ask ourselves, is this the closest to SOMA from Brave New World?

“the warm, the richly coloured, the infinitely friendly world of soma-holida.”

Looking around, modern day depression has become a more serious issue, teenager suicide rate has climbed up steadily since the arrival of social media. JAMA did a study in 2019, which followed over 3800 people, and found that higher amounts of social media use were associated with higher levels of depression. That was true both when the researchers compared between people and compared each person against their own mental health over time. While technology continues to improve our way of interaction, endless targeted entertainment and status seeking games are only going to get us so far. As a tech investor, we have the ability to power platforms, and also have the responsibility to make thoughtful long-term decisions to piece technology, society and humanity together in a harmonious way. Unlike TikTok, live streaming platforms like Kumu in the Philippines are more about finding like-minded people to build communities, inspiring ordinary people to share their stories and facilitating day-to-day life (finding produce, attending online church sessions etc).

In the view of Openspace, building real connections is a more sustainable way of interaction which technology can empower. As Emerging puts it:

“Perhaps the best defence is curtailing the worst abuses of human vulnerability and tweaking the arenas in which we play our hyper-charged status games. Maybe versions 2.0 of social will find the one thing humans crave more than status, is connection.

Maybe a better set of tools can help us play better games.”