Categories
Search Leeds SEO Conference

Machine learning for SEOs | SearchLeeds 2019

What is machine learning?

When it comes to Machine Learning (ML), everyone immediately thinks Artificial Intelligence and futuristic-type products like self-driving cars or Siri. So nowadays everybody is talking about it. Looking around only some really know how to do it. But everyone thinks everyone else is doing it so, everyone claims they’re doing it, too.

So what is Machine Learning really?

For a start Machine Learning (ML) and Artificial Intelligence (AI) are two separate entities that just so happen to complement each other and it’s only fair to say that Machine Learning – similar to Deep Learning – is a subset of Artificial Intelligence.
While artificial intelligence (AI) aims to harness certain aspects of the “thinking” mind, Machine Learning focuses on the ability of machines to receive a set of data and learn for themselves, changing algorithms as they learn more about the information they’re processing.

How do we apply Machine Learning to Digital Marketing?

Machine Learning enables organisations to leverage large datasets to develop customer insights, incorporate external data sources such as competitive insights and weather data, analyse shopping histories, interpret and categorise behaviours and create actionable insights and customer-specific personalisation.

Or in simple words, ML can help you find hidden knowledge in available consumer data to streamline your digital marketing processes.

Machine learning tips for SEO

In this session by Britney Muller, you’ll understand Machine Learning basics, what ML can be used for, examples of ML solving SEO tasks and executable programs you can start using immediately.

Who is Britney Muller?

Britney Muller – a Senior SEO Scientist at MOZ and a proud Minnesota native, graduated from the University Of Minnesota – Twin Cities with a major in Strategic Communications – Public Relations.

Britney Muller and Omi Sido at SearchLeeds
Britney Muller and Omi Sido at SearchLeeds

Data is Britney’s yellow brick road. She knows just how to optimise businesses online and offline conversion rates through in-depth geo-local target market and customer acquisition research. Britney Muller stays on top of her game by pivoting strategy month-by-month via google analytic evaluations, taking any new algorithmic changes into effect, dynamic targeting, A/B testing, eye tracking and multi-channel content evaluations.

Video Transcript: Machine Learning for SEOs

Search Leeds. How’s everyone doing? Good, good, good. Good. I am so excited. We’re gonna have so much fun. All right. So a lot of you have probably heard about Elon Musk. Right. Our last speaker just talked about Steve Jobs.

I’m going to say a little something about Elon.

So Musk started this company called Open AI. And the whole initiative of this company was basically to further the space of Machine Learning and AI. And so the idea was that they want to propel things forward a little quicker and open source everything. So they want to make this available to everyone. Didn’t really go as planned. So just a couple months ago, they, unfortunately, unveiled that their text generator was too good, essentially too dangerous to release to the public. This came as a little bit shocking news, you know, based on their entire mission statement. And on one hand, some of you might be thinking, is this a PR stunt? You know? Are they full of shit? What’s going on here? Or is this legitimate? And so, I took a closer look at this because they were able to reveal the basically boiled back version of this generator. And so I thought it might be kind of cool to show you all today what that looks like.

So in order to generate text, you have to start with a word or sentence or a paragraph, and the more you can feed it, the better. A lot of you have probably use smart compose in Gmail, right? Where it predicts what your next word or next few words might be based off of what you’re emailing someone. Exact same principles here. And so when I put in ‘Search Leeds is going to be the best SEO conference of all time, because’ gives me this: because it is the only one where both experts and people who don’t know what they’re doing can learn. And they’re getting lots of useful tips from experts. You can check this page to find the best websites with SEO, and the best resources for your site at this conference.

This goes on to explain that this event is there’s a free event called SEO on-demand in every state in America. It gives you a date and a time and a link. And some of you might be wondering, what is that all about? And this link doesn’t go to anywhere. This is literally just a Machine Learning model that is read so much SEO tax and so much text in general. They know that there’s a call to action on most pages. And so they have automatically generated this stuff.

And it’s just going to get better and better and better.

It’s wild. What a time to be alive, people. This is crazy. If you take one thing away from this talk today, I hope it’s this.

Machine Learning is becoming more and more accessible to all of us. And it’s going to free us up to work on much higher level more strategic work. It’s exciting, right? And there’s a lot of chatter about the displacement of jobs. And I would like to argue that there should be a conversation around the displacement of jobs to people who use these tools. And so that’s what I want to give each and every one of you today. Is how we can start to navigate these waters and what you can do with them. And I’m confident that by the end of this, every single one of you can use Machine Learning or put a model on your computer today. It’s, it’s easier than you think.

So we’re going to go through some examples that I hope you enjoy. And then we’re going to break down what is Machine Learning? Like what is this actual stuff? What’s going on? And then I’m going to hand over all the tools and resources necessary to get you started. That sound good? Okay.

All right. So this is one great example of AI being able to read so much text and compress information, allowing, in this case, it was JP Morgan to work on higher-level things, right? They were able to automate the absorption of hours and hours of finance material. That’s wild. You can do the same sort of thing for your topic, for your industry, for your space. It can also do things like predict what’s in front of a camera. And sometimes it gets things right. Sometimes it gets things wrong, sometimes it’s just funny. You know, it’s really strange and interesting. And I actually brought this model today just to try out because I did this at a conference recently and I like to see what it thinks the audience is. So let’s see. It says, wig my hair, isn’t it? Spotlight spot? So might be a little too, bright? Does anyone want to come grab this so that you guys can pass it around the audience and just play with it a little bit? No. Okay. Really, no one wants to check this out. It’s pretty hilarious. Okay, I’m just gonna leave it right here. If any of you start feeling crazy, come check it out. It’s hilarious. You can do all sorts of weird stuff like that, right?

And I want to be very clear. I have absolutely no clue what I’m doing in the world of Machine Learning. This stuff is so complex and strange. And I sort of like to think of myself as just the best thief ever. Right? I come across these models and I see potential in other spaces. And so I will look at something that someone has already built and apply it to something completely different. And that’s sort of what I want to challenge you to do today as I go through these examples. One example of this is I came across the Shakespeare model. And so this was released somewhat recently, and it looks at tonnes and tonnes of text of Shakespeare. And it creates entirely new stories, new characters, new drama, new, whatever, it’s wild. And so the normal adult that I am, I was like ‘well, what could I do with this?’ Right? And I thought, what if I combined text from two different people, right, like Rand Fishkin and Beyoncé. What if I take articles of text that Rand has written about SEO and search and combine it with the entire Lemonade album? So that’s what I did. And their first album isn’t so great because I only trained it on 10 iterate iterations, those are called epochs. And at this point in the training, it’s still making up words, it’s still just kind of not doing so great. But once I got it up to around 100 epochs, they made some pretty cool wraps. This, you can find their entire album at that Bitly link. It’s pretty ridiculous. And it’s funny to see the output of this model. It knows how a song is laid out, right. And it even knows how to rhyme. And it will use different things that Rand has said, and it’s hilarious. But these are the sort of things I want you all to think about. Because you are all the domain experts. You are all the ones that have very specific expertise in what it is that you do on a day to day basis. And these are the tools that can make your life so much easier that can automate some of those things and help all of us essentially level up.

So you’re probably wondering, okay, that’s great, but how can some of this stuff more aptly apply to SEO or digital marketing? So there’s all sorts of tools already available. This one for content research was mentioned by Kameron Jenkins. I think you say it fozzy, frizzy, but it automates the research of shared content around a particular topic. It can even start to outline blog posts for you, and put together questions and answers. It’s super, super powerful, again, just freeing you up to work on higher-level things.

Do you know that you can automate videos, literally automate the generation of videos? This is a tool called Lumen5. Shout out to Nikki Moser who told me about this. You basically put in a bunch of texts and you can use this for free and it will generate a media based on the text that you enter. So it’s using natural language processing to understand what your text is about, and to find pictures and images to go with your video that suit the material. It’s incredible stuff. And obviously, it’s not going to be perfect, right? A lot. None of this stuff is going to be perfect. But it’s going to get you a lot of the way there, most of the way there. Right. It’s exciting.

Automate transcriptions. So lots of us probably listen to podcasts. Some of us in this room probably create podcasts. And the average podcast listener consumes seven episodes a week. That is wild. And the little SEO inside of me just dies a little bit thinking about that content, not being translated to search, right, aside from Google Now indexing them. But for you for your site. Why would you not want to transcribe the audio content to make it work better for you. And there’s all sorts of material out there that is doing this really, really quickly and cost-effectively. My favourite go-to is Amazon transcribe. You can literally transcribe an hour’s worth of audio within a couple minutes for pennies. And it also delineates person, one from person to all the way up to 10 people. Which is pretty incredible. There’s going to be typos, and it’s not going to be perfect. But again, it’s going to get you most of the way there.

You can automate image optimisation. This stuff is really, really exciting, especially those of you that deal with really large websites with 10s of thousands of skews of images, right. Think of something like Macy’s or a large department store. I came across this model years ago called TensorFlow for poets. It’s available to all of you online and only takes about 15 minutes to do. And basically what it does is it trains a model to recognise and predict flower or pictures of flowers. So it will tell you what type of flower it is. Now, when you’re playing around with these models, I challenge you to break the shit. Like that’s what it’s there for. And so what I did was I looked at, well how is it doing this. What’s going on on my computer? And it was literally just a file system of folders of the specific types of flowers. And I thought, Okay, this is interesting. What if I add a folder myself.

And so this folder name ‘pumpkin’ is after my pet snake pumpkin, and I was able to source I think, maybe 50 images of Ball Pythons from the internet and I put them all in this folder. And then I uploaded a picture of my adorable little pumpkin and it, sure enough, predicts within 99.7% confidence that this is pumpkin. It’s amazing. It wasn’t that many images to get it to be this good. So just consider the fact that you can customise models like this to fit your needs to solve solutions that you’re working on, and really to help scale image optimisation.

Pretty exciting stuff.

The ability to automate meta descriptions is incredible. So, again, for those of you that are working on really large sites, it’s not viable to write meta descriptions for every single one of those pages. It’s just not possible.

So in order to, you know, get something unique on all of those 10s or hundreds of thousands of pages, you can use different models like this, that basically take these algorithms that do content summarisation. And so this was a model I found on Algorithmia, which is a great tool. You can literally just plug and play different models to whatever you need them to do. And then we were able to plug it into AWS. And then into Google Sheets, and we have all the steps here for you.

So you need to find the lambda developer. Granted, this looks somewhat complicated, and it is, but it is getting easier. And I have an example at the very end that is a bit simpler than this.

But this would work too if you have, if you have the resources to someone that can implement this, or if you yourself could implement it. Pretty exciting stuff and shout out to Jared Oaks and Grayson Parks who you should definitely follow, who are really at the cutting edge of this stuff. They are building incredible applications in the world of data science and marketing, online marketing, and SEO. So this is just the tip of the iceberg. We could go on and on with these examples forever. I mean, it’s really an exciting space. And this is just the very beginning. And so again, I challenge you all to consider the ways in which these different models can assist you and co-workers and the industry. And there’s already people that are innovating in this space. Major shout out to Dan, who I think might be in here.

But he, yeah. I mean, Dan has took it upon himself to create a Google Sheet that basically automates the intense categorisation of keywords. That’s incredible, right? There are really amazing people in our space and even geographically in this area, that are doing incredible things. So I highly, highly suggest, you know, following and supporting these people and you yourself helping to innovate all of us because, I mean, we all just level up which is so much fun. So big thanks to Dan.

So again, this Machine Learning it’s going to free us up to work on higher-level strategy. That’s what it’s all about. So what is it? What heck is happening under the hood? And how can we talk intellectually or accurately about this type of stuff with clients and coworkers. So to break this down a little bit. Machine learning, it’s a subset of AI. And for the record, anytime you hear the word AI 99.9999% of the time, it’s Machine Learning. We have not yet reached AI.

But we’d love to use the word for some reason.

What exactly is Machine Learning?

So Machine Learning, it’s a subset of AI. And it combines statistics and programming to give computers the ability to learn without explicitly being programmed. These things are able to identify patterns, and computation them at a level so far beyond the human mind, that it’s just blowing different fields on the water. It’s really, really exciting space.

And what a typical model, this is very general, but it looks something like this.

So it starts with tonnes and tonnes. Not it depends on how much data you, your model requires. But let’s just say it starts with hundreds to hopefully thousands of labelled training data. So whatever it is you’re trying to get this model to learn, you want to have clean labelled training data. Now you want to save a chunk of that data for the end. You can’t train the entire model on all of your data. Otherwise, you don’t know how well you did. So you have to train, you have to take a set of that and save it to test and see how well the model does once you’ve trained it.

And so how does it learn? How the heck does that work beyond that’s the general flow. It learns through basic linear regression. And so it’s able to identify when it’s incorrect, how far off that is. It’s also known as squared meaning or mean squared error or the loss function. And so that’s a tries to get a lower and lower loss. That’s the whole goal of these models. And what starts to happen is you might start out on the very left-hand side here, where your curve isn’t fitting to any of the data points very well, like it’s kind of around there. But you want a really nice smooth slope. What you don’t want to do is the far right. You don’t want to fit every single data point, because it doesn’t allow for new instances of data.

And the best example of that, is this. Right? This is overfitting. It doesn’t allow for new positions. It doesn’t allow for anything in that space, which is crazy. So you want to be really careful to not overfit data. So if Machine Learning was a car, data would be the fuel. Most of Machine Learning is just all about training data. And it’s all about cleaning data and finding data. That’s why we see Google stealing stuff from us like this. Guess what they’re doing. We are labelling training data for all of the large companies right now. We’re doing it all the time. We don’t even know it. Who here did the 10 year challenge? Guess what they’re using that for? That’s now being built into this beautiful age prediction model. Right, which can be really incredible for certain instances, like you think of kidnappings and predicting what they look like years from now. That’s amazing. But it can also be used for really scary things.

So you want to be mindful that these things are happening. And we are openly feeding our data to these huge tech companies. I mean, I use the Google keyboard, and up there at IO this year in their keynote, they talk about how they are using the text that we are entering in the keyboard to be predictive of everyone else. They are literally iterating this Machine Learning model of text prediction, based on what we all say. It’s crazy, right? This stuff is happening so quickly and so fast. But again, if we can use some of this stuff to our advantage and be mindful of the things we can use it for, it’s going to help us do more strategic work.

All right, so what are the tools and stuff that you can walk away with today to help get you started?

I would like you all to think of this, as you know, it’s going to get easier and easier. It’s going to be more and more accessible. And just think of this stuff as plug and play things that you can use for the things that you do or for the services that you provide.

First and foremost, I highly suggest everyone check out Code Labs. Labs is a place to go on Google where they walk you through step by step implementation of Machine Learning. So go to Code Labs, I would suggest filtering this category. I believe it’s by TensorFlow or Machine Learning. Either category will get you to some of those models. But this again is kind of what teaches you what goes into something like this. This is where I found that TensorFlow for poets model that you should definitely check out. And I, again, this takes you not even 20 minutes on your local computer. And you start to have an understanding of what is required to do some of this stuff.

Colab notebooks are incredibly powerful. There’s any developers in the room or anyone that’s worked with Jupiter notebooks. This is basically the online Google Drive version. And you can literally collaborate with others, which is pretty exciting. And Google’s kind of dangling a carrot in front of our face on this one, because they’re giving us free GPU. If you see that green connected, checkmark, that means that you’re able to access free GPU power, meaning you can do these things so much faster than you could on your own and on your own local computer. Again, why? I’m, I’m sure they love checking out what people are working on. Right? They love getting additional training information. Wild. MonkeyLearn is a really good tool that has basically pre-baked models. They also have a Google Sheets add on, which is pretty powerful. And you can get to do all sorts of things right within Google Sheets. Algorithmia as I mentioned before, basically packages up models and these beautiful little packages that you can, again, plug and play it to do whatever it is that you might want for your work.

This is the craziest thing. So I will say, exploring in this space and having so much fun, just learning about Machine Learning and testing stuff out and digging around online I find stuff like this, that I feel like most SEOs and most marketers aren’t even aware that it’s available. Google’s own natural language, like model, their API is available online. It’s available in a way that you can just paste in your text and you can see what it categorises your content as. You can take this a step further, and you can put in your competitors and you can cross-compare, okay? How are they categorising my competitors contents that’s ranking above me versus mine?

And then you can even look at that on an entity level. So if you literally just go to Google, if you google Natural Language API, and then you just scroll down, you get to this section. And you have all of these tabs available to look at sentiment, syntax categories, you name it. Really powerful stuff.

This came out a couple days ago and I want to squeeze it in because source lots of people have been saying that rev.com is the most accurate audio transcription tool available today, which is quite exciting. I haven’t had a chance to check it out yet, but highly suggest you do. That’s pretty cutting edge. If you need Transcription in real-time, Google has cloud speech to text. So this could literally be transcribing the things I’m saying right now, which is pretty exciting. Image net is the largest online source of labelled images. This is where I got those images of Ball Pythons. It’s incredibly robust. I have a love-hate relationship with image net that maybe you can ask me about later if you want. But it’s very interesting space. This g.co/teachablemachine. You can build a model like this without touching one line of code. So Google’s trying to make this so so easy for you to play around with this stuff and make things happen just using your computer camera. So here I’m just doing different things that produce a specific type of animal. It’s goofy.

Paul Shapiro is another one of the cutting edge individuals in this space who’s applying data science and SEO. And he came out with this way to do automatic meta descriptions. A different way than what I had previously mentioned. I highly suggest you check that out and some of his other content. I swear to God, this guy is living in 2040. It’s insane. Paul, also years ago, built something that basically can automate 301 redirects. Yes, 301 redirects. It crawls archive.org of your site from months previous, and it cross compares it to your current index pages. And if one is missing, it looks up like a similar syntactic page and automatically redirects it within your htaccess file.

How wild is that? Like this stuff is happening and has been happening and the more of us that are willing to explore it and to apply it. It’s just going to help us level up and move forward. So really cool stuff highly suggest you check that out.

Kaggle is the largest platform for data science competitions. And it’s most interesting to me to just keep an eye on what the heck is happening in the space. Who is, you know, putting up millions of dollars for specific tasks and what are they. So it’s a little eerie to me that the TSA is has put up a bid on passenger screening, right, and you have to remember these models are only as good as their training data. So if this model is just the least bit biassed, if it’s the least bit racist, the output will be too which is terrifying. So moving forward in the space, diversity is paramount to the success of this field. It’s really, really critical that we start having more of these conversations because this is happening more and more. The technology to make this stuff possible is just getting better and better and better and cheaper.

We are now talking about disposable AI. There are many computers so cheap, it’s a couple dollars. And you can put entire models on the cloud that can be pushed through these disposable AI. There’s a conference recently where they had built a disposable AI for Uber and Lyft drivers to put in their vehicles. And it would notify them if a passenger left anything behind. And there’s no security concerns because nothing is getting saved. Nothing is going to the server. It’s all sitting on these machines and vanishing as soon as it happens, similar to the demo I showed.

So I take pride I never sell on stage. I think that’s, that’s like the most uncomfortable thing to watch and to be a part of, but I have to say just this one thing about our data science team at Moz. And it is that we are working on some incredible groundbreaking features for SEOs to help make your jobs easier and more effective. And I couldn’t be more excited to release some of that stuff for you guys. So just want to let you know that. Some other material to get started. And you can all these slides online will make them available. Some advanced resources, some of the stuff is really, really fun, if you want to check it out.

And then finally, what did we learn? What what was all of this about? So we know that Machine learning it combines statistics and programming. And the models are only as good as the data. Each and every one of you. I am so so confident could create a model today, even on this next break. And if you want to do that, and if you’re curious about that, please come see me. I’m happy to help talk to you about that. This is just, this is going to help us level up as an industry. This is going to help us evolve into the next phases of Digital Marketing and SEO and it’s so so damn exciting. And again, diversity could not be more paramount in machine mearning. It’s a good thing to be aware of.

So that’s it for me. Thank you guys so much.

By Omi Sido

Omi Sido is an SEO and Web Development professional with 6 years of experience in both web and traditional advertising, promotions, events, and campaigns. He has worked on integrated campaigns for major clients such as Vectone Mobile, Delight Mobile and The Global Real Estate Institute.

Currently, Omi Sido is Senior Technical SEO at Canon Europe.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.