Skip to playerSkip to main content
  • 8 months ago
Collect data from anywhere with Google Sheet's ImportXML function, which allows you to scrape data from websites and import it directly into your sheet. By specifying an XPath query, you can easily target specific elements and import data without having to manually copy and paste information. This feature is ideal for research, tracking market trends, and analyzing website performance. The ImportXML function is a versatile and efficient tool for collecting and analyzing data from various sources.

Extract City From Address
=TRIM(MID(B2,FIND(",",B2)+2,FIN D(",",B2,FIND(",",B2)+1)-FIND(",",B2)-2))

Extract Population From Webpage (USA) /html/body/article/section[1]/div[1]/p[4]/span
=IMPORTXML(("https://www.citypopulation.de/en/usa/metro"&lower(TRIM(SUBSTITUTE(C2," ","")))),"/html/body/article/section[1]/div[1]/p[4]/span")

Extract Population From Webpage (Canada) /html/body/article/section[1]/table/tbody[2]/tr/td[8]
=IMPORTXML(("https://www.citypopulation.de/en/canada/metro"&lower(TRIM(SUBSTITUTE(C22," ","")))),"/html/body/article/section[1]/table/tbody[2]/tr/td[8]")

Tier Calculation
=IF(D2>=10000000,"Tier 1",IF(D2>=5000000,"Tier 2",IF(D2>=1000000,"Tier 3","Tier 4")))


Source for population
USA
https://www.citypopulation.de/en/usa/metronewyork/
Canada
https://www.citypopulation.de/en/canada/metrocalgary/

ImportXML, google sheet, google, xpath query,
importxml google sheets,import data from password protected website to google sheets,pull data from google spreadsheet to website,importxml locale,how to use importxml,import table from website to google sheets,google sheets import from web,importxml examples,
Transcript
00:00If you want to strip down the city from the main address here like this and
00:03determine the population live from the World Wide Web and also at the same time
00:08determine the tier based on the population, this is how you do it. First
00:12and foremost of all, this is what the table looks like all empty. The blue ones
00:16is US address and the one below in red are Canadian address. So you can see in
00:22most address the city is the second one. You can see two commas here. Also the
00:27Canadian address, the city is on the second one here. If you have taken the
00:30setup it's pretty straightforward to strip down the city. You use this formula here
00:34like this and hit enter and you can see they stripped down the New York for you
00:39and if I apply to the rest of them you'll work the same way like that. There we go
00:44and to get population you have to use a function called import XML and the way to
00:52do it is that let's say import XML like that. The first argument on import XML is
00:59a URL of where you're gonna get those population from and the second one is
01:04called expat query. We're gonna leave the locale as it is and not gonna even enter
01:09them. So let me give you a quick introduction. The way I'm
01:12getting the population is from this web page called population city
01:16population.de and let's say for example if you want a US pop cities population
01:22this is New Jersey New York so if you want a particular text from this web
01:27page this is basic web scraping right what you do is that you highlight this
01:31text here like this and you go right click and you inspect. Now this is a
01:36Google Chrome Firefox is a bit different but the idea is the same I think they use
01:40the same term as inspect. Once that's done this this section of the code will be
01:44highlighted under elements here. If you right click in here you go copy and then
01:49you click on copy full path and that's what you need for the expat. So and also
01:55the thing to note is that the way that the URL is formed is like that. So what
02:01you're gonna have to do is that in this in this particular website they use city.deen
02:08US Metro New York slash that's the URL and we do have New York over here so we're
02:14gonna have to make everything lowercase and strip it down. So basically this is
02:18what the first portion of your import URL looks like like that. So basically
02:26that's a URL and you're taking the city and you are getting rid of the spaces and
02:33then after that trimming front and back of any white spaces and changing
02:39everything to lowercase and concatenating them. And then I did say that the that
02:45the expat would be something like that. See is this this thing here. This is where
02:53it looks like if you copy I mean this is where this came from was let me hit enter
02:58here and it'll give you the population of that. So where this came from was this if
03:02you go this and you go right click and you go copy and you go full path and if I
03:07were to open my notepad here let me get my notepad here and if I were to paste it
03:13see this is what it looks like. So that's what you need for your expat there. So
03:18that's for US and for Canada you're gonna have to change this to CA and this expat
03:24it's gonna be a bit different as well. Let's do this here for now and then to
03:29determine the tier I'm going to use population determine tier. So this is a
03:34formula I'm using for tier. So basically 10 million is going to be called tier 1
03:385 million tier 2 and 1 million tier 3 and 4 million for anything else. So that's how
03:45I determine the tier and then if you take this and apply to all the US address
03:50you get this population from that web page and also the tier as well as you go
03:56along. For the Canadian city, city extraction is similar same formula I'm
04:01going to copy the same formula and paste it in here and it works for all of them
04:05because the structure is same the cities at the center here but the but the what
04:11you call that the equation the formula to calculate or the formula to get the
04:14population is going to be slightly different. If you look at this web page
04:18here like I said citypopulation.deen this is Canada previously was US so
04:25that's something you have to change. Metro followed by city is the same and this
04:30one here is your XPath. This is the population in 2010. You will right click
04:34and you go inspect you will highlight this thing here for you. This is the number
04:39that you want. Right click here and what you do is you hit copy full path again and
04:44if you look at your this is this one was for US. Let me give a couple of space here and
04:50if you press Ctrl V or paste it this is what it looks like for Canadian. So what
04:57you do is this so go back to your Google sheet here I'm going to copy this
05:02formula here. As you can see I'm going to use the same formula called import XML
05:08HTTPS population city as you can see this is called Canada Metro and basically doing
05:14the same thing getting rid of space between if that's any of the city has two
05:20words in it trimming it which means removing white spaces from back and
05:24front and changing it to lower place basically forming the the URL itself and
05:29then I'm going to copy this this what I got it for expat here and then that's my
05:36second argument or my import XML and I'm going to close it with parentheses and hit
05:40enter and that didn't work at all so let me have a quick look at it what happened
05:44here is this double click here you got to put the expat query in double
05:52code don't forget that and hit enter and you'll get your total population from
05:58that web page there now this is so six million mainly because this is this is
06:03Calgary what they want here was Toronto so if you take this and paste it to
06:09Calgary you will see 1.481 million is from here okay so gonna apply the same
06:17formula to rest of the Canadian city here like this and I'm gonna keep the tier
06:23calculation the same likewise if you want to change let's say make it a bit
06:28smaller for a Canadian city maybe call it what 1 million for tier 1 and half a
06:35million for tier 2 and then 100,000 for tier 3 and so on you can change it that
06:42way too right it depends on how you want to categorize each city well that's how I
06:48do it basically city population from the World Wide Web and tier calculation
Be the first to comment
Add your comment

Recommended