Web Scraping Using Excel VBA

Introduction

Excel is the most used Business Intelligence tool. Its programming language, Visual Basic for Application (VBA), can be used to enhance and extend Excel’s functionality. This post is not about VBA nor web scraping. Web scraping seems to be one of the new frontiers of advanced analytics. Python, R and other programming are being widely used to retrieve data from the web. The main aim of this post is to show how a few VBA lines of code can equally achieve the same goal.

Getting Data from the Web

 I was looking for internet users by country data over the internet. I came across this website: http://www.internetlivestats.com/internet-users-by-country/.

This web page has internet users by country data for 2016 for 201 countries. By clicking on the country, you can drill through and get country-specific internet users data starting from 2000. 2015 and 2016 data are estimated data.

If you are interested in a very handful countries data, then you can manually click on the country name and copy paste the data. If you are interested to get all the data – for all years – for all countries – then the manual process is very inefficient.

So, how can we achieve this goal – getting all the data for all years- for all countries? Somehow, we should leverage programming language. Python or any other programing language could be used to achieve this goal. Equally, VBA could also server our interest.  

How?

First thing first. We need to download first the 2016 data for all countries. As this table has the hyperlink for country-specific data.
Step 1: Open excel – Go to the Data ribbon and click on From Web



Step 2: A new web query window will open. In the address bar add the following URL: http://www.internetlivestats.com/internet-users-by-country/
Click on Go and after the process is completed click on Import.

Step 3: Select worksheet and cell to load the data and click OK. For additional configuration, you can click on properties and finally click OK.

Step 4: Rename the worksheet as “Internet Users”. Remove the unnecessary information and format it as a table. By now you should be able to have something looks like this:

Step 5: Add a new worksheet and rename it as InternetUsersTrended. Add the following field name:

Field names from Column A to Column H will be mapped to Country specific data. CountyName field will be a calculated field.

Step 6: Go to Internet Users sheet and click on any country name – say China. You can see the following web page:

What are the tasks we need to accomplish now?

Go to each of the countries link :


a) Get the data in a staging worksheet
b)  Map and do some data cleaning 
c)  Stored the data into our InternetUsersTrended worksheet.  


Step 7: Open the VBA editor window. To do so, go to Developer ribbon and click on Visual Basic or use alt + F11.




 Step 8: A VBA editor window should be now opened. Click on Insert and add module

           
Step 9: For the sake of best practice, you can rename Model1 – to any meaningful project name. I renamed it as follow:
Step 10: I have prepared the following very basic VBA code. Here is what it does:

a)     Defining variables that will be used for getting data process, data cleaning and copy and paste process. I called by code as “getCountryData”. I also disabled any alerts, events and screen updating.  












b)     Setting the value for the variables: the actual country-specific data starts from row 3. LastRow indicates how many countries I have in my Internet Users sheet. The country name and their URL value are defined in the while loop.



c)      Getting the data: the following lines of code get the data from the web and save it in the worksheet we created in the previous step (b).
         

d)     Data cleaning: the data cleaning process will start only if we are able to get 8 columns of data, the number of rows should also be greater than 3. The data cleaning process involves formatting year (2015 and 2016), adding country name and etc.  


e)     Data load process: the data load process is a copy-paste operation to our final table – InternetUserTrended sheet. After we copied the data, the staging sheet will be deleted. And the operation resumes to the next country until we go through each of the countries one by one.

Step 11: This website doesn’t have internet users’ data for all countries since 2000. We got 3230 records. If the web page had data for all countries since 2000, the total number of records we would expect is 3417 (201 * 17). We are now missing 187 records – we are missing data for 11 countries. I consider this as a success. The final table looks like this:




Code:
Sub getCountryData()

'Date = 10/10/2017
'Website = https://www.dabis.solutions/

 With Application
    .ScreenUpdating = False
    .EnableEvents = False
    .DisplayAlerts = False
 End With

' Variables for getting data process
Dim LastRow As Long
Dim URL As String
Dim CountryName As String
Dim CName As Range   'Country Name

' variables for copy Paste process
Dim Last As Long
Dim StartRow As Long
Dim cntrywshtLast As Long ' Country worksheetLast row
Dim CopyRng As Range
Dim cntrywsht As Worksheet ' Country worksheet that will be used as data staging
Dim trendedwsht As Worksheet ' worksheet that will hold our data
Dim cntryclm As Long       ' Number of columns - should be 8 columns
Dim wb As Workbook

Set wb = ThisWorkbook
    LastRow = wb.Sheets("Internet Users").Cells(Rows.Count, "A").End(xlUp).Row
    StartRow = 3

Do While LastRow > 1
    Set CName = wb.Sheets("Internet Users").Range("B" & LastRow)
   
    CountryName = CName.Value
    URL = CName.Hyperlinks(1).Address & CName.Hyperlinks(1).SubAddress
    
    With wb.Worksheets
        .Add.Name = CountryName
        .Select
       
    End With
 
' Getting the data

     With ActiveSheet.QueryTables.Add(Connection:= _
        "URL;" & URL, Destination:= _
        Range("$A$1"))
        .Name = CountryName
        .FieldNames = True
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlAllTables 'xlEntirePage
        .WebFormatting = xlWebFormattingNone
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .Refresh BackgroundQuery:=False
    End With
  
        Set cntrywsht = wb.Sheets(CountryName)
        Set trendedwsht = wb.Sheets("InternetUsersTrended") 'wb.Worksheets(7)
        Last = wb.Sheets("InternetUsersTrended").Cells(Rows.Count, "A").End(xlUp).Row
        cntrywshtLast = wb.Sheets(CountryName).Cells(Rows.Count, "A").End(xlUp).Row
        cntryclm = wb.Sheets(CountryName).Cells(1, Columns.Count).End(xlToLeft).Column
       
        If cntrywshtLast > 0 And cntrywshtLast >= StartRow And cntryclm = 8 Then

' Doing some data cleaning
       
            Sheets(CountryName).Select
            Range("A3").Select
            ActiveCell.FormulaR1C1 = "2016"
            Range("A4").Select
            ActiveCell.FormulaR1C1 = "2015"
            Range("I3").Select
            ActiveCell.FormulaR1C1 = CountryName
            Columns("I:I").ColumnWidth = 13.57
            Range("I3").Select
            Selection.AutoFill Destination:=Range("I3:I19"), Type:=xlFillDefault
' Data Load Process
            Set CopyRng = cntrywsht.Range("A3:I19")
             CopyRng.Copy
                With trendedwsht.Cells(Last + 1, "A")
                    .PasteSpecial xlPasteValues
                    .PasteSpecial xlPasteFormats
                    Application.CutCopyMode = False
                End With
          End If
    Sheets(CountryName).Delete
    LastRow = LastRow - 1
Loop
End Sub

Previous
Next Post »
2 Comment
avatar

Pretty blog, so many ideas in a single site, thanks for the informative article, keep updating more article.
Software testing course in chennai

Balas
avatar

Nice post. Thanks for sharing! I want people to know just how good this information is in your article. It’s interesting content and Great work.
Thanks & Regards,
VRIT Professionals,
No.1 Leading Web Designing Training Institute In Chennai.

Balas