Skip to content Skip to sidebar Skip to footer

Web Scraping In R With Loop From Data.frame

library(rvest) df <- data.frame(Links = c('Qmobile_Noir-M6', 'Qmobile_Noir-A1', 'Qmobile_Noir-E8')) for(i in 1:3) { webpage <- read_html(paste0('https://www.whatmobile.co

Solution 1:

The problem is in how you're structuring your for loop. It's much easier just to not use one in the first place, though, as R has great support for iterating over lists, like lapply and purrr::map. One version of how you could structure your data:

library(tidyverse)
library(rvest)

base_url <-"https://www.whatmobile.com.pk/"

models <- data_frame(model =c("Qmobile_Noir-M6","Qmobile_Noir-A1","Qmobile_Noir-E8"),
           link = paste0(base_url, model),
           page = map(link, read_html))

model_specs <- models %>% 
    mutate(node = map(page, html_node,'.specs'),
           specs = map(node, html_table, header =TRUE, fill =TRUE),
           specs = map(specs, set_names,c('var1','var2','val1','val2')))%>% 
    select(model, specs)%>% 
    unnest()

model_specs
#> # A tibble: 119 x 5#>              model      var1       var2#>              <chr>     <chr>      <chr>#>  1 Qmobile_Noir-M6     Build         OS#>  2 Qmobile_Noir-M6     Build Dimensions#>  3 Qmobile_Noir-M6     Build     Weight#>  4 Qmobile_Noir-M6     Build        SIM#>  5 Qmobile_Noir-M6     Build     Colors#>  6 Qmobile_Noir-M6 Frequency    2G Band#>  7 Qmobile_Noir-M6 Frequency    3G Band#>  8 Qmobile_Noir-M6 Frequency    4G Band#>  9 Qmobile_Noir-M6 Processor        CPU#> 10 Qmobile_Noir-M6 Processor    Chipset#> # ... with 109 more rows, and 2 more variables: val1 <chr>, val2 <chr>

The data is still pretty messy, but at least it's all there.

Solution 2:

it is capturing all three values, but it writes over them with each loop. That's why it only shows one value, and that one value being for the last page

You need to initialise a variable first before you go into your loop, I suggest a list so you can store data for each successive loop. So something like

final_table <- list()

for(i in 1:3) {
   webpage <- read_html(paste0("https://www.whatmobile.com.pk/",   df$Links[i]))
   data <- webpage %>%
   html_nodes(".specs") %>%
   .[[1]] %>% 
html_table(fill= TRUE)

 final_table[[i]] <- data.frame(data, stringsAsFactors = F)
}

In this was, it appends new data to the list with each loop.

Post a Comment for "Web Scraping In R With Loop From Data.frame"