Skip to content Skip to sidebar Skip to footer

R Readhtmltable() Function Error

I'm running into a problem when trying to use the readHTMLTable function in the R package XML. When running library(XML) baseurl <- 'http://www.pro-football-reference.com/tea

Solution 1:

When readHTMLTable() complains about the 'names' attribute, it's a good bet that it's having trouble matching the data with what it's parsed for header values. The simplest way around this is to simply turn off header parsing entirely:

table.list <- readHTMLTable(theurl, header=F)

Note that I changed the name of the return value from "readtable" to "table.list". (I also skipped the getURL() call since 1. it didn't work for me and 2. readHTMLTable() knows how to handle URLs). The reason for the change is that, without further direction, readHTMLTable() will hunt down and parse every HTML table it can find on the given page, returning a list containing a data.frame for each.

The page you have sent it after is fairly rich, with 8 separate tables:

> length(table.list)
[1]8

If you were only interested in a single table on the page, you can use the which attribute to specify it and receive its contents as a data.frame directly.

This could also cure your original problem if it had choked on a table you're not interested in. Many pages still use tables for navigation, search boxes, etc., so it's worth taking a look at the page first.

But this is unlikely to be the case in your example since it actually choked on all but one of them. In the unlikely event that the stars aligned and you were only interested in the successfully-oarsed third table on the page (passing statistics) you could grab it like this, keeping header parsing on:

> passing.df = readHTMLTable(theurl, which=3)> print(passing.df)
  No.             Age Pos  G GS  QBrec Cmp Att  Cmp%  Yds TD TD% Int Int% Lng  Y/A AY/A  Y/C   Y/G  Rate Sk Yds NY/A  ANY/A Sk% 4QC GWD
1  12  Tom Brady*  34  QB 16 16 13-3-0 401 611  65.6 5235 39 6.4  12  2.0  99  8.6  9.0 13.1 327.2 105.6 32 173  7.9   8.2 5.0   2   3
2   8 Brian Hoyer  26      3  0          1   1 100.0   22  0 0.0   0  0.0  22 22.0 22.0 22.0   7.3 118.7  0   0 22.0  22.0 0.0

Post a Comment for "R Readhtmltable() Function Error"