Html rvest 如何通过 id 选择特定的 css 节点

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32127921/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 18:06:35  来源:igfitidea点击:

rvest how to select a specific css node by id

htmlcssrweb-scrapingrvest

提问by Vegebird

I'm trying to use the rvest package to scrape data from a web page. In a simple format, the html code looks like this:

我正在尝试使用 rvest 包从网页中抓取数据。以简单的格式,html 代码如下所示:

<div class="style">
   <input id="a" value="123">
   <input id="b">
</div>

I want to get the value 123 from the first input. I tried the following R code:

我想从第一个输入中获取值 123。我尝试了以下 R 代码:

library(rvest)
url<-"xxx"
output<-html_nodes(url, ".style input")

This will return a list of input tags:

这将返回输入标签列表:

[[1]]
<input id="a" value="123">
[[2]]
<input id="b">

Next I tried using html_node to reference the first input tag by id:

接下来我尝试使用 html_node 通过 id 引用第一个输入标签:

html_node(output, "#a")

Here it returned a list of nulls instead of the input tag I want.

这里它返回了一个空值列表,而不是我想要的输入标签。

[[1]]
NULL
[[2]]
NULL

My question is, how can I reference the input tag using its id?

我的问题是,如何使用其 id 引用输入标签?

回答by Rentrop

You can use xpath:

您可以使用 xpath:

require(rvest)
text <- '<div class="style">
   <input id="a" value="123">
   <input id="b">
</div>'

h <- read_html(text)

h %>% 
  html_nodes(xpath = '//*[@id="a"]') %>%
  xml_attr("value")

The easiest way to get css- and xpath-selector is to use http://selectorgadget.com/. For a specific attribute like yours use chrome's developer toolbar to get the xpath as follows: enter image description here

获取 css- 和 xpath-selector 的最简单方法是使用http://selectorgadget.com/。对于像您这样的特定属性,请使用 chrome 的开发人员工具栏来获取 xpath,如下所示: 在此处输入图片说明

回答by hrbrmstr

This will work just fine with straight CSS selectors:

这适用于直接的 CSS 选择器:

library(rvest)

doc <- '<div class="style">
   <input id="a" value="123">
   <input id="b">
</div>'

pg <- html(doc)
html_attr(html_nodes(pg, "div > input:first-of-type"), "value")

## [1] "123"

回答by arvi1000

Adding an answer bc I don't see the easy css selector shorthand for selecting by id: using #your_id_name:

添加答案 bc 我没有看到用于按 id 选择的简单 css 选择器简写:使用#your_id_name

h %>% 
  html_node('#a') %>%
  html_attr('value')

which outputs "123" as desired.

根据需要输出“123”。

Same setup as the others:

与其他设置相同:

require(rvest)
text <- '<div class="style">
   <input id="a" value="123">
   <input id="b">
</div>'

h <- read_html(text)