博客/ PHP

simplexml_load解析xml和获取xml子元素内容的方法

9617次阅读 738人点赞 作者: WuBin 发布时间: 2021-09-24 09:00:21

扫码到手机查看

一段Rss的XML

最近在捣鼓RSS，也制作了一个RSS的订阅频道，感兴趣的小伙伴可以看一下：https://www.wubin.work/rss

这里面就涉及到了xml的解析的问题，我们先来看一段XML：

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
       ...
>
<channel>
    <title>张鑫旭-鑫空间-鑫生活</title>
    <atom:link href="..." rel="self" type="application/rss+xml" />
    <link>https://..</link>
    <description>it&#039;s my whole life!</description>
    <lastBuildDate>Thu, 23 Sep 2021 16:09:56 +0000</lastBuildDate>
    <language>zh-CN</language>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <generator>https://wordpress.org/?v=5.0.4</generator>
    <item>
        <title>HTML slot 插槽元素深入</title>
        <link>https://.../</link>
        <comments>https://...</comments>
        <pubDate>Thu, 23 Sep 2021 16:04:44 +0000</pubDate>
        <dc:creator><![CDATA[张 鑫旭]]></dc:creator>
        <category><![CDATA[HTML相关]]></category>
        <category><![CDATA[customElements]]></category>
        <category><![CDATA[dialog]]></category>
        <category><![CDATA[display:contents]]></category>
        <category><![CDATA[html]]></category>
        <category><![CDATA[slot]]></category>
        <category><![CDATA[Web Components]]></category>
        <guid isPermaLink="false">https://...</guid>
        <description><![CDATA[最细致的介绍 HTML slot 插槽元素的文章]]></description>
        <content:encoded><![CDATA[
            ...一堆HTML元素
        ]]></content:encoded>
        <wfw:commentRss>https://...</wfw:commentRss>
        <slash:comments>0</slash:comments>
    </item>

    <item>..若干item元素</item>

</channel>
</rss>

通用的解析方法

比较简单的是使用simplexml_load_file()方法。（代码节选自《PHP深度分析：101个核心技巧P280》）

$url = 'http://rss.sitepoint.com/f/sitepoint_blogs_feed';
$xml = simplexml_load_file($url);
$channel = $xml->channel;
echo "Title: ", (string) $channel->title, "\n",
    "Description: ", (string) $channel->description, "\n",
    "Link: ", (string) $channel->link, "\n";
foreach ($channel->item as $item)
{
  echo "Item: ", (string) $item->title, "\n",
      "Link: ", (string) $item->link, "\n",
      "Description:\n", (string) $item->description, "\n";
}

注意：如果要正在访问的属性的实际值，就必须首先将其转换为合适的类型，否则会收到代表这个值的SimpleXMLElement。

同样的也有simplexml_load_string，XMLreader等。

解析子元素

以上方法解析平常元素没什么问题，但是当遇到子元素的时候，比如，《一》中的：

<content:encoded><![CDATA[
            ...一堆HTML元素
]]></content:encoded>

首先，要明确：<content:encoded>，content是命名空间，encoded是标签名称。encoded就相当于子元素。

$url = 'https://www.uisdc.com/feed'; // 下面用到

方法一

参考：https://www.dazhuanlan.com/yeahe/topics/1423564

$xml = simplexml_load_file($url);
$channel = $xml->channel;

foreach ($channel->item as $itemIndex => $item) {
   // var_dump($item->children('content', true));
   $cc = $item->children('content', true)->encoded;
   echo $cc;
}

使用->children('content',true)->encoded获取</content:encoded>标签中的内容。

方法二

参考：http://cn.voidcc.com/question/p-cvhtjuik-cw.html

$feed_url = $url; 
$feeds = file_get_contents($feed_url); 
$feeds = str_replace("<content:encoded>","<contentEncoded>",$feeds); 
$feeds = str_replace("</content:encoded>","</contentEncoded>",$feeds); 
$rss = simplexml_load_string($feeds); 
foreach($rss->channel->item as $entry) { 
    echo ("<a href='$entry->link' title='$entry->title'>" . $entry->title . "</a>"); 
    echo ("$entry->contentEncoded"); 
}

这里首先获取xml内容，转化为字符串，然后替换掉字符串中的特定标签，最后使用simplexml_load_string方法从字符串中加载xml转化为simpleXML对象，并使用获取title的方法直接读取。

方法三

参考：https://cloud.tencent.com/developer/ask/108517

$rss = new DOMDocument();
$rss->load($url);
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
    $item = array (
            'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
            'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
            'pubDate' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue,
            'description' => $node->getElementsByTagName('description')->item(0)->nodeValue,
            'content' => $node->getElementsByTagName('encoded')->item(0)->nodeValue

            );
    array_push($feed, $item);
}

就是使用newDOMDocument获取到simpleXML对象转化为节点，并获取节点中的内容的操作思路。

simplexml_load解析xml和获取xml子元素内容的方法

一段Rss的XML

通用的解析方法

解析子元素

方法一

方法二

方法三

相关资料

python基础-操作列表和迭代器

uniapp实现被浏览器唤起的功能

【正则】一些常用的正则表达式总结

【中文】免费可商用字体下载与考证

Vue3开发一个v-loading的自定义指令

关于手机上滚动穿透问题的解决

Vue+html2canvas截图空白的问题

vue-router4过度动画无效解决方案