You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -18,7 +18,7 @@ Install the latest version using composer.
18
18
$ composer require paquettg/php-html-parser
19
19
```
20
20
21
-
This package can be found on [packagist](https://packagist.org/packages/paquettg/php-html-parser) and is best loaded using [composer](http://getcomposer.org/). We support php 7.1, 7.2, 7.3, and 7.4.
21
+
This package can be found on [packagist](https://packagist.org/packages/paquettg/php-html-parser) and is best loaded using [composer](http://getcomposer.org/). We support php 7.2, 7.3, and 7.4.
22
22
23
23
Usage
24
24
-----
@@ -28,7 +28,7 @@ You can find many examples of how to use the dom parser and any of its parts (wh
@@ -46,7 +46,7 @@ You may also seamlessly load a file into the dom instead of a string, which is m
46
46
```php
47
47
// Assuming you installed from Composer:
48
48
require "vendor/autoload.php";
49
-
use PHPHtmlParser\Dom\Node;
49
+
use PHPHtmlParser\Dom;
50
50
51
51
$dom = new Dom;
52
52
$dom->loadFromFile('tests/data/big.html');
@@ -69,8 +69,6 @@ foreach ($contents as $content)
69
69
70
70
This example loads the html from big.html, a real page found online, and gets all the content-border classes to process. It also shows a few things you can do with a node but it is not an exhaustive list of methods that a node has available.
71
71
72
-
Alternativly, you can always use the `load()` method to load the file. It will attempt to find the file using `file_exists` and, if successful, will call `loadFromFile()` for you. The same applies to a URL and `loadFromUrl()` method.
73
-
74
72
Loading Url
75
73
----------------
76
74
@@ -79,7 +77,7 @@ Loading a url is very similar to the way you would load the html from a file.
$html = $dom->outerHtml; // same result as the first example
91
89
```
92
90
93
-
What makes the loadFromUrl method note worthy is the `PHPHtmlParser\CurlInterface` parameter, an optional second parameter. By default, we use the `PHPHtmlParser\Curl` class to get the contents of the url. On the other hand, though, you can inject your own implementation of CurlInterface and we will attempt to load the url using what ever tool/settings you want, up to you.
91
+
loadFromUrl will, by default, use an implementation of the `\Psr\Http\Client\ClientInterface` to do the HTTP request and a default implementation of `\Psr\Http\Message\RequestInterface`to create the body of the request. You can easely implement your own version of either the client or request to use a custom HTTP connection when using loadFromUrl.
94
92
95
93
```php
96
94
// Assuming you installed from Composer:
97
95
require "vendor/autoload.php";
98
-
use PHPHtmlParser\Dom\Node;
99
-
use App\Services\Connector;
96
+
use PHPHtmlParser\Dom;
97
+
use App\Services\MyClient;
100
98
101
99
$dom = new Dom;
102
-
$dom->loadFromUrl('http://google.com', [], new Connector);
100
+
$dom->loadFromUrl('http://google.com', null, new MyClient());
103
101
$html = $dom->outerHtml;
104
102
```
105
103
106
-
As long as the Connector object implements the `PHPHtmlParser\CurlInterface`interface properly it will use that object to get the content of the url instead of the default `PHPHtmlParser\Curl` class.
104
+
As long as the client object implements the interface properly it will use that object to get the content of the url.
107
105
108
106
Loading Strings
109
107
---------------
110
108
111
-
Loading a string directly, with out the checks in `load()` is also easily done.
109
+
Loading a string directly is also easily done.
112
110
113
111
```php
114
112
// Assuming you installed from Composer:
115
113
require "vendor/autoload.php";
116
-
use PHPHtmlParser\Dom\Node;
114
+
use PHPHtmlParser\Dom;
117
115
118
116
$dom = new Dom;
119
-
$dom->loadStr('<html>String</html>', []);
117
+
$dom->loadStr('<html>String</html>');
120
118
$html = $dom->outerHtml;
121
119
```
122
120
123
-
If the string is to long, depending on your file system, the `load()` method will throw a warning. If this happens you can just call the above method to bypass the `is_file()` check in the `load()` method.
124
-
125
121
Options
126
122
-------
127
123
@@ -130,21 +126,24 @@ You can also set parsing option that will effect the behavior of the parsing eng
130
126
```php
131
127
// Assuming you installed from Composer:
132
128
require "vendor/autoload.php";
133
-
use PHPHtmlParser\Dom\Node;
129
+
use PHPHtmlParser\Dom;
130
+
use PHPHtmlParser\Options;
134
131
135
132
$dom = new Dom;
136
-
$dom->setOptions([
137
-
'strict' => true, // Set a global option to enable strict html parsing.
138
-
]);
133
+
$dom->setOptions(
134
+
// this is set as the global option level.
135
+
(new Options())
136
+
->setStrict(true)
137
+
);
139
138
140
-
$dom->loadFromUrl('http://google.com', [
141
-
'whitespaceTextNode' => false, // Only applies to this load.
142
-
]);
139
+
$dom->loadFromUrl('http://google.com',
140
+
(new Options())->setWhitespaceTextNode(false) // only applies to this load.
141
+
);
143
142
144
143
$dom->loadFromUrl('http://gmail.com'); // will not have whitespaceTextNode set to false.
145
144
```
146
145
147
-
At the moment we support 8 options.
146
+
At the moment we support 12 options.
148
147
149
148
**Strict**
150
149
@@ -182,15 +181,17 @@ Set this to `false` if you want to preserve whitespace inside of text nodes. It
182
181
183
182
Set this to `false` if you want to preserve smarty script found in the html content. It is set to `true` by default.
184
183
185
-
**depthFirstSearch**
184
+
**htmlSpecialCharsDecode**
185
+
186
+
By default this is set to `false`. Setting this to `true` will apply the php function `htmlspecialchars_decode` too all attribute values and text nodes.
186
187
187
-
By default this is set to `false` for legacy support. Setting this to `true` will change the behavior of find to order elements by depth first. This will properly preserve the order of elements as they where in the HTML.
188
+
**selfClosing**
188
189
189
-
This option is depricated and will be removed in version `3.0.0`with the new behavior being as if it was set to `true`.
190
+
This option contains an array of all self closing tags. These tags must be self closing and the parser will force them to be so if you have strict turned on. You can update this list with any additional tags that can be used as a self closing tag when using strict. You can also remove tags from this array or clear it out completly.
190
191
191
-
**htmlSpecialCharsDecode**
192
+
**noSlash**
192
193
193
-
By default this is set to `false`. Setting this to `true` will apply the php function `htmlspecialchars_decode` too all attribute values and text nodes.
194
+
This option contains an array of all tags that can not be self closing. The list starts off as empty but you can add elements as you wish.
194
195
195
196
Static Facade
196
197
-------------
@@ -200,7 +201,7 @@ You can also mount a static facade for the Dom object.
200
201
```PHP
201
202
PHPHtmlParser\StaticDom::mount();
202
203
203
-
Dom::load('tests/big.hmtl');
204
+
Dom::loadFromFile('tests/big.hmtl');
204
205
$objects = Dom::find('.content-border');
205
206
206
207
```
@@ -213,8 +214,10 @@ Modifying The Dom
213
214
You can always modify the dom that was created from any loading method. To change the attribute of any node you can just call the `setAttribute` method.
It is also possible to remove a node from the tree. Simply call the `delete` method on any node to remove it from the tree. It is important to note that you should unset the node after removing it from the `DOM``, it will still take memory as long as it is not unset.
You can modify the text of `TextNode` objects easely. Please note that, if you set an encoding, the new text will be encoded using the existing encoding.
0 commit comments