Skip to content

Commit c116346

Browse files
committed
Updated documentation
1 parent c2fa05a commit c116346

File tree

3 files changed

+51
-37
lines changed

3 files changed

+51
-37
lines changed

CHANGELOG.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,15 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8-
## [Unreleased]
8+
## 3.0.0
99

1010
### Added
1111
- Support for PSR7 HTTP clients and requests for URL calls has been added.
1212
- PHAN support and fixed all issues from PHAN has been added.
1313
- PHP-CS-Fixer added.
1414
- Support for html5 charset detection.
1515
- Added the ability to match both parent and children.
16+
- Added character set conversion in load.
1617

1718
### Changed
1819
- Fixed issue with \ causing an infite loop.
@@ -28,6 +29,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2829
- Removed support for the depth first search option.
2930
- `findById()` method removed from Dom object.
3031
- Removed `load()` method in Dom object.
32+
- Removed support for php 7.1.
3133

3234
## 2.2.0
3335

README.md

Lines changed: 47 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
PHP Html Parser
22
==========================
33

4-
Version 2.2.1
4+
Version 3.0.0
55

66
[![Build Status](https://travis-ci.org/paquettg/php-html-parser.png)](https://travis-ci.org/paquettg/php-html-parser)
77
[![Coverage Status](https://coveralls.io/repos/paquettg/php-html-parser/badge.png)](https://coveralls.io/r/paquettg/php-html-parser)
@@ -18,7 +18,7 @@ Install the latest version using composer.
1818
$ composer require paquettg/php-html-parser
1919
```
2020

21-
This package can be found on [packagist](https://packagist.org/packages/paquettg/php-html-parser) and is best loaded using [composer](http://getcomposer.org/). We support php 7.1, 7.2, 7.3, and 7.4.
21+
This package can be found on [packagist](https://packagist.org/packages/paquettg/php-html-parser) and is best loaded using [composer](http://getcomposer.org/). We support php 7.2, 7.3, and 7.4.
2222

2323
Usage
2424
-----
@@ -28,7 +28,7 @@ You can find many examples of how to use the dom parser and any of its parts (wh
2828
```php
2929
// Assuming you installed from Composer:
3030
require "vendor/autoload.php";
31-
use PHPHtmlParser\Dom\Node;
31+
use PHPHtmlParser\Dom;
3232

3333
$dom = new Dom;
3434
$dom->loadStr('<div class="all"><p>Hey bro, <a href="google.com">click here</a><br /> :)</p></div>');
@@ -46,7 +46,7 @@ You may also seamlessly load a file into the dom instead of a string, which is m
4646
```php
4747
// Assuming you installed from Composer:
4848
require "vendor/autoload.php";
49-
use PHPHtmlParser\Dom\Node;
49+
use PHPHtmlParser\Dom;
5050

5151
$dom = new Dom;
5252
$dom->loadFromFile('tests/data/big.html');
@@ -69,8 +69,6 @@ foreach ($contents as $content)
6969

7070
This example loads the html from big.html, a real page found online, and gets all the content-border classes to process. It also shows a few things you can do with a node but it is not an exhaustive list of methods that a node has available.
7171

72-
Alternativly, you can always use the `load()` method to load the file. It will attempt to find the file using `file_exists` and, if successful, will call `loadFromFile()` for you. The same applies to a URL and `loadFromUrl()` method.
73-
7472
Loading Url
7573
----------------
7674

@@ -79,7 +77,7 @@ Loading a url is very similar to the way you would load the html from a file.
7977
```php
8078
// Assuming you installed from Composer:
8179
require "vendor/autoload.php";
82-
use PHPHtmlParser\Dom\Node;
80+
use PHPHtmlParser\Dom;
8381

8482
$dom = new Dom;
8583
$dom->loadFromUrl('http://google.com');
@@ -90,38 +88,36 @@ $dom->loadFromUrl('http://google.com');
9088
$html = $dom->outerHtml; // same result as the first example
9189
```
9290

93-
What makes the loadFromUrl method note worthy is the `PHPHtmlParser\CurlInterface` parameter, an optional second parameter. By default, we use the `PHPHtmlParser\Curl` class to get the contents of the url. On the other hand, though, you can inject your own implementation of CurlInterface and we will attempt to load the url using what ever tool/settings you want, up to you.
91+
loadFromUrl will, by default, use an implementation of the `\Psr\Http\Client\ClientInterface` to do the HTTP request and a default implementation of `\Psr\Http\Message\RequestInterface` to create the body of the request. You can easely implement your own version of either the client or request to use a custom HTTP connection when using loadFromUrl.
9492

9593
```php
9694
// Assuming you installed from Composer:
9795
require "vendor/autoload.php";
98-
use PHPHtmlParser\Dom\Node;
99-
use App\Services\Connector;
96+
use PHPHtmlParser\Dom;
97+
use App\Services\MyClient;
10098

10199
$dom = new Dom;
102-
$dom->loadFromUrl('http://google.com', [], new Connector);
100+
$dom->loadFromUrl('http://google.com', null, new MyClient());
103101
$html = $dom->outerHtml;
104102
```
105103

106-
As long as the Connector object implements the `PHPHtmlParser\CurlInterface` interface properly it will use that object to get the content of the url instead of the default `PHPHtmlParser\Curl` class.
104+
As long as the client object implements the interface properly it will use that object to get the content of the url.
107105

108106
Loading Strings
109107
---------------
110108

111-
Loading a string directly, with out the checks in `load()` is also easily done.
109+
Loading a string directly is also easily done.
112110

113111
```php
114112
// Assuming you installed from Composer:
115113
require "vendor/autoload.php";
116-
use PHPHtmlParser\Dom\Node;
114+
use PHPHtmlParser\Dom;
117115

118116
$dom = new Dom;
119-
$dom->loadStr('<html>String</html>', []);
117+
$dom->loadStr('<html>String</html>');
120118
$html = $dom->outerHtml;
121119
```
122120

123-
If the string is to long, depending on your file system, the `load()` method will throw a warning. If this happens you can just call the above method to bypass the `is_file()` check in the `load()` method.
124-
125121
Options
126122
-------
127123

@@ -130,21 +126,24 @@ You can also set parsing option that will effect the behavior of the parsing eng
130126
```php
131127
// Assuming you installed from Composer:
132128
require "vendor/autoload.php";
133-
use PHPHtmlParser\Dom\Node;
129+
use PHPHtmlParser\Dom;
130+
use PHPHtmlParser\Options;
134131

135132
$dom = new Dom;
136-
$dom->setOptions([
137-
'strict' => true, // Set a global option to enable strict html parsing.
138-
]);
133+
$dom->setOptions(
134+
// this is set as the global option level.
135+
(new Options())
136+
->setStrict(true)
137+
);
139138

140-
$dom->loadFromUrl('http://google.com', [
141-
'whitespaceTextNode' => false, // Only applies to this load.
142-
]);
139+
$dom->loadFromUrl('http://google.com',
140+
(new Options())->setWhitespaceTextNode(false) // only applies to this load.
141+
);
143142

144143
$dom->loadFromUrl('http://gmail.com'); // will not have whitespaceTextNode set to false.
145144
```
146145

147-
At the moment we support 8 options.
146+
At the moment we support 12 options.
148147

149148
**Strict**
150149

@@ -182,15 +181,17 @@ Set this to `false` if you want to preserve whitespace inside of text nodes. It
182181

183182
Set this to `false` if you want to preserve smarty script found in the html content. It is set to `true` by default.
184183

185-
**depthFirstSearch**
184+
**htmlSpecialCharsDecode**
185+
186+
By default this is set to `false`. Setting this to `true` will apply the php function `htmlspecialchars_decode` too all attribute values and text nodes.
186187

187-
By default this is set to `false` for legacy support. Setting this to `true` will change the behavior of find to order elements by depth first. This will properly preserve the order of elements as they where in the HTML.
188+
**selfClosing**
188189

189-
This option is depricated and will be removed in version `3.0.0` with the new behavior being as if it was set to `true`.
190+
This option contains an array of all self closing tags. These tags must be self closing and the parser will force them to be so if you have strict turned on. You can update this list with any additional tags that can be used as a self closing tag when using strict. You can also remove tags from this array or clear it out completly.
190191

191-
**htmlSpecialCharsDecode**
192+
**noSlash**
192193

193-
By default this is set to `false`. Setting this to `true` will apply the php function `htmlspecialchars_decode` too all attribute values and text nodes.
194+
This option contains an array of all tags that can not be self closing. The list starts off as empty but you can add elements as you wish.
194195

195196
Static Facade
196197
-------------
@@ -200,7 +201,7 @@ You can also mount a static facade for the Dom object.
200201
```PHP
201202
PHPHtmlParser\StaticDom::mount();
202203

203-
Dom::load('tests/big.hmtl');
204+
Dom::loadFromFile('tests/big.hmtl');
204205
$objects = Dom::find('.content-border');
205206

206207
```
@@ -213,8 +214,10 @@ Modifying The Dom
213214
You can always modify the dom that was created from any loading method. To change the attribute of any node you can just call the `setAttribute` method.
214215

215216
```php
217+
use PHPHtmlParser\Dom;
218+
216219
$dom = new Dom;
217-
$dom->load('<div class="all"><p>Hey bro, <a href="google.com">click here</a><br /> :)</p></div>');
220+
$dom->loadStr('<div class="all"><p>Hey bro, <a href="google.com">click here</a><br /> :)</p></div>');
218221
$a = $dom->find('a')[0];
219222
$a->setAttribute('class', 'foo');
220223
echo $a->getAttribute('class'); // "foo"
@@ -223,8 +226,11 @@ echo $a->getAttribute('class'); // "foo"
223226
You may also get the `PHPHtmlParser\Dom\Tag` class directly and manipulate it as you see fit.
224227

225228
```php
229+
use PHPHtmlParser\Dom;
230+
226231
$dom = new Dom;
227-
$dom->load('<div class="all"><p>Hey bro, <a href="google.com">click here</a><br /> :)</p></div>');
232+
$dom->loadStr('<div class="all"><p>Hey bro, <a href="google.com">click here</a><br /> :)</p></div>');
233+
/** @var Dom\Node\AbstractNode $a */
228234
$a = $dom->find('a')[0];
229235
$tag = $a->getTag();
230236
$tag->setAttribute('class', 'foo');
@@ -234,8 +240,11 @@ echo $a->getAttribute('class'); // "foo"
234240
It is also possible to remove a node from the tree. Simply call the `delete` method on any node to remove it from the tree. It is important to note that you should unset the node after removing it from the `DOM``, it will still take memory as long as it is not unset.
235241

236242
```php
243+
use PHPHtmlParser\Dom;
244+
237245
$dom = new Dom;
238-
$dom->load('<div class="all"><p>Hey bro, <a href="google.com">click here</a><br /> :)</p></div>');
246+
$dom->loadStr('<div class="all"><p>Hey bro, <a href="google.com">click here</a><br /> :)</p></div>');
247+
/** @var Dom\Node\AbstractNode $a */
239248
$a = $dom->find('a')[0];
240249
$a->delete();
241250
unset($a);
@@ -245,8 +254,11 @@ echo $dom; // '<div class="all"><p>Hey bro, <br /> :)</p></div>');
245254
You can modify the text of `TextNode` objects easely. Please note that, if you set an encoding, the new text will be encoded using the existing encoding.
246255

247256
```php
257+
use PHPHtmlParser\Dom;
258+
248259
$dom = new Dom;
249-
$dom->load('<div class="all"><p>Hey bro, <a href="google.com">click here</a><br /> :)</p></div>');
260+
$dom->loadStr('<div class="all"><p>Hey bro, <a href="google.com">click here</a><br /> :)</p></div>');
261+
/** @var Dom\Node\InnerNode $a */
250262
$a = $dom->find('a')[0];
251263
$a->firstChild()->setText('biz baz');
252264
echo $dom; // '<div class="all"><p>Hey bro, <a href="google.com">biz baz</a><br /> :)</p></div>'

tests/Dom/CleanerTest.php

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
class CleanerTest extends TestCase
1010
{
11-
public function testLoadByURL()
11+
public function testCleanEregiFailureFile()
1212
{
1313
$cleaner = new Cleaner();
1414
$string = $cleaner->clean(\file_get_contents('tests/data/files/mvEregiReplaceFailure.html'), new Options(), 'utf-8');

0 commit comments

Comments
 (0)