[Git][java-team/jsoup][master] 3 commits: New upstream version 1.15.3

Markus Koschany (@apo) gitlab at salsa.debian.org
Sat Sep 3 00:09:02 BST 2022



Markus Koschany pushed to branch master at Debian Java Maintainers / jsoup


Commits:
c62d162d by Markus Koschany at 2022-09-03T01:02:48+02:00
New upstream version 1.15.3
- - - - -
cc352488 by Markus Koschany at 2022-09-03T01:02:48+02:00
Update upstream source from tag 'upstream/1.15.3'

Update to upstream version '1.15.3'
with Debian dir 4c621643ef5ea6a254fe24bb6236ad2f599ec0ba
- - - - -
6da4710c by Markus Koschany at 2022-09-03T01:06:04+02:00
Update changelog

- - - - -


26 changed files:

- CHANGES
- README.md
- debian/changelog
- pom.xml
- src/main/java/org/jsoup/helper/HttpConnection.java
- src/main/java/org/jsoup/helper/Validate.java
- + src/main/java/org/jsoup/helper/ValidationException.java
- src/main/java/org/jsoup/helper/W3CDom.java
- src/main/java/org/jsoup/internal/ConstrainableInputStream.java
- src/main/java/org/jsoup/internal/StringUtil.java
- src/main/java/org/jsoup/nodes/Element.java
- src/main/java/org/jsoup/parser/TreeBuilder.java
- src/main/java/org/jsoup/safety/Cleaner.java
- src/main/java/org/jsoup/select/QueryParser.java
- src/test/java/org/jsoup/helper/DataUtilTest.java
- src/test/java/org/jsoup/helper/HttpConnectionTest.java
- src/test/java/org/jsoup/helper/ValidateTest.java
- src/test/java/org/jsoup/integration/ConnectTest.java
- src/test/java/org/jsoup/integration/SessionIT.java
- src/test/java/org/jsoup/integration/SessionTest.java
- src/test/java/org/jsoup/integration/TestServer.java
- src/test/java/org/jsoup/internal/StringUtilTest.java
- src/test/java/org/jsoup/nodes/ElementTest.java
- src/test/java/org/jsoup/nodes/FormElementTest.java
- src/test/java/org/jsoup/nodes/PositionTest.java
- src/test/java/org/jsoup/safety/CleanerTest.java


Changes:

=====================================
CHANGES
=====================================
@@ -1,6 +1,26 @@
 jsoup changelog
 
-*** Release 1.15.2 [PENDING]
+Release 1.15.3 [2022-Aug-24]
+  * Security: fixed an issue where the jsoup cleaner may incorrectly sanitize crafted XSS attempts if
+    SafeList.preserveRelativeLinks is enabled.
+    <https://github.com/jhy/jsoup/security/advisories/GHSA-gp7f-rwcx-9369>
+
+  * Improvement: the Cleaner will preserve the source position of cleaned elements, if source tracking is enabled in the
+    original parse.
+
+  * Improvement: the error messages output from Validate are more descriptive. Exceptions are now ValidationExceptions
+    (extending IllegalArgumentException). Stack traces do not include the Validate class, to make it simpler to see
+    where the exception originated. Common validation errors including malformed URLs and empty selector results have
+    more explicit error messages.
+
+  * Bugfix: the DataUtil would incorrectly read from InputStreams that emitted reads less than the requested size. This
+    lead to incorrect results when parsing from chunked server responses, for e.g.
+    <https://github.com/jhy/jsoup/issues/1807>
+
+  * Build Improvement: added implementation version and related fields to the jar manifest.
+    <https://github.com/jhy/jsoup/issues/1809>
+
+*** Release 1.15.2 [2022-Jul-04]
   * Improvement: added the ability to track the position (line, column, index) in the original input source from where
     a given node was parsed. Accessible via Node.sourceRange() and Element.endSourceRange().
     <https://github.com/jhy/jsoup/pull/1790>


=====================================
README.md
=====================================
@@ -17,7 +17,7 @@ See [**jsoup.org**](https://jsoup.org/) for downloads and the full [API document
 [![Build Status](https://github.com/jhy/jsoup/workflows/Build/badge.svg)](https://github.com/jhy/jsoup/actions?query=workflow%3ABuild)
 
 ## Example
-Fetch the [Wikipedia](https://en.wikipedia.org/wiki/Main_Page) homepage, parse it to a [DOM](https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Introduction), and select the headlines from the *In the News* section into a list of [Elements](https://jsoup.org/apidocs/index.html?org/jsoup/select/Elements.html):
+Fetch the [Wikipedia](https://en.wikipedia.org/wiki/Main_Page) homepage, parse it to a [DOM](https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Introduction), and select the headlines from the *In the News* section into a list of [Elements](https://jsoup.org/apidocs/org/jsoup/select/Elements.html):
 
 ```java
 Document doc = Jsoup.connect("https://en.wikipedia.org/").get();


=====================================
debian/changelog
=====================================
@@ -1,3 +1,14 @@
+jsoup (1.15.3-1) unstable; urgency=high
+
+  * Team upload.
+  * New upstream version 1.15.3.
+    - Fix CVE-2022-36033:
+      Jsoup may incorrectly sanitize HTML including Javascript which could
+      allow XSS attacks. (Closes: #1018931)
+      Thanks to Salvatore Bonaccorso for the report.
+
+ -- Markus Koschany <apo at debian.org>  Sat, 03 Sep 2022 01:03:14 +0200
+
 jsoup (1.15.2-1) unstable; urgency=medium
 
   * Team upload.


=====================================
pom.xml
=====================================
@@ -5,7 +5,7 @@
 
   <groupId>org.jsoup</groupId>
   <artifactId>jsoup</artifactId>
-  <version>1.15.2</version><!-- remember to update previous version below for japicmp -->
+  <version>1.15.3</version><!-- remember to update previous version below for japicmp -->
   <description>jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.</description>
   <url>https://jsoup.org/</url>
   <inceptionYear>2009</inceptionYear>
@@ -24,7 +24,7 @@
     <url>https://github.com/jhy/jsoup</url>
     <connection>scm:git:https://github.com/jhy/jsoup.git</connection>
     <!-- <developerConnection>scm:git:git at github.com:jhy/jsoup.git</developerConnection> -->
-    <tag>jsoup-1.15.2</tag>
+    <tag>jsoup-1.15.3</tag>
   </scm>
   <organization>
     <name>Jonathan Hedley</name>
@@ -123,6 +123,9 @@
         <version>3.2.2</version>
         <configuration>
           <archive>
+            <manifest>
+              <addDefaultImplementationEntries>true</addDefaultImplementationEntries>
+            </manifest>
             <manifestEntries>
               <Automatic-Module-Name>org.jsoup</Automatic-Module-Name>
             </manifestEntries>
@@ -136,7 +139,7 @@
       <plugin>
         <groupId>org.apache.felix</groupId>
         <artifactId>maven-bundle-plugin</artifactId>
-        <version>5.1.6</version>
+        <version>5.1.8</version>
         <executions>
           <execution>
             <id>bundle-manifest</id>
@@ -157,7 +160,7 @@
       <plugin>
         <groupId>org.apache.maven.plugins</groupId>
         <artifactId>maven-resources-plugin</artifactId>
-        <version>3.2.0</version>
+        <version>3.3.0</version>
       </plugin>
       <plugin>
         <artifactId>maven-release-plugin</artifactId>
@@ -317,7 +320,7 @@
     <dependency>
       <groupId>org.junit.jupiter</groupId>
       <artifactId>junit-jupiter</artifactId>
-      <version>5.8.2</version>
+      <version>5.9.0</version>
       <scope>test</scope>
     </dependency>
 
@@ -325,7 +328,7 @@
       <!-- gson, to fetch entities from w3.org -->
       <groupId>com.google.code.gson</groupId>
       <artifactId>gson</artifactId>
-      <version>2.9.0</version>
+      <version>2.9.1</version>
       <scope>test</scope>
     </dependency>
 
@@ -333,7 +336,7 @@
       <!-- jetty for webserver integration tests. 9.x is last with Java7 support -->
       <groupId>org.eclipse.jetty</groupId>
       <artifactId>jetty-server</artifactId>
-      <version>9.4.46.v20220331</version>
+      <version>9.4.48.v20220622</version>
       <scope>test</scope>
     </dependency>
 
@@ -341,7 +344,7 @@
       <!-- jetty for webserver integration tests -->
       <groupId>org.eclipse.jetty</groupId>
       <artifactId>jetty-servlet</artifactId>
-      <version>9.4.46.v20220331</version>
+      <version>9.4.48.v20220622</version>
       <scope>test</scope>
     </dependency>
 


=====================================
src/main/java/org/jsoup/helper/HttpConnection.java
=====================================
@@ -179,11 +179,11 @@ public class HttpConnection implements Connection {
     }
 
     public Connection url(String url) {
-        Validate.notEmpty(url, "Must supply a valid URL");
+        Validate.notEmptyParam(url, "url");
         try {
             req.url(new URL(encodeUrl(url)));
         } catch (MalformedURLException e) {
-            throw new IllegalArgumentException("Malformed URL: " + url, e);
+            throw new IllegalArgumentException(String.format("The supplied URL, '%s', is malformed. Make sure it is an absolute URL, and starts with 'http://' or 'https://'. See https://jsoup.org/cookbook/extracting-data/working-with-urls", url), e);
         }
         return this;
     }
@@ -199,7 +199,7 @@ public class HttpConnection implements Connection {
     }
 
     public Connection userAgent(String userAgent) {
-        Validate.notNull(userAgent, "User agent must not be null");
+        Validate.notNullParam(userAgent, "userAgent");
         req.header(USER_AGENT, userAgent);
         return this;
     }
@@ -220,7 +220,7 @@ public class HttpConnection implements Connection {
     }
 
     public Connection referrer(String referrer) {
-        Validate.notNull(referrer, "Referrer must not be null");
+        Validate.notNullParam(referrer, "referrer");
         req.header("Referer", referrer);
         return this;
     }
@@ -263,7 +263,7 @@ public class HttpConnection implements Connection {
     }
 
     public Connection data(Map<String, String> data) {
-        Validate.notNull(data, "Data map must not be null");
+        Validate.notNullParam(data, "data");
         for (Map.Entry<String, String> entry : data.entrySet()) {
             req.data(KeyVal.create(entry.getKey(), entry.getValue()));
         }
@@ -271,7 +271,7 @@ public class HttpConnection implements Connection {
     }
 
     public Connection data(String... keyvals) {
-        Validate.notNull(keyvals, "Data key value pairs must not be null");
+        Validate.notNullParam(keyvals, "keyvals");
         Validate.isTrue(keyvals.length %2 == 0, "Must supply an even number of key value pairs");
         for (int i = 0; i < keyvals.length; i += 2) {
             String key = keyvals[i];
@@ -284,7 +284,7 @@ public class HttpConnection implements Connection {
     }
 
     public Connection data(Collection<Connection.KeyVal> data) {
-        Validate.notNull(data, "Data collection must not be null");
+        Validate.notNullParam(data, "data");
         for (Connection.KeyVal entry: data) {
             req.data(entry);
         }
@@ -292,7 +292,7 @@ public class HttpConnection implements Connection {
     }
 
     public Connection.KeyVal data(String key) {
-        Validate.notEmpty(key, "Data key must not be empty");
+        Validate.notEmptyParam(key, "key");
         for (Connection.KeyVal keyVal : request().data()) {
             if (keyVal.key().equals(key))
                 return keyVal;
@@ -311,7 +311,7 @@ public class HttpConnection implements Connection {
     }
 
     public Connection headers(Map<String,String> headers) {
-        Validate.notNull(headers, "Header map must not be null");
+        Validate.notNullParam(headers, "headers");
         for (Map.Entry<String,String> entry : headers.entrySet()) {
             req.header(entry.getKey(),entry.getValue());
         }
@@ -324,7 +324,7 @@ public class HttpConnection implements Connection {
     }
 
     public Connection cookies(Map<String, String> cookies) {
-        Validate.notNull(cookies, "Cookie map must not be null");
+        Validate.notNullParam(cookies, "cookies");
         for (Map.Entry<String, String> entry : cookies.entrySet()) {
             req.cookie(entry.getKey(), entry.getValue());
         }
@@ -432,7 +432,7 @@ public class HttpConnection implements Connection {
         }
 
         public T url(URL url) {
-            Validate.notNull(url, "URL must not be null");
+            Validate.notNullParam(url, "url");
             this.url = punyUrl(url); // if calling url(url) directly, does not go through encodeUrl, so we punycode it explicitly. todo - should we encode here as well?
             return (T) this;
         }
@@ -442,13 +442,13 @@ public class HttpConnection implements Connection {
         }
 
         public T method(Method method) {
-            Validate.notNull(method, "Method must not be null");
+            Validate.notNullParam(method, "method");
             this.method = method;
             return (T) this;
         }
 
         public String header(String name) {
-            Validate.notNull(name, "Header name must not be null");
+            Validate.notNullParam(name, "name");
             List<String> vals = getHeadersCaseInsensitive(name);
             if (vals.size() > 0) {
                 // https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
@@ -460,7 +460,7 @@ public class HttpConnection implements Connection {
 
         @Override
         public T addHeader(String name, String value) {
-            Validate.notEmpty(name);
+            Validate.notEmptyParam(name, "name");
             //noinspection ConstantConditions
             value = value == null ? "" : value;
 
@@ -476,7 +476,7 @@ public class HttpConnection implements Connection {
 
         @Override
         public List<String> headers(String name) {
-            Validate.notEmpty(name);
+            Validate.notEmptyParam(name, "name");
             return getHeadersCaseInsensitive(name);
         }
 
@@ -530,14 +530,14 @@ public class HttpConnection implements Connection {
         }
 
         public T header(String name, String value) {
-            Validate.notEmpty(name, "Header name must not be empty");
+            Validate.notEmptyParam(name, "name");
             removeHeader(name); // ensures we don't get an "accept-encoding" and a "Accept-Encoding"
             addHeader(name, value);
             return (T) this;
         }
 
         public boolean hasHeader(String name) {
-            Validate.notEmpty(name, "Header name must not be empty");
+            Validate.notEmptyParam(name, "name");
             return !getHeadersCaseInsensitive(name).isEmpty();
         }
 
@@ -556,8 +556,8 @@ public class HttpConnection implements Connection {
         }
 
         public T removeHeader(String name) {
-            Validate.notEmpty(name, "Header name must not be empty");
-            Map.Entry<String, List<String>> entry = scanHeaders(name); // remove is case insensitive too
+            Validate.notEmptyParam(name, "name");
+            Map.Entry<String, List<String>> entry = scanHeaders(name); // remove is case-insensitive too
             if (entry != null)
                 headers.remove(entry.getKey()); // ensures correct case
             return (T) this;
@@ -600,24 +600,24 @@ public class HttpConnection implements Connection {
         }
 
         public String cookie(String name) {
-            Validate.notEmpty(name, "Cookie name must not be empty");
+            Validate.notEmptyParam(name, "name");
             return cookies.get(name);
         }
 
         public T cookie(String name, String value) {
-            Validate.notEmpty(name, "Cookie name must not be empty");
-            Validate.notNull(value, "Cookie value must not be null");
+            Validate.notEmptyParam(name, "name");
+            Validate.notNullParam(value, "value");
             cookies.put(name, value);
             return (T) this;
         }
 
         public boolean hasCookie(String name) {
-            Validate.notEmpty(name, "Cookie name must not be empty");
+            Validate.notEmptyParam(name, "name");
             return cookies.containsKey(name);
         }
 
         public T removeCookie(String name) {
-            Validate.notEmpty(name, "Cookie name must not be empty");
+            Validate.notEmptyParam(name, "name");
             cookies.remove(name);
             return (T) this;
         }
@@ -749,7 +749,7 @@ public class HttpConnection implements Connection {
         }
 
         public Request data(Connection.KeyVal keyval) {
-            Validate.notNull(keyval, "Key val must not be null");
+            Validate.notNullParam(keyval, "keyval");
             data.add(keyval);
             return this;
         }
@@ -778,7 +778,7 @@ public class HttpConnection implements Connection {
         }
 
         public Connection.Request postDataCharset(String charset) {
-            Validate.notNull(charset, "Charset must not be null");
+            Validate.notNullParam(charset, "charset");
             if (!Charset.isSupported(charset)) throw new IllegalCharsetNameException(charset);
             this.postDataCharset = charset;
             return this;
@@ -834,7 +834,7 @@ public class HttpConnection implements Connection {
                 Validate.isFalse(req.executing, "Multiple threads were detected trying to execute the same request concurrently. Make sure to use Connection#newRequest() and do not share an executing request between threads.");
                 req.executing = true;
             }
-            Validate.notNull(req, "Request must not be null");
+            Validate.notNullParam(req, "req");
             URL url = req.url();
             Validate.notNull(url, "URL must be specified to connect");
             String protocol = url.getProtocol();
@@ -1275,14 +1275,14 @@ public class HttpConnection implements Connection {
         }
 
         private KeyVal(String key, String value) {
-            Validate.notEmpty(key, "Data key must not be empty");
-            Validate.notNull(value, "Data value must not be null");
+            Validate.notEmptyParam(key, "key");
+            Validate.notNullParam(value, "value");
             this.key = key;
             this.value = value;
         }
 
         public KeyVal key(String key) {
-            Validate.notEmpty(key, "Data key must not be empty");
+            Validate.notEmptyParam(key, "key");
             this.key = key;
             return this;
         }
@@ -1292,7 +1292,7 @@ public class HttpConnection implements Connection {
         }
 
         public KeyVal value(String value) {
-            Validate.notNull(value, "Data value must not be null");
+            Validate.notNullParam(value, "value");
             this.value = value;
             return this;
         }
@@ -1302,7 +1302,7 @@ public class HttpConnection implements Connection {
         }
 
         public KeyVal inputStream(InputStream inputStream) {
-            Validate.notNull(value, "Data input stream must not be null");
+            Validate.notNullParam(value, "inputStream");
             this.stream = inputStream;
             return this;
         }


=====================================
src/main/java/org/jsoup/helper/Validate.java
=====================================
@@ -3,7 +3,7 @@ package org.jsoup.helper;
 import javax.annotation.Nullable;
 
 /**
- * Simple validation methods. Designed for jsoup internal use.
+ * Validators to check that method arguments meet expectations. 
  */
 public final class Validate {
     
@@ -12,22 +12,34 @@ public final class Validate {
     /**
      * Validates that the object is not null
      * @param obj object to test
-     * @throws IllegalArgumentException if the object is null
+     * @throws ValidationException if the object is null
      */
     public static void notNull(@Nullable Object obj) {
         if (obj == null)
-            throw new IllegalArgumentException("Object must not be null");
+            throw new ValidationException("Object must not be null");
+    }
+
+    /**
+     Validates that the parameter is not null
+
+     * @param obj the parameter to test
+     * @param param the name of the parameter, for presentation in the validation exception.
+     * @throws ValidationException if the object is null
+     */
+    public static void notNullParam(@Nullable final Object obj, final String param) {
+        if (obj == null)
+            throw new ValidationException(String.format("The parameter '%s' must not be null.", param));
     }
 
     /**
      * Validates that the object is not null
      * @param obj object to test
      * @param msg message to include in the Exception if validation fails
-     * @throws IllegalArgumentException if the object is null
+     * @throws ValidationException if the object is null
      */
     public static void notNull(@Nullable Object obj, String msg) {
         if (obj == null)
-            throw new IllegalArgumentException(msg);
+            throw new ValidationException(msg);
     }
 
     /**
@@ -35,60 +47,75 @@ public final class Validate {
      null object. (Works around lack of Objects.requestNonNull in Android version.)
      * @param obj nullable object to case to not-null
      * @return the object, or throws an exception if it is null
-     * @throws IllegalArgumentException if the object is null
+     * @throws ValidationException if the object is null
      */
     public static Object ensureNotNull(@Nullable Object obj) {
         if (obj == null)
-            throw new IllegalArgumentException("Object must not be null");
+            throw new ValidationException("Object must not be null");
+        else return obj;
+    }
+
+    /**
+     Verifies the input object is not null, and returns that object. Effectively this casts a nullable object to a non-
+     null object. (Works around lack of Objects.requestNonNull in Android version.)
+     * @param obj nullable object to case to not-null
+     * @param msg the String format message to include in the validation exception when thrown
+     * @param args the arguments to the msg
+     * @return the object, or throws an exception if it is null
+     * @throws ValidationException if the object is null
+     */
+    public static Object ensureNotNull(@Nullable Object obj, String msg, Object... args) {
+        if (obj == null)
+            throw new ValidationException(String.format(msg, args));
         else return obj;
     }
 
     /**
      * Validates that the value is true
      * @param val object to test
-     * @throws IllegalArgumentException if the object is not true
+     * @throws ValidationException if the object is not true
      */
     public static void isTrue(boolean val) {
         if (!val)
-            throw new IllegalArgumentException("Must be true");
+            throw new ValidationException("Must be true");
     }
 
     /**
      * Validates that the value is true
      * @param val object to test
      * @param msg message to include in the Exception if validation fails
-     * @throws IllegalArgumentException if the object is not true
+     * @throws ValidationException if the object is not true
      */
     public static void isTrue(boolean val, String msg) {
         if (!val)
-            throw new IllegalArgumentException(msg);
+            throw new ValidationException(msg);
     }
 
     /**
      * Validates that the value is false
      * @param val object to test
-     * @throws IllegalArgumentException if the object is not false
+     * @throws ValidationException if the object is not false
      */
     public static void isFalse(boolean val) {
         if (val)
-            throw new IllegalArgumentException("Must be false");
+            throw new ValidationException("Must be false");
     }
 
     /**
      * Validates that the value is false
      * @param val object to test
      * @param msg message to include in the Exception if validation fails
-     * @throws IllegalArgumentException if the object is not false
+     * @throws ValidationException if the object is not false
      */
     public static void isFalse(boolean val, String msg) {
         if (val)
-            throw new IllegalArgumentException(msg);
+            throw new ValidationException(msg);
     }
 
     /**
      * Validates that the array contains no null elements
      * @param objects the array to test
-     * @throws IllegalArgumentException if the array contains a null element
+     * @throws ValidationException if the array contains a null element
      */
     public static void noNullElements(Object[] objects) {
         noNullElements(objects, "Array must not contain any null objects");
@@ -98,33 +125,44 @@ public final class Validate {
      * Validates that the array contains no null elements
      * @param objects the array to test
      * @param msg message to include in the Exception if validation fails
-     * @throws IllegalArgumentException if the array contains a null element
+     * @throws ValidationException if the array contains a null element
      */
     public static void noNullElements(Object[] objects, String msg) {
         for (Object obj : objects)
             if (obj == null)
-                throw new IllegalArgumentException(msg);
+                throw new ValidationException(msg);
     }
 
     /**
      * Validates that the string is not null and is not empty
      * @param string the string to test
-     * @throws IllegalArgumentException if the string is null or empty
+     * @throws ValidationException if the string is null or empty
      */
     public static void notEmpty(@Nullable String string) {
         if (string == null || string.length() == 0)
-            throw new IllegalArgumentException("String must not be empty");
+            throw new ValidationException("String must not be empty");
+    }
+
+    /**
+     Validates that the string parameter is not null and is not empty
+     * @param string the string to test
+     * @param param the name of the parameter, for presentation in the validation exception.
+     * @throws ValidationException if the string is null or empty
+     */
+    public static void notEmptyParam(@Nullable final String string, final String param) {
+        if (string == null || string.length() == 0)
+            throw new ValidationException(String.format("The '%s' parameter must not be empty.", param));
     }
 
     /**
      * Validates that the string is not null and is not empty
      * @param string the string to test
      * @param msg message to include in the Exception if validation fails
-     * @throws IllegalArgumentException if the string is null or empty
+     * @throws ValidationException if the string is null or empty
      */
     public static void notEmpty(@Nullable String string, String msg) {
         if (string == null || string.length() == 0)
-            throw new IllegalArgumentException(msg);
+            throw new ValidationException(msg);
     }
 
     /**
@@ -142,6 +180,6 @@ public final class Validate {
      @throws IllegalStateException if we reach this state
      */
     public static void fail(String msg) {
-        throw new IllegalArgumentException(msg);
+        throw new ValidationException(msg);
     }
 }


=====================================
src/main/java/org/jsoup/helper/ValidationException.java
=====================================
@@ -0,0 +1,34 @@
+package org.jsoup.helper;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ Validation exceptions, as thrown by the methods in {@link Validate}.
+ */
+public class ValidationException extends IllegalArgumentException {
+
+    public static final String Validator = Validate.class.getName();
+
+    public ValidationException(String msg) {
+        super(msg);
+    }
+
+    @Override
+    public synchronized Throwable fillInStackTrace() {
+        // Filters out the Validate class from the stacktrace, to more clearly point at the root-cause.
+
+        super.fillInStackTrace();
+
+        StackTraceElement[] stackTrace = getStackTrace();
+        List<StackTraceElement> filteredTrace = new ArrayList<>();
+        for (StackTraceElement trace : stackTrace) {
+            if (trace.getClassName().equals(Validator)) continue;
+            filteredTrace.add(trace);
+        }
+
+        setStackTrace(filteredTrace.toArray(new StackTraceElement[0]));
+
+        return this;
+    }
+}


=====================================
src/main/java/org/jsoup/helper/W3CDom.java
=====================================
@@ -269,8 +269,8 @@ public class W3CDom {
      @return the matches nodes
      */
     public NodeList selectXpath(String xpath, Node contextNode) {
-        Validate.notEmpty(xpath);
-        Validate.notNull(contextNode);
+        Validate.notEmptyParam(xpath, "xpath");
+        Validate.notNullParam(contextNode, "contextNode");
 
         NodeList nodeList;
         try {


=====================================
src/main/java/org/jsoup/internal/ConstrainableInputStream.java
=====================================
@@ -81,14 +81,16 @@ public final class ConstrainableInputStream extends BufferedInputStream {
         final ByteArrayOutputStream outStream = new ByteArrayOutputStream(bufferSize);
 
         int read;
+        int remaining = max;
         while (true) {
-            read = read(readBuffer, 0, bufferSize);
+            read = read(readBuffer, 0, localCapped ? Math.min(remaining, bufferSize) : bufferSize);
             if (read == -1) break;
             if (localCapped) { // this local byteBuffer cap may be smaller than the overall maxSize (like when reading first bytes)
-                if (read >= max) {
-                    outStream.write(readBuffer, 0, max);
+                if (read >= remaining) {
+                    outStream.write(readBuffer, 0, remaining);
                     break;
                 }
+                remaining -= read;
             }
             outStream.write(readBuffer, 0, read);
         }


=====================================
src/main/java/org/jsoup/internal/StringUtil.java
=====================================
@@ -290,6 +290,7 @@ public final class StringUtil {
      * @throws MalformedURLException if an error occurred generating the URL
      */
     public static URL resolve(URL base, String relUrl) throws MalformedURLException {
+        relUrl = stripControlChars(relUrl);
         // workaround: java resolves '//path/file + ?foo' to '//path/?foo', not '//path/file?foo' as desired
         if (relUrl.startsWith("?"))
             relUrl = base.getPath() + relUrl;
@@ -308,7 +309,9 @@ public final class StringUtil {
      * @param relUrl the relative URL to resolve. (If it's already absolute, it will be returned)
      * @return an absolute URL if one was able to be generated, or the empty string if not
      */
-    public static String resolve(final String baseUrl, final String relUrl) {
+    public static String resolve(String baseUrl, String relUrl) {
+        // workaround: java will allow control chars in a path URL and may treat as relative, but Chrome / Firefox will strip and may see as a scheme. Normalize to browser's view.
+        baseUrl = stripControlChars(baseUrl); relUrl = stripControlChars(relUrl);
         try {
             URL base;
             try {
@@ -327,6 +330,11 @@ public final class StringUtil {
     }
     private static final Pattern validUriScheme = Pattern.compile("^[a-zA-Z][a-zA-Z0-9+-.]*:");
 
+    private static final Pattern controlChars = Pattern.compile("[\\x00-\\x1f]*"); // matches ascii 0 - 31, to strip from url
+    private static String stripControlChars(final String input) {
+        return controlChars.matcher(input).replaceAll("");
+    }
+
     private static final ThreadLocal<Stack<StringBuilder>> threadLocalBuilders = new ThreadLocal<Stack<StringBuilder>>() {
         @Override
         protected Stack<StringBuilder> initialValue() {


=====================================
src/main/java/org/jsoup/nodes/Element.java
=====================================
@@ -172,7 +172,7 @@ public class Element extends Node {
      * @see Elements#tagName(String)
      */
     public Element tagName(String tagName) {
-        Validate.notEmpty(tagName, "Tag name must not be empty.");
+        Validate.notEmptyParam(tagName, "tagName");
         tag = Tag.valueOf(tagName, NodeUtils.parser(this).settings()); // maintains the case option of the original parse
         return this;
     }
@@ -468,7 +468,13 @@ public class Element extends Node {
      @since 1.15.2
      */
     public Element expectFirst(String cssQuery) {
-        return (Element) Validate.ensureNotNull(Selector.selectFirst(cssQuery, this));
+        return (Element) Validate.ensureNotNull(
+            Selector.selectFirst(cssQuery, this),
+            parent() != null ?
+                "No elements matched the query '%s' on element '%s'.":
+                "No elements matched the query '%s' in the document."
+            , cssQuery, this.tagName()
+        );
     }
 
     /**


=====================================
src/main/java/org/jsoup/parser/TreeBuilder.java
=====================================
@@ -37,8 +37,8 @@ abstract class TreeBuilder {
 
     @ParametersAreNonnullByDefault
     protected void initialiseParse(Reader input, String baseUri, Parser parser) {
-        Validate.notNull(input, "String input must not be null");
-        Validate.notNull(baseUri, "BaseURI must not be null");
+        Validate.notNullParam(input, "input");
+        Validate.notNullParam(baseUri, "baseUri");
         Validate.notNull(parser);
 
         doc = new Document(baseUri);


=====================================
src/main/java/org/jsoup/safety/Cleaner.java
=====================================
@@ -160,6 +160,13 @@ public class Cleaner {
         Attributes enforcedAttrs = safelist.getEnforcedAttributes(sourceTag);
         destAttrs.addAll(enforcedAttrs);
 
+        // Copy the original start and end range, if set
+        // TODO - might be good to make a generic Element#userData set type interface, and copy those all over
+        if (sourceEl.sourceRange().isTracked())
+            sourceEl.sourceRange().track(dest, true);
+        if (sourceEl.endSourceRange().isTracked())
+            sourceEl.endSourceRange().track(dest, false);
+
         return new ElementMeta(dest, numDiscarded);
     }
 


=====================================
src/main/java/org/jsoup/select/QueryParser.java
=====================================
@@ -360,7 +360,7 @@ public class QueryParser {
     private void has() {
         tq.consume(":has");
         String subQuery = tq.chompBalanced('(', ')');
-        Validate.notEmpty(subQuery, ":has(selector) subselect must not be empty");
+        Validate.notEmpty(subQuery, ":has(selector) sub-select must not be empty");
         evals.add(new StructuralEvaluator.Has(parse(subQuery)));
     }
 


=====================================
src/test/java/org/jsoup/helper/DataUtilTest.java
=====================================
@@ -1,11 +1,13 @@
 package org.jsoup.helper;
 
 import org.jsoup.Jsoup;
+import org.jsoup.integration.ParseTest;
 import org.jsoup.nodes.Document;
 import org.jsoup.parser.Parser;
 import org.junit.jupiter.api.Test;
 
 import java.io.*;
+import java.nio.ByteBuffer;
 import java.nio.charset.Charset;
 import java.nio.charset.StandardCharsets;
 import java.nio.file.Files;
@@ -228,4 +230,49 @@ public class DataUtilTest {
         assertEquals("This is not gzipped", doc.title());
         assertEquals("And should still be readable.", doc.selectFirst("p").text());
     }
+
+    // an input stream to give a range of output sizes, that changes on each read
+    static class VaryingReadInputStream extends InputStream {
+        final InputStream in;
+        int stride = 0;
+
+        VaryingReadInputStream(InputStream in) {
+            this.in = in;
+        }
+
+        public int read() throws IOException {
+            return in.read();
+        }
+
+        public int read(byte[] b) throws IOException {
+            return in.read(b, 0, Math.min(b.length, ++stride));
+        }
+
+        public int read(byte[] b, int off, int len) throws IOException {
+            return in.read(b, off, Math.min(len, ++stride));
+        }
+    }
+
+    @Test
+    void handlesChunkedInputStream() throws IOException {
+        File inputFile = ParseTest.getFile("/htmltests/large.html");
+        String input = ParseTest.getFileAsString(inputFile);
+        VaryingReadInputStream stream = new VaryingReadInputStream(ParseTest.inputStreamFrom(input));
+
+        Document expected = Jsoup.parse(input, "https://example.com");
+        Document doc = Jsoup.parse(stream, null, "https://example.com");
+        assertTrue(doc.hasSameValue(expected));
+    }
+
+    @Test
+    void handlesUnlimitedRead() throws IOException {
+        File inputFile = ParseTest.getFile("/htmltests/large.html");
+        String input = ParseTest.getFileAsString(inputFile);
+        VaryingReadInputStream stream = new VaryingReadInputStream(ParseTest.inputStreamFrom(input));
+
+        ByteBuffer byteBuffer = DataUtil.readToByteBuffer(stream, 0);
+        String read = new String(byteBuffer.array());
+
+        assertEquals(input, read);
+    }
 }


=====================================
src/test/java/org/jsoup/helper/HttpConnectionTest.java
=====================================
@@ -9,7 +9,13 @@ import org.junit.jupiter.api.Test;
 import java.io.IOException;
 import java.net.MalformedURLException;
 import java.net.URL;
-import java.util.*;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Locale;
+import java.util.Map;
 
 import static org.junit.jupiter.api.Assertions.*;
 
@@ -293,4 +299,15 @@ public class HttpConnectionTest {
         }
         assertTrue(urlThrew);
     }
+
+    @Test void testMalformedException() {
+        boolean threw = false;
+        try {
+            Jsoup.connect("jsoup.org/test");
+        } catch (IllegalArgumentException e) {
+            threw = true;
+            assertEquals("The supplied URL, 'jsoup.org/test', is malformed. Make sure it is an absolute URL, and starts with 'http://' or 'https://'. See https://jsoup.org/cookbook/extracting-data/working-with-urls", e.getMessage());
+        }
+        assertTrue(threw);
+    }
 }


=====================================
src/test/java/org/jsoup/helper/ValidateTest.java
=====================================
@@ -3,8 +3,9 @@ package org.jsoup.helper;
 import org.junit.jupiter.api.Assertions;
 import org.junit.jupiter.api.Test;
 
-public class ValidateTest {
+import static org.junit.jupiter.api.Assertions.*;
 
+public class ValidateTest {
     @Test
     public void testNotNull() {
         Validate.notNull("foo");
@@ -16,4 +17,30 @@ public class ValidateTest {
         }
         Assertions.assertTrue(threw);
     }
+
+    @Test void stacktraceFiltersOutValidateClass() {
+        boolean threw = false;
+        try {
+            Validate.notNull(null);
+        } catch (ValidationException e) {
+            threw = true;
+            assertEquals("Object must not be null", e.getMessage());
+            StackTraceElement[] stackTrace = e.getStackTrace();
+            for (StackTraceElement trace : stackTrace) {
+                assertNotEquals(trace.getClassName(), Validate.class.getName());
+            }
+            assertTrue(stackTrace.length >= 1);
+        }
+        Assertions.assertTrue(threw);
+    }
+
+    @Test void nonnullParam() {
+        boolean threw = true;
+        try {
+            Validate.notNullParam(null, "foo");
+        } catch (ValidationException e) {
+            assertEquals("The parameter 'foo' must not be null.", e.getMessage());
+        }
+        assertTrue(threw);
+    }
 }


=====================================
src/test/java/org/jsoup/integration/ConnectTest.java
=====================================
@@ -12,7 +12,6 @@ import org.jsoup.nodes.FormElement;
 import org.jsoup.parser.HtmlTreeBuilder;
 import org.jsoup.parser.Parser;
 import org.jsoup.parser.XmlTreeBuilder;
-import org.junit.jupiter.api.AfterAll;
 import org.junit.jupiter.api.BeforeAll;
 import org.junit.jupiter.api.Test;
 
@@ -41,11 +40,6 @@ public class ConnectTest {
         echoUrl = EchoServlet.Url;
     }
 
-    @AfterAll
-    public static void tearDown() {
-        TestServer.stop();
-    }
-
     @Test
     public void canConnectToLocalServer() throws IOException {
         String url = HelloServlet.Url;


=====================================
src/test/java/org/jsoup/integration/SessionIT.java
=====================================
@@ -3,11 +3,9 @@ package org.jsoup.integration;
 import org.jsoup.Connection;
 import org.jsoup.Jsoup;
 import org.jsoup.UncheckedIOException;
-import org.jsoup.integration.servlets.EchoServlet;
 import org.jsoup.integration.servlets.FileServlet;
 import org.jsoup.integration.servlets.SlowRider;
 import org.jsoup.nodes.Document;
-import org.junit.jupiter.api.AfterAll;
 import org.junit.jupiter.api.BeforeAll;
 import org.junit.jupiter.api.Test;
 
@@ -23,11 +21,6 @@ public class SessionIT {
         TestServer.start();
     }
 
-    @AfterAll
-    public static void tearDown() {
-        TestServer.stop();
-    }
-
     @Test
     public void multiThread() throws InterruptedException {
         int numThreads = 20;


=====================================
src/test/java/org/jsoup/integration/SessionTest.java
=====================================
@@ -8,7 +8,6 @@ import org.jsoup.integration.servlets.FileServlet;
 import org.jsoup.nodes.Document;
 import org.jsoup.parser.Parser;
 import org.jsoup.select.Elements;
-import org.junit.jupiter.api.AfterAll;
 import org.junit.jupiter.api.BeforeAll;
 import org.junit.jupiter.api.Test;
 
@@ -24,11 +23,6 @@ public class SessionTest {
         TestServer.start();
     }
 
-    @AfterAll
-    public static void tearDown() {
-        TestServer.stop();
-    }
-
     private static Elements keyEls(String key, Document doc) {
         return doc.select("th:contains(" + key + ") + td");
     }


=====================================
src/test/java/org/jsoup/integration/TestServer.java
=====================================
@@ -5,12 +5,11 @@ import org.eclipse.jetty.server.ServerConnector;
 import org.eclipse.jetty.servlet.ServletHandler;
 import org.jsoup.integration.servlets.BaseServlet;
 
-import java.util.concurrent.atomic.AtomicInteger;
+import java.net.InetSocketAddress;
 
 public class TestServer {
-    private static final Server jetty = new Server(0);
+    private static final Server jetty = new Server(new InetSocketAddress("localhost", 0));
     private static final ServletHandler handler = new ServletHandler();
-    private static AtomicInteger latch = new AtomicInteger(0);
 
     static {
         jetty.setHandler(handler);
@@ -21,26 +20,10 @@ public class TestServer {
 
     public static void start() {
         synchronized (jetty) {
-            int count = latch.getAndIncrement();
-            if (count == 0) {
-                try {
-                    jetty.start();
-                } catch (Exception e) {
-                    throw new IllegalStateException(e);
-                }
-            }
-        }
-    }
-
-    public static void stop() {
-        synchronized (jetty) {
-            int count = latch.getAndDecrement();
-            if (count == 0) {
-                try {
-                    jetty.stop();
-                } catch (Exception e) {
-                    throw new IllegalStateException(e);
-                }
+            try {
+                jetty.start(); // jetty will safely no-op a start on an already running instance
+            } catch (Exception e) {
+                throw new IllegalStateException(e);
             }
         }
     }


=====================================
src/test/java/org/jsoup/internal/StringUtilTest.java
=====================================
@@ -147,6 +147,15 @@ public class StringUtilTest {
         assertEquals("http://example.com/b/c/g#s/../x", resolve("http://example.com/b/c/d;p?q", "g#s/../x"));
     }
 
+    @Test void stripsControlCharsFromUrls() {
+        // should resovle to an absolute url:
+        assertEquals("foo:bar", resolve("\nhttps://\texample.com/", "\r\nfo\to:ba\br"));
+    }
+
+    @Test void allowsSpaceInUrl() {
+        assertEquals("https://example.com/foo bar/", resolve("HTTPS://example.com/example/", "../foo bar/"));
+    }
+
     @Test
     void isAscii() {
         assertTrue(StringUtil.isAscii(""));


=====================================
src/test/java/org/jsoup/nodes/ElementTest.java
=====================================
@@ -2,6 +2,7 @@ package org.jsoup.nodes;
 
 import org.jsoup.Jsoup;
 import org.jsoup.TextUtil;
+import org.jsoup.helper.ValidationException;
 import org.jsoup.internal.StringUtil;
 import org.jsoup.parser.ParseSettings;
 import org.jsoup.parser.Parser;
@@ -2272,6 +2273,32 @@ public class ElementTest {
         assertTrue(threw);
     }
 
+    @Test void testExpectFirstMessage() {
+        Document doc = Jsoup.parse("<p>One</p><p>Two <span>Three</span> <span>Four</span>");
+        boolean threw = false;
+        Element p = doc.expectFirst("P");
+        try {
+            Element span = p.expectFirst("span.doesNotExist");
+        } catch (ValidationException e) {
+            threw = true;
+            assertEquals("No elements matched the query 'span.doesNotExist' on element 'p'.", e.getMessage());
+        }
+        assertTrue(threw);
+    }
+
+    @Test void testExpectFirstMessageDoc() {
+        Document doc = Jsoup.parse("<p>One</p><p>Two <span>Three</span> <span>Four</span>");
+        boolean threw = false;
+        Element p = doc.expectFirst("P");
+        try {
+            Element span = doc.expectFirst("span.doesNotExist");
+        } catch (ValidationException e) {
+            threw = true;
+            assertEquals("No elements matched the query 'span.doesNotExist' in the document.", e.getMessage());
+        }
+        assertTrue(threw);
+    }
+
     @Test void spanRunsMaintainSpace() {
         // https://github.com/jhy/jsoup/issues/1787
         Document doc = Jsoup.parse("<p><span>One</span>\n<span>Two</span>\n<span>Three</span></p>");


=====================================
src/test/java/org/jsoup/nodes/FormElementTest.java
=====================================
@@ -7,7 +7,6 @@ import org.jsoup.integration.servlets.CookieServlet;
 import org.jsoup.integration.servlets.EchoServlet;
 import org.jsoup.integration.servlets.FileServlet;
 import org.jsoup.select.Elements;
-import org.junit.jupiter.api.AfterAll;
 import org.junit.jupiter.api.BeforeAll;
 import org.junit.jupiter.api.Test;
 
@@ -27,11 +26,6 @@ public class FormElementTest {
         TestServer.start();
     }
 
-    @AfterAll
-    public static void tearDown() {
-        TestServer.stop();
-    }
-
     @Test public void hasAssociatedControls() {
         //"button", "fieldset", "input", "keygen", "object", "output", "select", "textarea"
         String html = "<form id=1><button id=1><fieldset id=2 /><input id=3><keygen id=4><object id=5><output id=6>" +


=====================================
src/test/java/org/jsoup/nodes/PositionTest.java
=====================================
@@ -1,12 +1,8 @@
 package org.jsoup.nodes;
 
 import org.jsoup.Jsoup;
-import org.jsoup.integration.TestServer;
-import org.jsoup.integration.servlets.EchoServlet;
 import org.jsoup.integration.servlets.FileServlet;
 import org.jsoup.parser.Parser;
-import org.junit.jupiter.api.AfterAll;
-import org.junit.jupiter.api.BeforeAll;
 import org.junit.jupiter.api.Test;
 
 import java.io.IOException;
@@ -145,16 +141,6 @@ class PositionTest {
         assertEquals("6,1:80-6,17:96", comment.sourceRange().toString());
     }
 
-    @BeforeAll
-    static void setUp() {
-        TestServer.start();
-    }
-
-    @AfterAll
-    static void tearDown() {
-        TestServer.stop();
-    }
-
     @Test void tracksFromFetch() throws IOException {
         String url = FileServlet.urlTo("/htmltests/large.html"); // 280 K
         Document doc = Jsoup.connect(url).parser(TrackingParser).get();


=====================================
src/test/java/org/jsoup/safety/CleanerTest.java
=====================================
@@ -4,7 +4,10 @@ import org.jsoup.Jsoup;
 import org.jsoup.MultiLocaleExtension.MultiLocaleTest;
 import org.jsoup.TextUtil;
 import org.jsoup.nodes.Document;
+import org.jsoup.nodes.Element;
 import org.jsoup.nodes.Entities;
+import org.jsoup.nodes.Range;
+import org.jsoup.parser.Parser;
 import org.junit.jupiter.api.Test;
 
 import java.util.Locale;
@@ -210,6 +213,24 @@ public class CleanerTest {
         assertEquals("<a rel=\"nofollow\">Link</a>", clean);
     }
 
+    @Test void dropsConcealedJavascriptProtocolWhenRelativesLinksEnabled() {
+        Safelist safelist = Safelist.basic().preserveRelativeLinks(true);
+        String html = "<a href=\"
ja	va	script
:alert(1)\">Link</a>";
+        String clean = Jsoup.clean(html, "https://", safelist);
+        assertEquals("<a rel=\"nofollow\">Link</a>", clean);
+
+        String colon = "<a href=\"ja	va	script:alert(1)\">Link</a>";
+        String cleanColon = Jsoup.clean(colon, "https://", safelist);
+        assertEquals("<a rel=\"nofollow\">Link</a>", cleanColon);
+    }
+
+    @Test void dropsConcealedJavascriptProtocolWhenRelativesLinksDisabled() {
+        Safelist safelist = Safelist.basic().preserveRelativeLinks(false);
+        String html = "<a href=\"ja	vas
cript:alert(1)\">Link</a>";
+        String clean = Jsoup.clean(html, "https://", safelist);
+        assertEquals("<a rel=\"nofollow\">Link</a>", clean);
+    }
+
     @Test public void handlesCustomProtocols() {
         String html = "<img src='cid:12345' /> <img src='data:gzzt' />";
         String dropped = Jsoup.clean(html, Safelist.basicWithImages());
@@ -339,4 +360,17 @@ public class CleanerTest {
         assertEquals(Document.OutputSettings.Syntax.xml, result.outputSettings().syntax());
         assertEquals("<p>test<br /></p>", result.body().html());
     }
+
+    @Test void preservesSourcePositionViaUserData() {
+        Document orig = Jsoup.parse("<script>xss</script>\n <p>Hello</p>", Parser.htmlParser().setTrackPosition(true));
+        Element p = orig.expectFirst("p");
+        Range origRange = p.sourceRange();
+        assertEquals("2,2:22-2,5:25", origRange.toString());
+
+        Document clean = new Cleaner(Safelist.relaxed()).clean(orig);
+        Element cleanP = clean.expectFirst("p");
+        Range cleanRange = cleanP.sourceRange();
+        assertEquals(cleanRange, origRange);
+        assertEquals(clean.endSourceRange(), orig.endSourceRange());
+    }
 }



View it on GitLab: https://salsa.debian.org/java-team/jsoup/-/compare/752f1360f06f660838f2ee45a72c91155dceb0d0...6da4710c22da2b40e276dff64311da4d8f8bc3ec

-- 
View it on GitLab: https://salsa.debian.org/java-team/jsoup/-/compare/752f1360f06f660838f2ee45a72c91155dceb0d0...6da4710c22da2b40e276dff64311da4d8f8bc3ec
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-java-commits/attachments/20220902/6790aa85/attachment.htm>


More information about the pkg-java-commits mailing list