[med-svn] [python-avro] 07/14: Imported Upstream version 1.8.0~rc0+dfsg
Afif Elghraoui
afif-guest at moszumanska.debian.org
Sun Oct 25 00:46:24 UTC 2015
This is an automated email from the git hooks/post-receive script.
afif-guest pushed a commit to branch master
in repository python-avro.
commit 9d6eb1fc1ed7c9efe1790ca15732688cb4ef5af4
Author: Afif Elghraoui <afif at ghraoui.name>
Date: Sat Oct 24 15:51:54 2015 -0700
Imported Upstream version 1.8.0~rc0+dfsg
---
.gitignore | 4 +
BUILD.txt | 18 +-
CHANGES.txt | 181 ++
README.txt | 2 +-
build.sh | 40 +-
doc/src/content/xdocs/gettingstartedpython.xml | 20 +-
doc/src/content/xdocs/spec.xml | 63 +-
lang/py/build.xml | 52 +-
lang/py/ivy.xml | 24 +
lang/py/ivysettings.xml | 30 +
lang/py/lib/pyAntTasks-1.3-LICENSE.txt | 202 --
lang/py/lib/pyAntTasks-1.3.jar | Bin 18788 -> 0 bytes
lang/py/lib/simplejson/LICENSE.txt | 19 -
lang/py/lib/simplejson/__init__.py | 318 ---
lang/py/lib/simplejson/_speedups.c | 2329 --------------------
lang/py/lib/simplejson/decoder.py | 354 ---
lang/py/lib/simplejson/encoder.py | 440 ----
lang/py/lib/simplejson/scanner.py | 65 -
lang/py/lib/simplejson/tool.py | 37 -
lang/py/src/avro/schema.py | 6 +-
lang/py/src/avro/tether/__init__.py | 7 +
lang/py/src/avro/tether/tether_task.py | 498 +++++
lang/py/src/avro/tether/tether_task_runner.py | 227 ++
lang/py/src/avro/tether/util.py | 34 +
lang/py/test/mock_tether_parent.py | 95 +
lang/py/test/set_avro_test_path.py | 40 +
lang/py/test/test_datafile.py | 3 +
lang/py/test/test_datafile_interop.py | 3 +
lang/py/test/test_io.py | 3 +
lang/py/test/test_ipc.py | 2 +
lang/py/test/test_schema.py | 6 +
lang/py/test/test_tether_task.py | 116 +
lang/py/test/test_tether_task_runner.py | 191 ++
lang/py/test/test_tether_word_count.py | 213 ++
lang/py/test/word_count_task.py | 96 +
lang/py3/avro/schema.py | 8 +-
lang/py3/avro/tests/run_tests.py | 1 +
.../avro/tests/test_enum.py} | 40 +-
lang/py3/avro/tests/test_schema.py | 5 +
lang/py3/setup.py | 5 +-
pom.xml | 8 +-
share/VERSION.txt | 2 +-
share/docker/Dockerfile | 58 +
share/rat-excludes.txt | 1 +
.../org/apache/avro/ipc/trace/avroTrace.avdl | 68 -
.../org/apache/avro/ipc/trace/avroTrace.avpr | 82 -
share/test/schemas/http.avdl | 66 +
share/test/schemas/reserved.avsc | 2 +
share/test/schemas/specialtypes.avdl | 98 +
49 files changed, 2221 insertions(+), 3961 deletions(-)
diff --git a/.gitignore b/.gitignore
index 8c6b133..372789a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,7 @@
+*.iml
+*.ipr
+*.iws
+.idea/
.project
.settings
.classpath
diff --git a/BUILD.txt b/BUILD.txt
index a59c80c..7c3eea7 100644
--- a/BUILD.txt
+++ b/BUILD.txt
@@ -21,9 +21,25 @@ The following packages must be installed before Avro can be built:
- Apache Forrest 0.8 (for documentation)
- md5sum, sha1sum, used by top-level dist target
+To simplify this, you can run a Docker container with all the above
+dependencies installed by installing docker.io and typing:
+
+ ./build.sh docker
+
+When this completes you will be in a shell running in the
+container. Building the image the first time may take a while (20
+minutes or more) since dependencies must be downloaded and
+installed. However subsequent invocations are much faster as the
+cached image is used.
+
+The working directory in the container is mounted from your host. This
+allows you to access the files in your Avro development tree from the
+Docker container.
+
BUILDING
-Once the requirements are installed, build.sh can be used as follows:
+Once the requirements are installed (or from the Docker container),
+build.sh can be used as follows:
'./build.sh test' runs tests for all languages
'./build.sh dist' creates all release distribution files in dist/
diff --git a/CHANGES.txt b/CHANGES.txt
index 188ec44..afedefb 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,5 +1,186 @@
Avro Change Log
+Avro 1.8.0 (10 August 2014)
+
+ INCOMPATIBLE CHANGES
+
+ AVRO-1334. Java: Update versions of many dependencies. (scottcarey, cutting)
+
+ AVRO-997. Java: For enum values, no longer sometimes permit any
+ Object whose toString() names an enum symbol, but rather always
+ require use of distinct enum types. (Sean Busbey via cutting)
+
+ AVRO-1602. Java: Remove Dapper-style RPC trace facility. This
+ seems unused and has been a source of build problems. (cutting)
+
+ AVRO-1586. Build against Hadoop 2. With this change the avro-mapred and
+ trevni-avro JARs without a hadoop1 or hadoop2 Maven classifier are Hadoop 2
+ artifacts. To use with Hadoop 1, set the classifier to hadoop1.
+ (tomwhite)
+
+ AVRO-1502. Java: Generated classes now implement Serializable.
+ Generated classes need to be regenerated to use this release. (cutting)
+
+ NEW FEATURES
+
+ AVRO-1555. C#: Add support for RPC over HTTP. (Dmitry Kovalev via cutting)
+
+ AVRO-739. Add date, time, timestamp, and duration binary types to
+ specification. (Dmitry Kovalev and Ryan Blue via tomwhite)
+
+ AVRO-1590. Java: In resolving records in unions, permit structural
+ and shortname matches when fullname matching fails.
+ (Ryan Blue via cutting)
+
+ AVRO-570. Python: Add connector for tethered mapreduce.
+ (Jeremy Lewi and Steven Willis via cutting)
+
+ AVRO-834. Java: Data File corruption recovery tool.
+ (scottcarey and tomwhite)
+
+ AVRO-1614. Java: In generated builder classes, add accessors to
+ field sub-builders, permitting easier creation of nested, optional
+ structures. (Niels Basjes via cutting)
+
+ AVRO-1537. Make it easier to set up a multi-language build environment.
+ Support for running a Docker container with all build dependencies.
+ (tomwhite)
+
+ AVRO-680. Java: Support non-string map keys. (Sachin Goyal via Ryan Blue).
+
+ AVRO-1497. Java: Add support for logical types. (blue)
+
+ AVRO-1685. Java: Allow specifying sync in DataFileWriter.create
+ (Sehrope Sarkuni via tomwhite)
+
+ AVRO-1683. Add microsecond time and timestamp logical types to the
+ specification. (blue)
+
+ AVRO-1672. Java: Add date/time logical types and conversions. (blue)
+
+ OPTIMIZATIONS
+
+ IMPROVEMENTS
+
+ AVRO-843. C#: Change Visual Studio project files to specify .NET 3.5.
+ (Dmitry Kovalev via cutting)
+
+ AVRO-1583. Java: Add stdin support to the tojson tool.
+ (Clément Mahtieu via cutting)
+
+ AVRO-1551. Java: Add an output encoding option to the compiler
+ command line tool. (Keegan Witt via cutting)
+
+ AVRO-1585. Java: Deprecate Jackson classes in public API. (tomwhite)
+
+ AVRO-1619. Java: Improve javadoc comments in generated code.
+ (Niels Basjes via cutting)
+
+ AVRO-1616. Add IntelliJ files to .gitignore. (Niels Basjes via cutting)
+
+ AVRO-1539. Java: Add FileSystem based FsInput constructor.
+ (Allan Shoup via cutting)
+
+ AVRO-1628. Java: Add Schema#createUnion(Schema ...) convenience method.
+ (Clément Mahtieu via cutting)
+
+ AVRO-1655. Java: Add Schema.createRecord with field list.
+ (Lars Francke via blue)
+
+ AVRO-1681. Improve generated JavaDocs.
+ (Charles Gariépy-Ikeson via tomwhite)
+
+ AVRO-1645. Ruby: Improved handling of missing named types.
+ (Daniel Schierbeck via tomwhite)
+
+ AVRO-1693. Ruby: Allow writing arbitrary metadata to data files.
+ (Daniel Schierbeck via tomwhite)
+
+ AVRO-1692. Allow more than one logical type for a Java class. (blue via
+ tomwhite)
+
+ AVRO-1697. Ruby: Add support for the Snappy codec to the Ruby library.
+ (Daniel Schierbeck via tomwhite)
+
+ BUG FIXES
+
+ AVRO-1553. Java: MapReduce never uses MapOutputValueSchema (tomwhite)
+
+ AVRO-1544. Java: Fix GenericData#validate for unions with null.
+ (Matthew Hayes via cutting)
+
+ AVRO-1589. Java: Fix ReflectData.AllowNulls to not create unions
+ for primitive types. (Ryan Blue via cutting)
+
+ AVRO-1591. Java: Fix specific RPC so that proxies implement hashCode(),
+ equals() and toString(). (Mark Spadoni via cutting)
+
+ AVRO-1489. Java: Avro fails to build with OpenJDK 8. (Ricardo Arguello via
+ tomwhite)
+
+ AVRO-1302. Python: Update documentation to open files as binary to
+ prevent EOL substitution. (Lars Francke via cutting)
+
+ AVRO-1598. Java: Fix flakiness in TestFileSpanStorage.
+ (Ryan Blue via cutting)
+
+ AVRO-1592. Java: Fix handling of Java reserved words as enum
+ constants in generated code. (Lukas Steiblys via cutting)
+
+ AVRO-1597. Java: Random data tool writes corrupt files to standard out.
+ (cutting)
+
+ AVRO-1596. Java: Cannot read past corrupted block in Avro data file.
+ (tomwhite)
+
+ AVRO-1564. Java: Fix handling of optional byte field in Thrift.
+ (Michael Pershyn via cutting)
+
+ AVRO-1407: Java: Fix infinite loop on slow connect in NettyTransceiver.
+ (Gareth Davis via cutting)
+
+ AVRO-1604. Java: Fix ReflectData.AllowNull to work with @Nullable
+ annotations. (Ryan Blue via cutting)
+
+ AVRO-1545. Python. Fix to retain schema properties on primitive types.
+ (Dustin Spicuzza via cutting)
+
+ AVRO-1623. Java: Fix GenericData#validate to correctly resolve unions.
+ (Jeffrey Mullins via cutting)
+
+ AVRO-1621. PHP: FloatIntEncodingTest fails for NAN. (tomwhite)
+
+ AVRO-1573. Javascript. Upgrade to Grunt 0.4 for testing. (tomwhite)
+
+ AVRO-1624. Java. Surefire forkMode is deprecated. (Niels Basjes via
+ tomwhite)
+
+ AVRO-1630. Java: Creating Builder from instance loses data. (Niels Basjes
+ via tomwhite)
+
+ AVRO-1653. Fix typo in spec (lenghted => length). (Sehrope Sarkuni via blue)
+
+ AVRO-1656. Fix 'How to Contribute' link. (Benjamin Clauss via blue)
+
+ AVRO-1652. Java: Do not warn or validate defaults if validation is off.
+ (Michael D'Angelo via blue)
+
+ AVRO-1655. Java: Fix NPE in RecordSchema#toString when fields are null.
+ (Lars Francke via blue)
+
+ AVRO-1689. Update Dockerfile to use official Java repository. (tomwhite)
+
+ AVRO-1576. TestSchemaCompatibility is platform dependant.
+ (Stevo Slavic via tomwhite)
+
+ AVRO-1688. Ruby test_union(TestIO) is failing. (tomwhite)
+
+ AVRO-1673. Python 3 EnumSchema changes the order of symbols.
+ (Marcin Białoń via tomwhite)
+
+ AVRO-1491. Avro.ipc.dll not included in release zip/build file.
+ (Dmitry Kovalev via tomwhite)
+
Avro 1.7.7 (23 July 2014)
NEW FEATURES
diff --git a/README.txt b/README.txt
index a8f66f7..566f192 100644
--- a/README.txt
+++ b/README.txt
@@ -6,4 +6,4 @@ Learn more about Avro, please visit our website at:
To contribute to Avro, please read:
- https://cwiki.apache.org/AVRO/how-to-contribute.html
+ https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute
diff --git a/build.sh b/build.sh
index 06961c0..cce0cfb 100755
--- a/build.sh
+++ b/build.sh
@@ -22,7 +22,7 @@ cd `dirname "$0"` # connect to root
VERSION=`cat share/VERSION.txt`
function usage {
- echo "Usage: $0 {test|dist|sign|clean}"
+ echo "Usage: $0 {test|dist|sign|clean|docker}"
exit 1
}
@@ -96,8 +96,10 @@ case "$target" in
# build lang-specific artifacts
- (cd lang/java; mvn package -DskipTests -Dhadoop.version=2; rm -rf mapred/target/classes/;
- mvn -P dist package -DskipTests -Davro.version=$VERSION javadoc:aggregate)
+ (cd lang/java; mvn package -DskipTests -Dhadoop.version=1;
+ rm -rf mapred/target/{classes,test-classes}/;
+ rm -rf trevni/avro/target/{classes,test-classes}/;
+ mvn -P dist package -DskipTests -Davro.version=$VERSION javadoc:aggregate)
(cd lang/java/trevni/doc; mvn site)
(mvn -N -P copy-artifacts antrun:run)
@@ -169,9 +171,39 @@ case "$target" in
(cd lang/php; ./build.sh clean)
- (cd lang/perl; [ -f Makefile ] && make clean)
+ (cd lang/perl; [ ! -f Makefile ] || make clean)
;;
+ docker)
+ docker build -t avro-build share/docker
+ if [ "$(uname -s)" == "Linux" ]; then
+ USER_NAME=${SUDO_USER:=$USER}
+ USER_ID=$(id -u $USER_NAME)
+ GROUP_ID=$(id -g $USER_NAME)
+ else # boot2docker uid and gid
+ USER_NAME=$USER
+ USER_ID=1000
+ GROUP_ID=50
+ fi
+ docker build -t avro-build-${USER_NAME} - <<UserSpecificDocker
+FROM avro-build
+RUN groupadd -g ${GROUP_ID} ${USER_NAME} || true
+RUN useradd -g ${GROUP_ID} -u ${USER_ID} -k /root -m ${USER_NAME}
+ENV HOME /home/${USER_NAME}
+UserSpecificDocker
+ # By mapping the .m2 directory you can do an mvn install from
+ # within the container and use the result on your normal
+ # system. And this also is a significant speedup in subsequent
+ # builds because the dependencies are downloaded only once.
+ docker run --rm=true -t -i \
+ -v ${PWD}:/home/${USER_NAME}/avro \
+ -w /home/${USER_NAME}/avro \
+ -v ${HOME}/.m2:/home/${USER_NAME}/.m2 \
+ -v ${HOME}/.gnupg:/home/${USER_NAME}/.gnupg \
+ -u ${USER_NAME} \
+ avro-build-${USER_NAME}
+ ;;
+
*)
usage
;;
diff --git a/doc/src/content/xdocs/gettingstartedpython.xml b/doc/src/content/xdocs/gettingstartedpython.xml
index d8d9df8..156646a 100644
--- a/doc/src/content/xdocs/gettingstartedpython.xml
+++ b/doc/src/content/xdocs/gettingstartedpython.xml
@@ -136,14 +136,14 @@ import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
-schema = avro.schema.parse(open("user.avsc").read())
+schema = avro.schema.parse(open("user.avsc", "rb").read())
-writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
+writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), schema)
writer.append({"name": "Alyssa", "favorite_number": 256})
writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
writer.close()
-reader = DataFileReader(open("users.avro", "r"), DatumReader())
+reader = DataFileReader(open("users.avro", "rb"), DatumReader())
for user in reader:
print user
reader.close()
@@ -154,10 +154,18 @@ reader.close()
{u'favorite_color': u'red', u'favorite_number': 7, u'name': u'Ben'}
</source>
<p>
+ Do make sure that you open your files in binary mode (i.e. using the modes
+ <code>wb</code> or <code>rb</code> respectively). Otherwise you might
+ generate corrupt files due to
+ <a href="http://docs.python.org/library/functions.html#open">
+ automatic replacement</a> of newline characters with the
+ platform-specific representations.
+ </p>
+ <p>
Let's take a closer look at what's going on here.
</p>
<source>
-schema = avro.schema.parse(open("user.avsc").read())
+schema = avro.schema.parse(open("user.avsc", "rb").read())
</source>
<p>
<code>avro.schema.parse</code> takes a string containing a JSON schema
@@ -167,7 +175,7 @@ schema = avro.schema.parse(open("user.avsc").read())
user.avsc schema file here.
</p>
<source>
-writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
+writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), schema)
</source>
<p>
We create a <code>DataFileWriter</code>, which we'll use to write
@@ -201,7 +209,7 @@ writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
ignored.
</p>
<source>
-reader = DataFileReader(open("users.avro", "r"), DatumReader())
+reader = DataFileReader(open("users.avro", "rb"), DatumReader())
</source>
<p>
We open the file again, this time for reading back from disk. We use
diff --git a/doc/src/content/xdocs/spec.xml b/doc/src/content/xdocs/spec.xml
index 8c108c8..83c0420 100644
--- a/doc/src/content/xdocs/spec.xml
+++ b/doc/src/content/xdocs/spec.xml
@@ -871,7 +871,7 @@
<li>that many bytes of <em>buffer data</em>.</li>
</ul>
</li>
- <li>A message is always terminated by a zero-lenghted buffer.</li>
+ <li>A message is always terminated by a zero-length buffer.</li>
</ul>
<p>Framing is transparent to request and response message
@@ -1406,6 +1406,67 @@ void initFPTable() {
precisions match.</p>
</section>
+
+ <section>
+ <title>Date</title>
+ <p>
+ The <code>date</code> logical type represents a date within the calendar, with no reference to a particular time zone or time of day.
+ </p>
+ <p>
+ A <code>date</code> logical type annotates an Avro <code>int</code>, where the int stores the number of days from the unix epoch, 1 January 1970 (ISO calendar).
+ </p>
+ </section>
+
+ <section>
+ <title>Time (millisecond precision)</title>
+ <p>
+ The <code>time-millis</code> logical type represents a time of day, with no reference to a particular calendar, time zone or date, with a precision of one millisecond.
+ </p>
+ <p>
+ A <code>time-millis</code> logical type annotates an Avro <code>int</code>, where the int stores the number of milliseconds after midnight, 00:00:00.000.
+ </p>
+ </section>
+
+ <section>
+ <title>Time (microsecond precision)</title>
+ <p>
+ The <code>time-micros</code> logical type represents a time of day, with no reference to a particular calendar, time zone or date, with a precision of one microsecond.
+ </p>
+ <p>
+ A <code>time-micros</code> logical type annotates an Avro <code>long</code>, where the long stores the number of microseconds after midnight, 00:00:00.000000.
+ </p>
+ </section>
+
+ <section>
+ <title>Timestamp (millisecond precision)</title>
+ <p>
+ The <code>timestamp-millis</code> logical type represents an instant on the global timeline, independent of a particular time zone or calendar, with a precision of one millisecond.
+ </p>
+ <p>
+ A <code>timestamp-millis</code> logical type annotates an Avro <code>long</code>, where the long stores the number of milliseconds from the unix epoch, 1 January 1970 00:00:00.000 UTC.
+ </p>
+ </section>
+
+ <section>
+ <title>Timestamp (microsecond precision)</title>
+ <p>
+ The <code>timestamp-micros</code> logical type represents an instant on the global timeline, independent of a particular time zone or calendar, with a precision of one microsecond.
+ </p>
+ <p>
+ A <code>timestamp-micros</code> logical type annotates an Avro <code>long</code>, where the long stores the number of microseconds from the unix epoch, 1 January 1970 00:00:00.000000 UTC.
+ </p>
+ </section>
+
+ <section>
+ <title>Duration</title>
+ <p>
+ The <code>duration</code> logical type represents an amount of time defined by a number of months, days and milliseconds. This is not equivalent to a number of milliseconds, because, depending on the moment in time from which the duration is measured, the number of days in the month and number of milliseconds in a day may differ. Other standard periods such as years, quarters, hours and minutes can be expressed through these basic periods.
+ </p>
+ <p>
+ A <code>duration</code> logical type annotates Avro <code>fixed</code> type of size 12, which stores three little-endian unsigned integers that represent durations at different granularities of time. The first stores a number in months, the second stores a number in days, and the third stores a number in milliseconds.
+ </p>
+ </section>
+
</section>
<p><em>Apache Avro, Avro, Apache, and the Avro and Apache logos are
diff --git a/lang/py/build.xml b/lang/py/build.xml
index 6d371ea..61c3f4c 100644
--- a/lang/py/build.xml
+++ b/lang/py/build.xml
@@ -16,7 +16,7 @@
limitations under the License.
-->
-<project name="Avro" default="dist">
+<project name="Avro" default="dist" xmlns:ivy="antlib:org.apache.ivy.ant">
<!-- Load user's default properties. -->
<property file="${user.home}/build.properties"/>
@@ -36,6 +36,9 @@
<property name="lib.dir" value="${basedir}/lib"/>
<property name="test.dir" value="${basedir}/test"/>
+ <property name="ivy.version" value="2.2.0"/>
+ <property name="ivy.jar" value="${basedir}/lib/ivy-${ivy.version}.jar"/>
+
<!-- Load shared properties -->
<loadfile srcFile="${share.dir}/VERSION.txt" property="avro.version" />
<loadfile srcFile="${share.schema.dir}/org/apache/avro/ipc/HandshakeRequest.avsc" property="handshake.request.json"/>
@@ -55,6 +58,17 @@
<target name="init" description="Create the build directory.">
<mkdir dir="${build.dir}"/>
+ <available file="${ivy.jar}" property="ivy.jar.found"/>
+ <antcall target="ivy-download"/>
+ <typedef uri="antlib:org.apache.ivy.ant">
+ <classpath>
+ <pathelement location="${ivy.jar}" />
+ </classpath>
+ </typedef>
+ </target>
+
+ <target name="ivy-download" unless="ivy.jar.found" >
+ <get src="http://repo2.maven.org/maven2/org/apache/ivy/ivy/${ivy.version}/ivy-${ivy.version}.jar" dest="${ivy.jar}" usetimestamp="true" />
</target>
<target name="build"
@@ -77,6 +91,12 @@
<fileset dir="${lib.dir}" />
</copy>
+ <!--Copy the protocols used for tethering -->
+ <copy todir="${build.dir}/src/avro/tether">
+ <fileset dir="${share.schema.dir}/org/apache/avro/mapred/tether/">
+ <include name="*.avpr"/>
+ </fileset>
+ </copy>
<!-- Inline the handshake schemas -->
<copy file="${src.dir}/avro/ipc.py"
toFile="${build.dir}/src/avro/ipc.py"
@@ -120,6 +140,20 @@
<filter token="INTEROP_DATA_DIR" value="${interop.data.dir}"/>
</filterset>
</copy>
+
+ <!-- Ensure we have a local copy of the tools jar -->
+ <ivy:retrieve
+ pattern="${basedir}/../java/tools/target/[artifact]-[revision].[ext]"/>
+
+ <!-- Inline the location of the tools jar -->
+ <copy file="${test.dir}/test_tether_word_count.py"
+ toFile="${build.dir}/test/test_tether_word_count.py"
+ overwrite="true">
+ <filterset>
+ <filter token="AVRO_VERSION" value="${avro.version}"/>
+ <filter token="TOPDIR" value="${basedir}"/>
+ </filterset>
+ </copy>
</target>
<target name="test"
@@ -135,6 +169,22 @@
</py-test>
</target>
+ <!--Created a unittest to run just the tests for tethered jobs.
+ -->
+ <target name="test-tether"
+ description="Run unit tests for a hadoop python-tethered job."
+ depends="build">
+ <taskdef name="py-test" classname="org.pyant.tasks.PythonTestTask"
+ classpathref="java.classpath"/>
+ <py-test python="${python}" pythonpathref="test.path">
+ <fileset dir="${build.dir}/test">
+ <include name="test_tether*.py"/>
+ <!--<exclude name="test_datafile_interop.py"/>-->
+ </fileset>
+ </py-test>
+ </target>
+
+
<target name="interop-data-test"
description="Run python interop data tests"
depends="build">
diff --git a/lang/py/ivy.xml b/lang/py/ivy.xml
new file mode 100644
index 0000000..c37216c
--- /dev/null
+++ b/lang/py/ivy.xml
@@ -0,0 +1,24 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<ivy-module version="2.0">
+ <info organisation="org.apache.avro" module="python"/>
+ <configurations defaultconfmapping="default"/>
+ <dependencies>
+ <dependency org="org.apache.avro" name="avro-tools"
+ rev="${avro.version}" transitive="false"/>
+ </dependencies>
+</ivy-module>
diff --git a/lang/py/ivysettings.xml b/lang/py/ivysettings.xml
new file mode 100644
index 0000000..31de16e
--- /dev/null
+++ b/lang/py/ivysettings.xml
@@ -0,0 +1,30 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<ivysettings>
+ <settings defaultResolver="repos" />
+ <property name="m2-pattern" value="${user.home}/.m2/repository/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).[ext]" override="false" />
+ <resolvers>
+ <chain name="repos">
+ <ibiblio name="central" m2compatible="true"/>
+ <ibiblio name="apache-snapshots" m2compatible="true" root="https://repository.apache.org/content/groups/snapshots"/>
+ <filesystem name="local-maven2" m2compatible="true"> <!-- needed when building non-snapshot version for release -->
+ <artifact pattern="${m2-pattern}"/>
+ <ivy pattern="${m2-pattern}"/>
+ </filesystem>
+ </chain>
+ </resolvers>
+</ivysettings>
diff --git a/lang/py/lib/pyAntTasks-1.3-LICENSE.txt b/lang/py/lib/pyAntTasks-1.3-LICENSE.txt
deleted file mode 100644
index d645695..0000000
--- a/lang/py/lib/pyAntTasks-1.3-LICENSE.txt
+++ /dev/null
@@ -1,202 +0,0 @@
-
- Apache License
- Version 2.0, January 2004
- http://www.apache.org/licenses/
-
- TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-
- 1. Definitions.
-
- "License" shall mean the terms and conditions for use, reproduction,
- and distribution as defined by Sections 1 through 9 of this document.
-
- "Licensor" shall mean the copyright owner or entity authorized by
- the copyright owner that is granting the License.
-
- "Legal Entity" shall mean the union of the acting entity and all
- other entities that control, are controlled by, or are under common
- control with that entity. For the purposes of this definition,
- "control" means (i) the power, direct or indirect, to cause the
- direction or management of such entity, whether by contract or
- otherwise, or (ii) ownership of fifty percent (50%) or more of the
- outstanding shares, or (iii) beneficial ownership of such entity.
-
- "You" (or "Your") shall mean an individual or Legal Entity
- exercising permissions granted by this License.
-
- "Source" form shall mean the preferred form for making modifications,
- including but not limited to software source code, documentation
- source, and configuration files.
-
- "Object" form shall mean any form resulting from mechanical
- transformation or translation of a Source form, including but
- not limited to compiled object code, generated documentation,
- and conversions to other media types.
-
- "Work" shall mean the work of authorship, whether in Source or
- Object form, made available under the License, as indicated by a
- copyright notice that is included in or attached to the work
- (an example is provided in the Appendix below).
-
- "Derivative Works" shall mean any work, whether in Source or Object
- form, that is based on (or derived from) the Work and for which the
- editorial revisions, annotations, elaborations, or other modifications
- represent, as a whole, an original work of authorship. For the purposes
- of this License, Derivative Works shall not include works that remain
- separable from, or merely link (or bind by name) to the interfaces of,
- the Work and Derivative Works thereof.
-
- "Contribution" shall mean any work of authorship, including
- the original version of the Work and any modifications or additions
- to that Work or Derivative Works thereof, that is intentionally
- submitted to Licensor for inclusion in the Work by the copyright owner
- or by an individual or Legal Entity authorized to submit on behalf of
- the copyright owner. For the purposes of this definition, "submitted"
- means any form of electronic, verbal, or written communication sent
- to the Licensor or its representatives, including but not limited to
- communication on electronic mailing lists, source code control systems,
- and issue tracking systems that are managed by, or on behalf of, the
- Licensor for the purpose of discussing and improving the Work, but
- excluding communication that is conspicuously marked or otherwise
- designated in writing by the copyright owner as "Not a Contribution."
-
- "Contributor" shall mean Licensor and any individual or Legal Entity
- on behalf of whom a Contribution has been received by Licensor and
- subsequently incorporated within the Work.
-
- 2. Grant of Copyright License. Subject to the terms and conditions of
- this License, each Contributor hereby grants to You a perpetual,
- worldwide, non-exclusive, no-charge, royalty-free, irrevocable
- copyright license to reproduce, prepare Derivative Works of,
- publicly display, publicly perform, sublicense, and distribute the
- Work and such Derivative Works in Source or Object form.
-
- 3. Grant of Patent License. Subject to the terms and conditions of
- this License, each Contributor hereby grants to You a perpetual,
- worldwide, non-exclusive, no-charge, royalty-free, irrevocable
- (except as stated in this section) patent license to make, have made,
- use, offer to sell, sell, import, and otherwise transfer the Work,
- where such license applies only to those patent claims licensable
- by such Contributor that are necessarily infringed by their
- Contribution(s) alone or by combination of their Contribution(s)
- with the Work to which such Contribution(s) was submitted. If You
- institute patent litigation against any entity (including a
- cross-claim or counterclaim in a lawsuit) alleging that the Work
- or a Contribution incorporated within the Work constitutes direct
- or contributory patent infringement, then any patent licenses
- granted to You under this License for that Work shall terminate
- as of the date such litigation is filed.
-
- 4. Redistribution. You may reproduce and distribute copies of the
- Work or Derivative Works thereof in any medium, with or without
- modifications, and in Source or Object form, provided that You
- meet the following conditions:
-
- (a) You must give any other recipients of the Work or
- Derivative Works a copy of this License; and
-
- (b) You must cause any modified files to carry prominent notices
- stating that You changed the files; and
-
- (c) You must retain, in the Source form of any Derivative Works
- that You distribute, all copyright, patent, trademark, and
- attribution notices from the Source form of the Work,
- excluding those notices that do not pertain to any part of
- the Derivative Works; and
-
- (d) If the Work includes a "NOTICE" text file as part of its
- distribution, then any Derivative Works that You distribute must
- include a readable copy of the attribution notices contained
- within such NOTICE file, excluding those notices that do not
- pertain to any part of the Derivative Works, in at least one
- of the following places: within a NOTICE text file distributed
- as part of the Derivative Works; within the Source form or
- documentation, if provided along with the Derivative Works; or,
- within a display generated by the Derivative Works, if and
- wherever such third-party notices normally appear. The contents
- of the NOTICE file are for informational purposes only and
- do not modify the License. You may add Your own attribution
- notices within Derivative Works that You distribute, alongside
- or as an addendum to the NOTICE text from the Work, provided
- that such additional attribution notices cannot be construed
- as modifying the License.
-
- You may add Your own copyright statement to Your modifications and
- may provide additional or different license terms and conditions
- for use, reproduction, or distribution of Your modifications, or
- for any such Derivative Works as a whole, provided Your use,
- reproduction, and distribution of the Work otherwise complies with
- the conditions stated in this License.
-
- 5. Submission of Contributions. Unless You explicitly state otherwise,
- any Contribution intentionally submitted for inclusion in the Work
- by You to the Licensor shall be under the terms and conditions of
- this License, without any additional terms or conditions.
- Notwithstanding the above, nothing herein shall supersede or modify
- the terms of any separate license agreement you may have executed
- with Licensor regarding such Contributions.
-
- 6. Trademarks. This License does not grant permission to use the trade
- names, trademarks, service marks, or product names of the Licensor,
- except as required for reasonable and customary use in describing the
- origin of the Work and reproducing the content of the NOTICE file.
-
- 7. Disclaimer of Warranty. Unless required by applicable law or
- agreed to in writing, Licensor provides the Work (and each
- Contributor provides its Contributions) on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
- implied, including, without limitation, any warranties or conditions
- of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
- PARTICULAR PURPOSE. You are solely responsible for determining the
- appropriateness of using or redistributing the Work and assume any
- risks associated with Your exercise of permissions under this License.
-
- 8. Limitation of Liability. In no event and under no legal theory,
- whether in tort (including negligence), contract, or otherwise,
- unless required by applicable law (such as deliberate and grossly
- negligent acts) or agreed to in writing, shall any Contributor be
- liable to You for damages, including any direct, indirect, special,
- incidental, or consequential damages of any character arising as a
- result of this License or out of the use or inability to use the
- Work (including but not limited to damages for loss of goodwill,
- work stoppage, computer failure or malfunction, or any and all
- other commercial damages or losses), even if such Contributor
- has been advised of the possibility of such damages.
-
- 9. Accepting Warranty or Additional Liability. While redistributing
- the Work or Derivative Works thereof, You may choose to offer,
- and charge a fee for, acceptance of support, warranty, indemnity,
- or other liability obligations and/or rights consistent with this
- License. However, in accepting such obligations, You may act only
- on Your own behalf and on Your sole responsibility, not on behalf
- of any other Contributor, and only if You agree to indemnify,
- defend, and hold each Contributor harmless for any liability
- incurred by, or claims asserted against, such Contributor by reason
- of your accepting any such warranty or additional liability.
-
- END OF TERMS AND CONDITIONS
-
- APPENDIX: How to apply the Apache License to your work.
-
- To apply the Apache License to your work, attach the following
- boilerplate notice, with the fields enclosed by brackets "[]"
- replaced with your own identifying information. (Don't include
- the brackets!) The text should be enclosed in the appropriate
- comment syntax for the file format. We also recommend that a
- file or class name and description of purpose be included on the
- same "printed page" as the copyright notice for easier
- identification within third-party archives.
-
- Copyright [yyyy] [name of copyright owner]
-
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
diff --git a/lang/py/lib/pyAntTasks-1.3.jar b/lang/py/lib/pyAntTasks-1.3.jar
deleted file mode 100644
index 53a7877..0000000
Binary files a/lang/py/lib/pyAntTasks-1.3.jar and /dev/null differ
diff --git a/lang/py/lib/simplejson/LICENSE.txt b/lang/py/lib/simplejson/LICENSE.txt
deleted file mode 100644
index ad95f29..0000000
--- a/lang/py/lib/simplejson/LICENSE.txt
+++ /dev/null
@@ -1,19 +0,0 @@
-Copyright (c) 2006 Bob Ippolito
-
-Permission is hereby granted, free of charge, to any person obtaining a copy of
-this software and associated documentation files (the "Software"), to deal in
-the Software without restriction, including without limitation the rights to
-use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
-of the Software, and to permit persons to whom the Software is furnished to do
-so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
diff --git a/lang/py/lib/simplejson/__init__.py b/lang/py/lib/simplejson/__init__.py
deleted file mode 100644
index d5b4d39..0000000
--- a/lang/py/lib/simplejson/__init__.py
+++ /dev/null
@@ -1,318 +0,0 @@
-r"""JSON (JavaScript Object Notation) <http://json.org> is a subset of
-JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data
-interchange format.
-
-:mod:`simplejson` exposes an API familiar to users of the standard library
-:mod:`marshal` and :mod:`pickle` modules. It is the externally maintained
-version of the :mod:`json` library contained in Python 2.6, but maintains
-compatibility with Python 2.4 and Python 2.5 and (currently) has
-significant performance advantages, even without using the optional C
-extension for speedups.
-
-Encoding basic Python object hierarchies::
-
- >>> import simplejson as json
- >>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
- '["foo", {"bar": ["baz", null, 1.0, 2]}]'
- >>> print json.dumps("\"foo\bar")
- "\"foo\bar"
- >>> print json.dumps(u'\u1234')
- "\u1234"
- >>> print json.dumps('\\')
- "\\"
- >>> print json.dumps({"c": 0, "b": 0, "a": 0}, sort_keys=True)
- {"a": 0, "b": 0, "c": 0}
- >>> from StringIO import StringIO
- >>> io = StringIO()
- >>> json.dump(['streaming API'], io)
- >>> io.getvalue()
- '["streaming API"]'
-
-Compact encoding::
-
- >>> import simplejson as json
- >>> json.dumps([1,2,3,{'4': 5, '6': 7}], separators=(',',':'))
- '[1,2,3,{"4":5,"6":7}]'
-
-Pretty printing::
-
- >>> import simplejson as json
- >>> s = json.dumps({'4': 5, '6': 7}, sort_keys=True, indent=4)
- >>> print '\n'.join([l.rstrip() for l in s.splitlines()])
- {
- "4": 5,
- "6": 7
- }
-
-Decoding JSON::
-
- >>> import simplejson as json
- >>> obj = [u'foo', {u'bar': [u'baz', None, 1.0, 2]}]
- >>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]') == obj
- True
- >>> json.loads('"\\"foo\\bar"') == u'"foo\x08ar'
- True
- >>> from StringIO import StringIO
- >>> io = StringIO('["streaming API"]')
- >>> json.load(io)[0] == 'streaming API'
- True
-
-Specializing JSON object decoding::
-
- >>> import simplejson as json
- >>> def as_complex(dct):
- ... if '__complex__' in dct:
- ... return complex(dct['real'], dct['imag'])
- ... return dct
- ...
- >>> json.loads('{"__complex__": true, "real": 1, "imag": 2}',
- ... object_hook=as_complex)
- (1+2j)
- >>> import decimal
- >>> json.loads('1.1', parse_float=decimal.Decimal) == decimal.Decimal('1.1')
- True
-
-Specializing JSON object encoding::
-
- >>> import simplejson as json
- >>> def encode_complex(obj):
- ... if isinstance(obj, complex):
- ... return [obj.real, obj.imag]
- ... raise TypeError(repr(o) + " is not JSON serializable")
- ...
- >>> json.dumps(2 + 1j, default=encode_complex)
- '[2.0, 1.0]'
- >>> json.JSONEncoder(default=encode_complex).encode(2 + 1j)
- '[2.0, 1.0]'
- >>> ''.join(json.JSONEncoder(default=encode_complex).iterencode(2 + 1j))
- '[2.0, 1.0]'
-
-
-Using simplejson.tool from the shell to validate and pretty-print::
-
- $ echo '{"json":"obj"}' | python -m simplejson.tool
- {
- "json": "obj"
- }
- $ echo '{ 1.2:3.4}' | python -m simplejson.tool
- Expecting property name: line 1 column 2 (char 2)
-"""
-__version__ = '2.0.9'
-__all__ = [
- 'dump', 'dumps', 'load', 'loads',
- 'JSONDecoder', 'JSONEncoder',
-]
-
-__author__ = 'Bob Ippolito <bob at redivi.com>'
-
-from decoder import JSONDecoder
-from encoder import JSONEncoder
-
-_default_encoder = JSONEncoder(
- skipkeys=False,
- ensure_ascii=True,
- check_circular=True,
- allow_nan=True,
- indent=None,
- separators=None,
- encoding='utf-8',
- default=None,
-)
-
-def dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True,
- allow_nan=True, cls=None, indent=None, separators=None,
- encoding='utf-8', default=None, **kw):
- """Serialize ``obj`` as a JSON formatted stream to ``fp`` (a
- ``.write()``-supporting file-like object).
-
- If ``skipkeys`` is true then ``dict`` keys that are not basic types
- (``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``)
- will be skipped instead of raising a ``TypeError``.
-
- If ``ensure_ascii`` is false, then the some chunks written to ``fp``
- may be ``unicode`` instances, subject to normal Python ``str`` to
- ``unicode`` coercion rules. Unless ``fp.write()`` explicitly
- understands ``unicode`` (as in ``codecs.getwriter()``) this is likely
- to cause an error.
-
- If ``check_circular`` is false, then the circular reference check
- for container types will be skipped and a circular reference will
- result in an ``OverflowError`` (or worse).
-
- If ``allow_nan`` is false, then it will be a ``ValueError`` to
- serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``)
- in strict compliance of the JSON specification, instead of using the
- JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``).
-
- If ``indent`` is a non-negative integer, then JSON array elements and object
- members will be pretty-printed with that indent level. An indent level
- of 0 will only insert newlines. ``None`` is the most compact representation.
-
- If ``separators`` is an ``(item_separator, dict_separator)`` tuple
- then it will be used instead of the default ``(', ', ': ')`` separators.
- ``(',', ':')`` is the most compact JSON representation.
-
- ``encoding`` is the character encoding for str instances, default is UTF-8.
-
- ``default(obj)`` is a function that should return a serializable version
- of obj or raise TypeError. The default simply raises TypeError.
-
- To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the
- ``.default()`` method to serialize additional types), specify it with
- the ``cls`` kwarg.
-
- """
- # cached encoder
- if (not skipkeys and ensure_ascii and
- check_circular and allow_nan and
- cls is None and indent is None and separators is None and
- encoding == 'utf-8' and default is None and not kw):
- iterable = _default_encoder.iterencode(obj)
- else:
- if cls is None:
- cls = JSONEncoder
- iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii,
- check_circular=check_circular, allow_nan=allow_nan, indent=indent,
- separators=separators, encoding=encoding,
- default=default, **kw).iterencode(obj)
- # could accelerate with writelines in some versions of Python, at
- # a debuggability cost
- for chunk in iterable:
- fp.write(chunk)
-
-
-def dumps(obj, skipkeys=False, ensure_ascii=True, check_circular=True,
- allow_nan=True, cls=None, indent=None, separators=None,
- encoding='utf-8', default=None, **kw):
- """Serialize ``obj`` to a JSON formatted ``str``.
-
- If ``skipkeys`` is false then ``dict`` keys that are not basic types
- (``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``)
- will be skipped instead of raising a ``TypeError``.
-
- If ``ensure_ascii`` is false, then the return value will be a
- ``unicode`` instance subject to normal Python ``str`` to ``unicode``
- coercion rules instead of being escaped to an ASCII ``str``.
-
- If ``check_circular`` is false, then the circular reference check
- for container types will be skipped and a circular reference will
- result in an ``OverflowError`` (or worse).
-
- If ``allow_nan`` is false, then it will be a ``ValueError`` to
- serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``) in
- strict compliance of the JSON specification, instead of using the
- JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``).
-
- If ``indent`` is a non-negative integer, then JSON array elements and
- object members will be pretty-printed with that indent level. An indent
- level of 0 will only insert newlines. ``None`` is the most compact
- representation.
-
- If ``separators`` is an ``(item_separator, dict_separator)`` tuple
- then it will be used instead of the default ``(', ', ': ')`` separators.
- ``(',', ':')`` is the most compact JSON representation.
-
- ``encoding`` is the character encoding for str instances, default is UTF-8.
-
- ``default(obj)`` is a function that should return a serializable version
- of obj or raise TypeError. The default simply raises TypeError.
-
- To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the
- ``.default()`` method to serialize additional types), specify it with
- the ``cls`` kwarg.
-
- """
- # cached encoder
- if (not skipkeys and ensure_ascii and
- check_circular and allow_nan and
- cls is None and indent is None and separators is None and
- encoding == 'utf-8' and default is None and not kw):
- return _default_encoder.encode(obj)
- if cls is None:
- cls = JSONEncoder
- return cls(
- skipkeys=skipkeys, ensure_ascii=ensure_ascii,
- check_circular=check_circular, allow_nan=allow_nan, indent=indent,
- separators=separators, encoding=encoding, default=default,
- **kw).encode(obj)
-
-
-_default_decoder = JSONDecoder(encoding=None, object_hook=None)
-
-
-def load(fp, encoding=None, cls=None, object_hook=None, parse_float=None,
- parse_int=None, parse_constant=None, **kw):
- """Deserialize ``fp`` (a ``.read()``-supporting file-like object containing
- a JSON document) to a Python object.
-
- If the contents of ``fp`` is encoded with an ASCII based encoding other
- than utf-8 (e.g. latin-1), then an appropriate ``encoding`` name must
- be specified. Encodings that are not ASCII based (such as UCS-2) are
- not allowed, and should be wrapped with
- ``codecs.getreader(fp)(encoding)``, or simply decoded to a ``unicode``
- object and passed to ``loads()``
-
- ``object_hook`` is an optional function that will be called with the
- result of any object literal decode (a ``dict``). The return value of
- ``object_hook`` will be used instead of the ``dict``. This feature
- can be used to implement custom decoders (e.g. JSON-RPC class hinting).
-
- To use a custom ``JSONDecoder`` subclass, specify it with the ``cls``
- kwarg.
-
- """
- return loads(fp.read(),
- encoding=encoding, cls=cls, object_hook=object_hook,
- parse_float=parse_float, parse_int=parse_int,
- parse_constant=parse_constant, **kw)
-
-
-def loads(s, encoding=None, cls=None, object_hook=None, parse_float=None,
- parse_int=None, parse_constant=None, **kw):
- """Deserialize ``s`` (a ``str`` or ``unicode`` instance containing a JSON
- document) to a Python object.
-
- If ``s`` is a ``str`` instance and is encoded with an ASCII based encoding
- other than utf-8 (e.g. latin-1) then an appropriate ``encoding`` name
- must be specified. Encodings that are not ASCII based (such as UCS-2)
- are not allowed and should be decoded to ``unicode`` first.
-
- ``object_hook`` is an optional function that will be called with the
- result of any object literal decode (a ``dict``). The return value of
- ``object_hook`` will be used instead of the ``dict``. This feature
- can be used to implement custom decoders (e.g. JSON-RPC class hinting).
-
- ``parse_float``, if specified, will be called with the string
- of every JSON float to be decoded. By default this is equivalent to
- float(num_str). This can be used to use another datatype or parser
- for JSON floats (e.g. decimal.Decimal).
-
- ``parse_int``, if specified, will be called with the string
- of every JSON int to be decoded. By default this is equivalent to
- int(num_str). This can be used to use another datatype or parser
- for JSON integers (e.g. float).
-
- ``parse_constant``, if specified, will be called with one of the
- following strings: -Infinity, Infinity, NaN, null, true, false.
- This can be used to raise an exception if invalid JSON numbers
- are encountered.
-
- To use a custom ``JSONDecoder`` subclass, specify it with the ``cls``
- kwarg.
-
- """
- if (cls is None and encoding is None and object_hook is None and
- parse_int is None and parse_float is None and
- parse_constant is None and not kw):
- return _default_decoder.decode(s)
- if cls is None:
- cls = JSONDecoder
- if object_hook is not None:
- kw['object_hook'] = object_hook
- if parse_float is not None:
- kw['parse_float'] = parse_float
- if parse_int is not None:
- kw['parse_int'] = parse_int
- if parse_constant is not None:
- kw['parse_constant'] = parse_constant
- return cls(encoding=encoding, **kw).decode(s)
diff --git a/lang/py/lib/simplejson/_speedups.c b/lang/py/lib/simplejson/_speedups.c
deleted file mode 100644
index 23b5f4a..0000000
--- a/lang/py/lib/simplejson/_speedups.c
+++ /dev/null
@@ -1,2329 +0,0 @@
-#include "Python.h"
-#include "structmember.h"
-#if PY_VERSION_HEX < 0x02060000 && !defined(Py_TYPE)
-#define Py_TYPE(ob) (((PyObject*)(ob))->ob_type)
-#endif
-#if PY_VERSION_HEX < 0x02050000 && !defined(PY_SSIZE_T_MIN)
-typedef int Py_ssize_t;
-#define PY_SSIZE_T_MAX INT_MAX
-#define PY_SSIZE_T_MIN INT_MIN
-#define PyInt_FromSsize_t PyInt_FromLong
-#define PyInt_AsSsize_t PyInt_AsLong
-#endif
-#ifndef Py_IS_FINITE
-#define Py_IS_FINITE(X) (!Py_IS_INFINITY(X) && !Py_IS_NAN(X))
-#endif
-
-#ifdef __GNUC__
-#define UNUSED __attribute__((__unused__))
-#else
-#define UNUSED
-#endif
-
-#define DEFAULT_ENCODING "utf-8"
-
-#define PyScanner_Check(op) PyObject_TypeCheck(op, &PyScannerType)
-#define PyScanner_CheckExact(op) (Py_TYPE(op) == &PyScannerType)
-#define PyEncoder_Check(op) PyObject_TypeCheck(op, &PyEncoderType)
-#define PyEncoder_CheckExact(op) (Py_TYPE(op) == &PyEncoderType)
-
-static PyTypeObject PyScannerType;
-static PyTypeObject PyEncoderType;
-
-typedef struct _PyScannerObject {
- PyObject_HEAD
- PyObject *encoding;
- PyObject *strict;
- PyObject *object_hook;
- PyObject *parse_float;
- PyObject *parse_int;
- PyObject *parse_constant;
-} PyScannerObject;
-
-static PyMemberDef scanner_members[] = {
- {"encoding", T_OBJECT, offsetof(PyScannerObject, encoding), READONLY, "encoding"},
- {"strict", T_OBJECT, offsetof(PyScannerObject, strict), READONLY, "strict"},
- {"object_hook", T_OBJECT, offsetof(PyScannerObject, object_hook), READONLY, "object_hook"},
- {"parse_float", T_OBJECT, offsetof(PyScannerObject, parse_float), READONLY, "parse_float"},
- {"parse_int", T_OBJECT, offsetof(PyScannerObject, parse_int), READONLY, "parse_int"},
- {"parse_constant", T_OBJECT, offsetof(PyScannerObject, parse_constant), READONLY, "parse_constant"},
- {NULL}
-};
-
-typedef struct _PyEncoderObject {
- PyObject_HEAD
- PyObject *markers;
- PyObject *defaultfn;
- PyObject *encoder;
- PyObject *indent;
- PyObject *key_separator;
- PyObject *item_separator;
- PyObject *sort_keys;
- PyObject *skipkeys;
- int fast_encode;
- int allow_nan;
-} PyEncoderObject;
-
-static PyMemberDef encoder_members[] = {
- {"markers", T_OBJECT, offsetof(PyEncoderObject, markers), READONLY, "markers"},
- {"default", T_OBJECT, offsetof(PyEncoderObject, defaultfn), READONLY, "default"},
- {"encoder", T_OBJECT, offsetof(PyEncoderObject, encoder), READONLY, "encoder"},
- {"indent", T_OBJECT, offsetof(PyEncoderObject, indent), READONLY, "indent"},
- {"key_separator", T_OBJECT, offsetof(PyEncoderObject, key_separator), READONLY, "key_separator"},
- {"item_separator", T_OBJECT, offsetof(PyEncoderObject, item_separator), READONLY, "item_separator"},
- {"sort_keys", T_OBJECT, offsetof(PyEncoderObject, sort_keys), READONLY, "sort_keys"},
- {"skipkeys", T_OBJECT, offsetof(PyEncoderObject, skipkeys), READONLY, "skipkeys"},
- {NULL}
-};
-
-static Py_ssize_t
-ascii_escape_char(Py_UNICODE c, char *output, Py_ssize_t chars);
-static PyObject *
-ascii_escape_unicode(PyObject *pystr);
-static PyObject *
-ascii_escape_str(PyObject *pystr);
-static PyObject *
-py_encode_basestring_ascii(PyObject* self UNUSED, PyObject *pystr);
-void init_speedups(void);
-static PyObject *
-scan_once_str(PyScannerObject *s, PyObject *pystr, Py_ssize_t idx, Py_ssize_t *next_idx_ptr);
-static PyObject *
-scan_once_unicode(PyScannerObject *s, PyObject *pystr, Py_ssize_t idx, Py_ssize_t *next_idx_ptr);
-static PyObject *
-_build_rval_index_tuple(PyObject *rval, Py_ssize_t idx);
-static PyObject *
-scanner_new(PyTypeObject *type, PyObject *args, PyObject *kwds);
-static int
-scanner_init(PyObject *self, PyObject *args, PyObject *kwds);
-static void
-scanner_dealloc(PyObject *self);
-static int
-scanner_clear(PyObject *self);
-static PyObject *
-encoder_new(PyTypeObject *type, PyObject *args, PyObject *kwds);
-static int
-encoder_init(PyObject *self, PyObject *args, PyObject *kwds);
-static void
-encoder_dealloc(PyObject *self);
-static int
-encoder_clear(PyObject *self);
-static int
-encoder_listencode_list(PyEncoderObject *s, PyObject *rval, PyObject *seq, Py_ssize_t indent_level);
-static int
-encoder_listencode_obj(PyEncoderObject *s, PyObject *rval, PyObject *obj, Py_ssize_t indent_level);
-static int
-encoder_listencode_dict(PyEncoderObject *s, PyObject *rval, PyObject *dct, Py_ssize_t indent_level);
-static PyObject *
-_encoded_const(PyObject *const);
-static void
-raise_errmsg(char *msg, PyObject *s, Py_ssize_t end);
-static PyObject *
-encoder_encode_string(PyEncoderObject *s, PyObject *obj);
-static int
-_convertPyInt_AsSsize_t(PyObject *o, Py_ssize_t *size_ptr);
-static PyObject *
-_convertPyInt_FromSsize_t(Py_ssize_t *size_ptr);
-static PyObject *
-encoder_encode_float(PyEncoderObject *s, PyObject *obj);
-
-#define S_CHAR(c) (c >= ' ' && c <= '~' && c != '\\' && c != '"')
-#define IS_WHITESPACE(c) (((c) == ' ') || ((c) == '\t') || ((c) == '\n') || ((c) == '\r'))
-
-#define MIN_EXPANSION 6
-#ifdef Py_UNICODE_WIDE
-#define MAX_EXPANSION (2 * MIN_EXPANSION)
-#else
-#define MAX_EXPANSION MIN_EXPANSION
-#endif
-
-static int
-_convertPyInt_AsSsize_t(PyObject *o, Py_ssize_t *size_ptr)
-{
- /* PyObject to Py_ssize_t converter */
- *size_ptr = PyInt_AsSsize_t(o);
- if (*size_ptr == -1 && PyErr_Occurred());
- return 1;
- return 0;
-}
-
-static PyObject *
-_convertPyInt_FromSsize_t(Py_ssize_t *size_ptr)
-{
- /* Py_ssize_t to PyObject converter */
- return PyInt_FromSsize_t(*size_ptr);
-}
-
-static Py_ssize_t
-ascii_escape_char(Py_UNICODE c, char *output, Py_ssize_t chars)
-{
- /* Escape unicode code point c to ASCII escape sequences
- in char *output. output must have at least 12 bytes unused to
- accommodate an escaped surrogate pair "\uXXXX\uXXXX" */
- output[chars++] = '\\';
- switch (c) {
- case '\\': output[chars++] = (char)c; break;
- case '"': output[chars++] = (char)c; break;
- case '\b': output[chars++] = 'b'; break;
- case '\f': output[chars++] = 'f'; break;
- case '\n': output[chars++] = 'n'; break;
- case '\r': output[chars++] = 'r'; break;
- case '\t': output[chars++] = 't'; break;
- default:
-#ifdef Py_UNICODE_WIDE
- if (c >= 0x10000) {
- /* UTF-16 surrogate pair */
- Py_UNICODE v = c - 0x10000;
- c = 0xd800 | ((v >> 10) & 0x3ff);
- output[chars++] = 'u';
- output[chars++] = "0123456789abcdef"[(c >> 12) & 0xf];
- output[chars++] = "0123456789abcdef"[(c >> 8) & 0xf];
- output[chars++] = "0123456789abcdef"[(c >> 4) & 0xf];
- output[chars++] = "0123456789abcdef"[(c ) & 0xf];
- c = 0xdc00 | (v & 0x3ff);
- output[chars++] = '\\';
- }
-#endif
- output[chars++] = 'u';
- output[chars++] = "0123456789abcdef"[(c >> 12) & 0xf];
- output[chars++] = "0123456789abcdef"[(c >> 8) & 0xf];
- output[chars++] = "0123456789abcdef"[(c >> 4) & 0xf];
- output[chars++] = "0123456789abcdef"[(c ) & 0xf];
- }
- return chars;
-}
-
-static PyObject *
-ascii_escape_unicode(PyObject *pystr)
-{
- /* Take a PyUnicode pystr and return a new ASCII-only escaped PyString */
- Py_ssize_t i;
- Py_ssize_t input_chars;
- Py_ssize_t output_size;
- Py_ssize_t max_output_size;
- Py_ssize_t chars;
- PyObject *rval;
- char *output;
- Py_UNICODE *input_unicode;
-
- input_chars = PyUnicode_GET_SIZE(pystr);
- input_unicode = PyUnicode_AS_UNICODE(pystr);
-
- /* One char input can be up to 6 chars output, estimate 4 of these */
- output_size = 2 + (MIN_EXPANSION * 4) + input_chars;
- max_output_size = 2 + (input_chars * MAX_EXPANSION);
- rval = PyString_FromStringAndSize(NULL, output_size);
- if (rval == NULL) {
- return NULL;
- }
- output = PyString_AS_STRING(rval);
- chars = 0;
- output[chars++] = '"';
- for (i = 0; i < input_chars; i++) {
- Py_UNICODE c = input_unicode[i];
- if (S_CHAR(c)) {
- output[chars++] = (char)c;
- }
- else {
- chars = ascii_escape_char(c, output, chars);
- }
- if (output_size - chars < (1 + MAX_EXPANSION)) {
- /* There's more than four, so let's resize by a lot */
- Py_ssize_t new_output_size = output_size * 2;
- /* This is an upper bound */
- if (new_output_size > max_output_size) {
- new_output_size = max_output_size;
- }
- /* Make sure that the output size changed before resizing */
- if (new_output_size != output_size) {
- output_size = new_output_size;
- if (_PyString_Resize(&rval, output_size) == -1) {
- return NULL;
- }
- output = PyString_AS_STRING(rval);
- }
- }
- }
- output[chars++] = '"';
- if (_PyString_Resize(&rval, chars) == -1) {
- return NULL;
- }
- return rval;
-}
-
-static PyObject *
-ascii_escape_str(PyObject *pystr)
-{
- /* Take a PyString pystr and return a new ASCII-only escaped PyString */
- Py_ssize_t i;
- Py_ssize_t input_chars;
- Py_ssize_t output_size;
- Py_ssize_t chars;
- PyObject *rval;
- char *output;
- char *input_str;
-
- input_chars = PyString_GET_SIZE(pystr);
- input_str = PyString_AS_STRING(pystr);
-
- /* Fast path for a string that's already ASCII */
- for (i = 0; i < input_chars; i++) {
- Py_UNICODE c = (Py_UNICODE)(unsigned char)input_str[i];
- if (!S_CHAR(c)) {
- /* If we have to escape something, scan the string for unicode */
- Py_ssize_t j;
- for (j = i; j < input_chars; j++) {
- c = (Py_UNICODE)(unsigned char)input_str[j];
- if (c > 0x7f) {
- /* We hit a non-ASCII character, bail to unicode mode */
- PyObject *uni;
- uni = PyUnicode_DecodeUTF8(input_str, input_chars, "strict");
- if (uni == NULL) {
- return NULL;
- }
- rval = ascii_escape_unicode(uni);
- Py_DECREF(uni);
- return rval;
- }
- }
- break;
- }
- }
-
- if (i == input_chars) {
- /* Input is already ASCII */
- output_size = 2 + input_chars;
- }
- else {
- /* One char input can be up to 6 chars output, estimate 4 of these */
- output_size = 2 + (MIN_EXPANSION * 4) + input_chars;
- }
- rval = PyString_FromStringAndSize(NULL, output_size);
- if (rval == NULL) {
- return NULL;
- }
- output = PyString_AS_STRING(rval);
- output[0] = '"';
-
- /* We know that everything up to i is ASCII already */
- chars = i + 1;
- memcpy(&output[1], input_str, i);
-
- for (; i < input_chars; i++) {
- Py_UNICODE c = (Py_UNICODE)(unsigned char)input_str[i];
- if (S_CHAR(c)) {
- output[chars++] = (char)c;
- }
- else {
- chars = ascii_escape_char(c, output, chars);
- }
- /* An ASCII char can't possibly expand to a surrogate! */
- if (output_size - chars < (1 + MIN_EXPANSION)) {
- /* There's more than four, so let's resize by a lot */
- output_size *= 2;
- if (output_size > 2 + (input_chars * MIN_EXPANSION)) {
- output_size = 2 + (input_chars * MIN_EXPANSION);
- }
- if (_PyString_Resize(&rval, output_size) == -1) {
- return NULL;
- }
- output = PyString_AS_STRING(rval);
- }
- }
- output[chars++] = '"';
- if (_PyString_Resize(&rval, chars) == -1) {
- return NULL;
- }
- return rval;
-}
-
-static void
-raise_errmsg(char *msg, PyObject *s, Py_ssize_t end)
-{
- /* Use the Python function simplejson.decoder.errmsg to raise a nice
- looking ValueError exception */
- static PyObject *errmsg_fn = NULL;
- PyObject *pymsg;
- if (errmsg_fn == NULL) {
- PyObject *decoder = PyImport_ImportModule("simplejson.decoder");
- if (decoder == NULL)
- return;
- errmsg_fn = PyObject_GetAttrString(decoder, "errmsg");
- Py_DECREF(decoder);
- if (errmsg_fn == NULL)
- return;
- }
- pymsg = PyObject_CallFunction(errmsg_fn, "(zOO&)", msg, s, _convertPyInt_FromSsize_t, &end);
- if (pymsg) {
- PyErr_SetObject(PyExc_ValueError, pymsg);
- Py_DECREF(pymsg);
- }
-}
-
-static PyObject *
-join_list_unicode(PyObject *lst)
-{
- /* return u''.join(lst) */
- static PyObject *joinfn = NULL;
- if (joinfn == NULL) {
- PyObject *ustr = PyUnicode_FromUnicode(NULL, 0);
- if (ustr == NULL)
- return NULL;
-
- joinfn = PyObject_GetAttrString(ustr, "join");
- Py_DECREF(ustr);
- if (joinfn == NULL)
- return NULL;
- }
- return PyObject_CallFunctionObjArgs(joinfn, lst, NULL);
-}
-
-static PyObject *
-join_list_string(PyObject *lst)
-{
- /* return ''.join(lst) */
- static PyObject *joinfn = NULL;
- if (joinfn == NULL) {
- PyObject *ustr = PyString_FromStringAndSize(NULL, 0);
- if (ustr == NULL)
- return NULL;
-
- joinfn = PyObject_GetAttrString(ustr, "join");
- Py_DECREF(ustr);
- if (joinfn == NULL)
- return NULL;
- }
- return PyObject_CallFunctionObjArgs(joinfn, lst, NULL);
-}
-
-static PyObject *
-_build_rval_index_tuple(PyObject *rval, Py_ssize_t idx) {
- /* return (rval, idx) tuple, stealing reference to rval */
- PyObject *tpl;
- PyObject *pyidx;
- /*
- steal a reference to rval, returns (rval, idx)
- */
- if (rval == NULL) {
- return NULL;
- }
- pyidx = PyInt_FromSsize_t(idx);
- if (pyidx == NULL) {
- Py_DECREF(rval);
- return NULL;
- }
- tpl = PyTuple_New(2);
- if (tpl == NULL) {
- Py_DECREF(pyidx);
- Py_DECREF(rval);
- return NULL;
- }
- PyTuple_SET_ITEM(tpl, 0, rval);
- PyTuple_SET_ITEM(tpl, 1, pyidx);
- return tpl;
-}
-
-static PyObject *
-scanstring_str(PyObject *pystr, Py_ssize_t end, char *encoding, int strict, Py_ssize_t *next_end_ptr)
-{
- /* Read the JSON string from PyString pystr.
- end is the index of the first character after the quote.
- encoding is the encoding of pystr (must be an ASCII superset)
- if strict is zero then literal control characters are allowed
- *next_end_ptr is a return-by-reference index of the character
- after the end quote
-
- Return value is a new PyString (if ASCII-only) or PyUnicode
- */
- PyObject *rval;
- Py_ssize_t len = PyString_GET_SIZE(pystr);
- Py_ssize_t begin = end - 1;
- Py_ssize_t next = begin;
- int has_unicode = 0;
- char *buf = PyString_AS_STRING(pystr);
- PyObject *chunks = PyList_New(0);
- if (chunks == NULL) {
- goto bail;
- }
- if (end < 0 || len <= end) {
- PyErr_SetString(PyExc_ValueError, "end is out of bounds");
- goto bail;
- }
- while (1) {
- /* Find the end of the string or the next escape */
- Py_UNICODE c = 0;
- PyObject *chunk = NULL;
- for (next = end; next < len; next++) {
- c = (unsigned char)buf[next];
- if (c == '"' || c == '\\') {
- break;
- }
- else if (strict && c <= 0x1f) {
- raise_errmsg("Invalid control character at", pystr, next);
- goto bail;
- }
- else if (c > 0x7f) {
- has_unicode = 1;
- }
- }
- if (!(c == '"' || c == '\\')) {
- raise_errmsg("Unterminated string starting at", pystr, begin);
- goto bail;
- }
- /* Pick up this chunk if it's not zero length */
- if (next != end) {
- PyObject *strchunk = PyString_FromStringAndSize(&buf[end], next - end);
- if (strchunk == NULL) {
- goto bail;
- }
- if (has_unicode) {
- chunk = PyUnicode_FromEncodedObject(strchunk, encoding, NULL);
- Py_DECREF(strchunk);
- if (chunk == NULL) {
- goto bail;
- }
- }
- else {
- chunk = strchunk;
- }
- if (PyList_Append(chunks, chunk)) {
- Py_DECREF(chunk);
- goto bail;
- }
- Py_DECREF(chunk);
- }
- next++;
- if (c == '"') {
- end = next;
- break;
- }
- if (next == len) {
- raise_errmsg("Unterminated string starting at", pystr, begin);
- goto bail;
- }
- c = buf[next];
- if (c != 'u') {
- /* Non-unicode backslash escapes */
- end = next + 1;
- switch (c) {
- case '"': break;
- case '\\': break;
- case '/': break;
- case 'b': c = '\b'; break;
- case 'f': c = '\f'; break;
- case 'n': c = '\n'; break;
- case 'r': c = '\r'; break;
- case 't': c = '\t'; break;
- default: c = 0;
- }
- if (c == 0) {
- raise_errmsg("Invalid \\escape", pystr, end - 2);
- goto bail;
- }
- }
- else {
- c = 0;
- next++;
- end = next + 4;
- if (end >= len) {
- raise_errmsg("Invalid \\uXXXX escape", pystr, next - 1);
- goto bail;
- }
- /* Decode 4 hex digits */
- for (; next < end; next++) {
- Py_UNICODE digit = buf[next];
- c <<= 4;
- switch (digit) {
- case '0': case '1': case '2': case '3': case '4':
- case '5': case '6': case '7': case '8': case '9':
- c |= (digit - '0'); break;
- case 'a': case 'b': case 'c': case 'd': case 'e':
- case 'f':
- c |= (digit - 'a' + 10); break;
- case 'A': case 'B': case 'C': case 'D': case 'E':
- case 'F':
- c |= (digit - 'A' + 10); break;
- default:
- raise_errmsg("Invalid \\uXXXX escape", pystr, end - 5);
- goto bail;
- }
- }
-#ifdef Py_UNICODE_WIDE
- /* Surrogate pair */
- if ((c & 0xfc00) == 0xd800) {
- Py_UNICODE c2 = 0;
- if (end + 6 >= len) {
- raise_errmsg("Unpaired high surrogate", pystr, end - 5);
- goto bail;
- }
- if (buf[next++] != '\\' || buf[next++] != 'u') {
- raise_errmsg("Unpaired high surrogate", pystr, end - 5);
- goto bail;
- }
- end += 6;
- /* Decode 4 hex digits */
- for (; next < end; next++) {
- c2 <<= 4;
- Py_UNICODE digit = buf[next];
- switch (digit) {
- case '0': case '1': case '2': case '3': case '4':
- case '5': case '6': case '7': case '8': case '9':
- c2 |= (digit - '0'); break;
- case 'a': case 'b': case 'c': case 'd': case 'e':
- case 'f':
- c2 |= (digit - 'a' + 10); break;
- case 'A': case 'B': case 'C': case 'D': case 'E':
- case 'F':
- c2 |= (digit - 'A' + 10); break;
- default:
- raise_errmsg("Invalid \\uXXXX escape", pystr, end - 5);
- goto bail;
- }
- }
- if ((c2 & 0xfc00) != 0xdc00) {
- raise_errmsg("Unpaired high surrogate", pystr, end - 5);
- goto bail;
- }
- c = 0x10000 + (((c - 0xd800) << 10) | (c2 - 0xdc00));
- }
- else if ((c & 0xfc00) == 0xdc00) {
- raise_errmsg("Unpaired low surrogate", pystr, end - 5);
- goto bail;
- }
-#endif
- }
- if (c > 0x7f) {
- has_unicode = 1;
- }
- if (has_unicode) {
- chunk = PyUnicode_FromUnicode(&c, 1);
- if (chunk == NULL) {
- goto bail;
- }
- }
- else {
- char c_char = Py_CHARMASK(c);
- chunk = PyString_FromStringAndSize(&c_char, 1);
- if (chunk == NULL) {
- goto bail;
- }
- }
- if (PyList_Append(chunks, chunk)) {
- Py_DECREF(chunk);
- goto bail;
- }
- Py_DECREF(chunk);
- }
-
- rval = join_list_string(chunks);
- if (rval == NULL) {
- goto bail;
- }
- Py_CLEAR(chunks);
- *next_end_ptr = end;
- return rval;
-bail:
- *next_end_ptr = -1;
- Py_XDECREF(chunks);
- return NULL;
-}
-
-
-static PyObject *
-scanstring_unicode(PyObject *pystr, Py_ssize_t end, int strict, Py_ssize_t *next_end_ptr)
-{
- /* Read the JSON string from PyUnicode pystr.
- end is the index of the first character after the quote.
- if strict is zero then literal control characters are allowed
- *next_end_ptr is a return-by-reference index of the character
- after the end quote
-
- Return value is a new PyUnicode
- */
- PyObject *rval;
- Py_ssize_t len = PyUnicode_GET_SIZE(pystr);
- Py_ssize_t begin = end - 1;
- Py_ssize_t next = begin;
- const Py_UNICODE *buf = PyUnicode_AS_UNICODE(pystr);
- PyObject *chunks = PyList_New(0);
- if (chunks == NULL) {
- goto bail;
- }
- if (end < 0 || len <= end) {
- PyErr_SetString(PyExc_ValueError, "end is out of bounds");
- goto bail;
- }
- while (1) {
- /* Find the end of the string or the next escape */
- Py_UNICODE c = 0;
- PyObject *chunk = NULL;
- for (next = end; next < len; next++) {
- c = buf[next];
- if (c == '"' || c == '\\') {
- break;
- }
- else if (strict && c <= 0x1f) {
- raise_errmsg("Invalid control character at", pystr, next);
- goto bail;
- }
- }
- if (!(c == '"' || c == '\\')) {
- raise_errmsg("Unterminated string starting at", pystr, begin);
- goto bail;
- }
- /* Pick up this chunk if it's not zero length */
- if (next != end) {
- chunk = PyUnicode_FromUnicode(&buf[end], next - end);
- if (chunk == NULL) {
- goto bail;
- }
- if (PyList_Append(chunks, chunk)) {
- Py_DECREF(chunk);
- goto bail;
- }
- Py_DECREF(chunk);
- }
- next++;
- if (c == '"') {
- end = next;
- break;
- }
- if (next == len) {
- raise_errmsg("Unterminated string starting at", pystr, begin);
- goto bail;
- }
- c = buf[next];
- if (c != 'u') {
- /* Non-unicode backslash escapes */
- end = next + 1;
- switch (c) {
- case '"': break;
- case '\\': break;
- case '/': break;
- case 'b': c = '\b'; break;
- case 'f': c = '\f'; break;
- case 'n': c = '\n'; break;
- case 'r': c = '\r'; break;
- case 't': c = '\t'; break;
- default: c = 0;
- }
- if (c == 0) {
- raise_errmsg("Invalid \\escape", pystr, end - 2);
- goto bail;
- }
- }
- else {
- c = 0;
- next++;
- end = next + 4;
- if (end >= len) {
- raise_errmsg("Invalid \\uXXXX escape", pystr, next - 1);
- goto bail;
- }
- /* Decode 4 hex digits */
- for (; next < end; next++) {
- Py_UNICODE digit = buf[next];
- c <<= 4;
- switch (digit) {
- case '0': case '1': case '2': case '3': case '4':
- case '5': case '6': case '7': case '8': case '9':
- c |= (digit - '0'); break;
- case 'a': case 'b': case 'c': case 'd': case 'e':
- case 'f':
- c |= (digit - 'a' + 10); break;
- case 'A': case 'B': case 'C': case 'D': case 'E':
- case 'F':
- c |= (digit - 'A' + 10); break;
- default:
- raise_errmsg("Invalid \\uXXXX escape", pystr, end - 5);
- goto bail;
- }
- }
-#ifdef Py_UNICODE_WIDE
- /* Surrogate pair */
- if ((c & 0xfc00) == 0xd800) {
- Py_UNICODE c2 = 0;
- if (end + 6 >= len) {
- raise_errmsg("Unpaired high surrogate", pystr, end - 5);
- goto bail;
- }
- if (buf[next++] != '\\' || buf[next++] != 'u') {
- raise_errmsg("Unpaired high surrogate", pystr, end - 5);
- goto bail;
- }
- end += 6;
- /* Decode 4 hex digits */
- for (; next < end; next++) {
- c2 <<= 4;
- Py_UNICODE digit = buf[next];
- switch (digit) {
- case '0': case '1': case '2': case '3': case '4':
- case '5': case '6': case '7': case '8': case '9':
- c2 |= (digit - '0'); break;
- case 'a': case 'b': case 'c': case 'd': case 'e':
- case 'f':
- c2 |= (digit - 'a' + 10); break;
- case 'A': case 'B': case 'C': case 'D': case 'E':
- case 'F':
- c2 |= (digit - 'A' + 10); break;
- default:
- raise_errmsg("Invalid \\uXXXX escape", pystr, end - 5);
- goto bail;
- }
- }
- if ((c2 & 0xfc00) != 0xdc00) {
- raise_errmsg("Unpaired high surrogate", pystr, end - 5);
- goto bail;
- }
- c = 0x10000 + (((c - 0xd800) << 10) | (c2 - 0xdc00));
- }
- else if ((c & 0xfc00) == 0xdc00) {
- raise_errmsg("Unpaired low surrogate", pystr, end - 5);
- goto bail;
- }
-#endif
- }
- chunk = PyUnicode_FromUnicode(&c, 1);
- if (chunk == NULL) {
- goto bail;
- }
- if (PyList_Append(chunks, chunk)) {
- Py_DECREF(chunk);
- goto bail;
- }
- Py_DECREF(chunk);
- }
-
- rval = join_list_unicode(chunks);
- if (rval == NULL) {
- goto bail;
- }
- Py_DECREF(chunks);
- *next_end_ptr = end;
- return rval;
-bail:
- *next_end_ptr = -1;
- Py_XDECREF(chunks);
- return NULL;
-}
-
-PyDoc_STRVAR(pydoc_scanstring,
- "scanstring(basestring, end, encoding, strict=True) -> (str, end)\n"
- "\n"
- "Scan the string s for a JSON string. End is the index of the\n"
- "character in s after the quote that started the JSON string.\n"
- "Unescapes all valid JSON string escape sequences and raises ValueError\n"
- "on attempt to decode an invalid string. If strict is False then literal\n"
- "control characters are allowed in the string.\n"
- "\n"
- "Returns a tuple of the decoded string and the index of the character in s\n"
- "after the end quote."
-);
-
-static PyObject *
-py_scanstring(PyObject* self UNUSED, PyObject *args)
-{
- PyObject *pystr;
- PyObject *rval;
- Py_ssize_t end;
- Py_ssize_t next_end = -1;
- char *encoding = NULL;
- int strict = 1;
- if (!PyArg_ParseTuple(args, "OO&|zi:scanstring", &pystr, _convertPyInt_AsSsize_t, &end, &encoding, &strict)) {
- return NULL;
- }
- if (encoding == NULL) {
- encoding = DEFAULT_ENCODING;
- }
- if (PyString_Check(pystr)) {
- rval = scanstring_str(pystr, end, encoding, strict, &next_end);
- }
- else if (PyUnicode_Check(pystr)) {
- rval = scanstring_unicode(pystr, end, strict, &next_end);
- }
- else {
- PyErr_Format(PyExc_TypeError,
- "first argument must be a string, not %.80s",
- Py_TYPE(pystr)->tp_name);
- return NULL;
- }
- return _build_rval_index_tuple(rval, next_end);
-}
-
-PyDoc_STRVAR(pydoc_encode_basestring_ascii,
- "encode_basestring_ascii(basestring) -> str\n"
- "\n"
- "Return an ASCII-only JSON representation of a Python string"
-);
-
-static PyObject *
-py_encode_basestring_ascii(PyObject* self UNUSED, PyObject *pystr)
-{
- /* Return an ASCII-only JSON representation of a Python string */
- /* METH_O */
- if (PyString_Check(pystr)) {
- return ascii_escape_str(pystr);
- }
- else if (PyUnicode_Check(pystr)) {
- return ascii_escape_unicode(pystr);
- }
- else {
- PyErr_Format(PyExc_TypeError,
- "first argument must be a string, not %.80s",
- Py_TYPE(pystr)->tp_name);
- return NULL;
- }
-}
-
-static void
-scanner_dealloc(PyObject *self)
-{
- /* Deallocate scanner object */
- scanner_clear(self);
- Py_TYPE(self)->tp_free(self);
-}
-
-static int
-scanner_traverse(PyObject *self, visitproc visit, void *arg)
-{
- PyScannerObject *s;
- assert(PyScanner_Check(self));
- s = (PyScannerObject *)self;
- Py_VISIT(s->encoding);
- Py_VISIT(s->strict);
- Py_VISIT(s->object_hook);
- Py_VISIT(s->parse_float);
- Py_VISIT(s->parse_int);
- Py_VISIT(s->parse_constant);
- return 0;
-}
-
-static int
-scanner_clear(PyObject *self)
-{
- PyScannerObject *s;
- assert(PyScanner_Check(self));
- s = (PyScannerObject *)self;
- Py_CLEAR(s->encoding);
- Py_CLEAR(s->strict);
- Py_CLEAR(s->object_hook);
- Py_CLEAR(s->parse_float);
- Py_CLEAR(s->parse_int);
- Py_CLEAR(s->parse_constant);
- return 0;
-}
-
-static PyObject *
-_parse_object_str(PyScannerObject *s, PyObject *pystr, Py_ssize_t idx, Py_ssize_t *next_idx_ptr) {
- /* Read a JSON object from PyString pystr.
- idx is the index of the first character after the opening curly brace.
- *next_idx_ptr is a return-by-reference index to the first character after
- the closing curly brace.
-
- Returns a new PyObject (usually a dict, but object_hook can change that)
- */
- char *str = PyString_AS_STRING(pystr);
- Py_ssize_t end_idx = PyString_GET_SIZE(pystr) - 1;
- PyObject *rval = PyDict_New();
- PyObject *key = NULL;
- PyObject *val = NULL;
- char *encoding = PyString_AS_STRING(s->encoding);
- int strict = PyObject_IsTrue(s->strict);
- Py_ssize_t next_idx;
- if (rval == NULL)
- return NULL;
-
- /* skip whitespace after { */
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
-
- /* only loop if the object is non-empty */
- if (idx <= end_idx && str[idx] != '}') {
- while (idx <= end_idx) {
- /* read key */
- if (str[idx] != '"') {
- raise_errmsg("Expecting property name", pystr, idx);
- goto bail;
- }
- key = scanstring_str(pystr, idx + 1, encoding, strict, &next_idx);
- if (key == NULL)
- goto bail;
- idx = next_idx;
-
- /* skip whitespace between key and : delimiter, read :, skip whitespace */
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
- if (idx > end_idx || str[idx] != ':') {
- raise_errmsg("Expecting : delimiter", pystr, idx);
- goto bail;
- }
- idx++;
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
-
- /* read any JSON data type */
- val = scan_once_str(s, pystr, idx, &next_idx);
- if (val == NULL)
- goto bail;
-
- if (PyDict_SetItem(rval, key, val) == -1)
- goto bail;
-
- Py_CLEAR(key);
- Py_CLEAR(val);
- idx = next_idx;
-
- /* skip whitespace before } or , */
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
-
- /* bail if the object is closed or we didn't get the , delimiter */
- if (idx > end_idx) break;
- if (str[idx] == '}') {
- break;
- }
- else if (str[idx] != ',') {
- raise_errmsg("Expecting , delimiter", pystr, idx);
- goto bail;
- }
- idx++;
-
- /* skip whitespace after , delimiter */
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
- }
- }
- /* verify that idx < end_idx, str[idx] should be '}' */
- if (idx > end_idx || str[idx] != '}') {
- raise_errmsg("Expecting object", pystr, end_idx);
- goto bail;
- }
- /* if object_hook is not None: rval = object_hook(rval) */
- if (s->object_hook != Py_None) {
- val = PyObject_CallFunctionObjArgs(s->object_hook, rval, NULL);
- if (val == NULL)
- goto bail;
- Py_DECREF(rval);
- rval = val;
- val = NULL;
- }
- *next_idx_ptr = idx + 1;
- return rval;
-bail:
- Py_XDECREF(key);
- Py_XDECREF(val);
- Py_DECREF(rval);
- return NULL;
-}
-
-static PyObject *
-_parse_object_unicode(PyScannerObject *s, PyObject *pystr, Py_ssize_t idx, Py_ssize_t *next_idx_ptr) {
- /* Read a JSON object from PyUnicode pystr.
- idx is the index of the first character after the opening curly brace.
- *next_idx_ptr is a return-by-reference index to the first character after
- the closing curly brace.
-
- Returns a new PyObject (usually a dict, but object_hook can change that)
- */
- Py_UNICODE *str = PyUnicode_AS_UNICODE(pystr);
- Py_ssize_t end_idx = PyUnicode_GET_SIZE(pystr) - 1;
- PyObject *val = NULL;
- PyObject *rval = PyDict_New();
- PyObject *key = NULL;
- int strict = PyObject_IsTrue(s->strict);
- Py_ssize_t next_idx;
- if (rval == NULL)
- return NULL;
-
- /* skip whitespace after { */
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
-
- /* only loop if the object is non-empty */
- if (idx <= end_idx && str[idx] != '}') {
- while (idx <= end_idx) {
- /* read key */
- if (str[idx] != '"') {
- raise_errmsg("Expecting property name", pystr, idx);
- goto bail;
- }
- key = scanstring_unicode(pystr, idx + 1, strict, &next_idx);
- if (key == NULL)
- goto bail;
- idx = next_idx;
-
- /* skip whitespace between key and : delimiter, read :, skip whitespace */
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
- if (idx > end_idx || str[idx] != ':') {
- raise_errmsg("Expecting : delimiter", pystr, idx);
- goto bail;
- }
- idx++;
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
-
- /* read any JSON term */
- val = scan_once_unicode(s, pystr, idx, &next_idx);
- if (val == NULL)
- goto bail;
-
- if (PyDict_SetItem(rval, key, val) == -1)
- goto bail;
-
- Py_CLEAR(key);
- Py_CLEAR(val);
- idx = next_idx;
-
- /* skip whitespace before } or , */
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
-
- /* bail if the object is closed or we didn't get the , delimiter */
- if (idx > end_idx) break;
- if (str[idx] == '}') {
- break;
- }
- else if (str[idx] != ',') {
- raise_errmsg("Expecting , delimiter", pystr, idx);
- goto bail;
- }
- idx++;
-
- /* skip whitespace after , delimiter */
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
- }
- }
-
- /* verify that idx < end_idx, str[idx] should be '}' */
- if (idx > end_idx || str[idx] != '}') {
- raise_errmsg("Expecting object", pystr, end_idx);
- goto bail;
- }
-
- /* if object_hook is not None: rval = object_hook(rval) */
- if (s->object_hook != Py_None) {
- val = PyObject_CallFunctionObjArgs(s->object_hook, rval, NULL);
- if (val == NULL)
- goto bail;
- Py_DECREF(rval);
- rval = val;
- val = NULL;
- }
- *next_idx_ptr = idx + 1;
- return rval;
-bail:
- Py_XDECREF(key);
- Py_XDECREF(val);
- Py_DECREF(rval);
- return NULL;
-}
-
-static PyObject *
-_parse_array_str(PyScannerObject *s, PyObject *pystr, Py_ssize_t idx, Py_ssize_t *next_idx_ptr) {
- /* Read a JSON array from PyString pystr.
- idx is the index of the first character after the opening brace.
- *next_idx_ptr is a return-by-reference index to the first character after
- the closing brace.
-
- Returns a new PyList
- */
- char *str = PyString_AS_STRING(pystr);
- Py_ssize_t end_idx = PyString_GET_SIZE(pystr) - 1;
- PyObject *val = NULL;
- PyObject *rval = PyList_New(0);
- Py_ssize_t next_idx;
- if (rval == NULL)
- return NULL;
-
- /* skip whitespace after [ */
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
-
- /* only loop if the array is non-empty */
- if (idx <= end_idx && str[idx] != ']') {
- while (idx <= end_idx) {
-
- /* read any JSON term and de-tuplefy the (rval, idx) */
- val = scan_once_str(s, pystr, idx, &next_idx);
- if (val == NULL)
- goto bail;
-
- if (PyList_Append(rval, val) == -1)
- goto bail;
-
- Py_CLEAR(val);
- idx = next_idx;
-
- /* skip whitespace between term and , */
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
-
- /* bail if the array is closed or we didn't get the , delimiter */
- if (idx > end_idx) break;
- if (str[idx] == ']') {
- break;
- }
- else if (str[idx] != ',') {
- raise_errmsg("Expecting , delimiter", pystr, idx);
- goto bail;
- }
- idx++;
-
- /* skip whitespace after , */
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
- }
- }
-
- /* verify that idx < end_idx, str[idx] should be ']' */
- if (idx > end_idx || str[idx] != ']') {
- raise_errmsg("Expecting object", pystr, end_idx);
- goto bail;
- }
- *next_idx_ptr = idx + 1;
- return rval;
-bail:
- Py_XDECREF(val);
- Py_DECREF(rval);
- return NULL;
-}
-
-static PyObject *
-_parse_array_unicode(PyScannerObject *s, PyObject *pystr, Py_ssize_t idx, Py_ssize_t *next_idx_ptr) {
- /* Read a JSON array from PyString pystr.
- idx is the index of the first character after the opening brace.
- *next_idx_ptr is a return-by-reference index to the first character after
- the closing brace.
-
- Returns a new PyList
- */
- Py_UNICODE *str = PyUnicode_AS_UNICODE(pystr);
- Py_ssize_t end_idx = PyUnicode_GET_SIZE(pystr) - 1;
- PyObject *val = NULL;
- PyObject *rval = PyList_New(0);
- Py_ssize_t next_idx;
- if (rval == NULL)
- return NULL;
-
- /* skip whitespace after [ */
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
-
- /* only loop if the array is non-empty */
- if (idx <= end_idx && str[idx] != ']') {
- while (idx <= end_idx) {
-
- /* read any JSON term */
- val = scan_once_unicode(s, pystr, idx, &next_idx);
- if (val == NULL)
- goto bail;
-
- if (PyList_Append(rval, val) == -1)
- goto bail;
-
- Py_CLEAR(val);
- idx = next_idx;
-
- /* skip whitespace between term and , */
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
-
- /* bail if the array is closed or we didn't get the , delimiter */
- if (idx > end_idx) break;
- if (str[idx] == ']') {
- break;
- }
- else if (str[idx] != ',') {
- raise_errmsg("Expecting , delimiter", pystr, idx);
- goto bail;
- }
- idx++;
-
- /* skip whitespace after , */
- while (idx <= end_idx && IS_WHITESPACE(str[idx])) idx++;
- }
- }
-
- /* verify that idx < end_idx, str[idx] should be ']' */
- if (idx > end_idx || str[idx] != ']') {
- raise_errmsg("Expecting object", pystr, end_idx);
- goto bail;
- }
- *next_idx_ptr = idx + 1;
- return rval;
-bail:
- Py_XDECREF(val);
- Py_DECREF(rval);
- return NULL;
-}
-
-static PyObject *
-_parse_constant(PyScannerObject *s, char *constant, Py_ssize_t idx, Py_ssize_t *next_idx_ptr) {
- /* Read a JSON constant from PyString pystr.
- constant is the constant string that was found
- ("NaN", "Infinity", "-Infinity").
- idx is the index of the first character of the constant
- *next_idx_ptr is a return-by-reference index to the first character after
- the constant.
-
- Returns the result of parse_constant
- */
- PyObject *cstr;
- PyObject *rval;
- /* constant is "NaN", "Infinity", or "-Infinity" */
- cstr = PyString_InternFromString(constant);
- if (cstr == NULL)
- return NULL;
-
- /* rval = parse_constant(constant) */
- rval = PyObject_CallFunctionObjArgs(s->parse_constant, cstr, NULL);
- idx += PyString_GET_SIZE(cstr);
- Py_DECREF(cstr);
- *next_idx_ptr = idx;
- return rval;
-}
-
-static PyObject *
-_match_number_str(PyScannerObject *s, PyObject *pystr, Py_ssize_t start, Py_ssize_t *next_idx_ptr) {
- /* Read a JSON number from PyString pystr.
- idx is the index of the first character of the number
- *next_idx_ptr is a return-by-reference index to the first character after
- the number.
-
- Returns a new PyObject representation of that number:
- PyInt, PyLong, or PyFloat.
- May return other types if parse_int or parse_float are set
- */
- char *str = PyString_AS_STRING(pystr);
- Py_ssize_t end_idx = PyString_GET_SIZE(pystr) - 1;
- Py_ssize_t idx = start;
- int is_float = 0;
- PyObject *rval;
- PyObject *numstr;
-
- /* read a sign if it's there, make sure it's not the end of the string */
- if (str[idx] == '-') {
- idx++;
- if (idx > end_idx) {
- PyErr_SetNone(PyExc_StopIteration);
- return NULL;
- }
- }
-
- /* read as many integer digits as we find as long as it doesn't start with 0 */
- if (str[idx] >= '1' && str[idx] <= '9') {
- idx++;
- while (idx <= end_idx && str[idx] >= '0' && str[idx] <= '9') idx++;
- }
- /* if it starts with 0 we only expect one integer digit */
- else if (str[idx] == '0') {
- idx++;
- }
- /* no integer digits, error */
- else {
- PyErr_SetNone(PyExc_StopIteration);
- return NULL;
- }
-
- /* if the next char is '.' followed by a digit then read all float digits */
- if (idx < end_idx && str[idx] == '.' && str[idx + 1] >= '0' && str[idx + 1] <= '9') {
- is_float = 1;
- idx += 2;
- while (idx <= end_idx && str[idx] >= '0' && str[idx] <= '9') idx++;
- }
-
- /* if the next char is 'e' or 'E' then maybe read the exponent (or backtrack) */
- if (idx < end_idx && (str[idx] == 'e' || str[idx] == 'E')) {
-
- /* save the index of the 'e' or 'E' just in case we need to backtrack */
- Py_ssize_t e_start = idx;
- idx++;
-
- /* read an exponent sign if present */
- if (idx < end_idx && (str[idx] == '-' || str[idx] == '+')) idx++;
-
- /* read all digits */
- while (idx <= end_idx && str[idx] >= '0' && str[idx] <= '9') idx++;
-
- /* if we got a digit, then parse as float. if not, backtrack */
- if (str[idx - 1] >= '0' && str[idx - 1] <= '9') {
- is_float = 1;
- }
- else {
- idx = e_start;
- }
- }
-
- /* copy the section we determined to be a number */
- numstr = PyString_FromStringAndSize(&str[start], idx - start);
- if (numstr == NULL)
- return NULL;
- if (is_float) {
- /* parse as a float using a fast path if available, otherwise call user defined method */
- if (s->parse_float != (PyObject *)&PyFloat_Type) {
- rval = PyObject_CallFunctionObjArgs(s->parse_float, numstr, NULL);
- }
- else {
- rval = PyFloat_FromDouble(PyOS_ascii_atof(PyString_AS_STRING(numstr)));
- }
- }
- else {
- /* parse as an int using a fast path if available, otherwise call user defined method */
- if (s->parse_int != (PyObject *)&PyInt_Type) {
- rval = PyObject_CallFunctionObjArgs(s->parse_int, numstr, NULL);
- }
- else {
- rval = PyInt_FromString(PyString_AS_STRING(numstr), NULL, 10);
- }
- }
- Py_DECREF(numstr);
- *next_idx_ptr = idx;
- return rval;
-}
-
-static PyObject *
-_match_number_unicode(PyScannerObject *s, PyObject *pystr, Py_ssize_t start, Py_ssize_t *next_idx_ptr) {
- /* Read a JSON number from PyUnicode pystr.
- idx is the index of the first character of the number
- *next_idx_ptr is a return-by-reference index to the first character after
- the number.
-
- Returns a new PyObject representation of that number:
- PyInt, PyLong, or PyFloat.
- May return other types if parse_int or parse_float are set
- */
- Py_UNICODE *str = PyUnicode_AS_UNICODE(pystr);
- Py_ssize_t end_idx = PyUnicode_GET_SIZE(pystr) - 1;
- Py_ssize_t idx = start;
- int is_float = 0;
- PyObject *rval;
- PyObject *numstr;
-
- /* read a sign if it's there, make sure it's not the end of the string */
- if (str[idx] == '-') {
- idx++;
- if (idx > end_idx) {
- PyErr_SetNone(PyExc_StopIteration);
- return NULL;
- }
- }
-
- /* read as many integer digits as we find as long as it doesn't start with 0 */
- if (str[idx] >= '1' && str[idx] <= '9') {
- idx++;
- while (idx <= end_idx && str[idx] >= '0' && str[idx] <= '9') idx++;
- }
- /* if it starts with 0 we only expect one integer digit */
- else if (str[idx] == '0') {
- idx++;
- }
- /* no integer digits, error */
- else {
- PyErr_SetNone(PyExc_StopIteration);
- return NULL;
- }
-
- /* if the next char is '.' followed by a digit then read all float digits */
- if (idx < end_idx && str[idx] == '.' && str[idx + 1] >= '0' && str[idx + 1] <= '9') {
- is_float = 1;
- idx += 2;
- while (idx < end_idx && str[idx] >= '0' && str[idx] <= '9') idx++;
- }
-
- /* if the next char is 'e' or 'E' then maybe read the exponent (or backtrack) */
- if (idx < end_idx && (str[idx] == 'e' || str[idx] == 'E')) {
- Py_ssize_t e_start = idx;
- idx++;
-
- /* read an exponent sign if present */
- if (idx < end_idx && (str[idx] == '-' || str[idx] == '+')) idx++;
-
- /* read all digits */
- while (idx <= end_idx && str[idx] >= '0' && str[idx] <= '9') idx++;
-
- /* if we got a digit, then parse as float. if not, backtrack */
- if (str[idx - 1] >= '0' && str[idx - 1] <= '9') {
- is_float = 1;
- }
- else {
- idx = e_start;
- }
- }
-
- /* copy the section we determined to be a number */
- numstr = PyUnicode_FromUnicode(&str[start], idx - start);
- if (numstr == NULL)
- return NULL;
- if (is_float) {
- /* parse as a float using a fast path if available, otherwise call user defined method */
- if (s->parse_float != (PyObject *)&PyFloat_Type) {
- rval = PyObject_CallFunctionObjArgs(s->parse_float, numstr, NULL);
- }
- else {
- rval = PyFloat_FromString(numstr, NULL);
- }
- }
- else {
- /* no fast path for unicode -> int, just call */
- rval = PyObject_CallFunctionObjArgs(s->parse_int, numstr, NULL);
- }
- Py_DECREF(numstr);
- *next_idx_ptr = idx;
- return rval;
-}
-
-static PyObject *
-scan_once_str(PyScannerObject *s, PyObject *pystr, Py_ssize_t idx, Py_ssize_t *next_idx_ptr)
-{
- /* Read one JSON term (of any kind) from PyString pystr.
- idx is the index of the first character of the term
- *next_idx_ptr is a return-by-reference index to the first character after
- the number.
-
- Returns a new PyObject representation of the term.
- */
- char *str = PyString_AS_STRING(pystr);
- Py_ssize_t length = PyString_GET_SIZE(pystr);
- if (idx >= length) {
- PyErr_SetNone(PyExc_StopIteration);
- return NULL;
- }
- switch (str[idx]) {
- case '"':
- /* string */
- return scanstring_str(pystr, idx + 1,
- PyString_AS_STRING(s->encoding),
- PyObject_IsTrue(s->strict),
- next_idx_ptr);
- case '{':
- /* object */
- return _parse_object_str(s, pystr, idx + 1, next_idx_ptr);
- case '[':
- /* array */
- return _parse_array_str(s, pystr, idx + 1, next_idx_ptr);
- case 'n':
- /* null */
- if ((idx + 3 < length) && str[idx + 1] == 'u' && str[idx + 2] == 'l' && str[idx + 3] == 'l') {
- Py_INCREF(Py_None);
- *next_idx_ptr = idx + 4;
- return Py_None;
- }
- break;
- case 't':
- /* true */
- if ((idx + 3 < length) && str[idx + 1] == 'r' && str[idx + 2] == 'u' && str[idx + 3] == 'e') {
- Py_INCREF(Py_True);
- *next_idx_ptr = idx + 4;
- return Py_True;
- }
- break;
- case 'f':
- /* false */
- if ((idx + 4 < length) && str[idx + 1] == 'a' && str[idx + 2] == 'l' && str[idx + 3] == 's' && str[idx + 4] == 'e') {
- Py_INCREF(Py_False);
- *next_idx_ptr = idx + 5;
- return Py_False;
- }
- break;
- case 'N':
- /* NaN */
- if ((idx + 2 < length) && str[idx + 1] == 'a' && str[idx + 2] == 'N') {
- return _parse_constant(s, "NaN", idx, next_idx_ptr);
- }
- break;
- case 'I':
- /* Infinity */
- if ((idx + 7 < length) && str[idx + 1] == 'n' && str[idx + 2] == 'f' && str[idx + 3] == 'i' && str[idx + 4] == 'n' && str[idx + 5] == 'i' && str[idx + 6] == 't' && str[idx + 7] == 'y') {
- return _parse_constant(s, "Infinity", idx, next_idx_ptr);
- }
- break;
- case '-':
- /* -Infinity */
- if ((idx + 8 < length) && str[idx + 1] == 'I' && str[idx + 2] == 'n' && str[idx + 3] == 'f' && str[idx + 4] == 'i' && str[idx + 5] == 'n' && str[idx + 6] == 'i' && str[idx + 7] == 't' && str[idx + 8] == 'y') {
- return _parse_constant(s, "-Infinity", idx, next_idx_ptr);
- }
- break;
- }
- /* Didn't find a string, object, array, or named constant. Look for a number. */
- return _match_number_str(s, pystr, idx, next_idx_ptr);
-}
-
-static PyObject *
-scan_once_unicode(PyScannerObject *s, PyObject *pystr, Py_ssize_t idx, Py_ssize_t *next_idx_ptr)
-{
- /* Read one JSON term (of any kind) from PyUnicode pystr.
- idx is the index of the first character of the term
- *next_idx_ptr is a return-by-reference index to the first character after
- the number.
-
- Returns a new PyObject representation of the term.
- */
- Py_UNICODE *str = PyUnicode_AS_UNICODE(pystr);
- Py_ssize_t length = PyUnicode_GET_SIZE(pystr);
- if (idx >= length) {
- PyErr_SetNone(PyExc_StopIteration);
- return NULL;
- }
- switch (str[idx]) {
- case '"':
- /* string */
- return scanstring_unicode(pystr, idx + 1,
- PyObject_IsTrue(s->strict),
- next_idx_ptr);
- case '{':
- /* object */
- return _parse_object_unicode(s, pystr, idx + 1, next_idx_ptr);
- case '[':
- /* array */
- return _parse_array_unicode(s, pystr, idx + 1, next_idx_ptr);
- case 'n':
- /* null */
- if ((idx + 3 < length) && str[idx + 1] == 'u' && str[idx + 2] == 'l' && str[idx + 3] == 'l') {
- Py_INCREF(Py_None);
- *next_idx_ptr = idx + 4;
- return Py_None;
- }
- break;
- case 't':
- /* true */
- if ((idx + 3 < length) && str[idx + 1] == 'r' && str[idx + 2] == 'u' && str[idx + 3] == 'e') {
- Py_INCREF(Py_True);
- *next_idx_ptr = idx + 4;
- return Py_True;
- }
- break;
- case 'f':
- /* false */
- if ((idx + 4 < length) && str[idx + 1] == 'a' && str[idx + 2] == 'l' && str[idx + 3] == 's' && str[idx + 4] == 'e') {
- Py_INCREF(Py_False);
- *next_idx_ptr = idx + 5;
- return Py_False;
- }
- break;
- case 'N':
- /* NaN */
- if ((idx + 2 < length) && str[idx + 1] == 'a' && str[idx + 2] == 'N') {
- return _parse_constant(s, "NaN", idx, next_idx_ptr);
- }
- break;
- case 'I':
- /* Infinity */
- if ((idx + 7 < length) && str[idx + 1] == 'n' && str[idx + 2] == 'f' && str[idx + 3] == 'i' && str[idx + 4] == 'n' && str[idx + 5] == 'i' && str[idx + 6] == 't' && str[idx + 7] == 'y') {
- return _parse_constant(s, "Infinity", idx, next_idx_ptr);
- }
- break;
- case '-':
- /* -Infinity */
- if ((idx + 8 < length) && str[idx + 1] == 'I' && str[idx + 2] == 'n' && str[idx + 3] == 'f' && str[idx + 4] == 'i' && str[idx + 5] == 'n' && str[idx + 6] == 'i' && str[idx + 7] == 't' && str[idx + 8] == 'y') {
- return _parse_constant(s, "-Infinity", idx, next_idx_ptr);
- }
- break;
- }
- /* Didn't find a string, object, array, or named constant. Look for a number. */
- return _match_number_unicode(s, pystr, idx, next_idx_ptr);
-}
-
-static PyObject *
-scanner_call(PyObject *self, PyObject *args, PyObject *kwds)
-{
- /* Python callable interface to scan_once_{str,unicode} */
- PyObject *pystr;
- PyObject *rval;
- Py_ssize_t idx;
- Py_ssize_t next_idx = -1;
- static char *kwlist[] = {"string", "idx", NULL};
- PyScannerObject *s;
- assert(PyScanner_Check(self));
- s = (PyScannerObject *)self;
- if (!PyArg_ParseTupleAndKeywords(args, kwds, "OO&:scan_once", kwlist, &pystr, _convertPyInt_AsSsize_t, &idx))
- return NULL;
-
- if (PyString_Check(pystr)) {
- rval = scan_once_str(s, pystr, idx, &next_idx);
- }
- else if (PyUnicode_Check(pystr)) {
- rval = scan_once_unicode(s, pystr, idx, &next_idx);
- }
- else {
- PyErr_Format(PyExc_TypeError,
- "first argument must be a string, not %.80s",
- Py_TYPE(pystr)->tp_name);
- return NULL;
- }
- return _build_rval_index_tuple(rval, next_idx);
-}
-
-static PyObject *
-scanner_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
-{
- PyScannerObject *s;
- s = (PyScannerObject *)type->tp_alloc(type, 0);
- if (s != NULL) {
- s->encoding = NULL;
- s->strict = NULL;
- s->object_hook = NULL;
- s->parse_float = NULL;
- s->parse_int = NULL;
- s->parse_constant = NULL;
- }
- return (PyObject *)s;
-}
-
-static int
-scanner_init(PyObject *self, PyObject *args, PyObject *kwds)
-{
- /* Initialize Scanner object */
- PyObject *ctx;
- static char *kwlist[] = {"context", NULL};
- PyScannerObject *s;
-
- assert(PyScanner_Check(self));
- s = (PyScannerObject *)self;
-
- if (!PyArg_ParseTupleAndKeywords(args, kwds, "O:make_scanner", kwlist, &ctx))
- return -1;
-
- /* PyString_AS_STRING is used on encoding */
- s->encoding = PyObject_GetAttrString(ctx, "encoding");
- if (s->encoding == Py_None) {
- Py_DECREF(Py_None);
- s->encoding = PyString_InternFromString(DEFAULT_ENCODING);
- }
- else if (PyUnicode_Check(s->encoding)) {
- PyObject *tmp = PyUnicode_AsEncodedString(s->encoding, NULL, NULL);
- Py_DECREF(s->encoding);
- s->encoding = tmp;
- }
- if (s->encoding == NULL || !PyString_Check(s->encoding))
- goto bail;
-
- /* All of these will fail "gracefully" so we don't need to verify them */
- s->strict = PyObject_GetAttrString(ctx, "strict");
- if (s->strict == NULL)
- goto bail;
- s->object_hook = PyObject_GetAttrString(ctx, "object_hook");
- if (s->object_hook == NULL)
- goto bail;
- s->parse_float = PyObject_GetAttrString(ctx, "parse_float");
- if (s->parse_float == NULL)
- goto bail;
- s->parse_int = PyObject_GetAttrString(ctx, "parse_int");
- if (s->parse_int == NULL)
- goto bail;
- s->parse_constant = PyObject_GetAttrString(ctx, "parse_constant");
- if (s->parse_constant == NULL)
- goto bail;
-
- return 0;
-
-bail:
- Py_CLEAR(s->encoding);
- Py_CLEAR(s->strict);
- Py_CLEAR(s->object_hook);
- Py_CLEAR(s->parse_float);
- Py_CLEAR(s->parse_int);
- Py_CLEAR(s->parse_constant);
- return -1;
-}
-
-PyDoc_STRVAR(scanner_doc, "JSON scanner object");
-
-static
-PyTypeObject PyScannerType = {
- PyObject_HEAD_INIT(NULL)
- 0, /* tp_internal */
- "simplejson._speedups.Scanner", /* tp_name */
- sizeof(PyScannerObject), /* tp_basicsize */
- 0, /* tp_itemsize */
- scanner_dealloc, /* tp_dealloc */
- 0, /* tp_print */
- 0, /* tp_getattr */
- 0, /* tp_setattr */
- 0, /* tp_compare */
- 0, /* tp_repr */
- 0, /* tp_as_number */
- 0, /* tp_as_sequence */
- 0, /* tp_as_mapping */
- 0, /* tp_hash */
- scanner_call, /* tp_call */
- 0, /* tp_str */
- 0,/* PyObject_GenericGetAttr, */ /* tp_getattro */
- 0,/* PyObject_GenericSetAttr, */ /* tp_setattro */
- 0, /* tp_as_buffer */
- Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC, /* tp_flags */
- scanner_doc, /* tp_doc */
- scanner_traverse, /* tp_traverse */
- scanner_clear, /* tp_clear */
- 0, /* tp_richcompare */
- 0, /* tp_weaklistoffset */
- 0, /* tp_iter */
- 0, /* tp_iternext */
- 0, /* tp_methods */
- scanner_members, /* tp_members */
- 0, /* tp_getset */
- 0, /* tp_base */
- 0, /* tp_dict */
- 0, /* tp_descr_get */
- 0, /* tp_descr_set */
- 0, /* tp_dictoffset */
- scanner_init, /* tp_init */
- 0,/* PyType_GenericAlloc, */ /* tp_alloc */
- scanner_new, /* tp_new */
- 0,/* PyObject_GC_Del, */ /* tp_free */
-};
-
-static PyObject *
-encoder_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
-{
- PyEncoderObject *s;
- s = (PyEncoderObject *)type->tp_alloc(type, 0);
- if (s != NULL) {
- s->markers = NULL;
- s->defaultfn = NULL;
- s->encoder = NULL;
- s->indent = NULL;
- s->key_separator = NULL;
- s->item_separator = NULL;
- s->sort_keys = NULL;
- s->skipkeys = NULL;
- }
- return (PyObject *)s;
-}
-
-static int
-encoder_init(PyObject *self, PyObject *args, PyObject *kwds)
-{
- /* initialize Encoder object */
- static char *kwlist[] = {"markers", "default", "encoder", "indent", "key_separator", "item_separator", "sort_keys", "skipkeys", "allow_nan", NULL};
-
- PyEncoderObject *s;
- PyObject *allow_nan;
-
- assert(PyEncoder_Check(self));
- s = (PyEncoderObject *)self;
-
- if (!PyArg_ParseTupleAndKeywords(args, kwds, "OOOOOOOOO:make_encoder", kwlist,
- &s->markers, &s->defaultfn, &s->encoder, &s->indent, &s->key_separator, &s->item_separator, &s->sort_keys, &s->skipkeys, &allow_nan))
- return -1;
-
- Py_INCREF(s->markers);
- Py_INCREF(s->defaultfn);
- Py_INCREF(s->encoder);
- Py_INCREF(s->indent);
- Py_INCREF(s->key_separator);
- Py_INCREF(s->item_separator);
- Py_INCREF(s->sort_keys);
- Py_INCREF(s->skipkeys);
- s->fast_encode = (PyCFunction_Check(s->encoder) && PyCFunction_GetFunction(s->encoder) == (PyCFunction)py_encode_basestring_ascii);
- s->allow_nan = PyObject_IsTrue(allow_nan);
- return 0;
-}
-
-static PyObject *
-encoder_call(PyObject *self, PyObject *args, PyObject *kwds)
-{
- /* Python callable interface to encode_listencode_obj */
- static char *kwlist[] = {"obj", "_current_indent_level", NULL};
- PyObject *obj;
- PyObject *rval;
- Py_ssize_t indent_level;
- PyEncoderObject *s;
- assert(PyEncoder_Check(self));
- s = (PyEncoderObject *)self;
- if (!PyArg_ParseTupleAndKeywords(args, kwds, "OO&:_iterencode", kwlist,
- &obj, _convertPyInt_AsSsize_t, &indent_level))
- return NULL;
- rval = PyList_New(0);
- if (rval == NULL)
- return NULL;
- if (encoder_listencode_obj(s, rval, obj, indent_level)) {
- Py_DECREF(rval);
- return NULL;
- }
- return rval;
-}
-
-static PyObject *
-_encoded_const(PyObject *obj)
-{
- /* Return the JSON string representation of None, True, False */
- if (obj == Py_None) {
- static PyObject *s_null = NULL;
- if (s_null == NULL) {
- s_null = PyString_InternFromString("null");
- }
- Py_INCREF(s_null);
- return s_null;
- }
- else if (obj == Py_True) {
- static PyObject *s_true = NULL;
- if (s_true == NULL) {
- s_true = PyString_InternFromString("true");
- }
- Py_INCREF(s_true);
- return s_true;
- }
- else if (obj == Py_False) {
- static PyObject *s_false = NULL;
- if (s_false == NULL) {
- s_false = PyString_InternFromString("false");
- }
- Py_INCREF(s_false);
- return s_false;
- }
- else {
- PyErr_SetString(PyExc_ValueError, "not a const");
- return NULL;
- }
-}
-
-static PyObject *
-encoder_encode_float(PyEncoderObject *s, PyObject *obj)
-{
- /* Return the JSON representation of a PyFloat */
- double i = PyFloat_AS_DOUBLE(obj);
- if (!Py_IS_FINITE(i)) {
- if (!s->allow_nan) {
- PyErr_SetString(PyExc_ValueError, "Out of range float values are not JSON compliant");
- return NULL;
- }
- if (i > 0) {
- return PyString_FromString("Infinity");
- }
- else if (i < 0) {
- return PyString_FromString("-Infinity");
- }
- else {
- return PyString_FromString("NaN");
- }
- }
- /* Use a better float format here? */
- return PyObject_Repr(obj);
-}
-
-static PyObject *
-encoder_encode_string(PyEncoderObject *s, PyObject *obj)
-{
- /* Return the JSON representation of a string */
- if (s->fast_encode)
- return py_encode_basestring_ascii(NULL, obj);
- else
- return PyObject_CallFunctionObjArgs(s->encoder, obj, NULL);
-}
-
-static int
-_steal_list_append(PyObject *lst, PyObject *stolen)
-{
- /* Append stolen and then decrement its reference count */
- int rval = PyList_Append(lst, stolen);
- Py_DECREF(stolen);
- return rval;
-}
-
-static int
-encoder_listencode_obj(PyEncoderObject *s, PyObject *rval, PyObject *obj, Py_ssize_t indent_level)
-{
- /* Encode Python object obj to a JSON term, rval is a PyList */
- PyObject *newobj;
- int rv;
-
- if (obj == Py_None || obj == Py_True || obj == Py_False) {
- PyObject *cstr = _encoded_const(obj);
- if (cstr == NULL)
- return -1;
- return _steal_list_append(rval, cstr);
- }
- else if (PyString_Check(obj) || PyUnicode_Check(obj))
- {
- PyObject *encoded = encoder_encode_string(s, obj);
- if (encoded == NULL)
- return -1;
- return _steal_list_append(rval, encoded);
- }
- else if (PyInt_Check(obj) || PyLong_Check(obj)) {
- PyObject *encoded = PyObject_Str(obj);
- if (encoded == NULL)
- return -1;
- return _steal_list_append(rval, encoded);
- }
- else if (PyFloat_Check(obj)) {
- PyObject *encoded = encoder_encode_float(s, obj);
- if (encoded == NULL)
- return -1;
- return _steal_list_append(rval, encoded);
- }
- else if (PyList_Check(obj) || PyTuple_Check(obj)) {
- return encoder_listencode_list(s, rval, obj, indent_level);
- }
- else if (PyDict_Check(obj)) {
- return encoder_listencode_dict(s, rval, obj, indent_level);
- }
- else {
- PyObject *ident = NULL;
- if (s->markers != Py_None) {
- int has_key;
- ident = PyLong_FromVoidPtr(obj);
- if (ident == NULL)
- return -1;
- has_key = PyDict_Contains(s->markers, ident);
- if (has_key) {
- if (has_key != -1)
- PyErr_SetString(PyExc_ValueError, "Circular reference detected");
- Py_DECREF(ident);
- return -1;
- }
- if (PyDict_SetItem(s->markers, ident, obj)) {
- Py_DECREF(ident);
- return -1;
- }
- }
- newobj = PyObject_CallFunctionObjArgs(s->defaultfn, obj, NULL);
- if (newobj == NULL) {
- Py_XDECREF(ident);
- return -1;
- }
- rv = encoder_listencode_obj(s, rval, newobj, indent_level);
- Py_DECREF(newobj);
- if (rv) {
- Py_XDECREF(ident);
- return -1;
- }
- if (ident != NULL) {
- if (PyDict_DelItem(s->markers, ident)) {
- Py_XDECREF(ident);
- return -1;
- }
- Py_XDECREF(ident);
- }
- return rv;
- }
-}
-
-static int
-encoder_listencode_dict(PyEncoderObject *s, PyObject *rval, PyObject *dct, Py_ssize_t indent_level)
-{
- /* Encode Python dict dct a JSON term, rval is a PyList */
- static PyObject *open_dict = NULL;
- static PyObject *close_dict = NULL;
- static PyObject *empty_dict = NULL;
- PyObject *kstr = NULL;
- PyObject *ident = NULL;
- PyObject *key, *value;
- Py_ssize_t pos;
- int skipkeys;
- Py_ssize_t idx;
-
- if (open_dict == NULL || close_dict == NULL || empty_dict == NULL) {
- open_dict = PyString_InternFromString("{");
- close_dict = PyString_InternFromString("}");
- empty_dict = PyString_InternFromString("{}");
- if (open_dict == NULL || close_dict == NULL || empty_dict == NULL)
- return -1;
- }
- if (PyDict_Size(dct) == 0)
- return PyList_Append(rval, empty_dict);
-
- if (s->markers != Py_None) {
- int has_key;
- ident = PyLong_FromVoidPtr(dct);
- if (ident == NULL)
- goto bail;
- has_key = PyDict_Contains(s->markers, ident);
- if (has_key) {
- if (has_key != -1)
- PyErr_SetString(PyExc_ValueError, "Circular reference detected");
- goto bail;
- }
- if (PyDict_SetItem(s->markers, ident, dct)) {
- goto bail;
- }
- }
-
- if (PyList_Append(rval, open_dict))
- goto bail;
-
- if (s->indent != Py_None) {
- /* TODO: DOES NOT RUN */
- indent_level += 1;
- /*
- newline_indent = '\n' + (' ' * (_indent * _current_indent_level))
- separator = _item_separator + newline_indent
- buf += newline_indent
- */
- }
-
- /* TODO: C speedup not implemented for sort_keys */
-
- pos = 0;
- skipkeys = PyObject_IsTrue(s->skipkeys);
- idx = 0;
- while (PyDict_Next(dct, &pos, &key, &value)) {
- PyObject *encoded;
-
- if (PyString_Check(key) || PyUnicode_Check(key)) {
- Py_INCREF(key);
- kstr = key;
- }
- else if (PyFloat_Check(key)) {
- kstr = encoder_encode_float(s, key);
- if (kstr == NULL)
- goto bail;
- }
- else if (PyInt_Check(key) || PyLong_Check(key)) {
- kstr = PyObject_Str(key);
- if (kstr == NULL)
- goto bail;
- }
- else if (key == Py_True || key == Py_False || key == Py_None) {
- kstr = _encoded_const(key);
- if (kstr == NULL)
- goto bail;
- }
- else if (skipkeys) {
- continue;
- }
- else {
- /* TODO: include repr of key */
- PyErr_SetString(PyExc_ValueError, "keys must be a string");
- goto bail;
- }
-
- if (idx) {
- if (PyList_Append(rval, s->item_separator))
- goto bail;
- }
-
- encoded = encoder_encode_string(s, kstr);
- Py_CLEAR(kstr);
- if (encoded == NULL)
- goto bail;
- if (PyList_Append(rval, encoded)) {
- Py_DECREF(encoded);
- goto bail;
- }
- Py_DECREF(encoded);
- if (PyList_Append(rval, s->key_separator))
- goto bail;
- if (encoder_listencode_obj(s, rval, value, indent_level))
- goto bail;
- idx += 1;
- }
- if (ident != NULL) {
- if (PyDict_DelItem(s->markers, ident))
- goto bail;
- Py_CLEAR(ident);
- }
- if (s->indent != Py_None) {
- /* TODO: DOES NOT RUN */
- indent_level -= 1;
- /*
- yield '\n' + (' ' * (_indent * _current_indent_level))
- */
- }
- if (PyList_Append(rval, close_dict))
- goto bail;
- return 0;
-
-bail:
- Py_XDECREF(kstr);
- Py_XDECREF(ident);
- return -1;
-}
-
-
-static int
-encoder_listencode_list(PyEncoderObject *s, PyObject *rval, PyObject *seq, Py_ssize_t indent_level)
-{
- /* Encode Python list seq to a JSON term, rval is a PyList */
- static PyObject *open_array = NULL;
- static PyObject *close_array = NULL;
- static PyObject *empty_array = NULL;
- PyObject *ident = NULL;
- PyObject *s_fast = NULL;
- Py_ssize_t num_items;
- PyObject **seq_items;
- Py_ssize_t i;
-
- if (open_array == NULL || close_array == NULL || empty_array == NULL) {
- open_array = PyString_InternFromString("[");
- close_array = PyString_InternFromString("]");
- empty_array = PyString_InternFromString("[]");
- if (open_array == NULL || close_array == NULL || empty_array == NULL)
- return -1;
- }
- ident = NULL;
- s_fast = PySequence_Fast(seq, "_iterencode_list needs a sequence");
- if (s_fast == NULL)
- return -1;
- num_items = PySequence_Fast_GET_SIZE(s_fast);
- if (num_items == 0) {
- Py_DECREF(s_fast);
- return PyList_Append(rval, empty_array);
- }
-
- if (s->markers != Py_None) {
- int has_key;
- ident = PyLong_FromVoidPtr(seq);
- if (ident == NULL)
- goto bail;
- has_key = PyDict_Contains(s->markers, ident);
- if (has_key) {
- if (has_key != -1)
- PyErr_SetString(PyExc_ValueError, "Circular reference detected");
- goto bail;
- }
- if (PyDict_SetItem(s->markers, ident, seq)) {
- goto bail;
- }
- }
-
- seq_items = PySequence_Fast_ITEMS(s_fast);
- if (PyList_Append(rval, open_array))
- goto bail;
- if (s->indent != Py_None) {
- /* TODO: DOES NOT RUN */
- indent_level += 1;
- /*
- newline_indent = '\n' + (' ' * (_indent * _current_indent_level))
- separator = _item_separator + newline_indent
- buf += newline_indent
- */
- }
- for (i = 0; i < num_items; i++) {
- PyObject *obj = seq_items[i];
- if (i) {
- if (PyList_Append(rval, s->item_separator))
- goto bail;
- }
- if (encoder_listencode_obj(s, rval, obj, indent_level))
- goto bail;
- }
- if (ident != NULL) {
- if (PyDict_DelItem(s->markers, ident))
- goto bail;
- Py_CLEAR(ident);
- }
- if (s->indent != Py_None) {
- /* TODO: DOES NOT RUN */
- indent_level -= 1;
- /*
- yield '\n' + (' ' * (_indent * _current_indent_level))
- */
- }
- if (PyList_Append(rval, close_array))
- goto bail;
- Py_DECREF(s_fast);
- return 0;
-
-bail:
- Py_XDECREF(ident);
- Py_DECREF(s_fast);
- return -1;
-}
-
-static void
-encoder_dealloc(PyObject *self)
-{
- /* Deallocate Encoder */
- encoder_clear(self);
- Py_TYPE(self)->tp_free(self);
-}
-
-static int
-encoder_traverse(PyObject *self, visitproc visit, void *arg)
-{
- PyEncoderObject *s;
- assert(PyEncoder_Check(self));
- s = (PyEncoderObject *)self;
- Py_VISIT(s->markers);
- Py_VISIT(s->defaultfn);
- Py_VISIT(s->encoder);
- Py_VISIT(s->indent);
- Py_VISIT(s->key_separator);
- Py_VISIT(s->item_separator);
- Py_VISIT(s->sort_keys);
- Py_VISIT(s->skipkeys);
- return 0;
-}
-
-static int
-encoder_clear(PyObject *self)
-{
- /* Deallocate Encoder */
- PyEncoderObject *s;
- assert(PyEncoder_Check(self));
- s = (PyEncoderObject *)self;
- Py_CLEAR(s->markers);
- Py_CLEAR(s->defaultfn);
- Py_CLEAR(s->encoder);
- Py_CLEAR(s->indent);
- Py_CLEAR(s->key_separator);
- Py_CLEAR(s->item_separator);
- Py_CLEAR(s->sort_keys);
- Py_CLEAR(s->skipkeys);
- return 0;
-}
-
-PyDoc_STRVAR(encoder_doc, "_iterencode(obj, _current_indent_level) -> iterable");
-
-static
-PyTypeObject PyEncoderType = {
- PyObject_HEAD_INIT(NULL)
- 0, /* tp_internal */
- "simplejson._speedups.Encoder", /* tp_name */
- sizeof(PyEncoderObject), /* tp_basicsize */
- 0, /* tp_itemsize */
- encoder_dealloc, /* tp_dealloc */
- 0, /* tp_print */
- 0, /* tp_getattr */
- 0, /* tp_setattr */
- 0, /* tp_compare */
- 0, /* tp_repr */
- 0, /* tp_as_number */
- 0, /* tp_as_sequence */
- 0, /* tp_as_mapping */
- 0, /* tp_hash */
- encoder_call, /* tp_call */
- 0, /* tp_str */
- 0, /* tp_getattro */
- 0, /* tp_setattro */
- 0, /* tp_as_buffer */
- Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC, /* tp_flags */
- encoder_doc, /* tp_doc */
- encoder_traverse, /* tp_traverse */
- encoder_clear, /* tp_clear */
- 0, /* tp_richcompare */
- 0, /* tp_weaklistoffset */
- 0, /* tp_iter */
- 0, /* tp_iternext */
- 0, /* tp_methods */
- encoder_members, /* tp_members */
- 0, /* tp_getset */
- 0, /* tp_base */
- 0, /* tp_dict */
- 0, /* tp_descr_get */
- 0, /* tp_descr_set */
- 0, /* tp_dictoffset */
- encoder_init, /* tp_init */
- 0, /* tp_alloc */
- encoder_new, /* tp_new */
- 0, /* tp_free */
-};
-
-static PyMethodDef speedups_methods[] = {
- {"encode_basestring_ascii",
- (PyCFunction)py_encode_basestring_ascii,
- METH_O,
- pydoc_encode_basestring_ascii},
- {"scanstring",
- (PyCFunction)py_scanstring,
- METH_VARARGS,
- pydoc_scanstring},
- {NULL, NULL, 0, NULL}
-};
-
-PyDoc_STRVAR(module_doc,
-"simplejson speedups\n");
-
-void
-init_speedups(void)
-{
- PyObject *m;
- PyScannerType.tp_new = PyType_GenericNew;
- if (PyType_Ready(&PyScannerType) < 0)
- return;
- PyEncoderType.tp_new = PyType_GenericNew;
- if (PyType_Ready(&PyEncoderType) < 0)
- return;
- m = Py_InitModule3("_speedups", speedups_methods, module_doc);
- Py_INCREF((PyObject*)&PyScannerType);
- PyModule_AddObject(m, "make_scanner", (PyObject*)&PyScannerType);
- Py_INCREF((PyObject*)&PyEncoderType);
- PyModule_AddObject(m, "make_encoder", (PyObject*)&PyEncoderType);
-}
diff --git a/lang/py/lib/simplejson/decoder.py b/lang/py/lib/simplejson/decoder.py
deleted file mode 100644
index b769ea4..0000000
--- a/lang/py/lib/simplejson/decoder.py
+++ /dev/null
@@ -1,354 +0,0 @@
-"""Implementation of JSONDecoder
-"""
-import re
-import sys
-import struct
-
-from simplejson.scanner import make_scanner
-try:
- from simplejson._speedups import scanstring as c_scanstring
-except ImportError:
- c_scanstring = None
-
-__all__ = ['JSONDecoder']
-
-FLAGS = re.VERBOSE | re.MULTILINE | re.DOTALL
-
-def _floatconstants():
- _BYTES = '7FF80000000000007FF0000000000000'.decode('hex')
- if sys.byteorder != 'big':
- _BYTES = _BYTES[:8][::-1] + _BYTES[8:][::-1]
- nan, inf = struct.unpack('dd', _BYTES)
- return nan, inf, -inf
-
-NaN, PosInf, NegInf = _floatconstants()
-
-
-def linecol(doc, pos):
- lineno = doc.count('\n', 0, pos) + 1
- if lineno == 1:
- colno = pos
- else:
- colno = pos - doc.rindex('\n', 0, pos)
- return lineno, colno
-
-
-def errmsg(msg, doc, pos, end=None):
- # Note that this function is called from _speedups
- lineno, colno = linecol(doc, pos)
- if end is None:
- #fmt = '{0}: line {1} column {2} (char {3})'
- #return fmt.format(msg, lineno, colno, pos)
- fmt = '%s: line %d column %d (char %d)'
- return fmt % (msg, lineno, colno, pos)
- endlineno, endcolno = linecol(doc, end)
- #fmt = '{0}: line {1} column {2} - line {3} column {4} (char {5} - {6})'
- #return fmt.format(msg, lineno, colno, endlineno, endcolno, pos, end)
- fmt = '%s: line %d column %d - line %d column %d (char %d - %d)'
- return fmt % (msg, lineno, colno, endlineno, endcolno, pos, end)
-
-
-_CONSTANTS = {
- '-Infinity': NegInf,
- 'Infinity': PosInf,
- 'NaN': NaN,
-}
-
-STRINGCHUNK = re.compile(r'(.*?)(["\\\x00-\x1f])', FLAGS)
-BACKSLASH = {
- '"': u'"', '\\': u'\\', '/': u'/',
- 'b': u'\b', 'f': u'\f', 'n': u'\n', 'r': u'\r', 't': u'\t',
-}
-
-DEFAULT_ENCODING = "utf-8"
-
-def py_scanstring(s, end, encoding=None, strict=True, _b=BACKSLASH, _m=STRINGCHUNK.match):
- """Scan the string s for a JSON string. End is the index of the
- character in s after the quote that started the JSON string.
- Unescapes all valid JSON string escape sequences and raises ValueError
- on attempt to decode an invalid string. If strict is False then literal
- control characters are allowed in the string.
-
- Returns a tuple of the decoded string and the index of the character in s
- after the end quote."""
- if encoding is None:
- encoding = DEFAULT_ENCODING
- chunks = []
- _append = chunks.append
- begin = end - 1
- while 1:
- chunk = _m(s, end)
- if chunk is None:
- raise ValueError(
- errmsg("Unterminated string starting at", s, begin))
- end = chunk.end()
- content, terminator = chunk.groups()
- # Content is contains zero or more unescaped string characters
- if content:
- if not isinstance(content, unicode):
- content = unicode(content, encoding)
- _append(content)
- # Terminator is the end of string, a literal control character,
- # or a backslash denoting that an escape sequence follows
- if terminator == '"':
- break
- elif terminator != '\\':
- if strict:
- msg = "Invalid control character %r at" % (terminator,)
- #msg = "Invalid control character {0!r} at".format(terminator)
- raise ValueError(errmsg(msg, s, end))
- else:
- _append(terminator)
- continue
- try:
- esc = s[end]
- except IndexError:
- raise ValueError(
- errmsg("Unterminated string starting at", s, begin))
- # If not a unicode escape sequence, must be in the lookup table
- if esc != 'u':
- try:
- char = _b[esc]
- except KeyError:
- msg = "Invalid \\escape: " + repr(esc)
- raise ValueError(errmsg(msg, s, end))
- end += 1
- else:
- # Unicode escape sequence
- esc = s[end + 1:end + 5]
- next_end = end + 5
- if len(esc) != 4:
- msg = "Invalid \\uXXXX escape"
- raise ValueError(errmsg(msg, s, end))
- uni = int(esc, 16)
- # Check for surrogate pair on UCS-4 systems
- if 0xd800 <= uni <= 0xdbff and sys.maxunicode > 65535:
- msg = "Invalid \\uXXXX\\uXXXX surrogate pair"
- if not s[end + 5:end + 7] == '\\u':
- raise ValueError(errmsg(msg, s, end))
- esc2 = s[end + 7:end + 11]
- if len(esc2) != 4:
- raise ValueError(errmsg(msg, s, end))
- uni2 = int(esc2, 16)
- uni = 0x10000 + (((uni - 0xd800) << 10) | (uni2 - 0xdc00))
- next_end += 6
- char = unichr(uni)
- end = next_end
- # Append the unescaped character
- _append(char)
- return u''.join(chunks), end
-
-
-# Use speedup if available
-scanstring = c_scanstring or py_scanstring
-
-WHITESPACE = re.compile(r'[ \t\n\r]*', FLAGS)
-WHITESPACE_STR = ' \t\n\r'
-
-def JSONObject((s, end), encoding, strict, scan_once, object_hook, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
- pairs = {}
- # Use a slice to prevent IndexError from being raised, the following
- # check will raise a more specific ValueError if the string is empty
- nextchar = s[end:end + 1]
- # Normally we expect nextchar == '"'
- if nextchar != '"':
- if nextchar in _ws:
- end = _w(s, end).end()
- nextchar = s[end:end + 1]
- # Trivial empty object
- if nextchar == '}':
- return pairs, end + 1
- elif nextchar != '"':
- raise ValueError(errmsg("Expecting property name", s, end))
- end += 1
- while True:
- key, end = scanstring(s, end, encoding, strict)
-
- # To skip some function call overhead we optimize the fast paths where
- # the JSON key separator is ": " or just ":".
- if s[end:end + 1] != ':':
- end = _w(s, end).end()
- if s[end:end + 1] != ':':
- raise ValueError(errmsg("Expecting : delimiter", s, end))
-
- end += 1
-
- try:
- if s[end] in _ws:
- end += 1
- if s[end] in _ws:
- end = _w(s, end + 1).end()
- except IndexError:
- pass
-
- try:
- value, end = scan_once(s, end)
- except StopIteration:
- raise ValueError(errmsg("Expecting object", s, end))
- pairs[key] = value
-
- try:
- nextchar = s[end]
- if nextchar in _ws:
- end = _w(s, end + 1).end()
- nextchar = s[end]
- except IndexError:
- nextchar = ''
- end += 1
-
- if nextchar == '}':
- break
- elif nextchar != ',':
- raise ValueError(errmsg("Expecting , delimiter", s, end - 1))
-
- try:
- nextchar = s[end]
- if nextchar in _ws:
- end += 1
- nextchar = s[end]
- if nextchar in _ws:
- end = _w(s, end + 1).end()
- nextchar = s[end]
- except IndexError:
- nextchar = ''
-
- end += 1
- if nextchar != '"':
- raise ValueError(errmsg("Expecting property name", s, end - 1))
-
- if object_hook is not None:
- pairs = object_hook(pairs)
- return pairs, end
-
-def JSONArray((s, end), scan_once, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
- values = []
- nextchar = s[end:end + 1]
- if nextchar in _ws:
- end = _w(s, end + 1).end()
- nextchar = s[end:end + 1]
- # Look-ahead for trivial empty array
- if nextchar == ']':
- return values, end + 1
- _append = values.append
- while True:
- try:
- value, end = scan_once(s, end)
- except StopIteration:
- raise ValueError(errmsg("Expecting object", s, end))
- _append(value)
- nextchar = s[end:end + 1]
- if nextchar in _ws:
- end = _w(s, end + 1).end()
- nextchar = s[end:end + 1]
- end += 1
- if nextchar == ']':
- break
- elif nextchar != ',':
- raise ValueError(errmsg("Expecting , delimiter", s, end))
-
- try:
- if s[end] in _ws:
- end += 1
- if s[end] in _ws:
- end = _w(s, end + 1).end()
- except IndexError:
- pass
-
- return values, end
-
-class JSONDecoder(object):
- """Simple JSON <http://json.org> decoder
-
- Performs the following translations in decoding by default:
-
- +---------------+-------------------+
- | JSON | Python |
- +===============+===================+
- | object | dict |
- +---------------+-------------------+
- | array | list |
- +---------------+-------------------+
- | string | unicode |
- +---------------+-------------------+
- | number (int) | int, long |
- +---------------+-------------------+
- | number (real) | float |
- +---------------+-------------------+
- | true | True |
- +---------------+-------------------+
- | false | False |
- +---------------+-------------------+
- | null | None |
- +---------------+-------------------+
-
- It also understands ``NaN``, ``Infinity``, and ``-Infinity`` as
- their corresponding ``float`` values, which is outside the JSON spec.
-
- """
-
- def __init__(self, encoding=None, object_hook=None, parse_float=None,
- parse_int=None, parse_constant=None, strict=True):
- """``encoding`` determines the encoding used to interpret any ``str``
- objects decoded by this instance (utf-8 by default). It has no
- effect when decoding ``unicode`` objects.
-
- Note that currently only encodings that are a superset of ASCII work,
- strings of other encodings should be passed in as ``unicode``.
-
- ``object_hook``, if specified, will be called with the result
- of every JSON object decoded and its return value will be used in
- place of the given ``dict``. This can be used to provide custom
- deserializations (e.g. to support JSON-RPC class hinting).
-
- ``parse_float``, if specified, will be called with the string
- of every JSON float to be decoded. By default this is equivalent to
- float(num_str). This can be used to use another datatype or parser
- for JSON floats (e.g. decimal.Decimal).
-
- ``parse_int``, if specified, will be called with the string
- of every JSON int to be decoded. By default this is equivalent to
- int(num_str). This can be used to use another datatype or parser
- for JSON integers (e.g. float).
-
- ``parse_constant``, if specified, will be called with one of the
- following strings: -Infinity, Infinity, NaN.
- This can be used to raise an exception if invalid JSON numbers
- are encountered.
-
- """
- self.encoding = encoding
- self.object_hook = object_hook
- self.parse_float = parse_float or float
- self.parse_int = parse_int or int
- self.parse_constant = parse_constant or _CONSTANTS.__getitem__
- self.strict = strict
- self.parse_object = JSONObject
- self.parse_array = JSONArray
- self.parse_string = scanstring
- self.scan_once = make_scanner(self)
-
- def decode(self, s, _w=WHITESPACE.match):
- """Return the Python representation of ``s`` (a ``str`` or ``unicode``
- instance containing a JSON document)
-
- """
- obj, end = self.raw_decode(s, idx=_w(s, 0).end())
- end = _w(s, end).end()
- if end != len(s):
- raise ValueError(errmsg("Extra data", s, end, len(s)))
- return obj
-
- def raw_decode(self, s, idx=0):
- """Decode a JSON document from ``s`` (a ``str`` or ``unicode`` beginning
- with a JSON document) and return a 2-tuple of the Python
- representation and the index in ``s`` where the document ended.
-
- This can be used to decode a JSON document from a string that may
- have extraneous data at the end.
-
- """
- try:
- obj, end = self.scan_once(s, idx)
- except StopIteration:
- raise ValueError("No JSON object could be decoded")
- return obj, end
diff --git a/lang/py/lib/simplejson/encoder.py b/lang/py/lib/simplejson/encoder.py
deleted file mode 100644
index cf58290..0000000
--- a/lang/py/lib/simplejson/encoder.py
+++ /dev/null
@@ -1,440 +0,0 @@
-"""Implementation of JSONEncoder
-"""
-import re
-
-try:
- from simplejson._speedups import encode_basestring_ascii as c_encode_basestring_ascii
-except ImportError:
- c_encode_basestring_ascii = None
-try:
- from simplejson._speedups import make_encoder as c_make_encoder
-except ImportError:
- c_make_encoder = None
-
-ESCAPE = re.compile(r'[\x00-\x1f\\"\b\f\n\r\t]')
-ESCAPE_ASCII = re.compile(r'([\\"]|[^\ -~])')
-HAS_UTF8 = re.compile(r'[\x80-\xff]')
-ESCAPE_DCT = {
- '\\': '\\\\',
- '"': '\\"',
- '\b': '\\b',
- '\f': '\\f',
- '\n': '\\n',
- '\r': '\\r',
- '\t': '\\t',
-}
-for i in range(0x20):
- #ESCAPE_DCT.setdefault(chr(i), '\\u{0:04x}'.format(i))
- ESCAPE_DCT.setdefault(chr(i), '\\u%04x' % (i,))
-
-# Assume this produces an infinity on all machines (probably not guaranteed)
-INFINITY = float('1e66666')
-FLOAT_REPR = repr
-
-def encode_basestring(s):
- """Return a JSON representation of a Python string
-
- """
- def replace(match):
- return ESCAPE_DCT[match.group(0)]
- return '"' + ESCAPE.sub(replace, s) + '"'
-
-
-def py_encode_basestring_ascii(s):
- """Return an ASCII-only JSON representation of a Python string
-
- """
- if isinstance(s, str) and HAS_UTF8.search(s) is not None:
- s = s.decode('utf-8')
- def replace(match):
- s = match.group(0)
- try:
- return ESCAPE_DCT[s]
- except KeyError:
- n = ord(s)
- if n < 0x10000:
- #return '\\u{0:04x}'.format(n)
- return '\\u%04x' % (n,)
- else:
- # surrogate pair
- n -= 0x10000
- s1 = 0xd800 | ((n >> 10) & 0x3ff)
- s2 = 0xdc00 | (n & 0x3ff)
- #return '\\u{0:04x}\\u{1:04x}'.format(s1, s2)
- return '\\u%04x\\u%04x' % (s1, s2)
- return '"' + str(ESCAPE_ASCII.sub(replace, s)) + '"'
-
-
-encode_basestring_ascii = c_encode_basestring_ascii or py_encode_basestring_ascii
-
-class JSONEncoder(object):
- """Extensible JSON <http://json.org> encoder for Python data structures.
-
- Supports the following objects and types by default:
-
- +-------------------+---------------+
- | Python | JSON |
- +===================+===============+
- | dict | object |
- +-------------------+---------------+
- | list, tuple | array |
- +-------------------+---------------+
- | str, unicode | string |
- +-------------------+---------------+
- | int, long, float | number |
- +-------------------+---------------+
- | True | true |
- +-------------------+---------------+
- | False | false |
- +-------------------+---------------+
- | None | null |
- +-------------------+---------------+
-
- To extend this to recognize other objects, subclass and implement a
- ``.default()`` method with another method that returns a serializable
- object for ``o`` if possible, otherwise it should call the superclass
- implementation (to raise ``TypeError``).
-
- """
- item_separator = ', '
- key_separator = ': '
- def __init__(self, skipkeys=False, ensure_ascii=True,
- check_circular=True, allow_nan=True, sort_keys=False,
- indent=None, separators=None, encoding='utf-8', default=None):
- """Constructor for JSONEncoder, with sensible defaults.
-
- If skipkeys is false, then it is a TypeError to attempt
- encoding of keys that are not str, int, long, float or None. If
- skipkeys is True, such items are simply skipped.
-
- If ensure_ascii is true, the output is guaranteed to be str
- objects with all incoming unicode characters escaped. If
- ensure_ascii is false, the output will be unicode object.
-
- If check_circular is true, then lists, dicts, and custom encoded
- objects will be checked for circular references during encoding to
- prevent an infinite recursion (which would cause an OverflowError).
- Otherwise, no such check takes place.
-
- If allow_nan is true, then NaN, Infinity, and -Infinity will be
- encoded as such. This behavior is not JSON specification compliant,
- but is consistent with most JavaScript based encoders and decoders.
- Otherwise, it will be a ValueError to encode such floats.
-
- If sort_keys is true, then the output of dictionaries will be
- sorted by key; this is useful for regression tests to ensure
- that JSON serializations can be compared on a day-to-day basis.
-
- If indent is a non-negative integer, then JSON array
- elements and object members will be pretty-printed with that
- indent level. An indent level of 0 will only insert newlines.
- None is the most compact representation.
-
- If specified, separators should be a (item_separator, key_separator)
- tuple. The default is (', ', ': '). To get the most compact JSON
- representation you should specify (',', ':') to eliminate whitespace.
-
- If specified, default is a function that gets called for objects
- that can't otherwise be serialized. It should return a JSON encodable
- version of the object or raise a ``TypeError``.
-
- If encoding is not None, then all input strings will be
- transformed into unicode using that encoding prior to JSON-encoding.
- The default is UTF-8.
-
- """
-
- self.skipkeys = skipkeys
- self.ensure_ascii = ensure_ascii
- self.check_circular = check_circular
- self.allow_nan = allow_nan
- self.sort_keys = sort_keys
- self.indent = indent
- if separators is not None:
- self.item_separator, self.key_separator = separators
- if default is not None:
- self.default = default
- self.encoding = encoding
-
- def default(self, o):
- """Implement this method in a subclass such that it returns
- a serializable object for ``o``, or calls the base implementation
- (to raise a ``TypeError``).
-
- For example, to support arbitrary iterators, you could
- implement default like this::
-
- def default(self, o):
- try:
- iterable = iter(o)
- except TypeError:
- pass
- else:
- return list(iterable)
- return JSONEncoder.default(self, o)
-
- """
- raise TypeError(repr(o) + " is not JSON serializable")
-
- def encode(self, o):
- """Return a JSON string representation of a Python data structure.
-
- >>> JSONEncoder().encode({"foo": ["bar", "baz"]})
- '{"foo": ["bar", "baz"]}'
-
- """
- # This is for extremely simple cases and benchmarks.
- if isinstance(o, basestring):
- if isinstance(o, str):
- _encoding = self.encoding
- if (_encoding is not None
- and not (_encoding == 'utf-8')):
- o = o.decode(_encoding)
- if self.ensure_ascii:
- return encode_basestring_ascii(o)
- else:
- return encode_basestring(o)
- # This doesn't pass the iterator directly to ''.join() because the
- # exceptions aren't as detailed. The list call should be roughly
- # equivalent to the PySequence_Fast that ''.join() would do.
- chunks = self.iterencode(o, _one_shot=True)
- if not isinstance(chunks, (list, tuple)):
- chunks = list(chunks)
- return ''.join(chunks)
-
- def iterencode(self, o, _one_shot=False):
- """Encode the given object and yield each string
- representation as available.
-
- For example::
-
- for chunk in JSONEncoder().iterencode(bigobject):
- mysocket.write(chunk)
-
- """
- if self.check_circular:
- markers = {}
- else:
- markers = None
- if self.ensure_ascii:
- _encoder = encode_basestring_ascii
- else:
- _encoder = encode_basestring
- if self.encoding != 'utf-8':
- def _encoder(o, _orig_encoder=_encoder, _encoding=self.encoding):
- if isinstance(o, str):
- o = o.decode(_encoding)
- return _orig_encoder(o)
-
- def floatstr(o, allow_nan=self.allow_nan, _repr=FLOAT_REPR, _inf=INFINITY, _neginf=-INFINITY):
- # Check for specials. Note that this type of test is processor- and/or
- # platform-specific, so do tests which don't depend on the internals.
-
- if o != o:
- text = 'NaN'
- elif o == _inf:
- text = 'Infinity'
- elif o == _neginf:
- text = '-Infinity'
- else:
- return _repr(o)
-
- if not allow_nan:
- raise ValueError(
- "Out of range float values are not JSON compliant: " +
- repr(o))
-
- return text
-
-
- if _one_shot and c_make_encoder is not None and not self.indent and not self.sort_keys:
- _iterencode = c_make_encoder(
- markers, self.default, _encoder, self.indent,
- self.key_separator, self.item_separator, self.sort_keys,
- self.skipkeys, self.allow_nan)
- else:
- _iterencode = _make_iterencode(
- markers, self.default, _encoder, self.indent, floatstr,
- self.key_separator, self.item_separator, self.sort_keys,
- self.skipkeys, _one_shot)
- return _iterencode(o, 0)
-
-def _make_iterencode(markers, _default, _encoder, _indent, _floatstr, _key_separator, _item_separator, _sort_keys, _skipkeys, _one_shot,
- ## HACK: hand-optimized bytecode; turn globals into locals
- False=False,
- True=True,
- ValueError=ValueError,
- basestring=basestring,
- dict=dict,
- float=float,
- id=id,
- int=int,
- isinstance=isinstance,
- list=list,
- long=long,
- str=str,
- tuple=tuple,
- ):
-
- def _iterencode_list(lst, _current_indent_level):
- if not lst:
- yield '[]'
- return
- if markers is not None:
- markerid = id(lst)
- if markerid in markers:
- raise ValueError("Circular reference detected")
- markers[markerid] = lst
- buf = '['
- if _indent is not None:
- _current_indent_level += 1
- newline_indent = '\n' + (' ' * (_indent * _current_indent_level))
- separator = _item_separator + newline_indent
- buf += newline_indent
- else:
- newline_indent = None
- separator = _item_separator
- first = True
- for value in lst:
- if first:
- first = False
- else:
- buf = separator
- if isinstance(value, basestring):
- yield buf + _encoder(value)
- elif value is None:
- yield buf + 'null'
- elif value is True:
- yield buf + 'true'
- elif value is False:
- yield buf + 'false'
- elif isinstance(value, (int, long)):
- yield buf + str(value)
- elif isinstance(value, float):
- yield buf + _floatstr(value)
- else:
- yield buf
- if isinstance(value, (list, tuple)):
- chunks = _iterencode_list(value, _current_indent_level)
- elif isinstance(value, dict):
- chunks = _iterencode_dict(value, _current_indent_level)
- else:
- chunks = _iterencode(value, _current_indent_level)
- for chunk in chunks:
- yield chunk
- if newline_indent is not None:
- _current_indent_level -= 1
- yield '\n' + (' ' * (_indent * _current_indent_level))
- yield ']'
- if markers is not None:
- del markers[markerid]
-
- def _iterencode_dict(dct, _current_indent_level):
- if not dct:
- yield '{}'
- return
- if markers is not None:
- markerid = id(dct)
- if markerid in markers:
- raise ValueError("Circular reference detected")
- markers[markerid] = dct
- yield '{'
- if _indent is not None:
- _current_indent_level += 1
- newline_indent = '\n' + (' ' * (_indent * _current_indent_level))
- item_separator = _item_separator + newline_indent
- yield newline_indent
- else:
- newline_indent = None
- item_separator = _item_separator
- first = True
- if _sort_keys:
- items = dct.items()
- items.sort(key=lambda kv: kv[0])
- else:
- items = dct.iteritems()
- for key, value in items:
- if isinstance(key, basestring):
- pass
- # JavaScript is weakly typed for these, so it makes sense to
- # also allow them. Many encoders seem to do something like this.
- elif isinstance(key, float):
- key = _floatstr(key)
- elif key is True:
- key = 'true'
- elif key is False:
- key = 'false'
- elif key is None:
- key = 'null'
- elif isinstance(key, (int, long)):
- key = str(key)
- elif _skipkeys:
- continue
- else:
- raise TypeError("key " + repr(key) + " is not a string")
- if first:
- first = False
- else:
- yield item_separator
- yield _encoder(key)
- yield _key_separator
- if isinstance(value, basestring):
- yield _encoder(value)
- elif value is None:
- yield 'null'
- elif value is True:
- yield 'true'
- elif value is False:
- yield 'false'
- elif isinstance(value, (int, long)):
- yield str(value)
- elif isinstance(value, float):
- yield _floatstr(value)
- else:
- if isinstance(value, (list, tuple)):
- chunks = _iterencode_list(value, _current_indent_level)
- elif isinstance(value, dict):
- chunks = _iterencode_dict(value, _current_indent_level)
- else:
- chunks = _iterencode(value, _current_indent_level)
- for chunk in chunks:
- yield chunk
- if newline_indent is not None:
- _current_indent_level -= 1
- yield '\n' + (' ' * (_indent * _current_indent_level))
- yield '}'
- if markers is not None:
- del markers[markerid]
-
- def _iterencode(o, _current_indent_level):
- if isinstance(o, basestring):
- yield _encoder(o)
- elif o is None:
- yield 'null'
- elif o is True:
- yield 'true'
- elif o is False:
- yield 'false'
- elif isinstance(o, (int, long)):
- yield str(o)
- elif isinstance(o, float):
- yield _floatstr(o)
- elif isinstance(o, (list, tuple)):
- for chunk in _iterencode_list(o, _current_indent_level):
- yield chunk
- elif isinstance(o, dict):
- for chunk in _iterencode_dict(o, _current_indent_level):
- yield chunk
- else:
- if markers is not None:
- markerid = id(o)
- if markerid in markers:
- raise ValueError("Circular reference detected")
- markers[markerid] = o
- o = _default(o)
- for chunk in _iterencode(o, _current_indent_level):
- yield chunk
- if markers is not None:
- del markers[markerid]
-
- return _iterencode
diff --git a/lang/py/lib/simplejson/scanner.py b/lang/py/lib/simplejson/scanner.py
deleted file mode 100644
index adbc6ec..0000000
--- a/lang/py/lib/simplejson/scanner.py
+++ /dev/null
@@ -1,65 +0,0 @@
-"""JSON token scanner
-"""
-import re
-try:
- from simplejson._speedups import make_scanner as c_make_scanner
-except ImportError:
- c_make_scanner = None
-
-__all__ = ['make_scanner']
-
-NUMBER_RE = re.compile(
- r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
- (re.VERBOSE | re.MULTILINE | re.DOTALL))
-
-def py_make_scanner(context):
- parse_object = context.parse_object
- parse_array = context.parse_array
- parse_string = context.parse_string
- match_number = NUMBER_RE.match
- encoding = context.encoding
- strict = context.strict
- parse_float = context.parse_float
- parse_int = context.parse_int
- parse_constant = context.parse_constant
- object_hook = context.object_hook
-
- def _scan_once(string, idx):
- try:
- nextchar = string[idx]
- except IndexError:
- raise StopIteration
-
- if nextchar == '"':
- return parse_string(string, idx + 1, encoding, strict)
- elif nextchar == '{':
- return parse_object((string, idx + 1), encoding, strict, _scan_once, object_hook)
- elif nextchar == '[':
- return parse_array((string, idx + 1), _scan_once)
- elif nextchar == 'n' and string[idx:idx + 4] == 'null':
- return None, idx + 4
- elif nextchar == 't' and string[idx:idx + 4] == 'true':
- return True, idx + 4
- elif nextchar == 'f' and string[idx:idx + 5] == 'false':
- return False, idx + 5
-
- m = match_number(string, idx)
- if m is not None:
- integer, frac, exp = m.groups()
- if frac or exp:
- res = parse_float(integer + (frac or '') + (exp or ''))
- else:
- res = parse_int(integer)
- return res, m.end()
- elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
- return parse_constant('NaN'), idx + 3
- elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
- return parse_constant('Infinity'), idx + 8
- elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
- return parse_constant('-Infinity'), idx + 9
- else:
- raise StopIteration
-
- return _scan_once
-
-make_scanner = c_make_scanner or py_make_scanner
diff --git a/lang/py/lib/simplejson/tool.py b/lang/py/lib/simplejson/tool.py
deleted file mode 100644
index 9044331..0000000
--- a/lang/py/lib/simplejson/tool.py
+++ /dev/null
@@ -1,37 +0,0 @@
-r"""Command-line tool to validate and pretty-print JSON
-
-Usage::
-
- $ echo '{"json":"obj"}' | python -m simplejson.tool
- {
- "json": "obj"
- }
- $ echo '{ 1.2:3.4}' | python -m simplejson.tool
- Expecting property name: line 1 column 2 (char 2)
-
-"""
-import sys
-import simplejson
-
-def main():
- if len(sys.argv) == 1:
- infile = sys.stdin
- outfile = sys.stdout
- elif len(sys.argv) == 2:
- infile = open(sys.argv[1], 'rb')
- outfile = sys.stdout
- elif len(sys.argv) == 3:
- infile = open(sys.argv[1], 'rb')
- outfile = open(sys.argv[2], 'wb')
- else:
- raise SystemExit(sys.argv[0] + " [infile [outfile]]")
- try:
- obj = simplejson.load(infile)
- except ValueError, e:
- raise SystemExit(e)
- simplejson.dump(obj, outfile, sort_keys=True, indent=4)
- outfile.write('\n')
-
-
-if __name__ == '__main__':
- main()
diff --git a/lang/py/src/avro/schema.py b/lang/py/src/avro/schema.py
index 86ce86a..f946d0a 100644
--- a/lang/py/src/avro/schema.py
+++ b/lang/py/src/avro/schema.py
@@ -385,13 +385,13 @@ class Field(object):
#
class PrimitiveSchema(Schema):
"""Valid primitive types are in PRIMITIVE_TYPES."""
- def __init__(self, type):
+ def __init__(self, type, other_props=None):
# Ensure valid ctor args
if type not in PRIMITIVE_TYPES:
raise AvroException("%s is not a valid primitive type." % type)
# Call parent ctor
- Schema.__init__(self, type)
+ Schema.__init__(self, type, other_props=other_props)
self.fullname = type
@@ -723,7 +723,7 @@ def make_avsc_object(json_data, names=None):
type = json_data.get('type')
other_props = get_other_props(json_data, SCHEMA_RESERVED_PROPS)
if type in PRIMITIVE_TYPES:
- return PrimitiveSchema(type)
+ return PrimitiveSchema(type, other_props)
elif type in NAMED_TYPES:
name = json_data.get('name')
namespace = json_data.get('namespace', names.default_namespace)
diff --git a/lang/py/src/avro/tether/__init__.py b/lang/py/src/avro/tether/__init__.py
new file mode 100644
index 0000000..458c692
--- /dev/null
+++ b/lang/py/src/avro/tether/__init__.py
@@ -0,0 +1,7 @@
+from .util import *
+from .tether_task import *
+from .tether_task_runner import *
+
+__all__=util.__all__
+__all__+=tether_task.__all__
+__all__+=tether_task_runner.__all__
diff --git a/lang/py/src/avro/tether/tether_task.py b/lang/py/src/avro/tether/tether_task.py
new file mode 100644
index 0000000..90a8788
--- /dev/null
+++ b/lang/py/src/avro/tether/tether_task.py
@@ -0,0 +1,498 @@
+"""
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+"""
+
+__all__=["TetherTask","TaskType","inputProtocol","outputProtocol","HTTPRequestor"]
+
+from avro import schema, protocol
+from avro import io as avio
+from avro import ipc
+
+import io as pyio
+import sys
+import os
+import traceback
+import logging
+import collections
+from StringIO import StringIO
+import threading
+
+
+# create protocol objects for the input and output protocols
+# The build process should copy InputProtocol.avpr and OutputProtocol.avpr
+# into the same directory as this module
+inputProtocol=None
+outputProtocol=None
+
+TaskType=None
+if (inputProtocol is None):
+ pfile=os.path.split(__file__)[0]+os.sep+"InputProtocol.avpr"
+
+ if not(os.path.exists(pfile)):
+ raise Exception("Could not locate the InputProtocol: {0} does not exist".format(pfile))
+
+ with file(pfile,'r') as hf:
+ prototxt=hf.read()
+
+ inputProtocol=protocol.parse(prototxt)
+
+ # use a named tuple to represent the tasktype enumeration
+ taskschema=inputProtocol.types_dict["TaskType"]
+ _ttype=collections.namedtuple("_tasktype",taskschema.symbols)
+ TaskType=_ttype(*taskschema.symbols)
+
+if (outputProtocol is None):
+ pfile=os.path.split(__file__)[0]+os.sep+"OutputProtocol.avpr"
+
+ if not(os.path.exists(pfile)):
+ raise Exception("Could not locate the OutputProtocol: {0} does not exist".format(pfile))
+
+ with file(pfile,'r') as hf:
+ prototxt=hf.read()
+
+ outputProtocol=protocol.parse(prototxt)
+
+class Collector(object):
+ """
+ Collector for map and reduce output values
+ """
+ def __init__(self,scheme=None,outputClient=None):
+ """
+
+ Parameters
+ ---------------------------------------------
+ scheme - The scheme for the datums to output - can be a json string
+ - or an instance of Schema
+ outputClient - The output client used to send messages to the parent
+ """
+
+ if not(isinstance(scheme,schema.Schema)):
+ scheme=schema.parse(scheme)
+
+ if (outputClient is None):
+ raise ValueError("output client can't be none.")
+
+ self.scheme=scheme
+ self.buff=StringIO()
+ self.encoder=avio.BinaryEncoder(self.buff)
+
+ self.datum_writer = avio.DatumWriter(writers_schema=self.scheme)
+ self.outputClient=outputClient
+
+ def collect(self,record,partition=None):
+ """Collect a map or reduce output value
+
+ Parameters
+ ------------------------------------------------------
+ record - The record to write
+ partition - Indicates the partition for a pre-partitioned map output
+ - currently not supported
+ """
+
+ self.buff.truncate(0)
+ self.datum_writer.write(record, self.encoder);
+ self.buff.flush();
+ self.buff.seek(0)
+
+ # delete all the data in the buffer
+ if (partition is None):
+
+ # TODO: Is there a more efficient way to read the data in self.buff?
+ # we could use self.buff.read() but that returns the byte array as a string
+ # will that work? We can also use self.buff.readinto to read it into
+ # a bytearray but the byte array must be pre-allocated
+ # self.outputClient.output(self.buff.buffer.read())
+
+ #its not a StringIO
+ self.outputClient.request("output",{"datum":self.buff.read()})
+ else:
+ self.outputClient.request("outputPartitioned",{"datum":self.buff.read(),"partition":partition})
+
+
+
+def keys_are_equal(rec1,rec2,fkeys):
+ """Check if the "keys" in two records are equal. The key fields
+ are all fields for which order isn't marked ignore.
+
+ Parameters
+ -------------------------------------------------------------------------
+ rec1 - The first record
+ rec2 - The second record
+ fkeys - A list of the fields to compare
+ """
+
+ for f in fkeys:
+ if not(rec1[f]==rec2[f]):
+ return False
+
+ return True
+
+
+class HTTPRequestor(object):
+ """
+ This is a small requestor subclass I created for the HTTP protocol.
+ Since the HTTP protocol isn't persistent, we need to instantiate
+ a new transciever and new requestor for each request.
+ But I wanted to use of the requestor to be identical to that for
+ SocketTransciever so that we can seamlessly switch between the two.
+ """
+
+ def __init__(self, server,port,protocol):
+ """
+ Instantiate the class.
+
+ Parameters
+ ----------------------------------------------------------------------
+ server - The server hostname
+ port - Which port to use
+ protocol - The protocol for the communication
+ """
+
+ self.server=server
+ self.port=port
+ self.protocol=protocol
+
+ def request(self,*args,**param):
+ transciever=ipc.HTTPTransceiver(self.server,self.port)
+ requestor=ipc.Requestor(self.protocol, transciever)
+ return requestor.request(*args,**param)
+
+
+class TetherTask(object):
+ """
+ Base class for python tether mapreduce programs.
+
+ ToDo: Currently the subclass has to implement both reduce and reduceFlush.
+ This is not very pythonic. A pythonic way to implement the reducer
+ would be to pass the reducer a generator (as dumbo does) so that the user
+ could iterate over the records for the given key.
+ How would we do this. I think we would need to have two threads, one thread would run
+ the user's reduce function. This loop would be suspended when no reducer records were available.
+ The other thread would read in the records for the reducer. This thread should
+ only buffer so many records at a time (i.e if the buffer is full, self.input shouldn't return right
+ away but wait for space to free up)
+ """
+
+ def __init__(self,inschema=None,midschema=None,outschema=None):
+ """
+
+ Parameters
+ ---------------------------------------------------------
+ inschema - The scheme for the input to the mapper
+ midschema - The scheme for the output of the mapper
+ outschema - The scheme for the output of the reducer
+
+ An example scheme for the prototypical word count example would be
+ inscheme='{"type":"record", "name":"Pair","namespace":"org.apache.avro.mapred","fields":[
+ {"name":"key","type":"string"},
+ {"name":"value","type":"long","order":"ignore"}]
+ }'
+
+ Important: The records are split into (key,value) pairs as required by map reduce
+ by using all fields with "order"=ignore for the key and the remaining fields for the value.
+
+ The subclass provides these schemas in order to tell this class which schemas it expects.
+ The configure request will also provide the schemas that the parent process is using.
+ This allows us to check whether the schemas match and if not whether we can resolve
+ the differences (see http://avro.apache.org/docs/current/spec.html#Schema+Resolution))
+
+ """
+
+
+ if (inschema is None):
+ raise ValueError("inschema can't be None")
+
+ if (midschema is None):
+ raise ValueError("midschema can't be None")
+
+ if (outschema is None):
+ raise ValueError("outschema can't be None")
+
+ # make sure we can parse the schemas
+ # Should we call fail if we can't parse the schemas?
+ self.inschema=schema.parse(inschema)
+ self.midschema=schema.parse(midschema)
+ self.outschema=schema.parse(outschema)
+
+
+ # declare various variables
+ self.clienTransciever=None
+
+ # output client is used to communicate with the parent process
+ # in particular to transmit the outputs of the mapper and reducer
+ self.outputClient = None
+
+ # collectors for the output of the mapper and reducer
+ self.midCollector=None
+ self.outCollector=None
+
+ self._partitions=None
+
+ # cache a list of the fields used by the reducer as the keys
+ # we need the fields to decide when we have finished processing all values for
+ # a given key. We cache the fields to be more efficient
+ self._red_fkeys=None
+
+ # We need to keep track of the previous record fed to the reducer
+ # b\c we need to be able to determine when we start processing a new group
+ # in the reducer
+ self.midRecord=None
+
+ # create an event object to signal when
+ # http server is ready to be shutdown
+ self.ready_for_shutdown=threading.Event()
+ self.log=logging.getLogger("TetherTask")
+
+ def open(self, inputport,clientPort=None):
+ """Open the output client - i.e the connection to the parent process
+
+ Parameters
+ ---------------------------------------------------------------
+ inputport - This is the port that the subprocess is listening on. i.e the
+ subprocess starts a server listening on this port to accept requests from
+ the parent process
+ clientPort - The port on which the server in the parent process is listening
+ - If this is None we look for the environment variable AVRO_TETHER_OUTPUT_PORT
+ - This is mainly provided for debugging purposes. In practice
+ we want to use the environment variable
+
+ """
+
+
+ # Open the connection to the parent process
+ # The port the parent process is listening on is set in the environment
+ # variable AVRO_TETHER_OUTPUT_PORT
+ # open output client, connecting to parent
+
+ if (clientPort is None):
+ clientPortString = os.getenv("AVRO_TETHER_OUTPUT_PORT")
+ if (clientPortString is None):
+ raise Exception("AVRO_TETHER_OUTPUT_PORT env var is not set")
+
+ clientPort = int(clientPortString)
+
+ self.log.info("TetherTask.open: Opening connection to parent server on port={0}".format(clientPort))
+
+ # We use the HTTP protocol although we hope to shortly have
+ # support for SocketServer,
+ usehttp=True
+
+ if(usehttp):
+ # self.outputClient = ipc.Requestor(outputProtocol, self.clientTransceiver)
+ # since HTTP is stateless, a new transciever
+ # is created and closed for each request. We therefore set clientTransciever to None
+ # We still declare clientTransciever because for other (state) protocols we will need
+ # it and we want to check when we get the message fail whether the transciever
+ # needs to be closed.
+ # self.clientTranciever=None
+ self.outputClient = HTTPRequestor("127.0.0.1",clientPort,outputProtocol)
+
+ else:
+ raise NotImplementedError("Only http protocol is currently supported")
+
+ try:
+ self.outputClient.request('configure',{"port":inputport})
+ except Exception as e:
+ estr= traceback.format_exc()
+ self.fail(estr)
+
+
+ def configure(self,taskType, inSchemaText, outSchemaText):
+ """
+
+ Parameters
+ -------------------------------------------------------------------
+ taskType - What type of task (e.g map, reduce)
+ - This is an enumeration which is specified in the input protocol
+ inSchemaText - string containing the input schema
+ - This is the actual schema with which the data was encoded
+ i.e it is the writer_schema (see http://avro.apache.org/docs/current/spec.html#Schema+Resolution)
+ This is the schema the parent process is using which might be different
+ from the one provided by the subclass of tether_task
+
+ outSchemaText - string containing the output scheme
+ - This is the schema expected by the parent process for the output
+ """
+ self.taskType = taskType
+
+ try:
+ inSchema = schema.parse(inSchemaText)
+ outSchema = schema.parse(outSchemaText)
+
+ if (taskType==TaskType.MAP):
+ self.inReader=avio.DatumReader(writers_schema=inSchema,readers_schema=self.inschema)
+ self.midCollector=Collector(outSchemaText,self.outputClient)
+
+ elif(taskType==TaskType.REDUCE):
+ self.midReader=avio.DatumReader(writers_schema=inSchema,readers_schema=self.midschema)
+ # this.outCollector = new Collector<OUT>(outSchema);
+ self.outCollector=Collector(outSchemaText,self.outputClient)
+
+ # determine which fields in the input record are they keys for the reducer
+ self._red_fkeys=[f.name for f in self.midschema.fields if not(f.order=='ignore')]
+
+ except Exception as e:
+
+ estr= traceback.format_exc()
+ self.fail(estr)
+
+ def set_partitions(self,npartitions):
+
+ try:
+ self._partitions=npartitions
+ except Exception as e:
+ estr= traceback.format_exc()
+ self.fail(estr)
+
+ def get_partitions():
+ """ Return the number of map output partitions of this job."""
+ return self._partitions
+
+ def input(self,data,count):
+ """ Recieve input from the server
+
+ Parameters
+ ------------------------------------------------------
+ data - Sould containg the bytes encoding the serialized data
+ - I think this gets represented as a tring
+ count - how many input records are provided in the binary stream
+ """
+ try:
+ # to avio.BinaryDecoder
+ bdata=StringIO(data)
+ decoder = avio.BinaryDecoder(bdata)
+
+ for i in range(count):
+ if (self.taskType==TaskType.MAP):
+ inRecord = self.inReader.read(decoder)
+
+ # Do we need to pass midCollector if its declared as an instance variable
+ self.map(inRecord, self.midCollector)
+
+ elif (self.taskType==TaskType.REDUCE):
+
+ # store the previous record
+ prev = self.midRecord
+
+ # read the new record
+ self.midRecord = self.midReader.read(decoder);
+ if (prev != None and not(keys_are_equal(self.midRecord,prev,self._red_fkeys))):
+ # since the key has changed we need to finalize the processing
+ # for this group of key,value pairs
+ self.reduceFlush(prev, self.outCollector)
+ self.reduce(self.midRecord, self.outCollector)
+
+ except Exception as e:
+ estr= traceback.format_exc()
+ self.log.warning("failing: "+estr)
+ self.fail(estr)
+
+ def complete(self):
+ """
+ Process the complete request
+ """
+ if ((self.taskType == TaskType.REDUCE ) and not(self.midRecord is None)):
+ try:
+ self.reduceFlush(self.midRecord, self.outCollector);
+ except Exception as e:
+ estr=traceback.format_exc()
+ self.log.warning("failing: "+estr);
+ self.fail(estr)
+
+ self.outputClient.request("complete",dict())
+
+ def map(self,record,collector):
+ """Called with input values to generate intermediat values (i.e mapper output).
+
+ Parameters
+ ----------------------------------------------------------------------------
+ record - The input record
+ collector - The collector to collect the output
+
+ This is an abstract function which should be overloaded by the application specific
+ subclass.
+ """
+
+ raise NotImplementedError("This is an abstract method which should be overloaded in the subclass")
+
+ def reduce(self,record, collector):
+ """ Called with input values to generate reducer output. Inputs are sorted by the mapper
+ key.
+
+ The reduce function is invoked once for each value belonging to a given key outputted
+ by the mapper.
+
+ Parameters
+ ----------------------------------------------------------------------------
+ record - The mapper output
+ collector - The collector to collect the output
+
+ This is an abstract function which should be overloaded by the application specific
+ subclass.
+ """
+
+ raise NotImplementedError("This is an abstract method which should be overloaded in the subclass")
+
+ def reduceFlush(self,record, collector):
+ """
+ Called with the last intermediate value in each equivalence run.
+ In other words, reduceFlush is invoked once for each key produced in the reduce
+ phase. It is called after reduce has been invoked on each value for the given key.
+
+ Parameters
+ ------------------------------------------------------------------
+ record - the last record on which reduce was invoked.
+ """
+ raise NotImplementedError("This is an abstract method which should be overloaded in the subclass")
+
+ def status(self,message):
+ """
+ Called to update task status
+ """
+ self.outputClient.request("status",{"message":message})
+
+ def count(self,group, name, amount):
+ """
+ Called to increment a counter
+ """
+ self.outputClient.request("count",{"group":group, "name":name, "amount":amount})
+
+ def fail(self,message):
+ """
+ Call to fail the task.
+ """
+ self.log.error("TetherTask.fail: failure occured message follows:\n{0}".format(message))
+ try:
+ self.outputClient.request("fail",{"message":message})
+ except Exception as e:
+ estr=traceback.format_exc()
+ self.log.error("TetherTask.fail: an exception occured while trying to send the fail message to the output server:\n{0}".format(estr))
+
+ self.close()
+
+ def close(self):
+ self.log.info("TetherTask.close: closing")
+ if not(self.clienTransciever is None):
+ try:
+ self.clienTransciever.close()
+
+ except Exception as e:
+ # ignore exceptions
+ pass
+
+ # http server is ready to be shutdown
+ self.ready_for_shutdown.set()
diff --git a/lang/py/src/avro/tether/tether_task_runner.py b/lang/py/src/avro/tether/tether_task_runner.py
new file mode 100644
index 0000000..7d223d3
--- /dev/null
+++ b/lang/py/src/avro/tether/tether_task_runner.py
@@ -0,0 +1,227 @@
+"""
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+"""
+
+__all__=["TaskRunner"]
+
+if __name__ == "__main__":
+ # Relative imports don't work when being run directly
+ from avro import tether
+ from avro.tether import TetherTask, find_port, inputProtocol
+
+else:
+ from . import TetherTask, find_port, inputProtocol
+
+from avro import ipc
+from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
+import logging
+import weakref
+import threading
+import sys
+import traceback
+
+class TaskRunnerResponder(ipc.Responder):
+ """
+ The responder for the thethered process
+ """
+ def __init__(self,runner):
+ """
+ Param
+ ----------------------------------------------------------
+ runner - Instance of TaskRunner
+ """
+ ipc.Responder.__init__(self, inputProtocol)
+
+ self.log=logging.getLogger("TaskRunnerResponder")
+
+ # should we use weak references to avoid circular references?
+ # We use weak references b\c self.runner owns this instance of TaskRunnerResponder
+ if isinstance(runner,weakref.ProxyType):
+ self.runner=runner
+ else:
+ self.runner=weakref.proxy(runner)
+
+ self.task=weakref.proxy(runner.task)
+
+ def invoke(self, message, request):
+ try:
+ if message.name=='configure':
+ self.log.info("TetherTaskRunner: Recieved configure")
+ self.task.configure(request["taskType"],request["inSchema"],request["outSchema"])
+ elif message.name=='partitions':
+ self.log.info("TetherTaskRunner: Recieved partitions")
+ try:
+ self.task.set_partitions(request["partitions"])
+ except Exception as e:
+ self.log.error("Exception occured while processing the partitions message: Message:\n"+traceback.format_exc())
+ raise
+ elif message.name=='input':
+ self.log.info("TetherTaskRunner: Recieved input")
+ self.task.input(request["data"],request["count"])
+ elif message.name=='abort':
+ self.log.info("TetherTaskRunner: Recieved abort")
+ self.runner.close()
+ elif message.name=='complete':
+ self.log.info("TetherTaskRunner: Recieved complete")
+ self.task.complete()
+ self.task.close()
+ self.runner.close()
+ else:
+ self.log.warning("TetherTaskRunner: recieved unknown message {0}".format(message.name))
+
+ except Exception as e:
+ self.log.error("Error occured while processing message: {0}".format(message.name))
+ emsg=traceback.format_exc()
+ self.task.fail(emsg)
+
+ return None
+
+
+def HTTPHandlerGen(runner):
+ """
+ This is a class factory for the HTTPHandler. We need
+ a factory b\c we need a reference to the runner
+
+ Parameters
+ -----------------------------------------------------------------
+ runner - instance of the task runner
+ """
+
+ if not(isinstance(runner,weakref.ProxyType)):
+ runnerref=weakref.proxy(runner)
+ else:
+ runnerref=runner
+
+ class TaskRunnerHTTPHandler(BaseHTTPRequestHandler):
+ """Create a handler for the parent.
+ """
+
+ runner=runnerref
+ def __init__(self,*args,**param):
+ """
+ """
+ BaseHTTPRequestHandler.__init__(self,*args,**param)
+
+ def do_POST(self):
+ self.responder =TaskRunnerResponder(self.runner)
+ call_request_reader = ipc.FramedReader(self.rfile)
+ call_request = call_request_reader.read_framed_message()
+ resp_body = self.responder.respond(call_request)
+ self.send_response(200)
+ self.send_header('Content-Type', 'avro/binary')
+ self.end_headers()
+ resp_writer = ipc.FramedWriter(self.wfile)
+ resp_writer.write_framed_message(resp_body)
+
+ return TaskRunnerHTTPHandler
+
+class TaskRunner(object):
+ """This class ties together the server handling the requests from
+ the parent process and the instance of TetherTask which actually
+ implements the logic for the mapper and reducer phases
+ """
+
+ def __init__(self,task):
+ """
+ Construct the runner
+
+ Parameters
+ ---------------------------------------------------------------
+ task - An instance of tether task
+ """
+
+ self.log=logging.getLogger("TaskRunner:")
+
+ if not(isinstance(task,TetherTask)):
+ raise ValueError("task must be an instance of tether task")
+ self.task=task
+
+ self.server=None
+ self.sthread=None
+
+ def start(self,outputport=None,join=True):
+ """
+ Start the server
+
+ Parameters
+ -------------------------------------------------------------------
+ outputport - (optional) The port on which the parent process is listening
+ for requests from the task.
+ - This will typically be supplied by an environment variable
+ we allow it to be supplied as an argument mainly for debugging
+ join - (optional) If set to fault then we don't issue a join to block
+ until the thread excecuting the server terminates.
+ This is mainly for debugging. By setting it to false,
+ we can resume execution in this thread so that we can do additional
+ testing
+ """
+
+ port=find_port()
+ address=("localhost",port)
+
+
+ def thread_run(task_runner=None):
+ task_runner.server = HTTPServer(address, HTTPHandlerGen(task_runner))
+ task_runner.server.allow_reuse_address = True
+ task_runner.server.serve_forever()
+
+ # create a separate thread for the http server
+ sthread=threading.Thread(target=thread_run,kwargs={"task_runner":self})
+ sthread.start()
+
+ self.sthread=sthread
+ # This needs to run in a separat thread b\c serve_forever() blocks
+ self.task.open(port,clientPort=outputport)
+
+ # wait for the other thread to finish
+ if (join):
+ self.task.ready_for_shutdown.wait()
+ self.server.shutdown()
+
+ # should we do some kind of check to make sure it exits
+ self.log.info("Shutdown the logger")
+ # shutdown the logging
+ logging.shutdown()
+
+ def close(self):
+ """
+ Handler for the close message
+ """
+
+ self.task.close()
+
+if __name__ == '__main__':
+ # TODO::Make the logging level a parameter we can set
+ # logging.basicConfig(level=logging.INFO,filename='/tmp/log',filemode='w')
+ logging.basicConfig(level=logging.INFO)
+
+ if (len(sys.argv)<=1):
+ print "Error: tether_task_runner.__main__: Usage: tether_task_runner task_package.task_module.TaskClass"
+ raise ValueError("Usage: tether_task_runner task_package.task_module.TaskClass")
+
+ fullcls=sys.argv[1]
+ mod,cname=fullcls.rsplit(".",1)
+
+ logging.info("tether_task_runner.__main__: Task: {0}".format(fullcls))
+
+ modobj=__import__(mod,fromlist=cname)
+
+ taskcls=getattr(modobj,cname)
+ task=taskcls()
+
+ runner=TaskRunner(task=task)
+ runner.start()
diff --git a/lang/py/src/avro/tether/util.py b/lang/py/src/avro/tether/util.py
new file mode 100644
index 0000000..071b4a1
--- /dev/null
+++ b/lang/py/src/avro/tether/util.py
@@ -0,0 +1,34 @@
+"""
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+"""
+
+__all__=["find_port"]
+
+import socket
+
+
+def find_port():
+ """
+ Return an unbound port
+ """
+ s=socket.socket()
+ s.bind(("127.0.0.1",0))
+
+ port=s.getsockname()[1]
+ s.close()
+
+ return port
\ No newline at end of file
diff --git a/lang/py/test/mock_tether_parent.py b/lang/py/test/mock_tether_parent.py
new file mode 100644
index 0000000..399a03a
--- /dev/null
+++ b/lang/py/test/mock_tether_parent.py
@@ -0,0 +1,95 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import sys
+import set_avro_test_path
+from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
+from avro import ipc
+from avro import protocol
+from avro import tether
+
+import socket
+
+def find_port():
+ """
+ Return an unbound port
+ """
+ s=socket.socket()
+ s.bind(("127.0.0.1",0))
+
+ port=s.getsockname()[1]
+ s.close()
+
+ return port
+
+SERVER_ADDRESS = ('localhost', find_port())
+
+class MockParentResponder(ipc.Responder):
+ """
+ The responder for the mocked parent
+ """
+ def __init__(self):
+ ipc.Responder.__init__(self, tether.outputProtocol)
+
+ def invoke(self, message, request):
+ if message.name=='configure':
+ print "MockParentResponder: Recieved 'configure': inputPort={0}".format(request["port"])
+
+ elif message.name=='status':
+ print "MockParentResponder: Recieved 'status': message={0}".format(request["message"])
+ elif message.name=='fail':
+ print "MockParentResponder: Recieved 'fail': message={0}".format(request["message"])
+ else:
+ print "MockParentResponder: Recieved {0}".format(message.name)
+
+ # flush the output so it shows up in the parent process
+ sys.stdout.flush()
+
+ return None
+
+class MockParentHandler(BaseHTTPRequestHandler):
+ """Create a handler for the parent.
+ """
+ def do_POST(self):
+ self.responder =MockParentResponder()
+ call_request_reader = ipc.FramedReader(self.rfile)
+ call_request = call_request_reader.read_framed_message()
+ resp_body = self.responder.respond(call_request)
+ self.send_response(200)
+ self.send_header('Content-Type', 'avro/binary')
+ self.end_headers()
+ resp_writer = ipc.FramedWriter(self.wfile)
+ resp_writer.write_framed_message(resp_body)
+
+if __name__ == '__main__':
+ if (len(sys.argv)<=1):
+ raise ValueError("Usage: mock_tether_parent command")
+
+ cmd=sys.argv[1].lower()
+ if (sys.argv[1]=='start_server'):
+ if (len(sys.argv)==3):
+ port=int(sys.argv[2])
+ else:
+ raise ValueError("Usage: mock_tether_parent start_server port")
+
+ SERVER_ADDRESS=(SERVER_ADDRESS[0],port)
+ print "mock_tether_parent: Launching Server on Port: {0}".format(SERVER_ADDRESS[1])
+
+ # flush the output so it shows up in the parent process
+ sys.stdout.flush()
+ parent_server = HTTPServer(SERVER_ADDRESS, MockParentHandler)
+ parent_server.allow_reuse_address = True
+ parent_server.serve_forever()
diff --git a/lang/py/test/set_avro_test_path.py b/lang/py/test/set_avro_test_path.py
new file mode 100644
index 0000000..d8b0098
--- /dev/null
+++ b/lang/py/test/set_avro_test_path.py
@@ -0,0 +1,40 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Module adjusts the path PYTHONPATH so the unittests
+will work even if an egg for AVRO is already installed.
+By default eggs always appear higher on pythons path then
+directories set via the environment variable PYTHONPATH.
+
+For reference see:
+http://www.velocityreviews.com/forums/t716589-pythonpath-and-eggs.html
+http://stackoverflow.com/questions/897792/pythons-sys-path-value.
+
+Unittests would therefore use the installed AVRO and not the AVRO
+being built. To work around this the unittests import this module before
+importing AVRO. This module in turn adjusts the python path so that the test
+build of AVRO is higher on the path then any installed eggs.
+"""
+import sys
+import os
+
+# determine the build directory and then make sure all paths that start with the
+# build directory are at the top of the path
+builddir=os.path.split(os.path.split(__file__)[0])[0]
+bpaths=filter(lambda s:s.startswith(builddir), sys.path)
+
+for p in bpaths:
+ sys.path.insert(0,p)
\ No newline at end of file
diff --git a/lang/py/test/test_datafile.py b/lang/py/test/test_datafile.py
index b3ce692..72994f3 100644
--- a/lang/py/test/test_datafile.py
+++ b/lang/py/test/test_datafile.py
@@ -15,6 +15,9 @@
# limitations under the License.
import os
import unittest
+
+import set_avro_test_path
+
from avro import schema
from avro import io
from avro import datafile
diff --git a/lang/py/test/test_datafile_interop.py b/lang/py/test/test_datafile_interop.py
index 8f4e883..7204529 100644
--- a/lang/py/test/test_datafile_interop.py
+++ b/lang/py/test/test_datafile_interop.py
@@ -15,6 +15,9 @@
# limitations under the License.
import os
import unittest
+
+import set_avro_test_path
+
from avro import io
from avro import datafile
diff --git a/lang/py/test/test_io.py b/lang/py/test/test_io.py
index 05a6f80..1e79d3e 100644
--- a/lang/py/test/test_io.py
+++ b/lang/py/test/test_io.py
@@ -19,6 +19,9 @@ try:
except ImportError:
from StringIO import StringIO
from binascii import hexlify
+
+import set_avro_test_path
+
from avro import schema
from avro import io
diff --git a/lang/py/test/test_ipc.py b/lang/py/test/test_ipc.py
index 2545b15..7fffe49 100644
--- a/lang/py/test/test_ipc.py
+++ b/lang/py/test/test_ipc.py
@@ -19,6 +19,8 @@ servers yet available.
"""
import unittest
+import set_avro_test_path
+
# This test does import this code, to make sure it at least passes
# compilation.
from avro import ipc
diff --git a/lang/py/test/test_schema.py b/lang/py/test/test_schema.py
index b9c84b3..204d1b1 100644
--- a/lang/py/test/test_schema.py
+++ b/lang/py/test/test_schema.py
@@ -17,6 +17,8 @@
Test the schema parsing logic.
"""
import unittest
+import set_avro_test_path
+
from avro import schema
def print_test_name(test_name):
@@ -287,6 +289,10 @@ OTHER_PROP_EXAMPLES = [
"symbols": [ "one", "two", "three" ],
"cp_float" : 1.0 }
""",True),
+ ExampleSchema("""\
+ {"type": "long",
+ "date": "true"}
+ """, True)
]
EXAMPLES = PRIMITIVE_EXAMPLES
diff --git a/lang/py/test/test_tether_task.py b/lang/py/test/test_tether_task.py
new file mode 100644
index 0000000..32265e6
--- /dev/null
+++ b/lang/py/test/test_tether_task.py
@@ -0,0 +1,116 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+
+import os
+import subprocess
+import sys
+import time
+import unittest
+
+import set_avro_test_path
+
+class TestTetherTask(unittest.TestCase):
+ """
+ TODO: We should validate the the server response by looking at stdout
+ """
+ def test1(self):
+ """
+ Test that the thether_task is working. We run the mock_tether_parent in a separate
+ subprocess
+ """
+ from avro import tether
+ from avro import io as avio
+ from avro import schema
+ from avro.tether import HTTPRequestor,inputProtocol, find_port
+
+ import StringIO
+ import mock_tether_parent
+ from word_count_task import WordCountTask
+
+ task=WordCountTask()
+
+ proc=None
+ try:
+ # launch the server in a separate process
+ # env["AVRO_TETHER_OUTPUT_PORT"]=output_port
+ env=dict()
+ env["PYTHONPATH"]=':'.join(sys.path)
+ server_port=find_port()
+
+ pyfile=mock_tether_parent.__file__
+ proc=subprocess.Popen(["python", pyfile,"start_server","{0}".format(server_port)])
+ input_port=find_port()
+
+ print "Mock server started process pid={0}".format(proc.pid)
+ # Possible race condition? open tries to connect to the subprocess before the subprocess is fully started
+ # so we give the subprocess time to start up
+ time.sleep(1)
+ task.open(input_port,clientPort=server_port)
+
+ # TODO: We should validate that open worked by grabbing the STDOUT of the subproces
+ # and ensuring that it outputted the correct message.
+
+ #***************************************************************
+ # Test the mapper
+ task.configure(tether.TaskType.MAP,str(task.inschema),str(task.midschema))
+
+ # Serialize some data so we can send it to the input function
+ datum="This is a line of text"
+ writer = StringIO.StringIO()
+ encoder = avio.BinaryEncoder(writer)
+ datum_writer = avio.DatumWriter(task.inschema)
+ datum_writer.write(datum, encoder)
+
+ writer.seek(0)
+ data=writer.read()
+
+ # Call input to simulate calling map
+ task.input(data,1)
+
+ # Test the reducer
+ task.configure(tether.TaskType.REDUCE,str(task.midschema),str(task.outschema))
+
+ # Serialize some data so we can send it to the input function
+ datum={"key":"word","value":2}
+ writer = StringIO.StringIO()
+ encoder = avio.BinaryEncoder(writer)
+ datum_writer = avio.DatumWriter(task.midschema)
+ datum_writer.write(datum, encoder)
+
+ writer.seek(0)
+ data=writer.read()
+
+ # Call input to simulate calling reduce
+ task.input(data,1)
+
+ task.complete()
+
+ # try a status
+ task.status("Status message")
+
+ except Exception as e:
+ raise
+ finally:
+ # close the process
+ if not(proc is None):
+ proc.kill()
+
+ pass
+
+if __name__ == '__main__':
+ unittest.main()
\ No newline at end of file
diff --git a/lang/py/test/test_tether_task_runner.py b/lang/py/test/test_tether_task_runner.py
new file mode 100644
index 0000000..a3f10fe
--- /dev/null
+++ b/lang/py/test/test_tether_task_runner.py
@@ -0,0 +1,191 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import subprocess
+import sys
+import time
+import unittest
+
+import set_avro_test_path
+
+
+class TestTetherTaskRunner(unittest.TestCase):
+ """ unit test for a tethered task runner.
+ """
+
+ def test1(self):
+ from word_count_task import WordCountTask
+ from avro.tether import TaskRunner, find_port,HTTPRequestor,inputProtocol, TaskType
+ from avro import io as avio
+ import mock_tether_parent
+ import subprocess
+ import StringIO
+ import logging
+
+ # set the logging level to debug so that debug messages are printed
+ logging.basicConfig(level=logging.DEBUG)
+
+ proc=None
+ try:
+ # launch the server in a separate process
+ env=dict()
+ env["PYTHONPATH"]=':'.join(sys.path)
+ parent_port=find_port()
+
+ pyfile=mock_tether_parent.__file__
+ proc=subprocess.Popen(["python", pyfile,"start_server","{0}".format(parent_port)])
+ input_port=find_port()
+
+ print "Mock server started process pid={0}".format(proc.pid)
+ # Possible race condition? open tries to connect to the subprocess before the subprocess is fully started
+ # so we give the subprocess time to start up
+ time.sleep(1)
+
+ runner=TaskRunner(WordCountTask())
+
+ runner.start(outputport=parent_port,join=False)
+
+ # Test sending various messages to the server and ensuring they are
+ # processed correctly
+ requestor=HTTPRequestor("localhost",runner.server.server_address[1],inputProtocol)
+
+ # TODO: We should validate that open worked by grabbing the STDOUT of the subproces
+ # and ensuring that it outputted the correct message.
+
+ # Test the mapper
+ requestor.request("configure",{"taskType":TaskType.MAP,"inSchema":str(runner.task.inschema),"outSchema":str(runner.task.midschema)})
+
+ # Serialize some data so we can send it to the input function
+ datum="This is a line of text"
+ writer = StringIO.StringIO()
+ encoder = avio.BinaryEncoder(writer)
+ datum_writer = avio.DatumWriter(runner.task.inschema)
+ datum_writer.write(datum, encoder)
+
+ writer.seek(0)
+ data=writer.read()
+
+
+ # Call input to simulate calling map
+ requestor.request("input",{"data":data,"count":1})
+
+ #Test the reducer
+ requestor.request("configure",{"taskType":TaskType.REDUCE,"inSchema":str(runner.task.midschema),"outSchema":str(runner.task.outschema)})
+
+ #Serialize some data so we can send it to the input function
+ datum={"key":"word","value":2}
+ writer = StringIO.StringIO()
+ encoder = avio.BinaryEncoder(writer)
+ datum_writer = avio.DatumWriter(runner.task.midschema)
+ datum_writer.write(datum, encoder)
+
+ writer.seek(0)
+ data=writer.read()
+
+
+ #Call input to simulate calling reduce
+ requestor.request("input",{"data":data,"count":1})
+
+ requestor.request("complete",{})
+
+
+ runner.task.ready_for_shutdown.wait()
+ runner.server.shutdown()
+ #time.sleep(2)
+ #runner.server.shutdown()
+
+ sthread=runner.sthread
+
+ #Possible race condition?
+ time.sleep(1)
+
+ #make sure the other thread terminated
+ self.assertFalse(sthread.isAlive())
+
+ #shutdown the logging
+ logging.shutdown()
+
+ except Exception as e:
+ raise
+ finally:
+ #close the process
+ if not(proc is None):
+ proc.kill()
+
+
+ def test2(self):
+ """
+ In this test we want to make sure that when we run "tether_task_runner.py"
+ as our main script everything works as expected. We do this by using subprocess to run it
+ in a separate thread.
+ """
+ from word_count_task import WordCountTask
+ from avro.tether import TaskRunner, find_port,HTTPRequestor,inputProtocol, TaskType
+ from avro.tether import tether_task_runner
+ from avro import io as avio
+ import mock_tether_parent
+ import subprocess
+ import StringIO
+
+
+ proc=None
+
+ runnerproc=None
+ try:
+ #launch the server in a separate process
+ env=dict()
+ env["PYTHONPATH"]=':'.join(sys.path)
+ parent_port=find_port()
+
+ pyfile=mock_tether_parent.__file__
+ proc=subprocess.Popen(["python", pyfile,"start_server","{0}".format(parent_port)])
+
+ #Possible race condition? when we start tether_task_runner it will call
+ # open tries to connect to the subprocess before the subprocess is fully started
+ #so we give the subprocess time to start up
+ time.sleep(1)
+
+
+ #start the tether_task_runner in a separate process
+ env={"AVRO_TETHER_OUTPUT_PORT":"{0}".format(parent_port)}
+ env["PYTHONPATH"]=':'.join(sys.path)
+
+ runnerproc=subprocess.Popen(["python",tether_task_runner.__file__,"word_count_task.WordCountTask"],env=env)
+
+ #possible race condition wait for the process to start
+ time.sleep(1)
+
+
+
+ print "Mock server started process pid={0}".format(proc.pid)
+ #Possible race condition? open tries to connect to the subprocess before the subprocess is fully started
+ #so we give the subprocess time to start up
+ time.sleep(1)
+
+
+ except Exception as e:
+ raise
+ finally:
+ #close the process
+ if not(runnerproc is None):
+ runnerproc.kill()
+
+ if not(proc is None):
+ proc.kill()
+
+if __name__==("__main__"):
+ unittest.main()
diff --git a/lang/py/test/test_tether_word_count.py b/lang/py/test/test_tether_word_count.py
new file mode 100644
index 0000000..6e51d31
--- /dev/null
+++ b/lang/py/test/test_tether_word_count.py
@@ -0,0 +1,213 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import inspect
+import subprocess
+import sys
+import time
+import unittest
+import os
+
+import set_avro_test_path
+
+class TestTetherWordCount(unittest.TestCase):
+ """ unittest for a python tethered map-reduce job.
+ """
+
+ def _write_lines(self,lines,fname):
+ """
+ Write the lines to an avro file named fname
+
+ Parameters
+ --------------------------------------------------------
+ lines - list of strings to write
+ fname - the name of the file to write to.
+ """
+ import avro.io as avio
+ from avro.datafile import DataFileReader,DataFileWriter
+ from avro import schema
+
+ #recursively make all directories
+ dparts=fname.split(os.sep)[:-1]
+ for i in range(len(dparts)):
+ pdir=os.sep+os.sep.join(dparts[:i+1])
+ if not(os.path.exists(pdir)):
+ os.mkdir(pdir)
+
+
+ with file(fname,'w') as hf:
+ inschema="""{"type":"string"}"""
+ writer=DataFileWriter(hf,avio.DatumWriter(inschema),writers_schema=schema.parse(inschema))
+
+ #encoder = avio.BinaryEncoder(writer)
+ #datum_writer = avio.DatumWriter()
+ for datum in lines:
+ writer.append(datum)
+
+ writer.close()
+
+
+
+
+ def _count_words(self,lines):
+ """Return a dictionary counting the words in lines
+ """
+ counts={}
+
+ for line in lines:
+ words=line.split()
+
+ for w in words:
+ if not(counts.has_key(w.strip())):
+ counts[w.strip()]=0
+
+ counts[w.strip()]=counts[w.strip()]+1
+
+ return counts
+
+ def test1(self):
+ """
+ Run a tethered map-reduce job.
+
+ Assumptions: 1) bash is available in /bin/bash
+ """
+ from word_count_task import WordCountTask
+ from avro.tether import tether_task_runner
+ from avro.datafile import DataFileReader
+ from avro.io import DatumReader
+ import avro
+
+ import subprocess
+ import StringIO
+ import shutil
+ import tempfile
+ import inspect
+
+ proc=None
+
+ try:
+
+
+ # TODO we use the tempfile module to generate random names
+ # for the files
+ base_dir = "/tmp/test_tether_word_count"
+ if os.path.exists(base_dir):
+ shutil.rmtree(base_dir)
+
+ inpath = os.path.join(base_dir, "in")
+ infile=os.path.join(inpath, "lines.avro")
+ lines=["the quick brown fox jumps over the lazy dog",
+ "the cow jumps over the moon",
+ "the rain in spain falls mainly on the plains"]
+
+ self._write_lines(lines,infile)
+
+ true_counts=self._count_words(lines)
+
+ if not(os.path.exists(infile)):
+ self.fail("Missing the input file {0}".format(infile))
+
+
+ # The schema for the output of the mapper and reducer
+ oschema="""
+{"type":"record",
+ "name":"Pair","namespace":"org.apache.avro.mapred","fields":[
+ {"name":"key","type":"string"},
+ {"name":"value","type":"long","order":"ignore"}
+ ]
+}
+"""
+
+ # write the schema to a temporary file
+ osfile=tempfile.NamedTemporaryFile(mode='w',suffix=".avsc",prefix="wordcount",delete=False)
+ outschema=osfile.name
+ osfile.write(oschema)
+ osfile.close()
+
+ if not(os.path.exists(outschema)):
+ self.fail("Missing the schema file")
+
+ outpath = os.path.join(base_dir, "out")
+
+ args=[]
+
+ args.append("java")
+ args.append("-jar")
+ args.append(os.path.abspath("@TOPDIR@/../java/tools/target/avro-tools- at AVRO_VERSION@.jar"))
+
+
+ args.append("tether")
+ args.extend(["--in",inpath])
+ args.extend(["--out",outpath])
+ args.extend(["--outschema",outschema])
+ args.extend(["--protocol","http"])
+
+ # form the arguments for the subprocess
+ subargs=[]
+
+ srcfile=inspect.getsourcefile(tether_task_runner)
+
+ # Create a shell script to act as the program we want to execute
+ # We do this so we can set the python path appropriately
+ script="""#!/bin/bash
+export PYTHONPATH={0}
+python -m avro.tether.tether_task_runner word_count_task.WordCountTask
+"""
+ # We need to make sure avro is on the path
+ # getsourcefile(avro) returns .../avro/__init__.py
+ asrc=inspect.getsourcefile(avro)
+ apath=asrc.rsplit(os.sep,2)[0]
+
+ # path to where the tests lie
+ tpath=os.path.split(__file__)[0]
+
+ exhf=tempfile.NamedTemporaryFile(mode='w',prefix="exec_word_count_",delete=False)
+ exfile=exhf.name
+ exhf.write(script.format((os.pathsep).join([apath,tpath]),srcfile))
+ exhf.close()
+
+ # make it world executable
+ os.chmod(exfile,0755)
+
+ args.extend(["--program",exfile])
+
+ print "Command:\n\t{0}".format(" ".join(args))
+ proc=subprocess.Popen(args)
+
+
+ proc.wait()
+
+ # read the output
+ with file(os.path.join(outpath,"part-00000.avro")) as hf:
+ reader=DataFileReader(hf, DatumReader())
+ for record in reader:
+ self.assertEqual(record["value"],true_counts[record["key"]])
+
+ reader.close()
+
+ except Exception as e:
+ raise
+ finally:
+ # close the process
+ if proc is not None and proc.returncode is None:
+ proc.kill()
+ if os.path.exists(base_dir):
+ shutil.rmtree(base_dir)
+ if os.path.exists(exfile):
+ os.remove(exfile)
+
+if __name__== "__main__":
+ unittest.main()
diff --git a/lang/py/test/word_count_task.py b/lang/py/test/word_count_task.py
new file mode 100644
index 0000000..30dcc51
--- /dev/null
+++ b/lang/py/test/word_count_task.py
@@ -0,0 +1,96 @@
+"""
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+"""
+
+__all__=["WordCountTask"]
+
+from avro.tether import TetherTask
+
+import logging
+
+#TODO::Make the logging level a parameter we can set
+#logging.basicConfig(level=logging.INFO)
+class WordCountTask(TetherTask):
+ """
+ Implements the mappper and reducer for the word count example
+ """
+
+ def __init__(self):
+ """
+ """
+
+ inschema="""{"type":"string"}"""
+ midschema="""{"type":"record", "name":"Pair","namespace":"org.apache.avro.mapred","fields":[
+ {"name":"key","type":"string"},
+ {"name":"value","type":"long","order":"ignore"}]
+ }"""
+ outschema=midschema
+ TetherTask.__init__(self,inschema,midschema,outschema)
+
+
+ #keep track of the partial sums of the counts
+ self.psum=0
+
+
+ def map(self,record,collector):
+ """Implement the mapper for the word count example
+
+ Parameters
+ ----------------------------------------------------------------------------
+ record - The input record
+ collector - The collector to collect the output
+ """
+
+ words=record.split()
+
+ for w in words:
+ logging.info("WordCountTask.Map: word={0}".format(w))
+ collector.collect({"key":w,"value":1})
+
+ def reduce(self,record, collector):
+ """Called with input values to generate reducer output. Inputs are sorted by the mapper
+ key.
+
+ The reduce function is invoked once for each value belonging to a given key outputted
+ by the mapper.
+
+ Parameters
+ ----------------------------------------------------------------------------
+ record - The mapper output
+ collector - The collector to collect the output
+ """
+
+ self.psum+=record["value"]
+
+ def reduceFlush(self,record, collector):
+ """
+ Called with the last intermediate value in each equivalence run.
+ In other words, reduceFlush is invoked once for each key produced in the reduce
+ phase. It is called after reduce has been invoked on each value for the given key.
+
+ Parameters
+ ------------------------------------------------------------------
+ record - the last record on which reduce was invoked.
+ """
+
+ #collect the current record
+ logging.info("WordCountTask.reduceFlush key={0} value={1}".format(record["key"],self.psum))
+
+ collector.collect({"key":record["key"],"value":self.psum})
+
+ #reset the sum
+ self.psum=0
diff --git a/lang/py3/avro/schema.py b/lang/py3/avro/schema.py
index b5d17fe..c3f73c5 100644
--- a/lang/py3/avro/schema.py
+++ b/lang/py3/avro/schema.py
@@ -643,7 +643,7 @@ class PrimitiveSchema(Schema):
Valid primitive types are defined in PRIMITIVE_TYPES.
"""
- def __init__(self, type):
+ def __init__(self, type, other_props=None):
"""Initializes a new schema object for the specified primitive type.
Args:
@@ -651,7 +651,7 @@ class PrimitiveSchema(Schema):
"""
if type not in PRIMITIVE_TYPES:
raise AvroException('%r is not a valid primitive type.' % type)
- super(PrimitiveSchema, self).__init__(type)
+ super(PrimitiveSchema, self).__init__(type, other_props=other_props)
@property
def name(self):
@@ -752,7 +752,7 @@ class EnumSchema(NamedSchema):
other_props=other_props,
)
- self._props['symbols'] = tuple(sorted(symbol_set))
+ self._props['symbols'] = symbols
if doc is not None:
self._props['doc'] = doc
@@ -1153,7 +1153,7 @@ def _SchemaFromJSONObject(json_object, names):
if type in PRIMITIVE_TYPES:
# FIXME should not ignore other properties
- return PrimitiveSchema(type)
+ return PrimitiveSchema(type, other_props=other_props)
elif type in NAMED_TYPES:
name = json_object.get('name')
diff --git a/lang/py3/avro/tests/run_tests.py b/lang/py3/avro/tests/run_tests.py
index 738c8e5..d7e6512 100644
--- a/lang/py3/avro/tests/run_tests.py
+++ b/lang/py3/avro/tests/run_tests.py
@@ -54,6 +54,7 @@ from avro.tests.test_ipc import *
from avro.tests.test_protocol import *
from avro.tests.test_schema import *
from avro.tests.test_script import *
+from avro.tests.test_enum import *
def SetupLogging():
diff --git a/lang/py/test/test_datafile_interop.py b/lang/py3/avro/tests/test_enum.py
similarity index 54%
copy from lang/py/test/test_datafile_interop.py
copy to lang/py3/avro/tests/test_enum.py
index 8f4e883..7e55359 100644
--- a/lang/py/test/test_datafile_interop.py
+++ b/lang/py3/avro/tests/test_enum.py
@@ -1,39 +1,35 @@
+#!/usr/bin/env python3
+# -*- mode: python -*-
+# -*- coding: utf-8 -*-
+
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
+# regarding copyright ownership. Thete ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
-#
+#
# http://www.apache.org/licenses/LICENSE-2.0
-#
+#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
-import os
+
import unittest
-from avro import io
-from avro import datafile
-class TestDataFileInterop(unittest.TestCase):
- def test_interop(self):
- print ''
- print 'TEST INTEROP'
- print '============'
- print ''
- for f in os.listdir('@INTEROP_DATA_DIR@'):
- print 'READING %s' % f
- print ''
+from avro import schema
+
+class TestEnum(unittest.TestCase):
+ def testSymbolsInOrder(self):
+ enum = schema.EnumSchema('Test', '', ['A', 'B'], schema.Names(), '', {})
+ self.assertEqual('A', enum.symbols[0])
- # read data in binary from file
- reader = open(os.path.join('@INTEROP_DATA_DIR@', f), 'rb')
- datum_reader = io.DatumReader()
- dfr = datafile.DataFileReader(reader, datum_reader)
- for datum in dfr:
- assert datum is not None
+ def testSymbolsInReverseOrder(self):
+ enum = schema.EnumSchema('Test', '', ['B', 'A'], schema.Names(), '', {})
+ self.assertEqual('B', enum.symbols[0])
if __name__ == '__main__':
- unittest.main()
+ raise Exception('Use run_tests.py')
diff --git a/lang/py3/avro/tests/test_schema.py b/lang/py3/avro/tests/test_schema.py
index 3aaa6b3..c836528 100644
--- a/lang/py3/avro/tests/test_schema.py
+++ b/lang/py3/avro/tests/test_schema.py
@@ -426,6 +426,11 @@ OTHER_PROP_EXAMPLES = [
""",
valid=True,
),
+ ExampleSchema("""
+ {"type": "long", "date": "true"}
+ """,
+ valid=True,
+ ),
]
EXAMPLES = PRIMITIVE_EXAMPLES
diff --git a/lang/py3/setup.py b/lang/py3/setup.py
index 426ad1d..53b76ad 100644
--- a/lang/py3/setup.py
+++ b/lang/py3/setup.py
@@ -27,6 +27,9 @@ from setuptools import setup
VERSION_FILE_NAME = 'VERSION.txt'
+# The following prevents distutils from using hardlinks (which may not always be
+# available, e.g. on a Docker volume). See http://bugs.python.org/issue8876
+del os.link
def RunsFromSourceDist():
"""Tests whether setup.py is invoked from a source distribution.
@@ -120,7 +123,7 @@ def Main():
avro_version = ReadVersion()
setup(
- name = 'avro-python3-snapshot',
+ name = 'avro-python3',
version = avro_version,
packages = ['avro'],
package_dir = {'avro': 'avro'},
diff --git a/pom.xml b/pom.xml
index e188eb0..c3b6197 100644
--- a/pom.xml
+++ b/pom.xml
@@ -19,6 +19,10 @@
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
+ <prerequisites>
+ <maven>2.2.1</maven>
+ </prerequisites>
+
<parent>
<groupId>org.apache</groupId>
<artifactId>apache</artifactId>
@@ -27,7 +31,7 @@
<groupId>org.apache.avro</groupId>
<artifactId>avro-toplevel</artifactId>
- <version>1.7.7</version>
+ <version>1.8.0</version>
<packaging>pom</packaging>
<name>Apache Avro Toplevel</name>
@@ -47,7 +51,7 @@
<!-- plugin versions -->
<antrun-plugin.version>1.7</antrun-plugin.version>
- <enforcer-plugin.version>1.0.1</enforcer-plugin.version>
+ <enforcer-plugin.version>1.3.1</enforcer-plugin.version>
</properties>
<modules>
diff --git a/share/VERSION.txt b/share/VERSION.txt
index 73c8b4f..afa2b35 100644
--- a/share/VERSION.txt
+++ b/share/VERSION.txt
@@ -1 +1 @@
-1.7.7
\ No newline at end of file
+1.8.0
\ No newline at end of file
diff --git a/share/docker/Dockerfile b/share/docker/Dockerfile
new file mode 100644
index 0000000..3bc0b33
--- /dev/null
+++ b/share/docker/Dockerfile
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Dockerfile for installing the necessary dependencies for building Avro.
+# See BUILD.txt.
+
+FROM java:7-jdk
+
+WORKDIR /root
+
+# Install dependencies from packages
+RUN apt-get update && apt-get install --no-install-recommends -y \
+ git subversion curl ant make maven \
+ gcc cmake asciidoc source-highlight \
+ g++ flex bison libboost-all-dev doxygen \
+ mono-devel mono-gmcs nunit \
+ nodejs nodejs-legacy npm \
+ perl \
+ php5 phpunit php5-gmp bzip2 \
+ python python-setuptools python3-setuptools \
+ ruby ruby-dev rake \
+ libsnappy1 libsnappy-dev
+
+# Install Forrest
+RUN mkdir -p /usr/local/apache-forrest
+RUN curl -O http://archive.apache.org/dist/forrest/0.8/apache-forrest-0.8.tar.gz
+RUN tar xzf *forrest* --strip-components 1 -C /usr/local/apache-forrest
+RUN echo 'forrest.home=/usr/local/apache-forrest' > build.properties
+RUN chmod -R 0777 /usr/local/apache-forrest/build /usr/local/apache-forrest/main \
+ /usr/local/apache-forrest/plugins
+ENV FORREST_HOME /usr/local/apache-forrest
+
+# Install Perl modules
+RUN curl -L http://cpanmin.us | perl - --self-upgrade # non-interactive cpan
+RUN cpanm install Module::Install Module::Install::ReadmeFromPod \
+ Module::Install::Repository \
+ Math::BigInt JSON::XS Try::Tiny Regexp::Common Encode \
+ IO::String Object::Tiny Compress::Zlib Test::More \
+ Test::Exception Test::Pod
+
+# Install Ruby modules
+RUN gem install echoe yajl-ruby multi_json snappy
+
+# Install global Node modules
+RUN npm install -g grunt-cli
diff --git a/share/rat-excludes.txt b/share/rat-excludes.txt
index c123a93..9b05e70 100644
--- a/share/rat-excludes.txt
+++ b/share/rat-excludes.txt
@@ -8,6 +8,7 @@
**/*.js
**/*.la
**/*.m4
+**/*.md
**/*.md5
**/*.pom
**/*.properties
diff --git a/share/schemas/org/apache/avro/ipc/trace/avroTrace.avdl b/share/schemas/org/apache/avro/ipc/trace/avroTrace.avdl
deleted file mode 100644
index 9fd5680..0000000
--- a/share/schemas/org/apache/avro/ipc/trace/avroTrace.avdl
+++ /dev/null
@@ -1,68 +0,0 @@
-/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-/**
- * A Span is our basic unit of tracing. It tracks the critical points
- * of a single RPC call and records other call meta-data. It also
- * allows arbitrary string annotations. Both the client and server create
- * Span objects, each of which is populated with half of the relevant event
- * data. They share a span ID, which allows us to merge them into one complete
- * span later on.
- */
- at namespace("org.apache.avro.ipc.trace")
-
-protocol AvroTrace {
- enum SpanEvent { SERVER_RECV, SERVER_SEND, CLIENT_RECV, CLIENT_SEND }
-
- fixed ID(8);
-
- record TimestampedEvent {
- long timeStamp; // Unix time, in nanoseconds
- union { SpanEvent, string} event;
- }
-
- /**
- * An individual span is the basic unit of testing.
- * The record is used by both \"client\" and \"server\".
- */
- record Span {
- ID traceID; // ID shared by all Spans in a given trace
- ID spanID; // Random ID for this Span
- union { ID, null } parentSpanID; // Parent Span ID (null if root Span)
- string messageName; // Function call represented
- long requestPayloadSize; // Size (bytes) of the request
- long responsePayloadSize; // Size (byts) of the response
- union { string, null} requestorHostname; // Hostname of requestor
-// int requestorPort; // Port of the requestor (currently unused)
- union { string, null } responderHostname; // Hostname of the responder
-// int responderPort; // Port of the responder (currently unused)
- array<TimestampedEvent> events; // List of critical events
- boolean complete; // Whether includes data from both sides
- }
-
- /**
- * Get all spans stored on this host.
- */
- array<Span> getAllSpans();
-
- /**
- * Get spans occuring between start and end. Each is a unix timestamp
- * in nanosecond units (for consistency with TimestampedEvent).
- */
- array<Span> getSpansInRange(long start, long end);
-}
diff --git a/share/schemas/org/apache/avro/ipc/trace/avroTrace.avpr b/share/schemas/org/apache/avro/ipc/trace/avroTrace.avpr
deleted file mode 100644
index 041f3e8..0000000
--- a/share/schemas/org/apache/avro/ipc/trace/avroTrace.avpr
+++ /dev/null
@@ -1,82 +0,0 @@
-{
- "protocol" : "AvroTrace",
- "namespace" : "org.apache.avro.ipc.trace",
- "types" : [ {
- "type" : "enum",
- "name" : "SpanEvent",
- "symbols" : [ "SERVER_RECV", "SERVER_SEND", "CLIENT_RECV", "CLIENT_SEND" ]
- }, {
- "type" : "fixed",
- "name" : "ID",
- "size" : 8
- }, {
- "type" : "record",
- "name" : "TimestampedEvent",
- "fields" : [ {
- "name" : "timeStamp",
- "type" : "long"
- }, {
- "name" : "event",
- "type" : [ "SpanEvent", "string" ]
- } ]
- }, {
- "type" : "record",
- "name" : "Span",
- "fields" : [ {
- "name" : "traceID",
- "type" : "ID"
- }, {
- "name" : "spanID",
- "type" : "ID"
- }, {
- "name" : "parentSpanID",
- "type" : [ "ID", "null" ]
- }, {
- "name" : "messageName",
- "type" : "string"
- }, {
- "name" : "requestPayloadSize",
- "type" : "long"
- }, {
- "name" : "responsePayloadSize",
- "type" : "long"
- }, {
- "name" : "requestorHostname",
- "type" : [ "string", "null" ]
- }, {
- "name" : "responderHostname",
- "type" : [ "string", "null" ]
- }, {
- "name" : "events",
- "type" : {
- "type" : "array",
- "items" : "TimestampedEvent"
- }
- }, {
- "name" : "complete",
- "type" : "boolean"
- } ]
- } ],
- "messages" : {
- "getAllSpans" : {
- "request" : [ ],
- "response" : {
- "type" : "array",
- "items" : "Span"
- }
- },
- "getSpansInRange" : {
- "request" : [ {
- "name" : "start",
- "type" : "long"
- }, {
- "name" : "end",
- "type" : "long"
- } ],
- "response" : {
- "type" : "array",
- "items" : "Span"
- }
- }
- }
-}
\ No newline at end of file
diff --git a/share/test/schemas/http.avdl b/share/test/schemas/http.avdl
new file mode 100644
index 0000000..52313e7
--- /dev/null
+++ b/share/test/schemas/http.avdl
@@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/** NOTE: This structure was inspired by HTTP and deliberately skewed to get the effects that needed testing */
+
+ at namespace("org.apache.avro.test.http")
+protocol Http {
+
+ enum NetworkType {
+ IPv4,
+ IPv6
+ }
+
+ record NetworkConnection {
+ NetworkType networkType;
+ string networkAddress;
+ }
+
+ record UserAgent {
+ union { null, string } id = null;
+ string useragent;
+ }
+
+ enum HttpMethod {
+ GET,
+ POST
+ }
+
+ record QueryParameter {
+ string name;
+ union { null, string } value; // Sometimes there is no value.
+ }
+
+ record HttpURI {
+ HttpMethod method;
+ string path;
+ array<QueryParameter> parameters = [];
+ }
+
+ record HttpRequest {
+ UserAgent userAgent;
+ HttpURI URI;
+ }
+
+ record Request {
+ long timestamp;
+ NetworkConnection connection;
+ HttpRequest httpRequest;
+ }
+
+}
diff --git a/share/test/schemas/reserved.avsc b/share/test/schemas/reserved.avsc
new file mode 100644
index 0000000..40f4849
--- /dev/null
+++ b/share/test/schemas/reserved.avsc
@@ -0,0 +1,2 @@
+{"name": "org.apache.avro.test.Reserved", "type": "enum",
+ "symbols": ["default","class","int"]},
diff --git a/share/test/schemas/specialtypes.avdl b/share/test/schemas/specialtypes.avdl
new file mode 100644
index 0000000..623e016
--- /dev/null
+++ b/share/test/schemas/specialtypes.avdl
@@ -0,0 +1,98 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/** NOTE: This structure is intended to contain names that are likely to cause collisions with the generated code. */
+
+ at namespace("org.apache.avro.test.specialtypes")
+protocol LetsBreakIt {
+
+ enum Enum {
+ builder,
+ Builder,
+ builderBuider,
+ value,
+ this
+ }
+
+ record One {
+ Enum this;
+ }
+
+ record Two {
+ union { null, string } this = null;
+ string String;
+ }
+
+ record Variables {
+ One this;
+
+ One Boolean;
+ One Integer;
+ One Long;
+ One Float;
+ One String;
+ }
+
+ enum Boolean {
+ Yes,
+ No
+ }
+
+ record String {
+ string value;
+ }
+
+ record builder {
+ One this;
+ Two builder;
+ }
+
+ record builderBuilder {
+ One this;
+ Two that;
+ }
+
+ record Builder {
+ One this;
+ Two that;
+ }
+
+ record value {
+ One this;
+ Two that;
+ }
+
+ record Types {
+ Boolean one;
+ builder two;
+ Builder three;
+ builderBuilder four;
+ String five;
+ value six;
+ }
+
+ record Names {
+ string Boolean;
+ string builder;
+ string Builder;
+ string builderBuilder;
+ string String;
+ string value;
+ }
+
+}
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/python-avro.git
More information about the debian-med-commit
mailing list