[Soc-coordination] Backend Tools and Infrastructure for DEX - Report 5

Nathan Handler nhandler at ubuntu.com
Tue Aug 23 05:14:16 UTC 2011


Hello everyone,

This is my fifth and final report about dextools for the 2011 Google
Summer of Code program. During the summer, I worked with Matt
Zimmerman and Stefano Zacchiroli to create a replacement web dashboard
for the Debian Derivatives Exchange (DEX). The dashboard displays a
list of projects and their respective tasks and then allows users to
easily update the status of these tasks. The dashboard also contains
two graphs for tracking the progress of projects and providing instant
recognition of all contributions to a project.

At the start of the summer, I knew that I wanted to to work on
improving the DEX infrastructure. I had worked with the team on the
ancient-merges project, and while a lot of good work was accomplished,
the process for tracking our progress was a bit clunky and difficult
to interact with. Anyone who read my initial proposal for the Summer
of Code would probably agree that it was quite vague. I really did not
have a solid plan for what I wanted to accomplish.

This began to change after my initial phone call with my mentor, Matt
Zimmerman. We decided that the first thing I would work on would be a
basic dashboard. Our plan was to have all of the data stored on the
Debian BTS. This would allow Debian Maintainers to benefit from DEX
without needing to learn about DEX's tools and workflows. The
dashboard would support multiple derivatives and multiple projects for
each derivative. Each project would be made up of a list of tasks,
where each task is linked to a BTS bug.

I prepared a mockup and some initial code based on that plan. The
dashboard was able to successfully display a list of BTS bugs and
their information. It generated the lists by assuming that bugs would
be usertagged with a user of 'debian-derivatives at lists.debian.org' and
a tag in the format of 'dex-<derivative>-<project>'. At this point, we
started trying to make sure that the dashboard would work for things
like the ancient-patches, python2.7, and the upcoming large-merges
projects. It did not take long to determine that it would not work for
all of these things. The ancient-patches project started with a simple
list of patch names. While most of these patches eventually ended up
as bugs on the BTS, they did not start this way. The dashboard would
need to support specifying a list of task names. This meant that it
would also need to store data itself and that not all data could be
located on the BTS.

This is when we first started to define what a 'task' is. The
dashboard tracks tasks. It does not directly track bugs or patches. A
task is nothing more than a title and status with an optional
assignee, note, and bug. This made it simple to convert a list of
patch names or a list of bug numbers to a list of tasks.

During the early early stages of the project, the dashboard moved
around a lot. We hoped to have it end up on dex.alioth.debian.org, but
we were not sure whether Alioth would meet all of our needs. This was
right around the time of the Alioth migration. Thanks to some tips, I
figured out that I could use a simple cronjob on wagner to pull the
git repository for vasks. This would allow the running instance of the
dashboard to always be up-to-date. The Alioth admins have also been
quite supportive. They have installed several additional packages for
me that were necessary to keep the dashboard running.

>From the start, Matt felt it was important to have some public
documentation about the project. This would allow us to point people
to something other than my blog posts. I decided to put this
information on the wiki. At one point, I also had a copy of the wiki
page stored in the git repository. This file was updated via a
cronjob, and I then committed and pushed it when I had code changes. I
ultimately decided that the best approach would be to maintain the
page on the wiki. A basic README file is included in the git
repository that simply links to the wiki page. On wagner, I have a
cronjob running to pull a copy of the wiki page so that it can be
accessed via the web.

Since Matt was traveling/moving this Summer, he arranged for Stefano
Zacchiroli to fill in for him as my mentor temporarily. It was at this
point in the summer that I began working on the first graph script.
The goal was to have a script to parse the list of tasks and generate
a graph showing the number of open tasks versus time. This would allow
us to easily track our progress, estimate when a project will be
complete, and detect periods of slowing down. We decided that while we
liked the Ubuntu burndown charts, they were a bit more complex than we
needed, so we did not reuse their code.

A dashboard that can't be modified is not that useful. An early goal
of this project was to allow users to update the dashboard via the
web. This proved to be a bit more difficult than I initially thought.
First, all of the html files that make up the dashboard are generated
by a script. This means that if you modify one of the html files
directly, all changes will be lost the next time the script runs. When
you make a change on a web form and hit submit, you usually expect to
see the changes the next time you visit the page. In order to
accomplish this, I made the form processing script directly modify the
html files. However, at this point, we did not have a way to locally
store extended information about tasks, causing the python script to
delete all changes made via the web form. This issue was rectified
fairly quickly. There is now a 'changes' file that stores all
information submitted via the web form. The python script reads this
file and applies the changes when it is generating the html files. A
second problem concerned the text inputs that were being used in most
of the cells on the table. By default, some fields would have text
cut-off if the string was too long. Text inputs have a 'size'
attribute that is meant to specify the number of visible characters. I
attempted to set this attribute in a script to the length of the
input's value. Strangely, the inputs ended up with a lot of extra
whitespace following the text. This resulted in the table being
stretched horizontally. After many hours of research and testing, I
was unable to find a solution to this problem. We ultimately decided
to simply remove the 'size' attribute and accept that text will be
cut-off.

My code got some early testing from Allison Randal and other people
working on the dex-ubuntu-python2.7 project. They were a huge help in
providing feedback and finding bugs in the code. While the dashboard
was unable to meet all of their needs (due to being in the early
stages of development), I hope that it at least helped to make it
easier to track the project's status.

Whenever a script is accepting input from a user, it is important to
validate it. Although the dashboard uses a select box to limit the
choices for the 'status', it is still trivial to submit an arbitrary
status for a task. That is why I added some validation to the form
processing script. It will ignore any unknown values, ensuring that
all tasks have a valid status. The other fields are a bit more tricky.
They were designed to be arbitrary text fields. As a result, almost
anything can be entered. There is no way to tell if something is a
title or a person. We thought of a few ways we could change this, but
most of them involved locking down the dashboard and restricting
changes. The old dashboard treated the fields as arbitrary text, so we
decided to not include any validation in the new one.

One popular feature request was the ability to link directly to a
project. Early versions of the dashboard had one main dex.html page
that used javascript to pull in the various projects. To get around
this, I first trying using the query string to allow users to specify
a distribution and project. While this worked, it resulted in long and
ugly URLs. It took a while for me to be able to implement a cleaner
solution. However, I eventually split each project into its own static
HTML file. This meant that people could simply copy the URL from the
address bar to link to a particular project. The main reason for doing
this was that we felt the typical person would be interested in
working on a specific project; most people won't be interested in
navigating between the different projects. This change also allowed
links that utilized the #graph anchor to function properly. Before,
due to the way the page went about loading the project, trying to
specify a specific project and #graph in the URL did not work
properly.

There is a plugin for jquery that allows tables to be sorted by
particular columns. For tables that just contain text, this works fine
without any issues. However, once I started using select boxes and
text inputs, things got a bit messed up. After some research, I
finally figured out how to use addParser to instruct the tablesorter
plugin in how to sort each column in the table.

In some of the earlier versions, I used some zebra stripes to make the
table easy to read. Thanks to a suggestion from Paul Wise (pabs), I
got rid of this striping and modified the code to color the rows based
on the task's status. Complete tasks are green, in progress tasks are
yellow, and incomplete tasks are red. This makes it very easy to find
tasks that still need work as well as measure the overall status of a
project.

Sometimes, a large task list can feel very intimidating. That is why I
added the ability to hide completed tasks with the check of a box.
Matt wanted to take this one step farther; he wanted the ability to
also hide tasks with a closed bug. This is why you will find 2
checkboxes at the top of the dashboard that allow you to custimize
which tasks are visible. Eventually, I might add support for
specifying this in the URL to allow groups to link to a particular
view of the dashboard.

In order to make it as easy as possible for new contributors to get
involved with DEX projects, I wanted to have a way to document what a
project is about and how to handle tasks. I decided to use the wiki
for this. All of the per-project documentation lives at
http://wiki.debian.org/DEX/<distro>/<project>. You can even take
advantage of wiki markup to format the documentation. The script will
parse the HTML files generated by the wiki, do some minor cleanup, and
then display the documentation at the top of the page. My original
plan was to use the ?action=raw output, but the lack of formatting
proved too restrictive. My second plan was to try and use something
like BeautifulSoup or a regex to filter non-whitelisted tags out of
the wiki-generated html. This failed and proved to be unnecessary. The
wiki does a pretty decent job of preventing malicious markup from
ending up on the dashboard.

There is a second graph that is displayed on the dashboard. This graph
shows the number of tasks each person has completed. The graph is
sorted so that the tallest bars on the left, and the goal is to
provide some immediate recognition of work done by contributors.

Since we essentially allow anyone to go ahead and modify the
dashboard, we knew it was important to have a method in place for
dealing with abuse. Any malicious edits made on the wiki can easily be
reverted. This idea of reverting to an earlier revision is what
inspired me to use a second local git repository to store the
projects/ directory. This directory stores all of the changes made via
the web interface. Whenever the form is submitted or the script runs
to update the list of tasks, any changes made to the projects/
directory are committed. This means that if a malicious user decides
to delete everything on the dashboard, we can quickly and easily
revert the changes. The revision history might also be interesting to
analyze at a future date.

When multiple people are collaborating on a project, an issue tracker
can be quite useful. For some reason, we did not set one up until
quite late into the project. Despite that, we are still using it to
track known bugs and feature requests. The issue tracker will become
even more useful once the dashboard starts to be utilized by more
people.

I spent a fair bit of time working on having the dashboard immediately
respond to changes. For example, if you change the status of a task,
the row color will instantly update. The form will also be submitted
in the background, eliminating the need to hit 'submit' to save the
change. I also use ajax in some new dialogs that allow the user to
create new projects and/or tasks via the web. While these forms are
submitted in the background via ajax, I am unable to have the changes
show up on the dashboard immediately. The generation of the task list
is a bit slow, so having the script run more often is not really a
good idea.

The next step for the dashboard is to really open it up for testing.
We hope to start up a new DEX project shortly. This will allow us to
see which features work and which need fixing. It will also help us
find some of the bugs that are probably hiding in the code.

Finally, if you are interested in helping out with DEX or dextools, my
code is available in a git repository
(http://anonscm.debian.org/gitweb/?p=dex/gsoc2011.git) on alioth.
There is also the issue tracker
(https://alioth.debian.org/tracker/index.php?group_id=100600&atid=413120)
where you can report bugs and request new features. Finally, you can
join the debian-derivatives mailing list, join #debian-derivatives on
OFTC, or contact me directly via email or on IRC (nhandler).

I have really enjoyed working on dextools this summer. I would like to
thank Matt Zimmerman, Stefano Zacchiroli, and everyone else who helped
with the project. I look forward to working with all of you more in
the future.

Thanks,
Nathan Handler (nhandler)



More information about the Soc-coordination mailing list