[Teammetrics-discuss] Idea for improving performance.
Andreas Tille
andreas at an3as.eu
Fri Jun 29 22:12:06 UTC 2012
On Fri, Jun 29, 2012 at 01:21:00PM +0530, Vipin Nair wrote:
>
> Currently we have a script that parses data from a text file and
> produces the graphs. The data in the text file is generated by
> processing the data in the database.
Uhmm, yes - that's too slow for the purpose, definitely.
> What I propose is that we maintain another database that contains the
> data from the text files.
This sounds like a detour. I'd rather suggest to use the *same*
database and do the selects that are creating the text files insert the
data stright into an aggregation table.
You could also enhance the upload_history.py script to rather inject the
data into this table instead of creating *.txt + graph. This could be
triggered using a command line switch.
> So we can add a script that updates the data
> in the new database and run it with other scripts each month. So
> instead of the heavy processing (some sql queries take upto 9ms to
> fetch the data) every time, all data access reduces to a simple select
> query and joins are completely avoided. This makes the site faster and
> reduced the load on the server drastically.
That's pretty clear - everything else will not work with a reasonable
speed.
> Instead of storing all this processed data in Postgres, this data can
> be stored in a nosql database and all our data retrieval cost can be
> made to O(1). Irrespective of the choice of the database, having a 2nd
> database will improve the performance dramatically.
>
> Why Nosql database?
> 1) Our data is essentially read-only(from users perspective)
> 2) No data joins required.
>
> Advantages of NoSql database
> 1) Fast (Very fast)
> 2) Efficient (All queries will be O(1) in our use case)
Well, if we would have this aggregation table the effort is reduced to a
pretty simple select. IMHO PostgreSQL is not *that* slow to not bear
the load. Using another new technique to win some microseconds does not
seem a sensible strategy to me.
> Which NoSql database?
> We have multiple options here, I'll pick the once I am comfortable with.
>
> 1) CouchDB
> - CouchDB stores all the data in JSON format so we can directly serve
> the data without any serialization.
> - Client side Javascript can directly access CouchDB so intermediate
> processing(in python) can be avoided.
> - Getting the security setting right is a pain but keeping a web
> server in front helps a lot.
> - I have good experience with CouchDb and have used it in a
> production quality app.
>
> 2) Redis - Ideal database for our use case.
> - In memory database but persistant, so data access times are reduced greatly.
> - Version is Squeeze is old, but I am not sure if it affects us.
> - I have used it few times but never in a production version app.
>
>
> So here is the basic monthly workflow, if we do the above steps:
>
> 1) Run script to populate current database
>
> 2) Run scripts to generate the text file
>
> 3)* Run scripts to store the same data (processed in step2) and
> populate the 2nd db.
>
> The web interface accesses the data in the 2nd database and presents
> the data to the user.
>
>
>
> What do you guys think of this?
The idea is good in principle but the solution is overengineering. Just
create an aggregation table and just fill it straight after the data
import. The web interface could do simple selects from this table.
Kind regards
Andreas.
--
http://fam-tille.de
More information about the Teammetrics-discuss
mailing list