Tuesday, February 17, 2009

Why Django ORM Sucks : It takes a hell lot of memory in processing.

Following a overwhelming response from readers we are now,

Moved to : http://www.nitinh.com/2009/02/why-django-orm-sucks-it-takes-a-hell-lot-of-memory-in-processing/

15 comments:

  1. Use Paginator (http://www.djangoproject.com/documentation/models/pagination/) to process your rows in chunks of 1000 or so:

    p = Paginator(Rating.objects.all(), 1000)
    for i in xrange(p.num_pages):
    page = p.page(i)
    for rating in page.object_list:
    ....

    That should reduce the memory overhead when processing large rowsets.

    ReplyDelete
  2. Sorry, should be "for i in p.page_range". But you get the idea.

    ReplyDelete
  3. Django's objects are not identity mapped. So in effect you're loading the movie table in memory, multiple times. What you should do is reuse the first example and replace this:
    cache.set(r.movie.id, [(r.user.id, r.rating)], 86400)

    by that:
    cache.set(r.movie_id, [(r.user_id, r.rating)], 86400)

    memory usage should be the size of the rating table.

    ReplyDelete
  4. Are you running with DEBUG=True, by any chance? If so, Django caches all the SQL statements for debugging. Try turning debug off in your settings and see if that helps.

    ReplyDelete
  5. You are doing the wrong stuff using the wrong tools.
    If you want a memory-stored database - setup MySQL cluster and stop worrying. If you want to store the non-ORM database data in memcache - you don't need to ignite the ORM (that keeps an object for every record you process), or at least you could've manually destruct the already processed objects.

    ReplyDelete
  6. As per the suggestion i changed
    cache.set(r.movie.id, [(r.user.id, r.rating)], 86400)

    to
    cache.set(r.movie_id, [(r.user_id, r.rating)], 86400)

    The Memory consumption is drastically reduced but still, don't know why the memory consumption is gradually increasing.

    ReplyDelete
  7. Are you using DEBUG=True ? That might be it.

    ReplyDelete
  8. I second the DEBUG=True bit, nothing kills memory faster than that. If you really need to, try forcing GC to happen after every so many rows with DEBUG disabled, and you should be able to cap your memory usage out.

    ReplyDelete
  9. http://docs.djangoproject.com/en/dev/ref/models/querysets/#iterator

    ReplyDelete
  10. The DEBUG MODE is False.. So that is not the problem

    ReplyDelete
  11. toxik : Iterator didn't helped I tried that already.

    ReplyDelete
  12. TheOne, I don't suppose you have the memory/time stats for the .iterator() version - just out of interest.

    ReplyDelete
  13. I'm also surprised that .iterator didn't help. Using .iterator instead of .__iter__ bypasses the queryset's internal result cache and prevents it from being filled up with the entire contents of the table -- which is probably not desirable in this situation and could explain why your memory consumption is still increasing.

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
  15. Nubians, likethe local tribes gene pool shed checked tended to be either skeletaland breastless or LARGE. I find alittle tatoo place and go inside.
    malesubmission stories nifty
    womens stories of beastiality
    family nude together first time stories
    free ture sex stories gay
    bdsm beastality juice stories
    Nubians, likethe local tribes gene pool shed checked tended to be either skeletaland breastless or LARGE. I find alittle tatoo place and go inside.

    ReplyDelete