Personal tools
You are here: Home > CS 342 Blog Entries > Levi Broadnax Blog > Blog Entry for Week Ending March 31

Blog Entry for Week Ending March 31

by Broadnax, Levi A. last modified Mar 31, 2017 05:53 PM

This week I managed to exceed my expectations for my web crawler project. Many believe that computers can never truly create art, but they have never seen a collage of every image on their favorite website amalgamated into an orchestra; a chaotic yet calculated tango leaving nothing to be desired. There have been a mountain of troubles, and a journey through my commit times range through any hour of day or night; waking up with blood shot eyes at my computers attempting to read and decipher imposssible illegible code, but never can I complain because this project really showed me that programming is something I am good at.

The largest hurdle for me was getting the images to download in a reasonable amount of time, and without letting the process die. XKCD.com, for example, has ~150MB of comic strips across thousands of images, and for hours I spent trying to use sleep methods as I have used in Java in years past. Using my JavaScript experience I knew promises were a better way, but Python had a different plan. The asyncio package included in the initial AOSA book allowed for use of the coroutine decorator pattern, I could safely download images and yield on completion. The time increase for downloading and rendering a new website took no more than a few minutes after a night of work.

On this project I wanted to give myself a goal that I felt was possibly out of reach, and I've exceeded that. In addition to all of the goals I had, I added a robots.txt parser to be in compliance with a robots.txt on any site, just in case a user wants to fork my project and use it in an environment where they are constantly crawling websites. I have a goal to make a constantly changing site using this web crawler, and to continue working on this project to learn more about Python, which I have not used prior to this class outside of a single file at work.

This week I plan to spend some time ironing out bugs that have been hindering progress for our group gamification progress, and perhaps set up an Amazon Web Services instance with the free GitHub trial to run my web crawler in a way that is faster than at home.

Until next time,

Levi Broadnax

CS emphasis accredited by

ABET logo

Contact Us

Computer Science Department
UW Oshkosh
800 Algoma Blvd.
Oshkosh, WI 54901

Phone: (920) 424-2068
Fax: (920) 424-0045
Building: Halsey Science Hall


Rooms: 229 (general office), 221 (David Furcy, chair)

Email: Send mail to chair at cschair@uwosh.edu