Personal tools
You are here: Home > CS 342 Blog Entries > Levi Broadnax Blog > Blog Entry for the Week Ending March 17

Blog Entry for the Week Ending March 17

by Broadnax, Levi A. last modified Mar 31, 2017 05:23 PM

These past 2 weeks I have accomplished quite a bit. I have begun to make major strides with my web-crawler, and I have removed much of the first bugs from our team project where we are building a Plone gamification module. Using TeamCity, I set up a local continuous integration server which will be immensely useful when I begin to use the web crawler to generate a new website. In addition to using TC for CI, I have set the version control system root to the main branch on my GitHub repository, luckily with TC I can easily establish more VCS roots using different user agents to allow myself to do as little grunt work, and let the server handle it. I have also set up a performance monitor and issue tracker utilizing the same software. The performance monitor will be increasingly important as I progress, to make sure that each step does not increase the web-crawler processing time by too much.

I have made some major headway on the actual development of the web-crawler-generator as well. The first clone of the repository was disappointing, all 17 of the pre-written unit tests failed. I could fix one of the failures by removing a deprecated parameter from one of the project functions, and only 12 unit tests were failing from an odd error: cannot register to a frozen router. The solution to all my problems was to change requirements.txt to downgrade the aiohttp version. This taught me a valuable less not to use >= in a dependency list because you never know what will later become deprecated.

I found that the WebCrawler was not in compliance with robots.txt specifications, and took it upon myself to extend on the already included urllib library, and use urllib.robotparser. On top of the normal issues that arise while using a new library, I found my implementation of the robots.txt parser increased the runtime exponentially, now taking an hour to complete compared to the original crawler. I have some ideas on how to fix this and will have completed those fixes before the next blog entry.

I have almost finished the quiet reporting functionality that I established in my requirements document. This prints the output to a single line the same way many CLI installers work, instead of the verbose output from before. I will have completed this reporting feature and the entirety of the crawling part by the end of next week, and have a proof of concept on the web generator functionality.

Until next time,

Levi Broadnax

CS emphasis accredited by

ABET logo

Contact Us

Computer Science Department
UW Oshkosh
800 Algoma Blvd.
Oshkosh, WI 54901

Phone: (920) 424-2068
Fax: (920) 424-0045
Building: Halsey Science Hall

Rooms: 229 (general office), 218 (George Thomas, chair)

Email: Send mail to chair at