Noah Levitt
|
6a11e1da2a
|
fix noVNC submodule path since brozzler webconsole has moved
|
2016-06-28 16:15:16 +00:00 |
|
Noah Levitt
|
cb4a16e58c
|
handle new bucket format in brozzler-webconsole
|
2016-06-28 00:13:01 +00:00 |
|
Noah Levitt
|
89474cb430
|
convert command-line executables to entry_points console_scripts, best practice according to Python Packaging Authority (eases testing, etc)
|
2016-06-27 14:12:56 -05:00 |
|
Noah Levitt
|
e4f8efe376
|
make brozzler-webconsole a part of the main brozzler package, using optional "extras_require" dependencies
|
2016-06-27 12:43:24 -05:00 |
|
Noah Levitt
|
366e467501
|
enhancements to the page thumbnail on the site page
|
2016-06-22 23:09:27 +00:00 |
|
Noah Levitt
|
9b3f3809cc
|
expose full rethinkdb entry as yaml on job page
|
2016-06-22 22:29:07 +00:00 |
|
Noah Levitt
|
510456eef2
|
order the page thumbnails on site page by least number of hops, so the seed shows up first
|
2016-06-22 21:20:00 +00:00 |
|
Noah Levitt
|
2038598f41
|
fix bug in case no outlinks are found, make brozzler.browser.browse_page() return an empty set instead of a set with one element which is an empty string {''}
|
2016-06-22 17:43:53 +00:00 |
|
Noah Levitt
|
d198a69e45
|
recurse through all frames to find outlinks
|
2016-06-22 11:39:31 -05:00 |
|
Noah Levitt
|
8d7bb582cb
|
back to a dev version number
|
2016-06-16 14:19:00 -05:00 |
|
Noah Levitt
|
6237ed3f34
|
update url
|
2016-06-16 14:18:24 -05:00 |
|
Noah Levitt
|
9c2fe25dd0
|
back to a dev version number
|
2016-06-16 14:13:03 -05:00 |
|
Noah Levitt
|
b2d4cc5ff0
|
beta version number for pypi upload
|
2016-06-16 14:11:45 -05:00 |
|
Noah Levitt
|
81d709eed0
|
bump version number
|
2016-06-16 13:56:08 -05:00 |
|
Noah Levitt
|
182cbfd0ce
|
bump version so we can upload to pypi and fix the readme
|
2016-05-11 12:10:23 -07:00 |
|
Noah Levitt
|
dd2211df31
|
yes you can install brozzler from the outside world now!
|
2016-05-11 12:09:09 -07:00 |
|
Noah Levitt
|
6f6216e432
|
catch exception from rethinkdb when unregistering from the service registry at shutdown
|
2016-05-11 00:46:50 +00:00 |
|
Noah Levitt
|
1141c5951e
|
add psutil dependency
|
2016-05-09 17:19:53 -07:00 |
|
Noah Levitt
|
464da5c3a6
|
avoid errors with old versions of pip or non-utf-8 locales by specifying the encoding of README.rst
|
2016-05-07 01:46:15 +00:00 |
|
Noah Levitt
|
053767d393
|
bump version again
|
2016-05-05 10:37:58 -07:00 |
|
Noah Levitt
|
cea192b4b3
|
copy over latest behaviors and stuff from umbra
|
2016-05-05 00:58:26 -07:00 |
|
Noah Levitt
|
0af00bb3d5
|
support for host rules in outlink scoping
|
2016-05-03 20:52:22 +00:00 |
|
Noah Levitt
|
df61e55b6b
|
add license headers
|
2016-04-25 20:02:11 +00:00 |
|
Noah Levitt
|
2825ffea15
|
support for extra "blocks" and "accepts" scope rules
|
2016-04-21 22:22:44 +00:00 |
|
Noah Levitt
|
568a553432
|
use the uncanonicalized url as part of the sha1 input to generate the page id, since canonicalization was stripping off the #fragment, and we might want to crawl the same url with different fragments (and there's no option to GoogleURLCanonicalizer to not strip the fragment)
|
2016-04-21 22:01:49 +00:00 |
|
Noah Levitt
|
fee008266f
|
support for one-hop-off (or n-hop-off) scoping
|
2016-04-21 17:41:59 +00:00 |
|
Noah Levitt
|
35b713a2e7
|
little version bump
|
2016-04-07 23:36:05 +00:00 |
|
Noah Levitt
|
919692f9fa
|
pin rethinkdb requirement to 2.3.x (this needs to roughly track deployed version)
|
2016-04-07 23:35:20 +00:00 |
|
Noah Levitt
|
ecb2e44442
|
if youtube-dl fetches pages or makes HEAD requests, look at the responses to determine if the page is html and therefore needs to be browsed; if it doesn't need to be browsed, check if youtube-dl has already fetched it (GET request to final bounce of redirect chain that returned a 200); if not, simply fetch it
|
2016-04-06 17:50:48 -07:00 |
|
Noah Levitt
|
a43b5016e1
|
use a dev version number
|
2016-03-18 02:03:20 +00:00 |
|
Noah Levitt
|
b06381790c
|
honor crawl job stop requests
|
2016-03-08 00:18:54 +00:00 |
|
Noah Levitt
|
d2567f4a13
|
loosen surt req
|
2016-03-02 00:16:58 +00:00 |
|
Noah Levitt
|
4c2ecab856
|
surt==0.3b2 (available on pypi)
|
2015-11-12 02:58:53 +00:00 |
|
Noah Levitt
|
8c69ca3b39
|
giving up on using git revision in version number :( latest issue is when installing a package that calls git to compute a version number, but cwd is some other git project, you get the wrong thing
|
2015-09-24 00:17:33 +00:00 |
|
Noah Levitt
|
9699a40645
|
remove "dev" from version number and switch README to rst
|
2015-09-23 22:35:26 +00:00 |
|
Noah Levitt
|
245078284d
|
pep440 compliant versioning
|
2015-09-23 14:46:57 -07:00 |
|
Noah Levitt
|
2863b7e422
|
goodbye requirements.txt now that we have devpi
|
2015-09-23 00:49:20 +00:00 |
|
Noah Levitt
|
cf91fb1377
|
Revert "use dependency_links instead of requirements.txt in spite of ugliness of --process-dependency-links, #egg=..., so that dependent projects can use brozzler more easily"
Ugh.. too much pain, not worth the time to figure out the magic #egg=
incantation.
This reverts commit 78ca0701651c35bda69122ddf652cbb8d95daeb0.
|
2015-08-26 19:44:04 +00:00 |
|
Noah Levitt
|
78ca070165
|
use dependency_links instead of requirements.txt in spite of ugliness of --process-dependency-links, #egg=..., so that dependent projects can use brozzler more easily
|
2015-08-26 19:22:59 +00:00 |
|
Noah Levitt
|
fd0c3322ee
|
update readme, s/umbra/brozzler/ in most places, delete non-brozzler stuff
|
2015-07-13 17:09:39 -07:00 |
|
Noah Levitt
|
783794ca37
|
basic of site/seed crawling with scoping
|
2015-07-09 18:36:07 -07:00 |
|
Noah Levitt
|
4022cc0162
|
simple in-memory frontier with prioritized queues by host
|
2015-07-08 17:44:38 -07:00 |
|
Noah Levitt
|
f254e2eec1
|
it's been stable, call it 1.0
|
2015-06-13 11:30:01 -07:00 |
|
Noah Levitt
|
c5c642a990
|
support for simple behavior that clicks on elements matching configured css selector; and one such behavior for acalog sites ARI-3775
|
2015-01-26 16:58:12 -08:00 |
|
Noah Levitt
|
0647df1ab9
|
behaviors.yaml to configure behaviors, in preparation for "simple" behavior support
|
2015-01-26 16:01:53 -08:00 |
|
Noah Levitt
|
ed92f3bd53
|
for the version string, use abbreviated commit hash instead of attempting to use the branch name
|
2014-05-29 23:33:14 -07:00 |
|
Noah Levitt
|
bef57e2819
|
for version string, try to handle case where head is detached
|
2014-05-29 20:57:33 -07:00 |
|
Noah Levitt
|
3127e02cbb
|
fancy --version that includes git branch and timestamp of last commit if available
|
2014-05-29 20:43:00 -07:00 |
|
Noah Levitt
|
1e18c2ca74
|
improve helper utilities
|
2014-05-20 16:44:13 -07:00 |
|
Noah Levitt
|
3e4232f32c
|
refactor umbra.py into controller.py and browser.py, improve class names
|
2014-05-20 02:42:40 -07:00 |
|