566 Commits

Author SHA1 Message Date
Noah Levitt
6a11e1da2a fix noVNC submodule path since brozzler webconsole has moved 2016-06-28 16:15:16 +00:00
Noah Levitt
cb4a16e58c handle new bucket format in brozzler-webconsole 2016-06-28 00:13:01 +00:00
Noah Levitt
98915b3d86 fix brozzler.svg symlink 2016-06-27 20:01:35 +00:00
Noah Levitt
89474cb430 convert command-line executables to entry_points console_scripts, best practice according to Python Packaging Authority (eases testing, etc) 2016-06-27 14:12:56 -05:00
Noah Levitt
e4f8efe376 make brozzler-webconsole a part of the main brozzler package, using optional "extras_require" dependencies 2016-06-27 12:43:24 -05:00
Noah Levitt
08a9636e95 remove crufty docker and no-docker scripts 2016-06-27 00:11:37 -05:00
Noah Levitt
55fda6e892 note python 3.4 requirement in readme 2016-06-25 15:44:27 -05:00
Noah Levitt
366e467501 enhancements to the page thumbnail on the site page 2016-06-22 23:09:27 +00:00
Noah Levitt
9b3f3809cc expose full rethinkdb entry as yaml on job page 2016-06-22 22:29:07 +00:00
Noah Levitt
510456eef2 order the page thumbnails on site page by least number of hops, so the seed shows up first 2016-06-22 21:20:00 +00:00
Noah Levitt
2038598f41 fix bug in case no outlinks are found, make brozzler.browser.browse_page() return an empty set instead of a set with one element which is an empty string {''} 2016-06-22 17:43:53 +00:00
Noah Levitt
d198a69e45 recurse through all frames to find outlinks 2016-06-22 11:39:31 -05:00
Noah Levitt
3b615120d4 Merge branch 'master' of github.com:internetarchive/brozzler
* 'master' of github.com:internetarchive/brozzler:
  back to a dev version number
  update url
  back to a dev version number
  beta version number for pypi upload
  bump version number
  call clearInterval when umbraBehaviorFinished is about to return true (see 1ef528eea7)
  copy over fec.gov behavior from umbra master
2016-06-20 16:48:11 +00:00
Noah Levitt
5ccf5a9dcb fix site warcprox-meta lookup now that it is a native json object rather than a string 2016-06-20 16:48:01 +00:00
Noah Levitt
8d7bb582cb back to a dev version number 2016-06-16 14:19:00 -05:00
Noah Levitt
6237ed3f34 update url 1.1b2 2016-06-16 14:18:24 -05:00
Noah Levitt
9c2fe25dd0 back to a dev version number 2016-06-16 14:13:03 -05:00
Noah Levitt
b2d4cc5ff0 beta version number for pypi upload 2016-06-16 14:11:45 -05:00
Noah Levitt
81d709eed0 bump version number 2016-06-16 13:56:08 -05:00
Noah Levitt
1577cd8926 call clearInterval when umbraBehaviorFinished is about to return true (see 1ef528eea7) 2016-06-16 13:55:17 -05:00
Noah Levitt
98acc8dc92 copy over fec.gov behavior from umbra master 2016-06-16 13:53:28 -05:00
Noah Levitt
d75e8c394a switch flask requirement to recent release, suggest gunicorn for running the app 2016-06-15 22:00:39 +00:00
Noah Levitt
b0ed4b8128 Merge pull request #7 from galgeek/disable_extensions
disable browser extensions
2016-06-06 11:57:46 -07:00
Noah Levitt
c63c21c30a Merge pull request #6 from ato/document-config
Document the job config format
2016-06-06 11:56:45 -07:00
Barbara Miller
1c1237d07e disable browser extensions 2016-05-27 22:51:38 -07:00
Noah Levitt
92f8f7c16d Merge pull request #5 from ato/fix-brozzler-new-site
brozzler-new-site: Fix warcprox_meta's default value and json import
2016-05-17 10:14:48 -07:00
Alex Osborne
484805fbda proxy is not supposed to have http:// prefix
Looks like the prefixes are added by BrozzleWorker._fetch_url()
2016-05-17 16:20:38 +10:00
Alex Osborne
02af30edd4 Document the job config format 2016-05-17 15:20:09 +10:00
Alex Osborne
a939689d44 Fix warcprox_meta's default value and json import 2016-05-17 13:51:42 +10:00
Noah Levitt
182cbfd0ce bump version so we can upload to pypi and fix the readme 2016-05-11 12:10:23 -07:00
Noah Levitt
dd2211df31 yes you can install brozzler from the outside world now! 2016-05-11 12:09:09 -07:00
Noah Levitt
6f6216e432 catch exception from rethinkdb when unregistering from the service registry at shutdown 2016-05-11 00:46:50 +00:00
Noah Levitt
c6e0e7c507 correctly handle site with no pages (which means the seed was blocked by robots.txt) in frontier.seed_page 2016-05-11 00:45:47 +00:00
Noah Levitt
317a5eb99d without sudo, psutil.net_connections() raises psutil.AccessDenied on mac; in this case, silently try running chrome on the unvetted configured port 2016-05-09 17:25:14 -07:00
Noah Levitt
1141c5951e add psutil dependency 2016-05-09 17:19:53 -07:00
Noah Levitt
c12090b3ef oops, no "+" there 2016-05-07 01:46:36 +00:00
Noah Levitt
464da5c3a6 avoid errors with old versions of pip or non-utf-8 locales by specifying the encoding of README.rst 2016-05-07 01:46:15 +00:00
Noah Levitt
1445aa9976 make Site.warcprox_meta a special thing, replacing Site.extra_headers; this way, warcprox_meta is a dictionary in rethinkdb rather than a long json string 2016-05-05 23:24:10 +00:00
Noah Levitt
07e15e26bd Merge pull request #3 from internetarchive/AITFIVE-859
browser.py - Check for open ports before starting Chrome. Open next a…
2016-05-05 16:00:38 -07:00
Adam Miller
1f7f55a14a browser.py - Fix port search logic 2016-05-05 22:55:45 +00:00
Adam Miller
8e84465ff9 browser.py - Check for open ports before starting Chrome. Open next available on conflict 2016-05-05 22:31:07 +00:00
Noah Levitt
053767d393 bump version again 2016-05-05 10:37:58 -07:00
Noah Levitt
8d618ed135 refactor post-behavior stuff into separate interval function for clarity 2016-05-05 10:37:00 -07:00
Noah Levitt
1ef528eea7 do the clearInterval thing when umbraBehaviorFinished is about to return true on all the behaviors (that have that function)... for the record the impetus for this is to stop scrolling so we can take the screenshot 2016-05-05 10:35:06 -07:00
Noah Levitt
5b492ac6f1 remove old facebook behavior, replaced by facebook.js.template (missed this on commit cea192b) 2016-05-05 10:28:01 -07:00
Noah Levitt
5a2ea2cea4 make brozzle-page utility save the screenshot to a file 2016-05-05 10:10:53 -07:00
Noah Levitt
87af7eaa73 Merge pull request #2 from internetarchive/AITFIVE-832
Restructure browser.py to take screenshot after behavior script.
2016-05-05 10:08:21 -07:00
Noah Levitt
31356d526a Merge branch 'master' into AITFIVE-832
* master:
  copy over latest behaviors and stuff from umbra
  support for host rules in outlink scoping
  recover from rethinkdb error updating service registry
2016-05-05 10:06:12 -07:00
Noah Levitt
cea192b4b3 copy over latest behaviors and stuff from umbra 2016-05-05 00:58:26 -07:00
Adam Miller
6e4e28d2df Modifying default.js behavior to stop the interval function when umbraBehaviorFinished returns true
We should do this in all behaviors ultimately to stop the behavior script upon completion
2016-05-05 01:03:57 +00:00