Noah Levitt
|
053767d393
|
bump version again
|
2016-05-05 10:37:58 -07:00 |
|
Noah Levitt
|
cea192b4b3
|
copy over latest behaviors and stuff from umbra
|
2016-05-05 00:58:26 -07:00 |
|
Noah Levitt
|
0af00bb3d5
|
support for host rules in outlink scoping
|
2016-05-03 20:52:22 +00:00 |
|
Noah Levitt
|
df61e55b6b
|
add license headers
|
2016-04-25 20:02:11 +00:00 |
|
Noah Levitt
|
2825ffea15
|
support for extra "blocks" and "accepts" scope rules
|
2016-04-21 22:22:44 +00:00 |
|
Noah Levitt
|
568a553432
|
use the uncanonicalized url as part of the sha1 input to generate the page id, since canonicalization was stripping off the #fragment, and we might want to crawl the same url with different fragments (and there's no option to GoogleURLCanonicalizer to not strip the fragment)
|
2016-04-21 22:01:49 +00:00 |
|
Noah Levitt
|
fee008266f
|
support for one-hop-off (or n-hop-off) scoping
|
2016-04-21 17:41:59 +00:00 |
|
Noah Levitt
|
35b713a2e7
|
little version bump
|
2016-04-07 23:36:05 +00:00 |
|
Noah Levitt
|
919692f9fa
|
pin rethinkdb requirement to 2.3.x (this needs to roughly track deployed version)
|
2016-04-07 23:35:20 +00:00 |
|
Noah Levitt
|
ecb2e44442
|
if youtube-dl fetches pages or makes HEAD requests, look at the responses to determine if the page is html and therefore needs to be browsed; if it doesn't need to be browsed, check if youtube-dl has already fetched it (GET request to final bounce of redirect chain that returned a 200); if not, simply fetch it
|
2016-04-06 17:50:48 -07:00 |
|
Noah Levitt
|
a43b5016e1
|
use a dev version number
|
2016-03-18 02:03:20 +00:00 |
|
Noah Levitt
|
b06381790c
|
honor crawl job stop requests
|
2016-03-08 00:18:54 +00:00 |
|
Noah Levitt
|
d2567f4a13
|
loosen surt req
|
2016-03-02 00:16:58 +00:00 |
|
Noah Levitt
|
4c2ecab856
|
surt==0.3b2 (available on pypi)
|
2015-11-12 02:58:53 +00:00 |
|
Noah Levitt
|
8c69ca3b39
|
giving up on using git revision in version number :( latest issue is when installing a package that calls git to compute a version number, but cwd is some other git project, you get the wrong thing
|
2015-09-24 00:17:33 +00:00 |
|
Noah Levitt
|
9699a40645
|
remove "dev" from version number and switch README to rst
|
2015-09-23 22:35:26 +00:00 |
|
Noah Levitt
|
245078284d
|
pep440 compliant versioning
|
2015-09-23 14:46:57 -07:00 |
|
Noah Levitt
|
2863b7e422
|
goodbye requirements.txt now that we have devpi
|
2015-09-23 00:49:20 +00:00 |
|
Noah Levitt
|
cf91fb1377
|
Revert "use dependency_links instead of requirements.txt in spite of ugliness of --process-dependency-links, #egg=..., so that dependent projects can use brozzler more easily"
Ugh.. too much pain, not worth the time to figure out the magic #egg=
incantation.
This reverts commit 78ca0701651c35bda69122ddf652cbb8d95daeb0.
|
2015-08-26 19:44:04 +00:00 |
|
Noah Levitt
|
78ca070165
|
use dependency_links instead of requirements.txt in spite of ugliness of --process-dependency-links, #egg=..., so that dependent projects can use brozzler more easily
|
2015-08-26 19:22:59 +00:00 |
|
Noah Levitt
|
fd0c3322ee
|
update readme, s/umbra/brozzler/ in most places, delete non-brozzler stuff
|
2015-07-13 17:09:39 -07:00 |
|
Noah Levitt
|
783794ca37
|
basic of site/seed crawling with scoping
|
2015-07-09 18:36:07 -07:00 |
|
Noah Levitt
|
4022cc0162
|
simple in-memory frontier with prioritized queues by host
|
2015-07-08 17:44:38 -07:00 |
|
Noah Levitt
|
f254e2eec1
|
it's been stable, call it 1.0
|
2015-06-13 11:30:01 -07:00 |
|
Noah Levitt
|
c5c642a990
|
support for simple behavior that clicks on elements matching configured css selector; and one such behavior for acalog sites ARI-3775
|
2015-01-26 16:58:12 -08:00 |
|
Noah Levitt
|
0647df1ab9
|
behaviors.yaml to configure behaviors, in preparation for "simple" behavior support
|
2015-01-26 16:01:53 -08:00 |
|
Noah Levitt
|
ed92f3bd53
|
for the version string, use abbreviated commit hash instead of attempting to use the branch name
|
2014-05-29 23:33:14 -07:00 |
|
Noah Levitt
|
bef57e2819
|
for version string, try to handle case where head is detached
|
2014-05-29 20:57:33 -07:00 |
|
Noah Levitt
|
3127e02cbb
|
fancy --version that includes git branch and timestamp of last commit if available
|
2014-05-29 20:43:00 -07:00 |
|
Noah Levitt
|
1e18c2ca74
|
improve helper utilities
|
2014-05-20 16:44:13 -07:00 |
|
Noah Levitt
|
3e4232f32c
|
refactor umbra.py into controller.py and browser.py, improve class names
|
2014-05-20 02:42:40 -07:00 |
|
Noah Levitt
|
cc0ffee508
|
only websocket-client-py3==0.13.1 works right with python3 at the moment, see https://github.com/liris/websocket-client/issues/84
|
2014-05-20 00:57:07 -07:00 |
|
Noah Levitt
|
f3a540b92d
|
setup.py - include behaviors.d/*.js in installation
|
2014-03-13 00:00:32 -07:00 |
|
Noah Levitt
|
4935d55b6e
|
specify classifier 'Programming Language :: Python :: 3.3' since websocket-client-py3 requires python 3.3, doesn't work with 3.2
|
2014-02-12 12:17:41 -08:00 |
|
Eldon
|
bd0183058d
|
Inccognito messes with currently running chromium instances, disable it
|
2014-01-23 18:26:20 -05:00 |
|
Eldon
|
4852fbf29f
|
Update setup.py, get rid of unused dependency
|
2014-01-23 16:18:13 -05:00 |
|
Eldon
|
db9eee5f2b
|
Should be full python 3 now
|
2014-01-22 01:32:41 +00:00 |
|
Eldon
|
272a9a3f42
|
Fix readme filename
|
2014-01-21 18:10:43 +00:00 |
|
Eldon
|
fdb62be2ba
|
First commit of umbra
|
2014-01-21 06:41:46 +00:00 |
|