Noah Levitt
|
1fb336cb2e
|
crawling outlinks not totally working
|
2015-07-11 02:29:19 -07:00 |
|
Noah Levitt
|
56a7bb7306
|
submit outlinks to hq
|
2015-07-10 21:31:41 -07:00 |
|
Noah Levitt
|
fd99764baa
|
brozzler-worker partially working
|
2015-07-10 21:07:47 -07:00 |
|
Noah Levitt
|
8aa1e6715a
|
feed seed url to the crawl url queue
|
2015-07-10 20:12:33 -07:00 |
|
Noah Levitt
|
1d068f4f86
|
starting work on brozzler crawl hq
|
2015-07-10 18:01:54 -07:00 |
|
Noah Levitt
|
fcc63b6675
|
fancier prioritization takes into account hops from seed, path depth; and clean shutdown
|
2015-07-09 22:35:37 -07:00 |
|
Noah Levitt
|
5f3c247e0c
|
trick to avoid crawling same url again too quickly
|
2015-07-09 21:49:55 -07:00 |
|
Noah Levitt
|
7cc777661d
|
fix dumb bug
|
2015-07-09 18:54:09 -07:00 |
|
Noah Levitt
|
783794ca37
|
basic of site/seed crawling with scoping
|
2015-07-09 18:36:07 -07:00 |
|
Noah Levitt
|
92ea701987
|
rudimentary crawling in parallel with multiple browsers
|
2015-07-08 18:50:18 -07:00 |
|
Noah Levitt
|
32abfcac8a
|
fix 'CrawlUrl' object has no attribute 'priority' bug
|
2015-07-08 17:51:09 -07:00 |
|
Noah Levitt
|
4022cc0162
|
simple in-memory frontier with prioritized queues by host
|
2015-07-08 17:44:38 -07:00 |
|
Noah Levitt
|
4042f22497
|
rudimentary link extraction and crawling
|
2015-07-07 16:45:52 -07:00 |
|
Noah Levitt
|
d8a962b29e
|
experimenting with captureScreenshot
|
2015-06-16 18:42:21 -07:00 |
|
Noah Levitt
|
f254e2eec1
|
it's been stable, call it 1.0
|
2015-06-13 11:30:01 -07:00 |
|
Hunter
|
903d2f3107
|
Merge pull request #39 from nlevitt/simple-behaviors
ARI-3775, ARI-3956 Simple behaviors
|
2015-04-16 15:01:49 -07:00 |
|
Noah Levitt
|
73bbd87d5d
|
merge in latest from master and adjust config as needed
|
2015-02-02 14:52:56 -08:00 |
|
Noah Levitt
|
776a6dac68
|
Merge branch 'master' into simple-behaviors
|
2015-02-02 14:49:34 -08:00 |
|
Noah Levitt
|
48b8754f40
|
Merge branch 'master' into simple-behaviors
|
2015-02-02 14:48:26 -08:00 |
|
Noah Levitt
|
db759f1066
|
Merge pull request #32 from adam-miller/ARI-3904
ARI-3904 Instagram behavior to scroll past two pages, and click to enla...
|
2015-02-02 14:47:44 -08:00 |
|
Adam Miller
|
ce47461656
|
Making scrolling and image loading more tolerant of slow loading.
|
2015-01-30 16:55:53 -08:00 |
|
Noah Levitt
|
9e5900c61f
|
ARI-3956 simple behavior for usask.ca slideshows (which also required enhancing the simple behavior logic)
|
2015-01-27 16:03:58 -08:00 |
|
Noah Levitt
|
0901cac2e0
|
Merge pull request #38 from nlevitt/bump-browser-timeout
increase browser start and stop timeouts, since sometimes we strand brow...
|
2015-01-26 21:22:18 -08:00 |
|
Noah Levitt
|
e9c2fc61dd
|
increase browser start and stop timeouts, since sometimes we strand browser processes after starting them, when the machine is very busy
|
2015-01-26 21:09:56 -08:00 |
|
Noah Levitt
|
d467cce221
|
Merge pull request #27 from vonrosen/ari-3774
Allow default behavior to include clicking on sound cloud player buttons embbedded in 3rd party sites.
|
2015-01-26 20:58:49 -08:00 |
|
Noah Levitt
|
c5c642a990
|
support for simple behavior that clicks on elements matching configured css selector; and one such behavior for acalog sites ARI-3775
|
2015-01-26 16:58:12 -08:00 |
|
Noah Levitt
|
0647df1ab9
|
behaviors.yaml to configure behaviors, in preparation for "simple" behavior support
|
2015-01-26 16:01:53 -08:00 |
|
Hunter Stern
|
91f9788eb2
|
Add iframe css path to target id for soundcloud buttons.
|
2015-01-21 16:28:29 -08:00 |
|
Hunter Stern
|
e9451f88d8
|
Merge branch 'master' of github.com:internetarchive/umbra into ari-3774
|
2015-01-21 16:21:13 -08:00 |
|
Noah Levitt
|
cdcef934e7
|
rewrite instagram behavior to be more like a state machine; update css selectors for current instagram; refactor as a sort of singleton class for cleaner namespacing
|
2015-01-16 13:21:12 -08:00 |
|
Noah Levitt
|
ddc7064585
|
Merge branch 'master' into ARI-3904
|
2015-01-15 18:37:28 -08:00 |
|
Noah Levitt
|
ffd60d35e6
|
Merge pull request #36 from vonrosen/ari-4150
Allow scrolling down a timeline in the facebook plugin so as to capture content in third party embedded timelines.
|
2014-12-22 21:47:31 -08:00 |
|
Hunter Stern
|
5ea12fd053
|
More refinements.
|
2014-12-19 15:52:13 -08:00 |
|
Hunter Stern
|
8d225b8859
|
More debugging.
|
2014-12-19 15:13:02 -08:00 |
|
Hunter Stern
|
5304f2909d
|
Less verbose logging.
|
2014-12-19 14:35:11 -08:00 |
|
Hunter Stern
|
ae60205648
|
Fix for https://webarchive.jira.com/browse/ARI-4150
|
2014-12-19 14:17:50 -08:00 |
|
Hunter Stern
|
cf88b9968c
|
Merge branch 'master' of github.com:internetarchive/umbra
|
2014-12-12 15:59:25 -08:00 |
|
Noah Levitt
|
1108ef9362
|
Merge pull request #33 from adam-miller/ARI-4016
ARI-4016 - Support: embedded videos on marquette.edu
|
2014-11-21 15:10:53 -08:00 |
|
Adam Miller
|
7f8e6802de
|
Implementing suggestions in pull request.
|
2014-11-07 15:56:05 -08:00 |
|
vonrosen
|
8e6859ef56
|
Merge pull request #35 from nlevitt/amqp-socket-error
properly handle socket.error from amqp conn.drain_events (was previously...
|
2014-11-03 12:09:27 -08:00 |
|
Noah Levitt
|
9053279b4e
|
change default routing key to "urls"
|
2014-11-03 11:54:59 -08:00 |
|
Noah Levitt
|
ab86426475
|
properly handle socket.error from amqp conn.drain_events (was previously diagnosed as error starting browser)
|
2014-11-03 11:54:10 -08:00 |
|
Noah Levitt
|
f40bd39e1a
|
Merge pull request #34 from dhamaniasad/patch-1
Update README.md
|
2014-10-30 19:04:24 -07:00 |
|
Asad Dhamani
|
9231cc2b5c
|
Update README.md
|
2014-10-31 07:02:49 +05:30 |
|
Asad Dhamani
|
e264f09c27
|
Update README.md
|
2014-10-29 12:42:43 +05:30 |
|
Hunter Stern
|
52bb02cbbe
|
Merge branch 'master' of github.com:internetarchive/umbra
|
2014-10-16 20:09:42 +00:00 |
|
vonrosen
|
01ed5a7d4d
|
Merge pull request #28 from internetarchive/ari-3940
Ari 3940 - prioritize scrolling all the way to the bottom
|
2014-10-09 21:21:02 +00:00 |
|
Adam Miller
|
bdf3e73062
|
Wait until big image is loaded before clicking to next image.
|
2014-10-03 14:17:07 -07:00 |
|
Hunter Stern
|
1ee45053c5
|
Even more formatting changes.
|
2014-09-22 14:22:52 -07:00 |
|
Hunter Stern
|
6af3455dbf
|
Improve formatting.
|
2014-09-22 14:21:00 -07:00 |
|