Erik Johnston
4d7e1dde70
Remove unnecessary diff
2017-10-13 11:36:32 +01:00
Erik Johnston
ae5d18617a
Make things be absolute paths again
2017-10-13 11:35:44 +01:00
Erik Johnston
9732ec6797
s/write_to_file/write_to_file_and_backup/
2017-10-13 11:34:41 +01:00
Erik Johnston
0e28281a02
Fix up
2017-10-13 11:33:49 +01:00
Erik Johnston
505371414f
Fix up thumbnailing function
2017-10-13 11:23:53 +01:00
Erik Johnston
e3428d26ca
Fix typo
2017-10-13 10:39:59 +01:00
Erik Johnston
35332298ef
Fix up comments
2017-10-13 10:39:32 +01:00
Erik Johnston
64db043a71
Move makedirs to thread
2017-10-13 10:25:01 +01:00
Erik Johnston
b60859d6cc
Use make_deferred_yieldable
2017-10-13 10:24:19 +01:00
Erik Johnston
d76621a47b
Fix comments
2017-10-12 18:16:25 +01:00
Erik Johnston
4ae85ae121
Don't close prematurely..
2017-10-12 17:57:31 +01:00
Erik Johnston
cc505b4b5e
getvalue closes buffer
2017-10-12 17:52:30 +01:00
Erik Johnston
1259a76047
Get len before close
2017-10-12 17:39:23 +01:00
Erik Johnston
802ca12d05
Don't close file prematurely
2017-10-12 17:37:21 +01:00
Erik Johnston
e283b555b1
Copy everything to backup
2017-10-12 17:31:24 +01:00
Erik Johnston
b77a13812c
Typo
2017-10-12 15:32:32 +01:00
Erik Johnston
6dfde6d485
Remove dead code
2017-10-12 15:30:26 +01:00
Erik Johnston
c8eeef6947
Fix typos
2017-10-12 15:28:24 +01:00
Erik Johnston
67cb89fbdf
Fix typo
2017-10-12 15:23:41 +01:00
Erik Johnston
bf4fb1fb40
Basic implementation of backup media store
2017-10-12 15:20:59 +01:00
Erik Johnston
d5694ac5fa
Only log if we've removed media
2017-09-28 16:08:08 +01:00
Erik Johnston
7cc483aa0e
Clear up expired url cache every 10s
2017-09-28 13:56:53 +01:00
Erik Johnston
e1e7d76cf1
Actually assign result to variable
2017-09-28 13:55:29 +01:00
Erik Johnston
5f501ec7e2
Fix typo in url cache expiry timer
2017-09-28 12:59:01 +01:00
Erik Johnston
ace8079086
Support new and old style media id formats
2017-09-28 12:52:51 +01:00
Erik Johnston
ae79764fe5
Change expires column to expires_ts
2017-09-28 12:37:53 +01:00
Erik Johnston
9ccb4226ba
Delete expired url cache data
2017-09-28 12:18:06 +01:00
Erik Johnston
7fe8ed1787
Store URL cache preview downloads seperately
...
This makes it easier to clear old media out at a later date
2017-06-23 11:14:11 +01:00
Erik Johnston
b8b936a6ea
Add API to quarantine media
2017-06-19 17:39:21 +01:00
Erik Johnston
48d2949416
Throw exception when not retrying when downloading media
2017-06-13 10:23:14 +01:00
Matthew Hodgson
836d5c44b6
actually trim oversize og:description meta
2017-05-22 21:14:20 +01:00
Erik Johnston
d12ae7fd1c
Don't log exceptions for NotRetryingDestination
2017-05-15 15:42:18 +01:00
Richard van der Hoff
1d09586599
Address review comments
...
- don't blindly proxy all HTTPRequestExceptions
- log unexpected exceptions at error
- avoid `isinstance`
- improve docs on `from_http_response_exception`
2017-03-14 14:15:37 +00:00
Richard van der Hoff
170ccc9de5
Fix routing loop when fetching remote media
...
When we proxy a media request to a remote server, add a query-param, which will
tell the remote server to 404 if it doesn't recognise the server_name.
This should fix a routing loop where the server keeps forwarding back to
itself.
Also improves the error handling on remote media fetches, so that we don't
always return a rather obscure 502.
2017-03-13 16:30:36 +00:00
Jurek
aea5461488
Fix dynamic thumbnails aspect
2017-02-24 22:43:27 +01:00
Mark Haines
32019c9897
Log which files we saved attachments to in the media_repository
2017-01-10 14:19:50 +00:00
Erik Johnston
f7085ac84f
Name linearizer's for better logs
2017-01-09 17:17:10 +00:00
Marcin Bachry
24c16fc349
Fix crash in url preview when html tag has no text
...
Signed-off-by: Marcin Bachry <hegel666@gmail.com>
2016-12-14 22:38:18 +01:00
Johannes Löthberg
32c8b5507c
preview_url_resource: Ellipsis must be in unicode string
...
Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
2016-12-01 13:12:13 +01:00
Mark Haines
b1c27975d0
Set CORs headers on responses from the media repo
2016-11-02 11:29:25 +00:00
Erik Johnston
d51b8a1674
Add quotes and be explicity about script-src
2016-09-05 17:35:01 +01:00
Erik Johnston
662b031a30
Allow PDF to be rendered from media repo
2016-09-05 17:25:26 +01:00
Erik Johnston
0af9e1a637
Set Content-Security-Policy
on media repo
...
This is to inform browsers that they should sandbox the returned
media. This is particularly cruical for javascript/HTML files.
2016-08-17 16:27:39 +01:00
Erik Johnston
f90b3d83a3
Add None check to _iterate_over_text
2016-08-17 15:17:17 +01:00
Erik Johnston
109a560905
Flake8
2016-08-16 14:57:21 +01:00
Erik Johnston
48b5829aea
Fix up preview URL API. Add tests.
...
This includes:
- Splitting out methods of a class into stand alone functions, to make
them easier to test.
- Adding unit tests to split out functions, testing HTML -> preview.
- Handle the fact that elements in lxml may have tail text.
2016-08-16 14:53:24 +01:00
Erik Johnston
5bcccfde6c
Don't include html comments in description
2016-08-05 14:45:11 +01:00
Erik Johnston
b5525c76d1
Typo
2016-08-04 16:10:08 +01:00
Erik Johnston
e97648c4e2
Test summarization
2016-08-04 16:09:09 +01:00
Erik Johnston
58c9653c6b
Don't infer paragrahs from newlines
2016-08-02 18:50:24 +01:00
Erik Johnston
6b58ade2f0
Comment on why we clone
2016-08-02 18:41:22 +01:00
Erik Johnston
9e66c58ceb
Spelling.
2016-08-02 18:37:31 +01:00
Erik Johnston
f83f5fbce8
Make it actually compile
2016-08-02 18:32:42 +01:00
Erik Johnston
aecaec3e10
Change the way we summarize URLs
...
Using XPath is slow on some machines (for unknown reasons), so use a
different approach to get a list of text nodes.
Try to generate a summary that respect paragraph and then word
boundaries, adding ellipses when appropriate.
2016-08-02 18:25:53 +01:00
Erik Johnston
f52cb4cd78
Remove race
2016-06-29 15:24:50 +01:00
Erik Johnston
a70688445d
Implement purge_media_cache admin API
2016-06-29 14:57:59 +01:00
Erik Johnston
314b146b2e
Track approximate last access time for remote media
2016-06-29 11:41:20 +01:00
Erik Johnston
09a17f965c
Line lengths
2016-06-15 16:58:12 +01:00
Erik Johnston
1e9026e484
Handle floats as img widths
2016-06-15 16:58:05 +01:00
Erik Johnston
a60169ea09
Handle og props with not content
2016-06-15 16:57:48 +01:00
Erik Johnston
eba4ff1bcb
502 on /thumbnail when can't contact remote server
2016-06-09 11:29:43 +01:00
Mark Haines
eb79110beb
Clean up the blacklist/whitelist handling.
...
Always set the config key with an empty list, even if a list isn't specified.
This means that the codepaths are the same for both the empty list and
for a missing key. Since the behaviour is the same for both cases this
makes the code somewhat easier to reason about.
2016-05-16 13:03:59 +01:00
Mark Haines
8d7ad44331
Report per request metrics for all of the things using request_handler
2016-04-28 10:57:49 +01:00
Erik Johnston
e8884e5e9c
Add self.media_repo to PreviewUrlResource
2016-04-19 14:51:34 +01:00
Erik Johnston
a7001c311b
_make_dirs was moved to MediaRepository
2016-04-19 14:49:31 +01:00
Erik Johnston
9181e2f4c7
Add store to PreviewUrlResource
2016-04-19 14:48:24 +01:00
Erik Johnston
fb76a81ff7
Reorder imports
2016-04-19 14:45:05 +01:00
Erik Johnston
0c93df89b6
Move MediaRepository to media_repository module
2016-04-19 11:31:43 +01:00
Erik Johnston
43f0941e8f
Split out BaseMediaResource into MediaRepository
...
This is so that a single MediaRepository can be shared across all
resources, rather than having a "copy" per resource.
In particular this allows us to guard against both the thumbnail and
download resource triggering a download of remote content at the same
time.
2016-04-19 11:24:59 +01:00
Matthew Hodgson
aaabbd3e9e
explicitly pass in the charset from Content-Type to lxml to fix cyrillic woes better
2016-04-15 14:32:25 +01:00
Matthew Hodgson
84f9cac4d0
fix cyrillic URL previews by hardcoding all page decoding to UTF-8 for now, rather than relying on lxml's heuristics which seem to get it wrong
2016-04-15 13:20:08 +01:00
Matthew Hodgson
f78b479118
fix urlparse import thinko breaking tiny URLs
2016-04-14 15:23:55 +01:00
Matthew Hodgson
bd77216d06
comment out 2c838f6459
due to risk of https://en.wikipedia.org/wiki/Billion_laughs attacks - thanks @torhve
2016-04-14 14:39:24 +01:00
Erik Johnston
d0633e6dbe
Sanitize the optional dependencies for spider API
2016-04-13 13:38:09 +01:00
Erik Johnston
17515bae14
PEP8
2016-04-11 11:02:50 +01:00
Matthew Hodgson
5ffacc5e84
fix typos and needless try/except from PR review
2016-04-11 10:39:16 +01:00
Matthew Hodgson
83b2f83da0
actually throw meaningful errors
2016-04-08 21:36:59 +01:00
Mark Haines
b36270b5e1
Fix pep8 warning
2016-04-08 19:52:23 +01:00
Matthew Hodgson
1ccabe2965
more PR feedback
2016-04-08 18:58:08 +01:00
Matthew Hodgson
dafef5a688
Add url_preview_enabled config option to turn on/off preview_url endpoint. defaults to off.
...
Add url_preview_ip_range_blacklist to let admins specify internal IP ranges that must not be spidered.
Add url_preview_url_blacklist to let admins specify URL patterns that must not be spidered.
Implement a custom SpiderEndpoint and associated support classes to implement url_preview_ip_range_blacklist
Add commentary and generally address PR feedback
2016-04-08 18:37:15 +01:00
Matthew Hodgson
cf51c4120e
report image size (bytewise) in OG meta
2016-04-03 23:57:05 +01:00
Matthew Hodgson
0834b152fb
char encoding
2016-04-03 12:59:27 +01:00
Matthew Hodgson
8b98a7e8c3
pep8
2016-04-03 12:56:29 +01:00
Matthew Hodgson
eab4d462f8
fix etag typing error. fix timestamp typing error
2016-04-03 02:02:46 +01:00
Matthew Hodgson
c3916462f6
rebase all image URLs
2016-04-03 01:33:12 +01:00
Matthew Hodgson
110780b18b
remove stale todo
2016-04-03 00:48:31 +01:00
Matthew Hodgson
b09e29a03c
Ensure only one download for a given URL is active at a time
2016-04-03 00:47:40 +01:00
Matthew Hodgson
7426c86eb8
add a persistent cache of URL lookups, and fix up the in-memory one to work
2016-04-03 00:31:57 +01:00
Matthew Hodgson
d1b154a10f
support gzip compression, and don't pass through error msgs
2016-04-02 03:06:39 +01:00
Matthew Hodgson
9377157961
how was _respond_default_thumbnail ever meant to work?
2016-04-02 02:31:45 +01:00
Matthew Hodgson
2c838f6459
pass back SVGs as their own thumbnails
2016-04-02 02:30:07 +01:00
Matthew Hodgson
5037ee0d37
handle missing dimensions without crashing
2016-04-02 02:29:57 +01:00
Matthew Hodgson
b26e8604f1
make meta comparisons case insensitive
2016-04-02 01:35:44 +01:00
Matthew Hodgson
5fd07da764
refactor calc_og; spider image URLs; fix xpath; add a (broken) expiringcache; loads of other fixes
2016-04-02 00:35:49 +01:00
Matthew Hodgson
c60b751694
fix assorted redirect, unicode and screenscraping bugs
2016-04-01 02:17:48 +01:00
Matthew Hodgson
683e564815
handle spidered relative images correctly
2016-03-31 23:52:58 +01:00
Matthew Hodgson
72550c3803
prevent choking on invalid utf-8, and handle image thumbnailing smarter
2016-03-31 15:14:14 +01:00
Matthew Hodgson
bb9a2ca87c
synthesise basig OG metadata from pages lacking it
2016-03-31 14:15:09 +01:00
Matthew Hodgson
a8a5dd3b44
handle requests with missing content-length headers (e.g. YouTube)
2016-03-31 01:55:21 +01:00
Matthew Hodgson
ae5831d303
fix bugs
2016-03-29 03:32:55 +01:00