Commit Graph

131 Commits

Author SHA1 Message Date
Erik Johnston
ae79764fe5 Change expires column to expires_ts 2017-09-28 12:37:53 +01:00
Erik Johnston
9ccb4226ba Delete expired url cache data 2017-09-28 12:18:06 +01:00
Erik Johnston
7fe8ed1787 Store URL cache preview downloads seperately
This makes it easier to clear old media out at a later date
2017-06-23 11:14:11 +01:00
Erik Johnston
b8b936a6ea Add API to quarantine media 2017-06-19 17:39:21 +01:00
Erik Johnston
48d2949416 Throw exception when not retrying when downloading media 2017-06-13 10:23:14 +01:00
Matthew Hodgson
836d5c44b6 actually trim oversize og:description meta 2017-05-22 21:14:20 +01:00
Erik Johnston
d12ae7fd1c Don't log exceptions for NotRetryingDestination 2017-05-15 15:42:18 +01:00
Richard van der Hoff
1d09586599 Address review comments
- don't blindly proxy all HTTPRequestExceptions
- log unexpected exceptions at error
- avoid `isinstance`
- improve docs on `from_http_response_exception`
2017-03-14 14:15:37 +00:00
Richard van der Hoff
170ccc9de5 Fix routing loop when fetching remote media
When we proxy a media request to a remote server, add a query-param, which will
tell the remote server to 404 if it doesn't recognise the server_name.

This should fix a routing loop where the server keeps forwarding back to
itself.

Also improves the error handling on remote media fetches, so that we don't
always return a rather obscure 502.
2017-03-13 16:30:36 +00:00
Jurek
aea5461488 Fix dynamic thumbnails aspect 2017-02-24 22:43:27 +01:00
Mark Haines
32019c9897 Log which files we saved attachments to in the media_repository 2017-01-10 14:19:50 +00:00
Erik Johnston
f7085ac84f Name linearizer's for better logs 2017-01-09 17:17:10 +00:00
Marcin Bachry
24c16fc349 Fix crash in url preview when html tag has no text
Signed-off-by: Marcin Bachry <hegel666@gmail.com>
2016-12-14 22:38:18 +01:00
Johannes Löthberg
32c8b5507c preview_url_resource: Ellipsis must be in unicode string
Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
2016-12-01 13:12:13 +01:00
Mark Haines
b1c27975d0 Set CORs headers on responses from the media repo 2016-11-02 11:29:25 +00:00
Erik Johnston
d51b8a1674 Add quotes and be explicity about script-src 2016-09-05 17:35:01 +01:00
Erik Johnston
662b031a30 Allow PDF to be rendered from media repo 2016-09-05 17:25:26 +01:00
Erik Johnston
0af9e1a637 Set Content-Security-Policy on media repo
This is to inform browsers that they should sandbox the returned
media. This is particularly cruical for javascript/HTML files.
2016-08-17 16:27:39 +01:00
Erik Johnston
f90b3d83a3 Add None check to _iterate_over_text 2016-08-17 15:17:17 +01:00
Erik Johnston
109a560905 Flake8 2016-08-16 14:57:21 +01:00
Erik Johnston
48b5829aea Fix up preview URL API. Add tests.
This includes:

- Splitting out methods of a class into stand alone functions, to make
  them easier to test.
- Adding unit tests to split out functions, testing HTML -> preview.
- Handle the fact that elements in lxml may have tail text.
2016-08-16 14:53:24 +01:00
Erik Johnston
5bcccfde6c Don't include html comments in description 2016-08-05 14:45:11 +01:00
Erik Johnston
b5525c76d1 Typo 2016-08-04 16:10:08 +01:00
Erik Johnston
e97648c4e2 Test summarization 2016-08-04 16:09:09 +01:00
Erik Johnston
58c9653c6b Don't infer paragrahs from newlines 2016-08-02 18:50:24 +01:00
Erik Johnston
6b58ade2f0 Comment on why we clone 2016-08-02 18:41:22 +01:00
Erik Johnston
9e66c58ceb Spelling. 2016-08-02 18:37:31 +01:00
Erik Johnston
f83f5fbce8 Make it actually compile 2016-08-02 18:32:42 +01:00
Erik Johnston
aecaec3e10 Change the way we summarize URLs
Using XPath is slow on some machines (for unknown reasons), so use a
different approach to get a list of text nodes.

Try to generate a summary that respect paragraph and then word
boundaries, adding ellipses when appropriate.
2016-08-02 18:25:53 +01:00
Erik Johnston
f52cb4cd78 Remove race 2016-06-29 15:24:50 +01:00
Erik Johnston
a70688445d Implement purge_media_cache admin API 2016-06-29 14:57:59 +01:00
Erik Johnston
314b146b2e Track approximate last access time for remote media 2016-06-29 11:41:20 +01:00
Erik Johnston
09a17f965c Line lengths 2016-06-15 16:58:12 +01:00
Erik Johnston
1e9026e484 Handle floats as img widths 2016-06-15 16:58:05 +01:00
Erik Johnston
a60169ea09 Handle og props with not content 2016-06-15 16:57:48 +01:00
Erik Johnston
eba4ff1bcb 502 on /thumbnail when can't contact remote server 2016-06-09 11:29:43 +01:00
Mark Haines
eb79110beb Clean up the blacklist/whitelist handling.
Always set the config key with an empty list, even if a list isn't specified.
This means that the codepaths are the same for both the empty list and
for a missing key. Since the behaviour is the same for both cases this
makes the code somewhat easier to reason about.
2016-05-16 13:03:59 +01:00
Mark Haines
8d7ad44331 Report per request metrics for all of the things using request_handler 2016-04-28 10:57:49 +01:00
Erik Johnston
e8884e5e9c Add self.media_repo to PreviewUrlResource 2016-04-19 14:51:34 +01:00
Erik Johnston
a7001c311b _make_dirs was moved to MediaRepository 2016-04-19 14:49:31 +01:00
Erik Johnston
9181e2f4c7 Add store to PreviewUrlResource 2016-04-19 14:48:24 +01:00
Erik Johnston
fb76a81ff7 Reorder imports 2016-04-19 14:45:05 +01:00
Erik Johnston
0c93df89b6 Move MediaRepository to media_repository module 2016-04-19 11:31:43 +01:00
Erik Johnston
43f0941e8f Split out BaseMediaResource into MediaRepository
This is so that a single MediaRepository can be shared across all
resources, rather than having a "copy" per resource.

In particular this allows us to guard against both the thumbnail and
download resource triggering a download of remote content at the same
time.
2016-04-19 11:24:59 +01:00
Matthew Hodgson
aaabbd3e9e explicitly pass in the charset from Content-Type to lxml to fix cyrillic woes better 2016-04-15 14:32:25 +01:00
Matthew Hodgson
84f9cac4d0 fix cyrillic URL previews by hardcoding all page decoding to UTF-8 for now, rather than relying on lxml's heuristics which seem to get it wrong 2016-04-15 13:20:08 +01:00
Matthew Hodgson
f78b479118 fix urlparse import thinko breaking tiny URLs 2016-04-14 15:23:55 +01:00
Matthew Hodgson
bd77216d06 comment out 2c838f6459 due to risk of https://en.wikipedia.org/wiki/Billion_laughs attacks - thanks @torhve 2016-04-14 14:39:24 +01:00
Erik Johnston
d0633e6dbe Sanitize the optional dependencies for spider API 2016-04-13 13:38:09 +01:00
Erik Johnston
17515bae14 PEP8 2016-04-11 11:02:50 +01:00