Will Hunt
91206e09f2
changelog & isort
2018-12-09 17:39:44 +00:00
Will Hunt
dbf736ba66
Make /config more CORS-y
2018-12-09 13:27:22 +00:00
Amber Brown
8b1affe7d5
Fix Content-Disposition in media repository ( #4176 )
2018-11-15 15:55:58 -06:00
Amber Brown
df758e155d
Use <meta> tags to discover the per-page encoding of html previews ( #4183 )
2018-11-15 11:05:08 -06:00
Amber Brown
b3708830b8
Fix URL preview bugs (type error when loading cache from db, content-type including quotes) ( #4157 )
2018-11-08 01:37:43 +11:00
Amber Brown
4cd1c9f2ff
Delete the disused & unspecced identicon functionality ( #4106 )
2018-10-29 23:57:24 +11:00
Richard van der Hoff
ef771cc4c2
Fix a number of flake8 errors
...
Broadly three things here:
* disable W504 which seems a bit whacko
* remove a bunch of `as e` expressions from exception handlers that don't use
them
* use `r""` for strings which include backslashes
Also, we don't use pep8 any more, so we can get rid of the duplicate config
there.
2018-10-24 10:39:03 +01:00
Richard van der Hoff
5c445114d3
Correctly account for cpu usage by background threads ( #4074 )
...
Wrap calls to deferToThread() in a thing which uses a child logcontext to
attribute CPU usage to the right request.
While we're in the area, remove the logcontext_tracer stuff, which is never
used, and afaik doesn't work.
Fixes #4064
2018-10-23 13:12:32 +01:00
Erik Johnston
f6a0a02a62
Fix bug where we raised StopIteration in a generator
...
This made python 3.7 unhappy
2018-10-17 16:10:52 +01:00
Richard van der Hoff
4c3e7eeec5
Merge pull request #3932 from matrix-org/erikj/auto_start_expiring_caches
...
Fix some instances of ExpiringCache not expiring cache items
2018-09-25 12:02:57 +01:00
Jérémy Farnaud
6cf261930a
added "media-src: 'self'" to CSP for resources ( #3578 )
...
Synapse doesn’t allow for media resources to be played directly from
Chrome. It is a problem for users on other networks (e.g. IRC)
communicating with Matrix users through a gateway. The gateway sends
them the raw URL for the resource when a Matrix user uploads a video
and the video cannot be played directly in Chrome using that URL.
Chrome argues it is not authorized to play the video because of the
Content Security Policy. Chrome checks for the "media-src" policy which
is missing, and defauts to the "default-src" policy which is "none".
As Synapse already sends "object-src: 'self'" I thought it wouldn’t be
a problem to add "media-src: 'self'" to the CSP to fix this problem.
2018-09-25 11:55:02 +01:00
Erik Johnston
8601c24287
Fix some instances of ExpiringCache not expiring cache items
...
ExpiringCache required that `start()` be called before it would actually
start expiring entries. A number of places didn't do that.
This PR removes `start` from ExpiringCache, and automatically starts
backround reaping process on creation instead.
2018-09-21 14:19:46 +01:00
Amber Brown
02aa41809b
Port rest/ to Python 3 ( #3823 )
2018-09-12 20:41:31 +10:00
Amber Brown
324525f40c
Port over enough to get some sytests running on Python 3 ( #3668 )
2018-08-20 23:54:49 +10:00
Will Hunt
c151b32b1d
Add GET media/v1/config ( #3184 )
2018-08-16 14:23:38 +01:00
Amber Brown
b37c472419
Rename async to async_helpers because async
is a keyword on Python 3.7 ( #3678 )
2018-08-10 23:50:21 +10:00
Richard van der Hoff
018d75a148
Refactor code for turning HttpResponseException into SynapseError
...
This commit replaces SynapseError.from_http_response_exception with
HttpResponseException.to_synapse_error.
The new method actually returns a ProxiedRequestError, which allows us to pass
through additional metadata from the API call.
2018-08-01 16:02:46 +01:00
Amber Brown
da7785147d
Python 3: Convert some unicode/bytes uses ( #3569 )
2018-08-02 00:54:06 +10:00
Richard van der Hoff
03751a6420
Fix some looping_call calls which were broken in #3604
...
It turns out that looping_call does check the deferred returned by its
callback, and (at least in the case of client_ips), we were relying on this,
and I broke it in #3604 .
Update run_as_background_process to return the deferred, and make sure we
return it to clock.looping_call.
2018-07-26 11:48:08 +01:00
Richard van der Hoff
371da42ae4
Wrap a number of things that run in the background
...
This will reduce the number of "Starting db connection from sentinel context"
warnings, and will help with our metrics.
2018-07-25 09:41:12 +01:00
Krombel
4a27000548
check isort by travis
2018-07-16 13:57:33 +02:00
Krombel
32fd6910d0
Use parse_{int,str} and assert from http.servlet
...
parse_integer and parse_string can take a request and raise errors
in case we have wrong or missing params.
This PR tries to use them more to deduplicate some code and make it
better readable
2018-07-13 21:40:14 +02:00
Amber Brown
49af402019
run isort
2018-07-09 16:09:20 +10:00
Amber Brown
6350bf925e
Attempt to be more performant on PyPy ( #3462 )
2018-06-28 14:49:57 +01:00
Amber Brown
77ac14b960
Pass around the reactor explicitly ( #3385 )
2018-06-22 09:37:10 +01:00
Amber Brown
1f69693347
Merge pull request #3244 from NotAFile/py3-six-4
...
replace some iteritems with six
2018-05-24 13:04:07 -05:00
Adrian Tschira
933bf2dd35
replace some iteritems with six
...
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-05-19 17:59:26 +02:00
Adrian Tschira
aafb0f6b0d
py3-ize url preview
2018-05-19 17:35:20 +02:00
Richard van der Hoff
318711e139
Set Server header in SynapseRequest
...
(instead of everywhere that writes a response. Or rather, the subset of places
which write responses where we haven't forgotten it).
This also means that we don't have to have the mysterious version_string
attribute in anything with a request handler.
Unfortunately it does mean that we have to pass the version string wherever we
instantiate a SynapseSite, which has been c&ped 150 times, but that is code
that ought to be cleaned up anyway really.
2018-05-10 18:50:27 +01:00
Richard van der Hoff
645cb4bf06
Remove redundant request_handler decorator
...
This is needless complexity; we might as well use the wrapper directly.
Also rename wrap_request_handler->wrap_json_request_handler.
2018-05-10 12:19:53 +01:00
Richard van der Hoff
be31adb036
Fix logcontext leak in media repo
...
Make FileResponder.write_to_consumer uphold the logcontext contract
2018-05-02 16:14:50 +01:00
Richard van der Hoff
dbf6f28d64
Merge pull request #3155 from NotAFile/py3-bytes-1
...
more bytes strings
2018-04-30 00:38:21 +01:00
Richard van der Hoff
aab2e4da60
Merge pull request #3140 from matrix-org/rav/use_run_in_background
...
Use run_in_background in preference to preserve_fn
2018-04-30 00:34:28 +01:00
Richard van der Hoff
9e2601f830
Merge pull request #3108 from NotAFile/py3-six-urlparse
...
Use six.moves.urlparse
2018-04-30 00:33:05 +01:00
Adrian Tschira
e9143b6593
more bytes strings
...
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-29 00:13:57 +02:00
Richard van der Hoff
fc149b4eeb
Merge remote-tracking branch 'origin/develop' into rav/use_run_in_background
2018-04-27 14:31:23 +01:00
Richard van der Hoff
2a13af23bc
Use run_in_background in preference to preserve_fn
...
While I was going through uses of preserve_fn for other PRs, I converted places
which only use the wrapped function once to use run_in_background, to avoid
creating the function object.
2018-04-27 12:55:51 +01:00
Richard van der Hoff
9255a6cb17
Improve exception handling for background processes
...
There were a bunch of places where we fire off a process to happen in the
background, but don't have any exception handling on it - instead relying on
the unhandled error being logged when the relevent deferred gets
garbage-collected.
This is unsatisfactory for a number of reasons:
- logging on garbage collection is best-effort and may happen some time after
the error, if at all
- it can be hard to figure out where the error actually happened.
- it is logged as a scary CRITICAL error which (a) I always forget to grep for
and (b) it's not really CRITICAL if a background process we don't care about
fails.
So this is an attempt to add exception handling to everything we fire off into
the background.
2018-04-27 11:07:40 +01:00
Adrian Tschira
2a3c33ff03
Use six.moves.urlparse
...
The imports were shuffled around a bunch in py3
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-15 21:22:43 +02:00
Adrian Tschira
4f40d058cc
Replace old-style raise with six.reraise
...
The old style raise is invalid syntax in python3. As noted in the docs,
this adds one more frame in the traceback, but I think this is
acceptable:
<ipython-input-7-bcc5cba3de3f> in <module>()
16 except:
17 pass
---> 18 six.reraise(*x)
/usr/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
691 if value.__traceback__ is not tb:
692 raise value.with_traceback(tb)
--> 693 raise value
694 finally:
695 value = None
<ipython-input-7-bcc5cba3de3f> in <module>()
9
10 try:
---> 11 x()
12 except:
13 x = sys.exc_info()
Also note that this uses six, which is not formally a dependency yet,
but is included indirectly since most packages depend on it.
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-06 23:06:24 +02:00
Erik Johnston
fa72803490
Merge branch 'master' of github.com:matrix-org/synapse into develop
2018-03-19 11:41:01 +00:00
Erik Johnston
926ba76e23
Replace ujson with simplejson
2018-03-15 23:43:31 +00:00
Erik Johnston
92c52df702
Make store_file use store_into_file
2018-02-14 17:55:18 +00:00
Erik Johnston
5fa571a91b
Tell storage providers about new file so they can upload
2018-02-07 13:35:08 +00:00
Erik Johnston
1f881e0746
Merge pull request #2791 from matrix-org/erikj/media_storage_refactor
...
Ensure media is in local cache before thumbnailing
2018-02-05 11:28:52 +00:00
Richard van der Hoff
d5352cbba8
Handle url_previews with no content-type
...
avoid failing with an exception if the remote server doesn't give us a
Content-Type header.
Also, clean up the exception handling a bit.
2018-02-02 00:53:46 +00:00
Matthew Hodgson
ab9f844aaf
Add federation_domain_whitelist option ( #2820 )
...
Add federation_domain_whitelist
gives a way to restrict which domains your HS is allowed to federate with.
useful mainly for gracefully preventing a private but internet-connected HS from trying to federate to the wider public Matrix network
2018-01-22 19:11:18 +01:00
Richard van der Hoff
b0d9e633ee
Merge pull request #2814 from matrix-org/rav/fix_urlcache_thumbs
...
Use the right path for url_preview thumbnails
2018-01-19 18:57:15 +00:00
Richard van der Hoff
ad7ec63d08
Use the right path for url_preview thumbnails
...
This was introduced by #2627 : we were overwriting the original media for url
previews with the thumbnails :/
(fixes https://github.com/vector-im/riot-web/issues/6012 , hopefully)
2018-01-19 18:29:39 +00:00
Erik Johnston
cd871a3057
Fix storage provider bug introduced when renamed to store_local
2018-01-18 18:37:59 +00:00
Erik Johnston
8ff6726c0d
Merge pull request #2812 from matrix-org/erikj/media_storage_provider_config
...
Make storage providers configurable
2018-01-18 18:33:57 +00:00
Erik Johnston
3fe2bae857
Missing staticmethod
2018-01-18 17:11:45 +00:00
Erik Johnston
aae77da73f
Fixup comments
2018-01-18 17:11:29 +00:00
Erik Johnston
9a89dae8c5
Fix typo in thumbnail resource causing access times to be incorrect
2018-01-18 15:06:24 +00:00
Erik Johnston
0af5dc63a8
Make storage providers more configurable
2018-01-18 14:07:21 +00:00
Erik Johnston
2cf6a7bc20
Use better file consumer
2018-01-18 12:00:46 +00:00
Erik Johnston
4a53f3a3e8
Ensure media is in local cache before thumbnailing
2018-01-18 12:00:46 +00:00
Erik Johnston
300edc2348
Update last access time when thumbnails are viewed
2018-01-17 10:24:43 +00:00
Erik Johnston
05f98a2224
Keep track of last access time for local media
2018-01-17 10:24:43 +00:00
Erik Johnston
d728c47142
Add docstring
2018-01-17 10:06:14 +00:00
Erik Johnston
d863f68cab
Use local vars
2018-01-16 16:24:15 +00:00
Erik Johnston
6368e5c0ab
Change _generate_thumbnails to take media_type
2018-01-16 16:17:38 +00:00
Erik Johnston
0a90d9ede4
Move setting of file_id up to caller
2018-01-16 16:03:05 +00:00
Erik Johnston
5dfc83704b
Fix typo
2018-01-16 14:32:56 +00:00
Erik Johnston
307f88dfb6
Fix up log lines
2018-01-16 13:53:52 +00:00
Erik Johnston
9795b9ebb1
Correctly use server_name/file_id when generating/fetching remote thumbnails
2018-01-16 12:02:06 +00:00
Erik Johnston
c5b589f2e8
Log when we respond with 404
2018-01-16 12:01:40 +00:00
Erik Johnston
a4c5e4a645
Fix thumbnailing remote files
2018-01-16 11:37:50 +00:00
Erik Johnston
1159abbdd2
Merge pull request #2767 from matrix-org/erikj/media_storage_refactor
...
Refactor MediaRepository to separate out storage
2018-01-16 10:23:50 +00:00
Richard van der Hoff
21bf87a146
Reinstate media download on thumbnail request
...
We need to actually download the remote media when we get a request for a
thumbnail.
2018-01-12 15:38:06 +00:00
Erik Johnston
694f1c1b18
Fix up comments
2018-01-12 15:02:46 +00:00
Erik Johnston
e21370ba54
Correctly reraise exception
2018-01-12 14:44:02 +00:00
Erik Johnston
85a4d78213
Make Responder a context manager
2018-01-12 13:32:03 +00:00
Erik Johnston
dcc8eded41
Add missing class var
2018-01-12 13:16:27 +00:00
Erik Johnston
1e4edd1717
Remove unnecessary condition
2018-01-12 11:28:32 +00:00
Erik Johnston
c6c009603c
Remove unused variables
2018-01-12 11:24:05 +00:00
Erik Johnston
4d88958cf6
Make class var local
2018-01-12 11:23:54 +00:00
Erik Johnston
227c491510
Comments
2018-01-12 11:22:41 +00:00
Erik Johnston
8f03aa9f61
Add StorageProvider concept
2018-01-09 16:16:12 +00:00
Erik Johnston
2442e9876c
Make PreviewUrlResource use MediaStorage
2018-01-09 16:15:07 +00:00
Erik Johnston
9d30a7691c
Make ThumbnailResource use MediaStorage
2018-01-09 16:15:07 +00:00
Erik Johnston
9e20840e02
Use MediaStorage for remote media
2018-01-09 16:15:07 +00:00
Erik Johnston
dd3092c3a3
Use MediaStorage for local files
2018-01-09 16:15:07 +00:00
Erik Johnston
ada470bccb
Add MediaStorage class
2018-01-09 16:15:07 +00:00
Erik Johnston
1ee787912b
Add some helper classes
2018-01-09 16:15:07 +00:00
Erik Johnston
47ca5eb882
Split out add_file_headers
2018-01-09 16:15:07 +00:00
Erik Johnston
b6c9deffda
Remove dead TODO
2018-01-09 15:53:23 +00:00
Erik Johnston
b30cd5b107
Remove dead code related to default thumbnails
2018-01-09 14:38:33 +00:00
Richard van der Hoff
5a4da5bf78
Merge pull request #2697 from matrix-org/rav/fix_urlcache_index_error
...
Fix error on sqlite 3.7
2017-11-27 12:25:48 +00:00
Richard van der Hoff
8132a6b7ac
Fix OPTIONS on preview_url
...
Fixes #2706
2017-11-23 17:52:31 +00:00
Richard van der Hoff
2908f955d1
Check database in has_completed_background_updates
...
so that the right thing happens on workers.
2017-11-22 18:02:15 +00:00
Richard van der Hoff
7098b65cb8
Fix error on sqlite 3.7
...
Create the url_cache index on local_media_repository as a background update, so
that we can detect whether we are on sqlite or not and create a partial or
complete index accordingly.
To avoid running the cleanup job before we have built the index, add a bailout
which will defer the cleanup if the bg updates are still running.
Fixes https://github.com/matrix-org/synapse/issues/2572 .
2017-11-21 11:14:17 +00:00
Richard van der Hoff
5d15abb120
Bit more logging
2017-11-10 16:58:04 +00:00
Richard van der Hoff
46790f50cf
Cache failures in url_preview handler
...
Reshuffle the caching logic in the url_preview handler so that failures are
cached (and to generally simplify things and fix the logcontext leaks).
2017-11-10 16:50:50 +00:00
Maxime Vaillancourt
5287e57c86
Ignore noscript tags when generating URL previews
2017-10-25 20:44:34 -04:00
Richard van der Hoff
eaaabc6c4f
replace 'except:' with 'except Exception:'
...
what could possibly go wrong
2017-10-23 15:52:32 +01:00
Richard van der Hoff
d03cfc4258
Fix a logcontext leak in the media repo
2017-10-23 14:34:27 +01:00
Erik Johnston
bd5718d0ad
Fix typo in thumbnail generation
2017-10-19 10:27:18 +01:00
Krombel
a6245478c8
fix thumbnailing ( #2548 )
...
in commit 0e28281a
the code for thumbnailing got refactored and the
renaming of this variables was not done correctly.
Signed-Off-by: Matthias Kesler <krombel@krombel.de>
2017-10-17 12:45:33 +02:00
Erik Johnston
1b6b0b1e66
Add try/finally block to close t_byte_source
2017-10-13 15:34:08 +01:00
Erik Johnston
6b725cf56a
Remove old comment
2017-10-13 15:23:41 +01:00
Erik Johnston
2b24416e90
Don't reuse source but instead copy from primary media store to backup
2017-10-13 14:11:34 +01:00
Erik Johnston
b92a8e6e4a
PEP8
2017-10-13 13:58:57 +01:00
Erik Johnston
31aa7bd8d1
Move type into key
2017-10-13 13:47:38 +01:00
Erik Johnston
ad1911bbf4
Comment
2017-10-13 13:47:05 +01:00
Erik Johnston
c021c39cbd
Remove spurious addition
2017-10-13 13:46:53 +01:00
Erik Johnston
1f43d22397
Don't needlessly rename variable
2017-10-13 11:42:07 +01:00
Erik Johnston
a675bd08bd
Add paths back in...
2017-10-13 11:41:06 +01:00
Erik Johnston
4d7e1dde70
Remove unnecessary diff
2017-10-13 11:36:32 +01:00
Erik Johnston
ae5d18617a
Make things be absolute paths again
2017-10-13 11:35:44 +01:00
Erik Johnston
9732ec6797
s/write_to_file/write_to_file_and_backup/
2017-10-13 11:34:41 +01:00
Erik Johnston
0e28281a02
Fix up
2017-10-13 11:33:49 +01:00
Erik Johnston
505371414f
Fix up thumbnailing function
2017-10-13 11:23:53 +01:00
Erik Johnston
e3428d26ca
Fix typo
2017-10-13 10:39:59 +01:00
Erik Johnston
35332298ef
Fix up comments
2017-10-13 10:39:32 +01:00
Erik Johnston
64db043a71
Move makedirs to thread
2017-10-13 10:25:01 +01:00
Erik Johnston
b60859d6cc
Use make_deferred_yieldable
2017-10-13 10:24:19 +01:00
Erik Johnston
d76621a47b
Fix comments
2017-10-12 18:16:25 +01:00
Erik Johnston
4ae85ae121
Don't close prematurely..
2017-10-12 17:57:31 +01:00
Erik Johnston
cc505b4b5e
getvalue closes buffer
2017-10-12 17:52:30 +01:00
Erik Johnston
1259a76047
Get len before close
2017-10-12 17:39:23 +01:00
Erik Johnston
802ca12d05
Don't close file prematurely
2017-10-12 17:37:21 +01:00
Erik Johnston
e283b555b1
Copy everything to backup
2017-10-12 17:31:24 +01:00
Erik Johnston
b77a13812c
Typo
2017-10-12 15:32:32 +01:00
Erik Johnston
6dfde6d485
Remove dead code
2017-10-12 15:30:26 +01:00
Erik Johnston
c8eeef6947
Fix typos
2017-10-12 15:28:24 +01:00
Erik Johnston
67cb89fbdf
Fix typo
2017-10-12 15:23:41 +01:00
Erik Johnston
bf4fb1fb40
Basic implementation of backup media store
2017-10-12 15:20:59 +01:00
Erik Johnston
d5694ac5fa
Only log if we've removed media
2017-09-28 16:08:08 +01:00
Erik Johnston
7cc483aa0e
Clear up expired url cache every 10s
2017-09-28 13:56:53 +01:00
Erik Johnston
e1e7d76cf1
Actually assign result to variable
2017-09-28 13:55:29 +01:00
Erik Johnston
5f501ec7e2
Fix typo in url cache expiry timer
2017-09-28 12:59:01 +01:00
Erik Johnston
ace8079086
Support new and old style media id formats
2017-09-28 12:52:51 +01:00
Erik Johnston
ae79764fe5
Change expires column to expires_ts
2017-09-28 12:37:53 +01:00
Erik Johnston
9ccb4226ba
Delete expired url cache data
2017-09-28 12:18:06 +01:00
Erik Johnston
7fe8ed1787
Store URL cache preview downloads seperately
...
This makes it easier to clear old media out at a later date
2017-06-23 11:14:11 +01:00
Erik Johnston
b8b936a6ea
Add API to quarantine media
2017-06-19 17:39:21 +01:00
Erik Johnston
48d2949416
Throw exception when not retrying when downloading media
2017-06-13 10:23:14 +01:00
Matthew Hodgson
836d5c44b6
actually trim oversize og:description meta
2017-05-22 21:14:20 +01:00
Erik Johnston
d12ae7fd1c
Don't log exceptions for NotRetryingDestination
2017-05-15 15:42:18 +01:00
Richard van der Hoff
1d09586599
Address review comments
...
- don't blindly proxy all HTTPRequestExceptions
- log unexpected exceptions at error
- avoid `isinstance`
- improve docs on `from_http_response_exception`
2017-03-14 14:15:37 +00:00
Richard van der Hoff
170ccc9de5
Fix routing loop when fetching remote media
...
When we proxy a media request to a remote server, add a query-param, which will
tell the remote server to 404 if it doesn't recognise the server_name.
This should fix a routing loop where the server keeps forwarding back to
itself.
Also improves the error handling on remote media fetches, so that we don't
always return a rather obscure 502.
2017-03-13 16:30:36 +00:00
Jurek
aea5461488
Fix dynamic thumbnails aspect
2017-02-24 22:43:27 +01:00
Mark Haines
32019c9897
Log which files we saved attachments to in the media_repository
2017-01-10 14:19:50 +00:00
Erik Johnston
f7085ac84f
Name linearizer's for better logs
2017-01-09 17:17:10 +00:00
Marcin Bachry
24c16fc349
Fix crash in url preview when html tag has no text
...
Signed-off-by: Marcin Bachry <hegel666@gmail.com>
2016-12-14 22:38:18 +01:00
Johannes Löthberg
32c8b5507c
preview_url_resource: Ellipsis must be in unicode string
...
Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
2016-12-01 13:12:13 +01:00
Mark Haines
b1c27975d0
Set CORs headers on responses from the media repo
2016-11-02 11:29:25 +00:00
Erik Johnston
d51b8a1674
Add quotes and be explicity about script-src
2016-09-05 17:35:01 +01:00
Erik Johnston
662b031a30
Allow PDF to be rendered from media repo
2016-09-05 17:25:26 +01:00
Erik Johnston
0af9e1a637
Set Content-Security-Policy
on media repo
...
This is to inform browsers that they should sandbox the returned
media. This is particularly cruical for javascript/HTML files.
2016-08-17 16:27:39 +01:00
Erik Johnston
f90b3d83a3
Add None check to _iterate_over_text
2016-08-17 15:17:17 +01:00
Erik Johnston
109a560905
Flake8
2016-08-16 14:57:21 +01:00
Erik Johnston
48b5829aea
Fix up preview URL API. Add tests.
...
This includes:
- Splitting out methods of a class into stand alone functions, to make
them easier to test.
- Adding unit tests to split out functions, testing HTML -> preview.
- Handle the fact that elements in lxml may have tail text.
2016-08-16 14:53:24 +01:00
Erik Johnston
5bcccfde6c
Don't include html comments in description
2016-08-05 14:45:11 +01:00
Erik Johnston
b5525c76d1
Typo
2016-08-04 16:10:08 +01:00
Erik Johnston
e97648c4e2
Test summarization
2016-08-04 16:09:09 +01:00
Erik Johnston
58c9653c6b
Don't infer paragrahs from newlines
2016-08-02 18:50:24 +01:00
Erik Johnston
6b58ade2f0
Comment on why we clone
2016-08-02 18:41:22 +01:00
Erik Johnston
9e66c58ceb
Spelling.
2016-08-02 18:37:31 +01:00
Erik Johnston
f83f5fbce8
Make it actually compile
2016-08-02 18:32:42 +01:00
Erik Johnston
aecaec3e10
Change the way we summarize URLs
...
Using XPath is slow on some machines (for unknown reasons), so use a
different approach to get a list of text nodes.
Try to generate a summary that respect paragraph and then word
boundaries, adding ellipses when appropriate.
2016-08-02 18:25:53 +01:00
Erik Johnston
f52cb4cd78
Remove race
2016-06-29 15:24:50 +01:00
Erik Johnston
a70688445d
Implement purge_media_cache admin API
2016-06-29 14:57:59 +01:00
Erik Johnston
314b146b2e
Track approximate last access time for remote media
2016-06-29 11:41:20 +01:00
Erik Johnston
09a17f965c
Line lengths
2016-06-15 16:58:12 +01:00
Erik Johnston
1e9026e484
Handle floats as img widths
2016-06-15 16:58:05 +01:00
Erik Johnston
a60169ea09
Handle og props with not content
2016-06-15 16:57:48 +01:00
Erik Johnston
eba4ff1bcb
502 on /thumbnail when can't contact remote server
2016-06-09 11:29:43 +01:00
Mark Haines
eb79110beb
Clean up the blacklist/whitelist handling.
...
Always set the config key with an empty list, even if a list isn't specified.
This means that the codepaths are the same for both the empty list and
for a missing key. Since the behaviour is the same for both cases this
makes the code somewhat easier to reason about.
2016-05-16 13:03:59 +01:00
Mark Haines
8d7ad44331
Report per request metrics for all of the things using request_handler
2016-04-28 10:57:49 +01:00
Erik Johnston
e8884e5e9c
Add self.media_repo to PreviewUrlResource
2016-04-19 14:51:34 +01:00
Erik Johnston
a7001c311b
_make_dirs was moved to MediaRepository
2016-04-19 14:49:31 +01:00
Erik Johnston
9181e2f4c7
Add store to PreviewUrlResource
2016-04-19 14:48:24 +01:00
Erik Johnston
fb76a81ff7
Reorder imports
2016-04-19 14:45:05 +01:00
Erik Johnston
0c93df89b6
Move MediaRepository to media_repository module
2016-04-19 11:31:43 +01:00
Erik Johnston
43f0941e8f
Split out BaseMediaResource into MediaRepository
...
This is so that a single MediaRepository can be shared across all
resources, rather than having a "copy" per resource.
In particular this allows us to guard against both the thumbnail and
download resource triggering a download of remote content at the same
time.
2016-04-19 11:24:59 +01:00
Matthew Hodgson
aaabbd3e9e
explicitly pass in the charset from Content-Type to lxml to fix cyrillic woes better
2016-04-15 14:32:25 +01:00
Matthew Hodgson
84f9cac4d0
fix cyrillic URL previews by hardcoding all page decoding to UTF-8 for now, rather than relying on lxml's heuristics which seem to get it wrong
2016-04-15 13:20:08 +01:00
Matthew Hodgson
f78b479118
fix urlparse import thinko breaking tiny URLs
2016-04-14 15:23:55 +01:00
Matthew Hodgson
bd77216d06
comment out 2c838f6459
due to risk of https://en.wikipedia.org/wiki/Billion_laughs attacks - thanks @torhve
2016-04-14 14:39:24 +01:00
Erik Johnston
d0633e6dbe
Sanitize the optional dependencies for spider API
2016-04-13 13:38:09 +01:00
Erik Johnston
17515bae14
PEP8
2016-04-11 11:02:50 +01:00
Matthew Hodgson
5ffacc5e84
fix typos and needless try/except from PR review
2016-04-11 10:39:16 +01:00
Matthew Hodgson
83b2f83da0
actually throw meaningful errors
2016-04-08 21:36:59 +01:00
Mark Haines
b36270b5e1
Fix pep8 warning
2016-04-08 19:52:23 +01:00
Matthew Hodgson
1ccabe2965
more PR feedback
2016-04-08 18:58:08 +01:00
Matthew Hodgson
dafef5a688
Add url_preview_enabled config option to turn on/off preview_url endpoint. defaults to off.
...
Add url_preview_ip_range_blacklist to let admins specify internal IP ranges that must not be spidered.
Add url_preview_url_blacklist to let admins specify URL patterns that must not be spidered.
Implement a custom SpiderEndpoint and associated support classes to implement url_preview_ip_range_blacklist
Add commentary and generally address PR feedback
2016-04-08 18:37:15 +01:00
Matthew Hodgson
cf51c4120e
report image size (bytewise) in OG meta
2016-04-03 23:57:05 +01:00
Matthew Hodgson
0834b152fb
char encoding
2016-04-03 12:59:27 +01:00
Matthew Hodgson
8b98a7e8c3
pep8
2016-04-03 12:56:29 +01:00
Matthew Hodgson
eab4d462f8
fix etag typing error. fix timestamp typing error
2016-04-03 02:02:46 +01:00
Matthew Hodgson
c3916462f6
rebase all image URLs
2016-04-03 01:33:12 +01:00
Matthew Hodgson
110780b18b
remove stale todo
2016-04-03 00:48:31 +01:00
Matthew Hodgson
b09e29a03c
Ensure only one download for a given URL is active at a time
2016-04-03 00:47:40 +01:00
Matthew Hodgson
7426c86eb8
add a persistent cache of URL lookups, and fix up the in-memory one to work
2016-04-03 00:31:57 +01:00
Matthew Hodgson
d1b154a10f
support gzip compression, and don't pass through error msgs
2016-04-02 03:06:39 +01:00
Matthew Hodgson
9377157961
how was _respond_default_thumbnail ever meant to work?
2016-04-02 02:31:45 +01:00
Matthew Hodgson
2c838f6459
pass back SVGs as their own thumbnails
2016-04-02 02:30:07 +01:00
Matthew Hodgson
5037ee0d37
handle missing dimensions without crashing
2016-04-02 02:29:57 +01:00