|
|
fe6c0c74a7
|
Merge branch 'bugfix/2-fail-on-unavailable-resource' of snegov/nevernote into master
|
2019-11-09 14:40:52 +00:00 |
|
|
|
6f917578aa
|
Fix failing on unavailable page resource
|
2019-11-09 17:33:58 +03:00 |
|
|
|
e6db3f9d1b
|
Fix newlines inside div tag
|
2019-10-22 16:45:40 +03:00 |
|
|
|
3b6df3417a
|
Fix link tag with missing rel attribute
|
2019-10-22 16:45:13 +03:00 |
|
|
|
e843abbc41
|
Fix python env string
|
2019-10-22 16:44:27 +03:00 |
|
|
|
89a8dd90cc
|
Use BS4 for HTML parsing
|
2019-10-22 16:05:29 +03:00 |
|
|
|
3198361266
|
Add --skip-dups option
|
2019-10-22 14:39:36 +03:00 |
|
|
|
bdceede4f2
|
Rework fetching URLs from the file
|
2019-10-22 12:17:49 +03:00 |
|
|
|
91cddfab7c
|
Refactor code
|
2019-10-22 12:17:49 +03:00 |
|
|
|
44b8a17841
|
Use requests library
|
2019-10-22 12:17:49 +03:00 |
|
Maks Snegov
|
56a7032b3e
|
Merge branch 'fix_htmlparser_strict'
|
2016-03-10 19:21:48 +03:00 |
|
Maks Snegov
|
26e7176222
|
strict argument in html.parser.HTMLParser is removed since 3.5
|
2016-03-10 19:15:03 +03:00 |
|
Maks Snegov
|
edd12deb37
|
Merge branch 'devel'
|
2016-02-04 09:10:56 +03:00 |
|
Maks Snegov
|
1a6a7b3c9b
|
Merge branch 'b64script' into devel
|
2014-10-04 11:08:41 -04:00 |
|
Maks Snegov
|
23f648e1ad
|
limit filename length with 128 chars plus extension
|
2014-10-04 10:59:32 -04:00 |
|
Maks Snegov
|
c1724b5921
|
use base64 encoding for embedded scripts
can avoid some issues in browsers' renderers (habrahabr pages was broken
because of nested </script> in script content.
|
2014-10-04 03:38:34 +04:00 |
|
Maks Snegov
|
6b3aa602ef
|
add script embedding
|
2014-10-04 03:24:38 +04:00 |
|
Maks Snegov
|
cf626546e7
|
use set of content-types for checking
|
2014-07-23 08:45:12 +04:00 |
|
Maks Snegov
|
fbf52e9544
|
add script parsing
|
2014-07-21 00:46:30 +04:00 |
|
Maks Snegov
|
7ce2bfb97f
|
fix urllib.error.HTTPError print
|
2014-07-20 21:42:13 +04:00 |
|
Maks Snegov
|
41e984e1f0
|
fix urllib.error.HTTPError calls
|
2014-07-20 21:40:14 +04:00 |
|
Maks Snegov
|
fb3870e9dd
|
skip http error pages
|
2014-07-20 17:31:43 +04:00 |
|
Maks Snegov
|
09346f4a70
|
fix: error with css charsets if no base charset
|
2014-07-20 17:31:15 +04:00 |
|
Maks Snegov
|
61d3d84a9c
|
remove unused exception
|
2014-07-20 17:30:48 +04:00 |
|
Maks Snegov
|
b5ddae0ef8
|
fix css charset error, add urllib.error.httperror
|
2014-07-20 17:04:56 +04:00 |
|
Maks Snegov
|
964e79f97b
|
add gzip encoding support
|
2014-07-20 14:03:49 +04:00 |
|
Maks Snegov
|
5c9d04cf3d
|
use file with links as arguments
|
2014-07-20 13:48:18 +04:00 |
|
Maks Snegov
|
514b39d287
|
use default charset utf-8 if not set in headers
|
2014-07-20 13:31:20 +04:00 |
|
Maks Snegov
|
45f30ca9de
|
fix: error with urls without scheme ('//ya.ru/index.html')
|
2014-07-20 13:30:22 +04:00 |
|
Maks Snegov
|
b58188b7b7
|
remove import
|
2014-07-20 13:29:56 +04:00 |
|
Maks Snegov
|
c523d025af
|
add duplicate checking
|
2014-07-20 13:06:51 +04:00 |
|
Maks Snegov
|
a0fbb414a7
|
write url in the beginning of the file
|
2014-07-20 12:17:01 +04:00 |
|
Maks Snegov
|
716c61f6f1
|
replace http.client with urllib
|
2014-07-20 08:09:07 +04:00 |
|
Maks Snegov
|
eb2c43f438
|
ignore UTF-8 errors
|
2014-06-25 08:38:43 +04:00 |
|
Maks Snegov
|
6a818f4bb4
|
fix: error with empty GET urls
|
2014-06-23 00:50:21 +04:00 |
|
Maks Snegov
|
594ff71991
|
add css embedding
|
2014-06-22 23:51:18 +04:00 |
|
Maks Snegov
|
754411b6b7
|
remove unused header from request
|
2014-06-22 22:57:42 +04:00 |
|
Maks Snegov
|
a7ef8a8b7b
|
separate complete_url function
|
2014-06-22 22:56:43 +04:00 |
|
Maks Snegov
|
35f755005d
|
fix: do not work with GET arguments
|
2014-06-22 13:12:35 +04:00 |
|
Maks Snegov
|
fe69eff79b
|
fix increment postfix in filenames
|
2014-06-22 12:38:05 +04:00 |
|
Maks Snegov
|
5c87f241d1
|
clean title from multiple whitespaces
|
2014-06-22 12:24:10 +04:00 |
|
Maks Snegov
|
ae63ca6318
|
skip connRefusedError pictures
|
2014-06-22 12:16:10 +04:00 |
|
Maks Snegov
|
36be68d78d
|
fix title with attributes parsing
|
2014-06-22 11:59:02 +04:00 |
|
Maks Snegov
|
ab03e18ce2
|
fix relative urls
|
2014-06-22 11:48:04 +04:00 |
|
Maks Snegov
|
5b91bef896
|
add infinite redirects blocking
|
2014-06-22 11:47:21 +04:00 |
|
Maks Snegov
|
11de357865
|
add image embedding
|
2014-06-22 11:45:37 +04:00 |
|
Maks Snegov
|
5837451ed7
|
add url as comment to saved pages
|
2014-06-21 20:23:25 +04:00 |
|
Maks Snegov
|
e2009e7f08
|
skip fname duplicates
|
2014-06-21 20:09:15 +04:00 |
|
Maks Snegov
|
ab9a7e34c1
|
get title name
|
2014-06-21 09:58:47 +04:00 |
|
Maks Snegov
|
aead01258d
|
remove never used if condition
|
2014-06-21 09:43:12 +04:00 |
|