下载twitter的内容
很早之前有个twittie还是什么的python的包,可以下载twitter的内容。自从被马斯克收购后,这种类型下载的方式很多被干掉了,因为既缺少可用的软件,又没有什么特别的内容需要,所以就没有继续。前些日子看到一些有趣的内容,又想要收集下来,所以找到这个python的包和windows下的可执行文件
https://github.com/mikf/gallery-dl/releases
在Windows在这个目录配置”%APPDATA%\gallery-dl\config.json“文件,示例如下
{
"extractor": {
"twitter": {
"videos": true,
"images": true
}
},
"output": {
"template": "{author[screen_name]}/{tweet_id}_{num}.{extension}"
}
}
还有要有x.com账户登录后的cookies信息,否则也是不行的。 这个可以在浏览器上安装插件”Get cookies.txt Local” 来实现,装上插件后,访问x.com,然后export cookies 以Netscape的格式,存为x.txt文件
再下来就可以在命令行下执行对应的信息收集了
比如:
gallery-dl --cookies x.txt https://x.com/account
由于那个json文件没有设置存放的目标文件夹,所以这样的结果就是下载下来的文件默认都放在C:\Users\your_account_name\gallery-dl\twitter这个文件夹下面,然后以不同的下载目标账户来进行区分。
gallery-dl --help
Usage: gallery-dl [OPTIONS] URL [URL...]
General Options:
-h, --help Print this help message and exit
--version Print program version and exit
-f, --filename FORMAT Filename format string for downloaded files ('/O' for "original" filenames)
-d, --destination PATH Target location for file downloads
-D, --directory PATH Exact location for file downloads
-X, --extractors PATH Load external extractors from PATH
-a, --user-agent UA User-Agent request header
--clear-cache MODULE Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything)
--compat Restore legacy 'category' names
Update Options:
-U, --update-check Check if a newer version is available
Input Options:
-i, --input-file FILE Download URLs found in FILE ('-' for stdin). More than one --input-file can be specified
-I, --input-file-comment FILE
Download URLs found in FILE. Comment them out after they were downloaded successfully.
-x, --input-file-delete FILE
Download URLs found in FILE. Delete them after they were downloaded successfully.
--no-input Do not prompt for passwords/tokens
Output Options:
-q, --quiet Activate quiet mode
-w, --warning Print only warnings and errors
-v, --verbose Print various debugging information
-g, --get-urls Print URLs instead of downloading
-G, --resolve-urls Print URLs instead of downloading; resolve intermediary URLs
-j, --dump-json Print JSON information
-J, --resolve-json Print JSON information; resolve intermediary URLs
-s, --simulate Simulate data extraction; do not download anything
-E, --extractor-info Print extractor defaults and settings
-K, --list-keywords Print a list of available keywords and example values for the given URLs
-e, --error-file FILE Add input URLs which returned an error to FILE
-N, --print [EVENT:]FORMAT Write FORMAT during EVENT (default 'prepare') to standard output instead of downloading
files. Can be used multiple times. Examples: 'id' or 'post:{md5[:8]}'
--Print [EVENT:]FORMAT Like --print, but downloads files as well
--print-to-file [EVENT:]FORMAT FILE
Append FORMAT during EVENT to FILE instead of downloading files. Can be used multiple
times
--Print-to-file [EVENT:]FORMAT FILE
Like --print-to-file, but downloads files as well
--list-modules Print a list of available extractor modules
--list-extractors [CATEGORIES]
Print a list of extractor classes with description, (sub)category and example URL
--write-log FILE Write logging output to FILE
--write-unsupported FILE Write URLs, which get emitted by other extractors but cannot be handled, to FILE
--write-pages Write downloaded intermediary pages to files in the current directory to debug problems
--print-traffic Display sent and read HTTP traffic
--no-colors Do not emit ANSI color codes in output
Networking Options:
-R, --retries N Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4)
--http-timeout SECONDS Timeout for HTTP connections (default: 30.0)
--proxy URL Use the specified proxy
--source-address IP Client-side IP address to bind to
-4, --force-ipv4 Make all connections via IPv4
-6, --force-ipv6 Make all connections via IPv6
--no-check-certificate Disable HTTPS certificate validation
Downloader Options:
-r, --limit-rate RATE Maximum download rate (e.g. 500k, 2.5M, or 800k-2M)
--chunk-size SIZE Size of in-memory data chunks (default: 32k)
--sleep SECONDS Number of seconds to wait before each download. This can be either a constant value or a
range (e.g. 2.7 or 2.0-3.5)
--sleep-request SECONDS Number of seconds to wait between HTTP requests during data extraction
--sleep-429 SECONDS Number of seconds to wait when receiving a '429 Too Many Requests' response
--sleep-extractor SECONDS Number of seconds to wait before starting data extraction for an input URL
--no-part Do not use .part files
--no-skip Do not skip downloads; overwrite existing files
--no-mtime Do not set file modification times according to Last-Modified HTTP response headers
--no-download Do not download any files
Configuration Options:
-o, --option KEY=VALUE Additional options. Example: -o browser=firefox
-c, --config FILE Additional configuration files
--config-yaml FILE Additional configuration files in YAML format
--config-toml FILE Additional configuration files in TOML format
--config-create Create a basic configuration file
--config-status Show configuration file status
--config-open Open configuration file in external application
--config-ignore Do not read default configuration files
Authentication Options:
-u, --username USER Username to login with
-p, --password PASS Password belonging to the given username
--netrc Enable .netrc authentication data
Cookie Options:
-C, --cookies FILE File to load additional cookies from
--cookies-export FILE Export session cookies to FILE
--cookies-from-browser BROWSER[/DOMAIN][+KEYRING][:PROFILE][::CONTAINER]
Name of the browser to load cookies from, with optional domain prefixed with '/', keyring
name prefixed with '+', profile prefixed with ':', and container prefixed with '::'
('none' for no container (default), 'all' for all containers)
Selection Options:
-A, --abort N[:TARGET] Stop current extractor(s) after N consecutive file downloads were skipped. Specify a
TARGET to set how many levels to ascend or to which subcategory to jump to. Examples: '-A
3', '-A 3:2', '-A 3:manga'
-T, --terminate N Stop current & parent extractors and proceed with the next input URL after N consecutive
file downloads were skipped
--filesize-min SIZE Do not download files smaller than SIZE (e.g. 500k or 2.5M)
--filesize-max SIZE Do not download files larger than SIZE (e.g. 500k or 2.5M)
--download-archive FILE Record successfully downloaded files in FILE and skip downloading any file already in it
--range RANGE Index range(s) specifying which files to download. These can be either a constant value,
range, or slice (e.g. '5', '8-20', or '1:24:3')
--chapter-range RANGE Like '--range', but applies to manga chapters and other delegated URLs
--filter EXPR Python expression controlling which files to download. Files for which the expression
evaluates to False are ignored. Available keys are the filename-specific ones listed by
'-K'. Example: --filter "image_width >= 1000 and rating in ('s', 'q')"
--chapter-filter EXPR Like '--filter', but applies to manga chapters and other delegated URLs
Post-processing Options:
-P, --postprocessor NAME Activate the specified post processor
--no-postprocessors Do not run any post processors
-O, --postprocessor-option KEY=VALUE
Additional post processor options
--write-metadata Write metadata to separate JSON files
--write-info-json Write gallery metadata to a info.json file
--write-tags Write image tags to separate text files
--zip Store downloaded files in a ZIP archive
--cbz Store downloaded files in a CBZ archive
--mtime NAME Set file modification times according to metadata selected by NAME. Examples: 'date' or
'status[date]'
--rename FORMAT Rename previously downloaded files from FORMAT to the current filename format
--rename-to FORMAT Rename previously downloaded files from the current filename format to FORMAT
--ugoira FMT Convert Pixiv Ugoira to FMT using FFmpeg. Supported formats are 'webm', 'mp4', 'gif',
'vp8', 'vp9', 'vp9-lossless', 'copy', 'zip'.
--exec CMD Execute CMD for each downloaded file. Supported replacement fields are {} or {_path},
{_directory}, {_filename}. Example: --exec "convert {} {}.png && rm {}"
--exec-after CMD Execute CMD after all files were downloaded. Example: --exec-after "cd {_directory} &&
convert * ../doc.pdf"
虽然上面的config.json里面只写了视频和图片,但是文本内容也是独立的下载下来了。
到目前我没有很仔细的研究这个软件的使用,只是收集信息的目的,这个软件是帮我实现了。
等到真的有必要的时候,我再来更新。