下载twitter的内容

很早之前有个twittie还是什么的python的包,可以下载twitter的内容。自从被马斯克收购后,这种类型下载的方式很多被干掉了,因为既缺少可用的软件,又没有什么特别的内容需要,所以就没有继续。前些日子看到一些有趣的内容,又想要收集下来,所以找到这个python的包和windows下的可执行文件

https://github.com/mikf/gallery-dl/releases

在Windows在这个目录配置”%APPDATA%\gallery-dl\config.json“文件,示例如下



{
    "extractor": {
        "twitter": {
            "videos": true,
            "images": true
        }
    },
    "output": {
        "template": "{author[screen_name]}/{tweet_id}_{num}.{extension}"
    }
}

还有要有x.com账户登录后的cookies信息,否则也是不行的。 这个可以在浏览器上安装插件”Get cookies.txt Local” 来实现,装上插件后,访问x.com,然后export cookies 以Netscape的格式,存为x.txt文件

再下来就可以在命令行下执行对应的信息收集了

比如:

gallery-dl --cookies x.txt https://x.com/account

由于那个json文件没有设置存放的目标文件夹,所以这样的结果就是下载下来的文件默认都放在C:\Users\your_account_name\gallery-dl\twitter这个文件夹下面,然后以不同的下载目标账户来进行区分。

gallery-dl --help
Usage: gallery-dl [OPTIONS] URL [URL...]

General Options:
  -h, --help                  Print this help message and exit
  --version                   Print program version and exit
  -f, --filename FORMAT       Filename format string for downloaded files ('/O' for "original" filenames)
  -d, --destination PATH      Target location for file downloads
  -D, --directory PATH        Exact location for file downloads
  -X, --extractors PATH       Load external extractors from PATH
  -a, --user-agent UA         User-Agent request header
  --clear-cache MODULE        Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything)
  --compat                    Restore legacy 'category' names

Update Options:
  -U, --update-check          Check if a newer version is available

Input Options:
  -i, --input-file FILE       Download URLs found in FILE ('-' for stdin). More than one --input-file can be specified
  -I, --input-file-comment FILE
                              Download URLs found in FILE. Comment them out after they were downloaded successfully.
  -x, --input-file-delete FILE
                              Download URLs found in FILE. Delete them after they were downloaded successfully.
  --no-input                  Do not prompt for passwords/tokens

Output Options:
  -q, --quiet                 Activate quiet mode
  -w, --warning               Print only warnings and errors
  -v, --verbose               Print various debugging information
  -g, --get-urls              Print URLs instead of downloading
  -G, --resolve-urls          Print URLs instead of downloading; resolve intermediary URLs
  -j, --dump-json             Print JSON information
  -J, --resolve-json          Print JSON information; resolve intermediary URLs
  -s, --simulate              Simulate data extraction; do not download anything
  -E, --extractor-info        Print extractor defaults and settings
  -K, --list-keywords         Print a list of available keywords and example values for the given URLs
  -e, --error-file FILE       Add input URLs which returned an error to FILE
  -N, --print [EVENT:]FORMAT  Write FORMAT during EVENT (default 'prepare') to standard output instead of downloading
                              files. Can be used multiple times. Examples: 'id' or 'post:{md5[:8]}'
  --Print [EVENT:]FORMAT      Like --print, but downloads files as well
  --print-to-file [EVENT:]FORMAT FILE
                              Append FORMAT during EVENT to FILE instead of downloading files. Can be used multiple
                              times
  --Print-to-file [EVENT:]FORMAT FILE
                              Like --print-to-file, but downloads files as well
  --list-modules              Print a list of available extractor modules
  --list-extractors [CATEGORIES]
                              Print a list of extractor classes with description, (sub)category and example URL
  --write-log FILE            Write logging output to FILE
  --write-unsupported FILE    Write URLs, which get emitted by other extractors but cannot be handled, to FILE
  --write-pages               Write downloaded intermediary pages to files in the current directory to debug problems
  --print-traffic             Display sent and read HTTP traffic
  --no-colors                 Do not emit ANSI color codes in output

Networking Options:
  -R, --retries N             Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4)
  --http-timeout SECONDS      Timeout for HTTP connections (default: 30.0)
  --proxy URL                 Use the specified proxy
  --source-address IP         Client-side IP address to bind to
  -4, --force-ipv4            Make all connections via IPv4
  -6, --force-ipv6            Make all connections via IPv6
  --no-check-certificate      Disable HTTPS certificate validation

Downloader Options:
  -r, --limit-rate RATE       Maximum download rate (e.g. 500k, 2.5M, or 800k-2M)
  --chunk-size SIZE           Size of in-memory data chunks (default: 32k)
  --sleep SECONDS             Number of seconds to wait before each download. This can be either a constant value or a
                              range (e.g. 2.7 or 2.0-3.5)
  --sleep-request SECONDS     Number of seconds to wait between HTTP requests during data extraction
  --sleep-429 SECONDS         Number of seconds to wait when receiving a '429 Too Many Requests' response
  --sleep-extractor SECONDS   Number of seconds to wait before starting data extraction for an input URL
  --no-part                   Do not use .part files
  --no-skip                   Do not skip downloads; overwrite existing files
  --no-mtime                  Do not set file modification times according to Last-Modified HTTP response headers
  --no-download               Do not download any files

Configuration Options:
  -o, --option KEY=VALUE      Additional options. Example: -o browser=firefox
  -c, --config FILE           Additional configuration files
  --config-yaml FILE          Additional configuration files in YAML format
  --config-toml FILE          Additional configuration files in TOML format
  --config-create             Create a basic configuration file
  --config-status             Show configuration file status
  --config-open               Open configuration file in external application
  --config-ignore             Do not read default configuration files

Authentication Options:
  -u, --username USER         Username to login with
  -p, --password PASS         Password belonging to the given username
  --netrc                     Enable .netrc authentication data

Cookie Options:
  -C, --cookies FILE          File to load additional cookies from
  --cookies-export FILE       Export session cookies to FILE
  --cookies-from-browser BROWSER[/DOMAIN][+KEYRING][:PROFILE][::CONTAINER]
                              Name of the browser to load cookies from, with optional domain prefixed with '/', keyring
                              name prefixed with '+', profile prefixed with ':', and container prefixed with '::'
                              ('none' for no container (default), 'all' for all containers)

Selection Options:
  -A, --abort N[:TARGET]      Stop current extractor(s) after N consecutive file downloads were skipped. Specify a
                              TARGET to set how many levels to ascend or to which subcategory to jump to. Examples: '-A
                              3', '-A 3:2', '-A 3:manga'
  -T, --terminate N           Stop current & parent extractors and proceed with the next input URL after N consecutive
                              file downloads were skipped
  --filesize-min SIZE         Do not download files smaller than SIZE (e.g. 500k or 2.5M)
  --filesize-max SIZE         Do not download files larger than SIZE (e.g. 500k or 2.5M)
  --download-archive FILE     Record successfully downloaded files in FILE and skip downloading any file already in it
  --range RANGE               Index range(s) specifying which files to download. These can be either a constant value,
                              range, or slice (e.g. '5', '8-20', or '1:24:3')
  --chapter-range RANGE       Like '--range', but applies to manga chapters and other delegated URLs
  --filter EXPR               Python expression controlling which files to download. Files for which the expression
                              evaluates to False are ignored. Available keys are the filename-specific ones listed by
                              '-K'. Example: --filter "image_width >= 1000 and rating in ('s', 'q')"
  --chapter-filter EXPR       Like '--filter', but applies to manga chapters and other delegated URLs

Post-processing Options:
  -P, --postprocessor NAME    Activate the specified post processor
  --no-postprocessors         Do not run any post processors
  -O, --postprocessor-option KEY=VALUE
                              Additional post processor options
  --write-metadata            Write metadata to separate JSON files
  --write-info-json           Write gallery metadata to a info.json file
  --write-tags                Write image tags to separate text files
  --zip                       Store downloaded files in a ZIP archive
  --cbz                       Store downloaded files in a CBZ archive
  --mtime NAME                Set file modification times according to metadata selected by NAME. Examples: 'date' or
                              'status[date]'
  --rename FORMAT             Rename previously downloaded files from FORMAT to the current filename format
  --rename-to FORMAT          Rename previously downloaded files from the current filename format to FORMAT
  --ugoira FMT                Convert Pixiv Ugoira to FMT using FFmpeg. Supported formats are 'webm', 'mp4', 'gif',
                              'vp8', 'vp9', 'vp9-lossless', 'copy', 'zip'.
  --exec CMD                  Execute CMD for each downloaded file. Supported replacement fields are {} or {_path},
                              {_directory}, {_filename}. Example: --exec "convert {} {}.png && rm {}"
  --exec-after CMD            Execute CMD after all files were downloaded. Example: --exec-after "cd {_directory} &&
                              convert * ../doc.pdf"

虽然上面的config.json里面只写了视频和图片,但是文本内容也是独立的下载下来了。

到目前我没有很仔细的研究这个软件的使用,只是收集信息的目的,这个软件是帮我实现了。

等到真的有必要的时候,我再来更新。

| 访问量:
Table of Contents