KiwiFarmer
KiwiFarmer is a Python package for scraping KiwiFarms threads and posts, extracting field values, and storing the results in a created MySQL database.
Run script
KiwiFarmer includes a script (run_smat.py
) for indexing all KiwiFarms posts into an Elasticsearch instance. The script uses a Redis database to keep track of which pages have already been indexed, which avoids redundant reindexing operations.
The script can be run perpetually using the command:
.. code-block:: bash
watch -n0 python run_smat.py
Workflow
KiwiFarmer also includes scripts for a workflow that downloads all website pages as HTML files, extracts relevant field data, and stores the data in a MySQL database.
These scripts are in the workflow/
subdirectory in the package root directory.
For more information, see docs/workflow.rst
TODO
-
add additional user fields for user signature and location
-
expand unit tests
- verify correctness of functions
-
expand instructions and info of docs
-
config file parsing
-
analysis tools/utilities/visualizations
-
improve input argument handling for classes (e.g. type conversion/checking)
File List
# Here is a list of files included in this repository:
- docs\conf.py
- docs\tests\test_conf.py
- examples\export_database_csv.py
- examples\preprocess_reactions.py
- generate-file-list.py
- kiwifarmer\__init__.py
- kiwifarmer\base.py
- kiwifarmer\functions.py
- kiwifarmer\templates.py
- kiwifarmer\utils.py
- run_smat.py
- setup.py
- sort_json_database.py
- tests\old_tests\__init__.py
- tests\old_tests\base.py
- tests\old_tests\conftest.py
- tests\old_tests\functions.py
- tests\old_tests\utils.py
- tests\test_AA_get_thread_url_list.py
- tests\test_AB_download_all_threads.py
- tests\test_AC_insert_threads.py
- tests\test_BA_get_page_url_list.py
- tests\test_BB_download_all_pages.py
- tests\test_BC_insert_pages.py
- tests\test_CA_download_all_users.py
- tests\test_CB_insert_users.py
- tests\test_DA_download_all_users_about.py
- tests\test_DB_get_users_following_list.py
- tests\test_DC_download_all_users_following.py
- tests\test_DD_insert_following.py
- tests\test_EA_insert_trophies.py
- tests\test_FA_get_reaction_url_list.py
- tests\test_FB_download_all_reactions.py
- tests\test_FC_preprocess_reactions.py
- tests\test_FD_insert_reactions.py
- tests\test_FE_get_more_reactions.py
- tests\test_export_database_csv.py
- tests\test_preprocess_reactions.py
- tests\test_run_smat.py
- workflow\AA_get_thread_url_list.py
- workflow\AB_download_all_threads.py
- workflow\AC_insert_threads.py
- workflow\BA_get_page_url_list.py
- workflow\BB_download_all_pages.py
- workflow\BC_insert_pages.py
- workflow\CA_download_all_users.py
- workflow\CB_insert_users.py
- workflow\DA_download_all_users_about.py
- workflow\DB_get_users_following_list.py
- workflow\DC_download_all_users_following.py
- workflow\DD_insert_following.py
- workflow\EA_insert_trophies.py
- workflow\FA_get_reaction_url_list.py
- workflow\FB_download_all_reactions.py
- workflow\FC_preprocess_reactions.py
- workflow\FD_insert_reactions.py
- workflow\FE_get_more_reactions.py
- workflow\GA_combine_data.py
- workflow\GB_organize_data.py
- workflow\GC_visualize_data_user_specific.py
- workflow\GD_visualize_follower_networks.py
- workflow\GF_visualize_more_data.py
- workflow\old_workflows\01-A_get_thread_url_list.py
- workflow\old_workflows\01-B_download_all_threads.py
- workflow\old_workflows\01-C_insert_threads.py
- workflow\old_workflows\02-A_get_page_url_list.py
- workflow\old_workflows\02-B_download_all_pages.py
- workflow\old_workflows\02-C_insert_pages.py
- workflow\old_workflows\03-A_download_all_users.py
- workflow\old_workflows\03-B_insert_users.py
- workflow\old_workflows\04-A_download_all_users_about.py
- workflow\old_workflows\04-B_get_users_following_list.py
- workflow\old_workflows\04-C_download_all_users_following.py
- workflow\old_workflows\04-D_insert_following.py
- workflow\old_workflows\05-A_insert_trophies.py
- workflow\old_workflows\06-A_get_reaction_url_list.py
- workflow\old_workflows\06-B_download_all_reactions.py
- workflow\old_workflows\06-C_preprocess_reactions.py
- workflow\old_workflows\06-D_insert_reactions.py
- workflow\old_workflows\06-E_get_more_reactions.py
- .vscode\settings.json
- data\connection_url_list.json
- data\reaction_data.json
- kiwifarms_20210224.json
- kiwifarms_combined_database.json
- kiwifarms_following_20210224.json
- kiwifarms_reorganized_database.json
- kiwifarms_trophies_20210224.json
- kiwifarms_users_20210224.json
- .gitignore
- .pre-commit-config.yaml
- CNAME
- file_list.txt
- kiwifarms.conf
- pytest.ini
- README.md
- README.rst
- requirements.txt
- script.log
- sitemap.xml
- visualization.log
- _config.yml
- data\connection_url_list.txt
- data\member_url_list.txt
- data\page_url_list.txt
- data\reaction_url_list.txt
- data\sitemap-1.xml
- data\sitemap-2.xml
- data\sitemap-3.xml
- data\sitemap-4.xml
- data\thread_url_list.txt
- data\downloaded_members\10.html
- data\downloaded_members\11.html
- data\downloaded_members\12.html
- data\downloaded_members\3.html
- data\downloaded_members\4.html
- data\downloaded_members\5.html
- data\downloaded_members\6.html
- data\downloaded_members\7.html
- data\downloaded_members\8.html
- data\downloaded_members\9.html
- data\downloaded_members_about\members.brooklynbailiff.3.about.html
- data\downloaded_members_about\members.champthom.2.about.html
- data\downloaded_members_about\members.null.1.about.html
- data\downloaded_members_connections\1.followers.connections.page1.html
- data\downloaded_members_connections\1.followers.connections.page10.html
- data\downloaded_members_connections\1.followers.connections.page11.html
- data\downloaded_members_connections\1.followers.connections.page12.html
- data\downloaded_members_connections\1.followers.connections.page13.html
- data\downloaded_members_connections\1.followers.connections.page14.html
- data\downloaded_members_connections\1.followers.connections.page15.html
- data\downloaded_members_connections\1.followers.connections.page16.html
- data\downloaded_members_connections\1.followers.connections.page17.html
- data\downloaded_members_connections\1.followers.connections.page18.html
- data\downloaded_members_connections\1.followers.connections.page19.html
- data\downloaded_members_connections\1.followers.connections.page2.html
- data\downloaded_members_connections\1.followers.connections.page20.html
- data\downloaded_members_connections\1.followers.connections.page21.html
- data\downloaded_members_connections\1.followers.connections.page22.html
- data\downloaded_members_connections\1.followers.connections.page23.html
- data\downloaded_members_connections\1.followers.connections.page24.html
- data\downloaded_members_connections\1.followers.connections.page25.html
- data\downloaded_members_connections\1.followers.connections.page26.html
- data\downloaded_members_connections\1.followers.connections.page27.html
- data\downloaded_members_connections\1.followers.connections.page28.html
- data\downloaded_members_connections\1.followers.connections.page29.html
- data\downloaded_members_connections\1.followers.connections.page3.html
- data\downloaded_members_connections\1.followers.connections.page4.html
- data\downloaded_members_connections\1.followers.connections.page5.html
- data\downloaded_members_connections\1.followers.connections.page6.html
- data\downloaded_members_connections\1.followers.connections.page7.html
- data\downloaded_members_connections\1.followers.connections.page8.html
- data\downloaded_members_connections\1.followers.connections.page9.html
- data\downloaded_pages\steven-bonnell-ii-destiny-destiny-gg.29205_page-1.html
- data\downloaded_pages\steven-bonnell-ii-destiny-destiny-gg.29205_page-2.html
- data\downloaded_reactions\posts_100_reactions_page_1.html
- data\downloaded_reactions\posts_1_reactions_page_1.html
- data\downloaded_reactions\posts_2127398_reactions_page_1.html
- data\downloaded_reactions\posts_96_reactions_page_1.html
- data\downloaded_threads\christian-sees-the-sights.14.html
- data\downloaded_threads\christian-sees-the-sights.32.html
- data\downloaded_threads\christian-sees-the-sights.32_page-2.html
- data\downloaded_threads\christian-sees-the-sights.32_page-3.html
- data\downloaded_threads\so-this-is-completely-permanent.12.html
- data\downloaded_threads\so-this-is-completely-permanent.30.html
- data\downloaded_threads\the-general-forum-rules.10.html
- data\downloaded_threads\the-general-forum-rules.28.html
- data\downloaded_threads\the-universal-rules.24.html
- data\downloaded_threads\the-universal-rules.6.html
- data\downloaded_threads\welcome-to-the-new-permanent-cwcki-forums.1.html
- data\downloaded_threads\welcome-to-the-new-permanent-cwcki-forums.18.html
- data\downloaded_threads\welcome-to-the-new-permanent-cwcki-forums.18_page-2.html
- data\downloaded_threads\welcome-to-the-new-permanent-cwcki-forums.18_page-3.html
- data\downloaded_threads\welcome-to-the-new-permanent-cwcki-forums.1_page-2.html
- data\downloaded_threads\welcome-to-the-new-permanent-cwcki-forums.1_page-3.html
- data\downloaded_threads\what-has-chris-chan-ruined-for-you.17.html
- data\downloaded_threads\what-has-chris-chan-ruined-for-you.17_page-2.html
- data\downloaded_threads\what-has-chris-chan-ruined-for-you.17_page-3.html
- data\downloaded_threads\what-has-chris-chan-ruined-for-you.17_page-4.html
- data\downloaded_threads\what-has-chris-chan-ruined-for-you.17_page-5.html
- data\downloaded_threads\what-has-chris-chan-ruined-for-you.17_page-6.html
- data\downloaded_threads\what-has-chris-chan-ruined-for-you.17_page-7.html
- data\downloaded_threads\worst-sonichu-pages.16.html
- data\downloaded_threads\worst-sonichu-pages.16_page-10.html
- data\downloaded_threads\worst-sonichu-pages.16_page-11.html
- data\downloaded_threads\worst-sonichu-pages.16_page-12.html
- data\downloaded_threads\worst-sonichu-pages.16_page-13.html
- data\downloaded_threads\worst-sonichu-pages.16_page-14.html
- data\downloaded_threads\worst-sonichu-pages.16_page-15.html
- data\downloaded_threads\worst-sonichu-pages.16_page-16.html
- data\downloaded_threads\worst-sonichu-pages.16_page-17.html
- data\downloaded_threads\worst-sonichu-pages.16_page-18.html
- data\downloaded_threads\worst-sonichu-pages.16_page-19.html
- data\downloaded_threads\worst-sonichu-pages.16_page-2.html
- data\downloaded_threads\worst-sonichu-pages.16_page-20.html
- data\downloaded_threads\worst-sonichu-pages.16_page-3.html
- data\downloaded_threads\worst-sonichu-pages.16_page-4.html
- data\downloaded_threads\worst-sonichu-pages.16_page-5.html
- data\downloaded_threads\worst-sonichu-pages.16_page-6.html
- data\downloaded_threads\worst-sonichu-pages.16_page-7.html
- data\downloaded_threads\worst-sonichu-pages.16_page-8.html
- data\downloaded_threads\worst-sonichu-pages.16_page-9.html
- data_visuals\follower_network.png
- data_visuals\follower_network_analysis.png
- data_visuals\messages_over_time.png
- data_visuals\message_length_distribution.png
- data_visuals\sentiment_distribution.png
- data_visuals\top_contributors.png
- data_visuals\user_activity_heatmap.png
- data_visuals\user_role_distribution.png
- data_visuals\word_cloud.png
- data_visuals\old\follower_network.png
- data_visuals\old\follower_network_analysis.png
- data_visuals\old\messages_over_time.png
- data_visuals\old\message_length_distribution.png
- data_visuals\old\sentiment_distribution.png
- data_visuals\old\top_contributors.png
- data_visuals\old\user_1_activity_heatmap.png
- data_visuals\old\user_1_messages_over_time.png
- data_visuals\old\user_2_activity_heatmap.png
- data_visuals\old\user_2_messages_over_time.png
- data_visuals\old\user_activity_heatmap.png
- data_visuals\old\user_role_distribution.png
- data_visuals\old\word_cloud.png
- docs\alias.rst_
- docs\default_apidocs.sh
- docs\index.rst
- docs\introduction.rst
- docs\make.bat
- docs\Makefile
- docs\overview.rst
- docs\quickstart.rst
- docs\workflow.rst
- docs\figs\database_schema.svg
- docs\figs\favicon.ico
- docs\figs\logo.svg
- docs\tests\test_alias.rst_
- docs\tests\test_index.rst
- docs\tests\test_make.bat
- docs\tests\test_Makefile
- docs\tests\test_overview.rst
- docs\tests\test_quickstart.rst
- docs\tests\test_workflow.rst
- kiwifarmer\__pycache__\base.cpython-311.pyc
- kiwifarmer\__pycache__\functions.cpython-311.pyc
- kiwifarmer\__pycache__\templates.cpython-311.pyc
- kiwifarmer\__pycache__\utils.cpython-311.pyc
- kiwifarmer\__pycache__\__init__.cpython-311.pyc
- .github\dependabot.yml
- .github\labeler.yml
- .github\ISSUE_TEMPLATE\bug_report.md
- .github\ISSUE_TEMPLATE\feature_request.md
- .github\PULL_REQUEST_TEMPLATE\pull_request_template.md
- .github\workflows\ActionLint.yml
- .github\workflows\Bandit.yml
- .github\workflows\black-formatter.yml
- .github\workflows\codeql.yml
- .github\workflows\dependency-review.yml
- .github\workflows\greetings.yml
- .github\workflows\label.yml
- .github\workflows\ossar.yml
- .github\workflows\osv-scanner.yml
- .github\workflows\pylint.yml
- .github\workflows\scorecards.yml
- .github\workflows\sitemap.yml
- .github\workflows\sobelow.yml
- .github\workflows\stale.yml
- .github\workflows\static.yml
- .github\workflows\super-linter.yml