Pandas Reading Multiple Files With Same Extension
Learning Outcomes
- To acquire how to delete single file directories.
- To learn how to delete multiple file directories.
- To practice deleting multiple .txt files and too other files.
So in the final episode nosotros learned how to combine multiple .csv files within Python.
Nevertheless every bit is often the case, after we're finished with wrangling the data, we'll need to delete these specific files from our local environment.
Therefore, its vital that we can:
- Delete multiple folder directories within Python.
- Delete all of the files inside our current working directory that are a specific filetype (.csv, .txt) etc.
Import packages
import os import glob import pandas as pd import shutil !pwd /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files Notice how when we practice!ls from the Jupyter notebook or ls from the terminal/command line we can seein that location are several .txt files and file directories that we'd like to delete and go along rubber:
!ls - Jupyter notebook ls - Command line
Permit's create a scenario where we have several folders and files that we'd like to either keep or delete.
Directories To Delete:
- ahrefs_backlink_data
- csv_data_to_delete
- digital_marketing_content
- seo_marketing_content
Directories To Go along Safe:
- I_never_want_to_delete_this_folder
Files To Delete:
- delete_me.txt
- delete_this_file.txt
- practicing_deleting.txt
Files To Keep Safe:
- keepthisfilesafe.txt
Deleting Specific File Directories With Python
Firstly let's see if nosotros tin can find some patterns within the directories that we would similar to delete or continue!
As we can see both the directories that we want to keep/delete contain underscores, and so there is no deviation hither. However,we practice desire to delete all of the sub-directories autonomously from just one directory so we can just:
- Obtain all of the file directories within the current working directory.
- Remove the I_never_want_to_delete_this_folder from our python list and then delete the remaining file directories!
And so let'south code that up
!pwd /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files # Allow'south define our current path here: # You volition need to change this to be unique to your specific directory path: path = '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files' This command willscan all of the files and folders within the current working directory, we will also filter this past adding and if statement and ensuring that the iterable (each item in the list) is a folder:
[ some_code_here if.is_dir()] list_subfolders_with_paths = [f.path for f in bone.scandir(path) if f.is_dir()] print(f'''In that location are all of the current subfolders within our electric current working directory: n {list_subfolders_with_paths }''') There are all of the current subfolders within our current working directory: ['/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/seo_marketing_content', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/ahrefs_backlink_data', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/I_never_want_to_delete_this_folder', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/.ipynb_checkpoints', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/digital_marketing_content', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/csv_data_to_delete'] Now we can only do a list comprehension to only select file names thatnever contain the words "I_never_want" inside the cord.
subfolders_to_delete = [folder_name for folder_name in list_subfolders_with_paths if "I_never_want" non in folder_name] print(f"These are the subfolders that we would like to delete: due north n {subfolders_to_delete}") These are the subfolders that we would like to delete: ['/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/seo_marketing_content', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/ahrefs_backlink_data', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/.ipynb_checkpoints', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/digital_marketing_content', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/csv_data_to_delete'] The important syntax to note above is:
if "I never want" not in folder_name This means that as we loop over every file_name if "I_never_want" is not within the string proper noun, it is included within the list comprehension,however the file I_never_want_to_delete_this_folder does have this string within it and is therefore excluded from the terminal python list.
Now that nosotros've got all of the subfolders in a listing nosotros will just create a for loop to delete every binder with the following command:
shutil.rmtree() for folder in subfolders_to_delete: print(folder) impress('-----') shutil.rmtree(folder) impress(f"Deleted the {folder} from your hardrive") /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/seo_marketing_content ----- Deleted the /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/seo_marketing_content from your hardrive /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/ahrefs_backlink_data ----- Deleted the /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/ahrefs_backlink_data from your hardrive /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/.ipynb_checkpoints ----- Deleted the /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/.ipynb_checkpoints from your hardrive /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/digital_marketing_content ----- Deleted the /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/digital_marketing_content from your hardrive /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/csv_data_to_delete ----- Deleted the /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/csv_data_to_delete from your hardrive Pro Tip: Y'all tin can onlydelete files one time and they don't become to your recycling bin!
And so definitely make sure to use impress() statements and double cheque that the files / folders are the ones you would like delete before committing to it!
We can double check that all of the folders have been deleted past either running:
!ls in a jupyter notebook ls on terminal / gitbash
How To Delete Specific File Types In Your Current Working Directory With Python
Now that we've deleted all of the folders and the files/folders inside of those specific folders.
Permit'south practice deleting some specific .txt files from our current working directory!
I'thousand going to show you ii dissimilar ways we could solve this trouble:
- The file that we want to keep doesn't comprise underscores _ ,therefore we could delete all of the files containing underscores.
- All of the files that we want to delete as wellcontain the phrase "delet" sowe could technically delete all files which match this text string.
How To Get All Of The Files Within The Electric current Working Directory In Python
Firstly let'due south obtain all of the directories and and so filter it past only items that are also files:
files = [f for f in os.listdir('.') if bone.path.isfile(f)] print(files) ['.DS_Store', 'how-to-delete-multiple-files-in-python.ipynb', 'practicing_deleting.txt', 'delete_this_file.txt', 'keepthisfilesafe.txt', 'delete_me.txt'] Remember that after you've run one of the following ii methods, the second one volition not piece of work as the files have already been deleted:
Method I:
for f in files: # And then we will look at every file and if contains an underscore then delete the file! if "_" in f: impress(f) # os.remove() allows us to easily remove single files <3 os.remove(f) # Output: # .DS_Store # practicing_deleting.txt # delete_this_file.txt # delete_me.txt Method Two:
for f in files: # Searching for only files that contain both .txt and delet inside the file proper noun if ".txt" in f and "delet" in f: print(f) # You would need to uncomment the method below! # bone.remove(f) # Output: # practicing_deleting.txt # delete_this_file.txt # delete_me.txt How To Delete Multiple File Types Inside The Electric current Working Directory
Now let's brand our method two slightly more complex. For example let'southward say we wanted to delete multiple filetype extensions including .pdf, .csv and .txt files!
# The bear on control allows us to create new files via final: !touch awesomefile.pdf !touch text.csv !bear on thisisatest.txt
files = [f for f in os.listdir('.') if os.path.isfile(f)] for f in files: if f.endswith(('.pdf','.csv', '.txt')): # This will only expect at files ending with the in a higher place extensions! impress(f, "would be deleted if os.remove() was uncommented!") # os.remove(f) text.csv would be deleted if os.remove() was uncommented! awesomefile.pdf would be deleted if bone.remove() was uncommented! keepthisfilesafe.txt would be deleted if os.remove() was uncommented! thisisatest.txt would exist deleted if bone.remove() was uncommented! Another methodwould be to utilize negation (delete everything but).
if not (some_condition - Truthful / False) for f in files: if not f.endswith(('.txt', '.ipynb')): print(f, "would be deleted if os.remove() was uncommented!") # os.remove(f) text.csv would be deleted if os.remove() was uncommented! awesomefile.pdf would be deleted if os.remove() was uncommented! Remember!
.endswith() accepts a tuple of things that y'all want to friction match against and returns True if whatsoever of them friction match. How To Search For Specific File Types From Current Directories Downward
If you have tons of subfolders and you would similarto detect any files in any of these folders, you can use thebone.walk() role.
for root, dirnames, filenames in os.walk(binder): for filename in filenames: if filename.endswith(extensions): do_something() !pwd /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files folder="/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files" # Create a listing of matches matches = [] extensions = ('.csv', '.txt') for root, dirnames, filenames in os.walk(folder): for filename in filenames: if filename.endswith(extensions): matches.append(os.path.bring together(root, filename)) print("This is the root binder:", root) print("northward These are the files matches obtained from all subsequent lower directories including the root folder:", matches) This is the root binder: /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/.ipynb_checkpoints These are the files matches obtained from all subsequent lower directories including the root folder: ['/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/text.csv', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/keepthisfilesafe.txt', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/thisisatest.txt'] How To Delete Specific File Types In All Of The Directories Beneath Your Current Working Directory
Now what about recursively deleting all files with a specific file type in a series of subfolders?
No trouble!
We will still use the os.walk() function, however notice that instead of appending the results to a listing,we tin can just delete the file instead:
for root, dirnames, filenames in os.walk(folder): for filename in filenames: if filename.endswith(extensions): os.remove(filename) # Find here how nosotros are using os.remove() instead of appending to a list! Hopefully you can see how easy it is todelete files and folders within your local folders at scale.
Reading and deleting multiple files ensures that you can create simple information pipelines such as:
- Manually download 100x .csv files.
- Automatically opening all of the .csv files.
- Concatenate the .csv files together into a pandas dataframe.
- Perform some information manipulation on the merged information.
- Save the concatenated pandas dataframe as a new csv i.eastward. main.csv
- Delete all of the original .csv files.
In the next episode, nosotros'll exist learning how we can further automate our information assay and data pipelines past directly reading and writing information to google sheets in pandas!
What'due south your reaction?
moralesmourrought.blogspot.com
Source: https://understandingdata.com/python-for-seo/how-to-delete-multiple-files-and-folders-in-python/
0 Response to "Pandas Reading Multiple Files With Same Extension"
Post a Comment