functions and built a simple Streamlit web app with them. Once installed, run the script with the command streamlit run script.py. To run the script, you need to have Streamlit installed, you can run the command pip install streamlit in your terminal. It uses the same functions as the previous version of the script to load, convert, and clean the dataset, then split it into test and train sets. The user can then click the "Process Data" button to run the script and preprocess the data. It uses the Streamlit library to create an interactive web app that allows the user to input the file path, test/train split size, and threshold for the number of missing values per record. This version is a Streamlit app that allows the user to provide the same arguments as command-line arguments. St.success("Data preprocessing completed!") Train_df, test_df = split_data(df, test_size) Threshold = st.number_input("Enter the threshold for the number of missing values per record: ", step=1, value=1)ĭf, conversions = load_and_convert_data(file_path)ĭf = handle_missing_values(df, threshold) Test_size = st.number_input("Enter the train/test split size (decimal between 0 and 1): ", step=0.01, value=0.2) St.set_page_config(page_title="Data Preprocessing", page_icon=":guardsman:", layout="wide")įile_path = st.text_input("Enter the path/name of the dataset csv file: ") Return train_test_split(df, test_size=test_size) # Drop records with more than threshold missing valueĭf.dropna(thresh=len(df.columns) - threshold, inplace=True) # Impute missing values for records with one missing valueįor col in missing_values.index:ĭf.fillna(df.median(), inplace=True) # Convert string values to numeric and track conversions in dictionaryĬonversions = ĭef handle_missing_values(df, threshold): # Initialize dictionary to track string to numeric conversions
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |