Data Loader - Read in CSVs 📂
Read in CSV files using bamboolib without having to write any code
Button to plugin docs.
Check if the resulting dataframe name of a transformation is valid python variable name and raise a human-readable error it it isn't.
Fix bivariate plots bugs that led to errors when e.g. columns contained missing values
We just couldn't hold back and had to crank out another feature this week!
With the Bivariate Time Series Plot, you can quickly plot your variables across time, and choose any granularity level you want. Inspect your average sales by year, quarter, months, etc.
Change datetime frequency ⏱
Either expand your time series column and fill it with values (grid expand) or do a groupby and calculate aggregations. This corresponds to pandas'
Univariate Time series summary - quality bar and new summary plot 📊
We added a data quality bar to the univariate summary of datetime columns and created a new plot which allows you to see the value counts for different aggregation levels of your time series (e.g. day, month, or year).
Do you have text data that you want to transform into numeric labels? The LabelEncoder lets you do exactly that!
PlotCreator - Add multiple y's to your plots 📈
Select multiple y-Axes for your plots.
In line plots, plotly draws lines by connecting the observations in your data in the order of their appearance in the dataframe - from top to bottom. This is a feature of plotly that allows you to draw circles or other geometric shapes. However, when working with bamboolib, users just want to have a time series like plot. That's what we ensure now by sorting the x-Axis.
With pandas 1.0+ came the new
NA type for missing values. We fixed a bug that cause the PlotCreator to throw an error when confronted with
Select Columns for preview ➡ Support 10,000+ columns dataframes 💪
Show a preview of your data on a selected subsample of your columns for more convenience and better performance when working with very (VERY) wide dataframes. This feature is also accessible via the glimpse, predictor patterns and correlation matrix.
Sampling columns on large data 🧹
When working with more than 100k rows, some interactive plots can become quite slow. In order to save you (and your computer), we randomly sample 100k rows if your data exceeds the row limit. But don't be afraid! With a click of a button, you can turn the sampling off.
Having a history of your transformations is handy, but it comes with the cost of requiring more computer memory. If you work with large datasets, you can save memory by controlling how many steps you can go back in history or turn off the history altogether (the latter with
Fixed a pandas 1.0 regression in "extract datetime attributes".
Combine Dataframes ➡⬅
Choose to stack dataframes on top of each other or to combine them side-by-side
Fixed a pandas 1.0 regression problem that caused numeric to numeric plots to break
This release comes with two major supports 🥳
Support pandas 1.0+🐼
Support JupyterLab 2.0+ 🛰
Also, we show more human-friendly errors in the 'Join' transformation and the 'PivotTable' feature.
This release makes two small adjustments:
Show human-friendly errors in the 'New Column Formula' transformation when the inputs are empty.
Adjust the code to prevent an error when the 'Pivot Table' returns a pandas.Series and not a pandas.DataFrame.
In this release, we fixed a couple of bugs and prepared the infrastructure for some of the advanced enterprise features like custom plugins.
Interactive Histogram 2.0 🏋
We released an improved version of our interactive histogram in the Explorer. Zoom into your data with drag & drop or set ranges using the input fields, undo the last zooms or reset the zoom altogether, and get the code of the histogram!
Search on Tab 🔎
If you open bamboolib, you can directly focus the search bar "Search transformations" via hitting Tab.
Drop duplicates 🧹
Do you have any duplicates in your data set? Simply drop them and decide whether you want to keep the first or last of the duplicates in your data.
bamboolib normalizes data frames also when the index column is un-named.
Fixed a bug that made scatter plots not appear in the Explorer
Fixed a bug that made "Create New Column Formula" not work when column names are substrings of other column names.
With this release, we also did a lot of house-keeping.
Meet our new glimpse - the main entry point to the Explore DataFrame feature. The new glimpse loads asymptotically and asynchronously, which comes with one huge advantage: no matter how large your data, you will get an immediate visual feedback of your columns. If your data is large, the glimpse will draw a first random sample of your data and then updates itself on the full data in the background.
Also, you can dive into the univariate summary of each column directly from the new glimpse.
bamboolib shows an update notice when a new version of bamboolib is available
All function hints under Explore DataFrame match the name of the dataframe under use
Fixed a bug which made the Statistics table disappear when looking at numeric column summaries
Fixed a bug that made a text box lose focus in the transformation Add Python Code
Introducing the new tab system 🥳
Users can now work inside the Plot creator, the Explorer and the Wrangler in parallel! If you click on the Plot creator or Data explorer, a new tab will open, so you can easily move between the wrangler and chart tabs. When you transform the data, the plots in the other tabs automatically adjust!
We also used the new tab system inside the explorer already.
More Univariate summary info on column click 🔎
When clicking on a column name, the user will directly be navigated to a full univariate summary tab
Support nullable Integers
Ever found yourself in that weird situation where you wanted to convert a
float column to integer and just nothing happens when you run the code? In most cases, that's because there are
NaNs in the column you want to convert. When converting
int, bamboolib makes sure that really happens by exploiting pandas nullable
Resample dataframe based on datetime column 📅
We enable users to change the frequency of time series data and allow to aggregate data based on datetime columns
Improve Live Code Export experience 🖥
We made the live code export more look like the user actually has typed the code. Among others, we removed the bamboolib code export comment.
Specify string format when converting datetime to string 📅
When converting a datetime to string, you can now specify in what format you want to display the strings in your data, e.g. display
Jan 1, 2020.
All datatype conversion features can directly be accessed via the search bar⌚
For example, simply type "to datetime" in the main search bar if you want to convert a column to datetime
Fast entering bamboolib
After a user has clicked on the bamboolib UI button to enter the user interface, she doesn't have to click on it anymore when she inspects a dataframe. This behaviour resets on a kernel restart.
Fixed bug so that "Pivot Table" feature can do a count of values
Filter on value in other column
When working with numeric columns, you can specify any valid python expression to filter rows. That also includes specifying values in other columns. And or course, we provide column auto-completion in the filter. If you want that feature also for other column types, please let us know!
Convert invalid input in number column to NaN
Have numbers stored as strings in your pandas DataFrame? When coercing them to numeric datatypes, bamboolib will automatically convert any invalid input to NaN.
Impute missing values 🖌
Replace missing values with fixed values or impute them with the mean, median, or mode of a column. You can also forward fill or backward fill missing values!
Rename column after changing dtype
When changing the datatype of a column, you can directly rename it. Also, when changing the datatype of a column to datetime, we provide examples for format strings so that you can easier decide how you want to format your datetime column.
Fixed a bug that caused buttons in JupyterLab to not show the whole column names in the "Explore DataFrame" ~ columns tab when the column names were too long.
Python 3.8 support 🥳
Python 3.8 support is finally there! And in the future, new versions of Python will be supported much quicker.
Column formula improved
We redesigned the column formula feature to be more easy to use.
Data wrangling speed improvement on large datasets 🚀
On a i7 macbook with 16GB RAM, wrangling data sets with more than 5 mio rows is now possible without any lags.
The dimensions of the data frame are displayed in a human readable fashion
We added descriptions to the dropdown elements in our wrangler ("Search transformations" dropdown) and plot creator
Drop missing values ✂
With the new transformation "Dropping missing (NA) values", you can quickly drop all missing values in one or multiple columns
Reorder columns 🔁
Move columns to the front or back of your DataFrame or place them wherever you want.
Categorise numeric values with your new binning feature
Groupby / aggregate
We added new selection options. Apply groupby functions on all columns, on all columns matching a data type or on all columns matching a regular expression.
We enlarged the code box in the "Add Python code" transformation so that you can add larger snippets more easily
Create Pivot Tables📑
Create full fletched pivot tables, including code export.
Create multiple aggregations for one column faster than ever 💪
Apply multiple aggregation functions on a column quickly through a multi-select dropdown
Support for categorical dtype
Convert string columns into categorical ones to save memory.
Rename columns during groupby / aggregate
Quick edit the last transformation from the main control panel
We added a new plot creator that lets you quickly build interactive plotly express graphs. You can also export the code for further customization. The creator works with the most important plot figure types. In case you are missing a specific figure type, please let us know.
Public license file for automatic activation
We provide a public license file under
~/.bamboolib/LICENSE that can be used for automatic license activation in docker or on other computers / VMs
We replaced the close, back, and delete buttons with icons, among others
Added a README to pypi.org. It was about time! 🙂
Keyboard support - Part 2⌨ In order to further improve the user experience when working with the keyboard, we replaced all transformation buttons from the main panel with one text input field, which means that all transformations are now fully searchable via the keyboard.
Better experience when working with the keyboard
When starting a transformation, the first input is always focused. Also, users can select values from the dropdown using tab.
Live code export
We made improvements on the user experience based on user feedback.
More plugin examples We added more examples to our plugin docs, e.g. a plugin example for how to write a custom aggregation function and a plugin showing how you can extract attributes from time delta columns (If you these plugins to become part of our core functionalities, please send us an email).
If the user selects a key for
df_left and there is a key with the same name in
df_right, we will automatically select that key. Also, the exported code simplifies when a merge with same keys has been carried out.
Text transformations (such as "filter" and "replace values") support case-sensitivity and regular expressions now.
"rename", "string manipulations", and "extract datetime attributes" are top-level transformations now and available via search.
After importing bamboolib, you don't need to call
bam.enable() anymore in order to show bamboolib when printing
We removed the normalization step which made sure that the dataframe index is always a RangeIndex.
Keyboard support ⌨ Transformations can now be created via typing on the keyboard - including auto-completion at every step. Mouse is still possible of course.
Plugin - beta🔌 We started implementing a plugin architecture. Starting with this release, you can write your own custom transformations. Please note that we are in beta mode currently, so the API may change over time.
Fixed an issue that caused an error when filtering values in a column that contains NAs
Fixed some CSS specificity issues in JupyterLab
Fixed an issue in JupyterLab that made bamboolib multi-select dropdowns not work properly
some users couldn't use bamboolib for free on Kaggle. We fixed that, because we love our users :)
Rename transformation results🔤 You can now rename the dataframe after a transformation (e.g. name the result of a filter "df_subset")
Edit last transformation ✏ You can edit the last transformation when looking at the history of you transformations.
Increase memory efficiency with numpy dtypes 💾 New support of numpy data types (e.g. int8, int16, ...) so that you can reduce the memory space of your dataframes
Live code export now also works on Firefox
We now support JupyterLab
Fixed an issue that made the Copy-Code-Button not work
Changed the styling of our buttons