New plugin: compute the cumulative sum of a column
Internal features for enterprise customers
Fix description of rename column(s) transformation
Fix bug that caused groupby/aggregate to fail
JupyterLab 3.0+ support 🎉
Resize the width of a column by double clicking on the right edge of it's column header
You can also resize all column at the same time by clicking in the dataset and typing
After filtering your data, you will get some post transformation insights that show how much of the data has been filtered.
Python 3.9 support: We now support Python 3.9 (well, that was about time! ;))
Export transformation descriptions, i.e. for each pandas code we write for you, we also add a little comment line describing what the code does in plain english. You can enable that feature via the config by running
bam.set_option("global.export_transformation_descriptions", True) in your Jupyter cell.
Users can now request features from within bamboolib!
Improve plugins API
Add human readable error message when no aggregation is specified in Groupby Transformation
Fix warning messages created by sklearn
Fix bug about guessing the dataframe name
We are really excited to announce that bamboolib is ready for the general public after months of intense user feedback and production iterations with our private beta customers.
Our mission is to empower every professional to perform Data Analysis in Python - without the burden of becoming a programmer or the hassle of googling syntax all day long. The key stone towards the mission is the bamboolib GUI that exports Python code. Similar to recording macros in Excel.
But that's not all. In many companies, there are routine analyses that are highly specific to the company and that have to be performed over and over again. Previously, such analyses have often been stored as internal Python libraries - created by and accessible only to the programmers. With bamboolib, those features can be quickly added to the GUI via simple Python plugins. Our existing customers use this to access internal databases, perform data cleaning or even to create complex machine learning models. All accessible to everyone in the company - releasing the programmer bottleneck in the company.
This frees developers from performing routine ETL tasks or creating standard plots for their colleagues and enables the colleagues to access the information they need whenever they need it in a self-service way.
But we are not only serving companies. There are millions of professionals, learners, students, and hobbyists who are willing to access the power of Python. Unfortunately, many of them have been locked out by the steep learning curve when trying to become a programmer. For all those people, we are offering our free community version that makes it easy to analyze their data and create reproducible reports. Providing them with the tools they need to bridge the gap to the fantastic world of open-source software.
We love the open-source ecosystem and everything we do is targeted towards making the open-source ecosystem stronger and more accessible. Both by trying to open-source as many of our internal projects as we can - e.g. ppscore and pyforest - and mainly by trying to build software that enhances the impact of open-source. There is still a long way to go but we will not stop until everyone can access the full power of open-source software.
Thank you for all your support and for all further information, please visit our website.
New plot 📊 - Treemap
Create a copy of your data 💾
Sometimes, you want to keep a copy of the original data before you start transforming it. With the
Create a copy of your column 💾
"Explore data" also supports columns that contain list objects and dictionary objects.
Bulk Change Data Types ◼
Change the data type of multiple columns in one go.
All dropdowns can handle very long column names
Sort x-axis by category name in histogram and bar chart.
Make x-axis categorical (handy when you want to make the bars in a histogram have some space in between).
Rename the x-axis. You can do that by searching "Rename column label" when adding a property to the Plot Creator.
Renamed the "Find and Replace" transformations so that they are easier to find and distinguish
Fixed a bug that caused bi-variate plots to crash when missing values needed to be dropped
Make ppscore handle Int8
Sometimes, the scrollbar hid the last row in the interactive data table view. We made sure that users can always see the full table.
Read Excel Files 📑
import bamboolib as bam
you now can also read in Excel spreadsheets
Major PlotCreator Upgrade 🎉 Over the last months, we've developed a new Plotting interface that makes the PlotCreator much more powerful. You can now almost change every bit of your plotly plots, including the color theme, the legend title, axis names, and tick distance.
New plot 📊- Candlestick chart
New transformations 🧹
Drop a column if it contains at least one missing values or if it contains only missing values
Calculate the cumulative product of values in a column
Calculate the percentage change of values in a column
We improved the user experience for all text transformations. You can now search for them directly in the search bar (just type "text" to see all text transformations). Also, most text transformations now allow selecting multiple columns and renaming the result for a significant speedup.
Internal developments for our corporate clients and improvements to our architecture.
Fixed a bug that caused an error when loading in CSV files on Windows
After installing bamboolib, plotly wasn't enabled inside the GUI.
Clean column names of your dataframe 🧹
Do you have messy column names (blanks and punctuation everywhere) and want to keep them in one clean format? "Clean column names" transforms your column names into one clean format.
Load Plugin support 💾
Write loader plugins yourself and add them to bamboolib so that you can read in any data source you need. There are no limits. Read in an excel file or distributed data from a Hadoop cluster. If you can express it in Python, you can call it via GUI from bamboolib.
View Plugin support 📊
Users can now write plugins that display something to the end user without altering the underlying dataframe. This can be user for creating company specific clustering plots or for conducting and displaying table results of statistical tests.
We improved the layout of our explore dataframe features to make it look cleaner.
Fixed a bug that caused the univariate summary to throw an
unknown semantic datatype Error when faced with a boolean column that only has one unique value.
Data Loader - Read in CSVs 📂
Read in CSV files using bamboolib without having to write any code
Button to plugin docs.
Check if the resulting dataframe name of a transformation is valid python variable name and raise a human-readable error it it isn't.
Fix bivariate plots bugs that led to errors when e.g. columns contained missing values
We just couldn't hold back and had to crank out another feature this week!
With the Bivariate Time Series Plot, you can quickly plot your variables across time, and choose any granularity level you want. Inspect your average sales by year, quarter, months, etc.
Change datetime frequency ⏱
Either expand your time series column and fill it with values (grid expand) or do a groupby and calculate aggregations. This corresponds to pandas'
Univariate Time series summary - quality bar and new summary plot 📊
We added a data quality bar to the univariate summary of datetime columns and created a new plot which allows you to see the value counts for different aggregation levels of your time series (e.g. day, month, or year).
Do you have text data that you want to transform into numeric labels? The LabelEncoder lets you do exactly that!
PlotCreator - Add multiple y's to your plots 📈
Select multiple y-Axes for your plots.
In line plots, plotly draws lines by connecting the observations in your data in the order of their appearance in the dataframe - from top to bottom. This is a feature of plotly that allows you to draw circles or other geometric shapes. However, when working with bamboolib, users just want to have a time series like plot. That's what we ensure now by sorting the x-Axis.
With pandas 1.0+ came the new
NA type for missing values. We fixed a bug that cause the PlotCreator to throw an error when confronted with
Select Columns for preview ➡ Support 10,000+ columns dataframes 💪
Show a preview of your data on a selected subsample of your columns for more convenience and better performance when working with very (VERY) wide dataframes. This feature is also accessible via the glimpse, predictor patterns and correlation matrix.
Sampling columns on large data 🧹
When working with more than 100k rows, some interactive plots can become quite slow. In order to save you (and your computer), we randomly sample 100k rows if your data exceeds the row limit. But don't be afraid! With a click of a button, you can turn the sampling off.
Having a history of your transformations is handy, but it comes with the cost of requiring more computer memory. If you work with large datasets, you can save memory by controlling how many steps you can go back in history or turn off the history altogether (the latter with
Fixed a pandas 1.0 regression in "extract datetime attributes".
Combine Dataframes ➡⬅
Choose to stack dataframes on top of each other or to combine them side-by-side
Fixed a pandas 1.0 regression problem that caused numeric to numeric plots to break
This release comes with two major supports 🥳
Support pandas 1.0+🐼
Support JupyterLab 2.0+ 🛰
Also, we show more human-friendly errors in the 'Join' transformation and the 'PivotTable' feature.
This release makes two small adjustments:
Show human-friendly errors in the 'New Column Formula' transformation when the inputs are empty.
Adjust the code to prevent an error when the 'Pivot Table' returns a pandas.Series and not a pandas.DataFrame.
In this release, we fixed a couple of bugs and prepared the infrastructure for some of the advanced enterprise features like custom plugins.
Interactive Histogram 2.0 🏋
We released an improved version of our interactive histogram in the Explorer. Zoom into your data with drag & drop or set ranges using the input fields, undo the last zooms or reset the zoom altogether, and get the code of the histogram!
Search on Tab 🔎
If you open bamboolib, you can directly focus the search bar "Search transformations" via hitting Tab.
Drop duplicates 🧹
Do you have any duplicates in your data set? Simply drop them and decide whether you want to keep the first or last of the duplicates in your data.
bamboolib normalizes data frames also when the index column is un-named.
Fixed a bug that made scatter plots not appear in the Explorer
Fixed a bug that made "Create New Column Formula" not work when column names are substrings of other column names.
With this release, we also did a lot of house-keeping.
Meet our new glimpse - the main entry point to the Explore DataFrame feature. The new glimpse loads asymptotically and asynchronously, which comes with one huge advantage: no matter how large your data, you will get an immediate visual feedback of your columns. If your data is large, the glimpse will draw a first random sample of your data and then updates itself on the full data in the background.
Also, you can dive into the univariate summary of each column directly from the new glimpse.
bamboolib shows an update notice when a new version of bamboolib is available
All function hints under Explore DataFrame match the name of the dataframe under use
Fixed a bug which made the Statistics table disappear when looking at numeric column summaries
Fixed a bug that made a text box lose focus in the transformation Add Python Code
Introducing the new tab system 🥳
Users can now work inside the Plot creator, the Explorer and the Wrangler in parallel! If you click on the Plot creator or Data explorer, a new tab will open, so you can easily move between the wrangler and chart tabs. When you transform the data, the plots in the other tabs automatically adjust!
We also used the new tab system inside the explorer already.
More Univariate summary info on column click 🔎
When clicking on a column name, the user will directly be navigated to a full univariate summary tab
Support nullable Integers
Ever found yourself in that weird situation where you wanted to convert a
float column to integer and just nothing happens when you run the code? In most cases, that's because there are
NaNs in the column you want to convert. When converting
int, bamboolib makes sure that really happens by exploiting pandas nullable
Resample dataframe based on datetime column 📅
We enable users to change the frequency of time series data and allow to aggregate data based on datetime columns
Improve Live Code Export experience 🖥
We made the live code export more look like the user actually has typed the code. Among others, we removed the bamboolib code export comment.
Specify string format when converting datetime to string 📅
When converting a datetime to string, you can now specify in what format you want to display the strings in your data, e.g. display
Jan 1, 2020.
All datatype conversion features can directly be accessed via the search bar⌚
For example, simply type "to datetime" in the main search bar if you want to convert a column to datetime
Fast entering bamboolib
After a user has clicked on the bamboolib UI button to enter the user interface, she doesn't have to click on it anymore when she inspects a dataframe. This behaviour resets on a kernel restart.
Fixed bug so that "Pivot Table" feature can do a count of values
Filter on value in other column
When working with numeric columns, you can specify any valid python expression to filter rows. That also includes specifying values in other columns. And or course, we provide column auto-completion in the filter. If you want that feature also for other column types, please let us know!
Convert invalid input in number column to NaN
Have numbers stored as strings in your pandas DataFrame? When coercing them to numeric datatypes, bamboolib will automatically convert any invalid input to NaN.
Impute missing values 🖌
Replace missing values with fixed values or impute them with the mean, median, or mode of a column. You can also forward fill or backward fill missing values!
Rename column after changing dtype
When changing the datatype of a column, you can directly rename it. Also, when changing the datatype of a column to datetime, we provide examples for format strings so that you can easier decide how you want to format your datetime column.
Fixed a bug that caused buttons in JupyterLab to not show the whole column names in the "Explore DataFrame" ~ columns tab when the column names were too long.
Python 3.8 support 🥳
Python 3.8 support is finally there! And in the future, new versions of Python will be supported much quicker.
Column formula improved
We redesigned the column formula feature to be more easy to use.
Data wrangling speed improvement on large datasets 🚀
On a i7 macbook with 16GB RAM, wrangling data sets with more than 5 mio rows is now possible without any lags.
The dimensions of the data frame are displayed in a human readable fashion
We added descriptions to the dropdown elements in our wrangler ("Search transformations" dropdown) and plot creator
Drop missing values ✂
With the new transformation "Dropping missing (NA) values", you can quickly drop all missing values in one or multiple columns
Reorder columns 🔁
Move columns to the front or back of your DataFrame or place them wherever you want.
Categorise numeric values with your new binning feature
Groupby / aggregate
We added new selection options. Apply groupby functions on all columns, on all columns matching a data type or on all columns matching a regular expression.
We enlarged the code box in the "Add Python code" transformation so that you can add larger snippets more easily
Create Pivot Tables📑
Create full fletched pivot tables, including code export.
Create multiple aggregations for one column faster than ever 💪
Apply multiple aggregation functions on a column quickly through a multi-select dropdown
Support for categorical dtype
Convert string columns into categorical ones to save memory.
Rename columns during groupby / aggregate
Quick edit the last transformation from the main control panel
We added a new plot creator that lets you quickly build interactive plotly express graphs. You can also export the code for further customization. The creator works with the most important plot figure types. In case you are missing a specific figure type, please let us know.
Public license file for automatic activation
We provide a public license file under
~/.bamboolib/LICENSE that can be used for automatic license activation in docker or on other computers / VMs
We replaced the close, back, and delete buttons with icons, among others
Added a README to pypi.org. It was about time! 🙂
Keyboard support - Part 2⌨ In order to further improve the user experience when working with the keyboard, we replaced all transformation buttons from the main panel with one text input field, which means that all transformations are now fully searchable via the keyboard.
Better experience when working with the keyboard
When starting a transformation, the first input is always focused. Also, users can select values from the dropdown using tab.
Live code export
We made improvements on the user experience based on user feedback.
More plugin examples We added more examples to our plugin docs, e.g. a plugin example for how to write a custom aggregation function and a plugin showing how you can extract attributes from time delta columns (If you these plugins to become part of our core functionalities, please send us an email).
If the user selects a key for
df_left and there is a key with the same name in
df_right, we will automatically select that key. Also, the exported code simplifies when a merge with same keys has been carried out.
Text transformations (such as "filter" and "replace values") support case-sensitivity and regular expressions now.
"rename", "string manipulations", and "extract datetime attributes" are top-level transformations now and available via search.
After importing bamboolib, you don't need to call
bam.enable() anymore in order to show bamboolib when printing
We removed the normalization step which made sure that the dataframe index is always a RangeIndex.
Keyboard support ⌨ Transformations can now be created via typing on the keyboard - including auto-completion at every step. Mouse is still possible of course.
Plugin - beta🔌 We started implementing a plugin architecture. Starting with this release, you can write your own custom transformations. Please note that we are in beta mode currently, so the API may change over time.
Fixed an issue that caused an error when filtering values in a column that contains NAs
Fixed some CSS specificity issues in JupyterLab
Fixed an issue in JupyterLab that made bamboolib multi-select dropdowns not work properly
some users couldn't use bamboolib for free on Kaggle. We fixed that, because we love our users :)
Rename transformation results🔤 You can now rename the dataframe after a transformation (e.g. name the result of a filter "df_subset")
Edit last transformation ✏ You can edit the last transformation when looking at the history of you transformations.
Increase memory efficiency with numpy dtypes 💾 New support of numpy data types (e.g. int8, int16, ...) so that you can reduce the memory space of your dataframes
Live code export now also works on Firefox
We now support JupyterLab
Fixed an issue that made the Copy-Code-Button not work
Changed the styling of our buttons