I often work with data that is too big to fit into memory, which can be an issue if you need all the data to calculate the average. I run into this problem often when working with large ensemble models, which can be tens or sometimes hundreds of gigabytes of data. One tool that has greatly improved my productivity and allowed me to efficiently work with such large datasets in Python is Dask.
This article is meant to showcase how Dask can be used to improve scalability by “chunking” your data before analyzing it. …
Zenodo is a free platform that allows anyone to upload and store data, making it searchable and providing a digital object identifier (DOI). Uploading small files from your local computer is easy with the drag-and-drop feature on the webpage. But if the files are large and on a remote computer then you need to use Zenodo REST API.
This post describes how to interact with your Zenodo account using the API via Python so you can create projects and…
In my day-to-day data work, I routinely find myself running a lot of
for loops. These can take minutes to complete, which isn’t necessarily a long time, but looping is embarrassingly parallelizable. We can do better.
In this article, I will discuss how to make more efficient use of your time when working in Python. Whether you work on a laptop or a high-performance computer (HPC), you can speed up your workflow by taking full advantage of all the computing power available to you. This can be achieved with the
Dask-jobqueue libraries. …
Homo sapiens have been apex predators since the beginning of time. This is because of our species’ uncanny ability to adapt to and overcome any challenge thrown at us. Adaptation and survival of the fittest are the cornerstones of neo-Darwinism, leading to species survival or going extinct. In the case of Homo sapiens, thriving.
We are all accustomed to adaptations and transitions in the way we live our lives. My first bank account came with blank checks, later when I was in college it came with a debit card, and now I rarely use either of these and just tap…
Jupyterlab is a web-based interactive development environment (IDE) for Jupyter notebooks, code, and data. Connecting and running Jupyterlab from a laptop is straightforward. You simply type
jupyter lab into your terminal and Jupyterlab will open in your browser, with the Notebook server running in your terminal.
What if you want to run it from a remote computer? How do you connect to Jupyterlab if it is running on a remote computer? This post will walk you through how to launch Jupyterlab on a SLURM cluster and then connect from a browser running on your local machine.
Miniconda is a minimal conda installer including only Python, conda, and its dependencies. Miniconda allows you to create customized environments by installing additional packages from the thousands of additional packages. You can create separate environments for each project and easily share your environment with others for easy reproducibility. With conda, you can create an environment on your local machine and seamlessly copy it to large shared high-performance computers (HPC).
Atmospheric CO₂ is currently around 420 ppm but would likely be 150–200 μatm greater if there was no life in the ocean, demonstrating the important and significant role the ocean ecosystem plays in the Earth system.
This post will explain how the ocean ecosystem modulates atmosphere CO₂ and how this may be changing now and in the future.
Our story begins with an organism responsible for the lion’s share of the Earth’s oxygen production: plankton.
There are two basic classes of plankton: phytoplankton (plants) and zooplankton (animals). Just like plants on land capture sunlight and CO₂ from the atmosphere to…
Lake Superior is the largest lake in the world by surface area and holds roughly 10% of the world’s surface freshwater. Carved out during the last ice age, the lake is a mere 10,000 years— young, considering some African lakes are millions of years old.
Something peculiar is happening to this young lake. It is tilting. We know because we can see it in historical water level measurements.
The water level in Superior is constantly changing. Lake levels naturally fluctuate due to snowmelt, rain, drought, or changes in groundwater level. …
Most of us who live in NYC are forever at the mercy of the Metro Transit Authority (MTA) and are forced to pay $2.75 per swipe to get to our destination. This adds up when you consider traveling to and from work and the occasional weekend galavant around the city. You can alleviate the financial burden slightly by buying a $127 monthly MTA pass. However, I have figured out a way to ride the train for a mere 13 cents.
Storage space on my hard drive is precious and I don’t want to download a bunch of data when I’m just going to process it down into something manageable.
Sometimes we can paste a URL into
xr.read_dataset() and it will gladly read the data. For example, xarray includes support for OPeNDAP to access some, but not all, datasets over HTTP.
This post will describe one solution you can use when things aren’t copacetic and directly pasting the URL into xarray fails. …
Climate scientist @ Columbia University. Studying the global carbon cycle. Scientist | Programmer | Dog dad