<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Notes by Dhiren]]></title><description><![CDATA[Thoughts of a curious developer
]]></description><link>https://notesbydhiren.com</link><image><url>https://substackcdn.com/image/fetch/$s_!jV-v!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb103986d-64ff-4fe2-b783-62fa57c274f2_800x800.png</url><title>Notes by Dhiren</title><link>https://notesbydhiren.com</link></image><generator>Substack</generator><lastBuildDate>Thu, 16 Apr 2026 15:15:55 GMT</lastBuildDate><atom:link href="https://notesbydhiren.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Dhiren Navani]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[notesbydhiren@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[notesbydhiren@substack.com]]></itunes:email><itunes:name><![CDATA[Dhiren Navani]]></itunes:name></itunes:owner><itunes:author><![CDATA[Dhiren Navani]]></itunes:author><googleplay:owner><![CDATA[notesbydhiren@substack.com]]></googleplay:owner><googleplay:email><![CDATA[notesbydhiren@substack.com]]></googleplay:email><googleplay:author><![CDATA[Dhiren Navani]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Andrew Ng ML Specialization (Courses 1 & 2 ) quick review]]></title><description><![CDATA[AI Generated podcast for reviewing concepts in courses 1 & 2 of Andrew Ng ML specialization.]]></description><link>https://notesbydhiren.com/p/andrew-ng-ml-specialization-courses</link><guid isPermaLink="false">https://notesbydhiren.com/p/andrew-ng-ml-specialization-courses</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Mon, 02 Jun 2025 02:28:02 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/164977238/d48314ea22145bcb21c76c4f270147cb.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>Hello, I am trying to learn more about Machine Learning given everything around us.  Wanted to try learning in public and sharing any insights. </p><p>I happen to learn best using audio and thankfully AI is making it easy. I use NotebookLM to create audio overviews. This is a review of Course 1 &amp; 2 of Andrew Ng&#8217;s ML specialization. Enjoy!</p>]]></content:encoded></item><item><title><![CDATA[Standards are awesome]]></title><description><![CDATA[Internet standards and the amazing value they create for the world]]></description><link>https://notesbydhiren.com/p/standards-are-awesome</link><guid isPermaLink="false">https://notesbydhiren.com/p/standards-are-awesome</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Fri, 05 Apr 2024 01:00:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!nkO6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b83c29-12d5-45dc-9be3-2f8dc5eeab43_800x512" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nkO6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b83c29-12d5-45dc-9be3-2f8dc5eeab43_800x512" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nkO6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b83c29-12d5-45dc-9be3-2f8dc5eeab43_800x512 424w, https://substackcdn.com/image/fetch/$s_!nkO6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b83c29-12d5-45dc-9be3-2f8dc5eeab43_800x512 848w, https://substackcdn.com/image/fetch/$s_!nkO6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b83c29-12d5-45dc-9be3-2f8dc5eeab43_800x512 1272w, https://substackcdn.com/image/fetch/$s_!nkO6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b83c29-12d5-45dc-9be3-2f8dc5eeab43_800x512 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nkO6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b83c29-12d5-45dc-9be3-2f8dc5eeab43_800x512" width="512" height="512" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00b83c29-12d5-45dc-9be3-2f8dc5eeab43_800x512&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:512,&quot;width&quot;:512,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nkO6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b83c29-12d5-45dc-9be3-2f8dc5eeab43_800x512 424w, https://substackcdn.com/image/fetch/$s_!nkO6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b83c29-12d5-45dc-9be3-2f8dc5eeab43_800x512 848w, https://substackcdn.com/image/fetch/$s_!nkO6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b83c29-12d5-45dc-9be3-2f8dc5eeab43_800x512 1272w, https://substackcdn.com/image/fetch/$s_!nkO6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00b83c29-12d5-45dc-9be3-2f8dc5eeab43_800x512 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The other day, I was listening to a podcast and wanted to cast it to my Google Home. But, guess what happened? I couldn't figure out how to cast the podcast from the Apple podcast app to Google Home.&nbsp;</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>What is the incentive for Apple to develop this capability? Google Home is not an Apple product and even worse these are rivaling ecosystems. Android phones also do not let you Airplay content to your Apple TV.</p><p>I realized that there are many podcast apps on the app store. I searched for one with this functionality and tried a couple. Finally found the one that worked (&#128226;<a href="https://pocketcasts.com/">Pocketcasts</a>)! It was such a pleasurable experience! &#128588;&#127996;&#128516;</p><p>I had another similar problem. I have Chromecast for my TV and I was unable to find a way to cast Apple TV to Chromecast because it only supported Apple&#8217;s Airplay. For this one though, I had no recourse. Except to play Apple TV on the Chrome Browser of my laptop and cast it that way.</p><p>This felt like a tale of two cities. On reflecting, I realized that the reason for such a different experience was STANDARDS. Yes, podcasts are usually distributed over the internet via RSS (Really Simple Syndication) feeds. RSS is a web syndication format, that specifies a standardized, computer-readable format for publishing and consuming content. That's why many applications can read and play podcasts. Videos, on the other hand, are not distributed over a standard. E.g. Youtube videos are only accessed via the Youtube app. The developers cannot build an app to play videos without using YouTube APIs (which might be hard to get access to).</p><h2>Why should Software Engineers care?</h2><p>As a software engineer, this realization can help in a few ways.</p><h3>Develop products that adhere to standards</h3><p>Let&#8217;s say you are designing a web service. You can design your interface using the OpenAPI specification. Since that is a widely expected standard, developers can develop tools for that specification and get broad adoption. E.g. automated client creation, and endpoint UIs for OpenAPI specifications. You can leverage these tools for your development.</p><h3>Use products behind a standard API</h3><p>Let&#8217;s say you want to make your systems observable. Instead of directly instrumenting monitoring system clients like Prometheus, Datadog, etc. You can choose to instrument your application with OpenTelemetry. You can then configure the backend to either of these monitoring systems. This enables you to easily switch backends in the future and gives you that optionality.</p><h3>Develop/Improve tools for common standards</h3><p>Since standards are shared across organizations. If you have a certain challenge you can solve it for everyone using that standard. E.g. If an email client, in your preferred language, is not performant and you have certain ideas. You can create a different client that follows the same SMTP or IMAP protocol. Your solution can then be used by everyone using that protocol.</p><h3>Regulation &amp; compliance</h3><p>Understanding standards like HIPAA &amp; GDPR is important to make your applications legally compliant.</p><p>Hope this helps you realize how important standards are.&nbsp;</p><p>Please subscribe to my newsletter for more insightful articles.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Comparing Terraform and Kubernetes CRD for Infrastructure as Code]]></title><description><![CDATA[Terraform & Kubernetes comparison]]></description><link>https://notesbydhiren.com/p/comparing-terraform-and-kubernetes</link><guid isPermaLink="false">https://notesbydhiren.com/p/comparing-terraform-and-kubernetes</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Sat, 24 Feb 2024 08:10:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!iGIb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1477d97-4caf-4e7b-ac96-8a2372763fc6_1366x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iGIb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1477d97-4caf-4e7b-ac96-8a2372763fc6_1366x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iGIb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1477d97-4caf-4e7b-ac96-8a2372763fc6_1366x768.png 424w, https://substackcdn.com/image/fetch/$s_!iGIb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1477d97-4caf-4e7b-ac96-8a2372763fc6_1366x768.png 848w, https://substackcdn.com/image/fetch/$s_!iGIb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1477d97-4caf-4e7b-ac96-8a2372763fc6_1366x768.png 1272w, https://substackcdn.com/image/fetch/$s_!iGIb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1477d97-4caf-4e7b-ac96-8a2372763fc6_1366x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iGIb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1477d97-4caf-4e7b-ac96-8a2372763fc6_1366x768.png" width="1366" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e1477d97-4caf-4e7b-ac96-8a2372763fc6_1366x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1366,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38967,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iGIb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1477d97-4caf-4e7b-ac96-8a2372763fc6_1366x768.png 424w, https://substackcdn.com/image/fetch/$s_!iGIb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1477d97-4caf-4e7b-ac96-8a2372763fc6_1366x768.png 848w, https://substackcdn.com/image/fetch/$s_!iGIb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1477d97-4caf-4e7b-ac96-8a2372763fc6_1366x768.png 1272w, https://substackcdn.com/image/fetch/$s_!iGIb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1477d97-4caf-4e7b-ac96-8a2372763fc6_1366x768.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Kubernetes has emerged as the de facto container orchestration platform. Its flexibility, scalability, and robust ecosystem make it an ideal choice for managing complex microservices architectures. Alongside Kubernetes, Infrastructure as Code (IaC) has revolutionized the way we manage infrastructure resources, providing a more reliable, reproducible, and version-controlled approach to deployments.</p><p>Terraform is a popular IaC framework with support for most cloud providers. Kubernetes also provides a mechanism to extend itself to manage custom infrastructure (resources) via Custom Resource Definitions (CRD). The wide adoption of Kubernetes as a computing platform leads to the question of using CRDs as an IaC tool. In this post, we compare both these tools and identify the use cases fit for them.</p><div><hr></div><h2>Developer Experience</h2><h3>Terraform</h3><p>HCL (HashiCorp Configuration Language) is the configuration syntax used by Terraform to define and manage infrastructure resources. You can extend the framework by writing modules to manage custom resources. You can choose the&nbsp;<a href="https://developer.hashicorp.com/terraform/language/settings/backends/configuration#available-backends">backend</a>&nbsp;to allow Terraform to store the state. I faced some problems when my dev environment became out of sync with the actual resources, but eventually found my way.</p><h3>Kubernetes CRD</h3><p>The resources would be expressed in YAML. For functionality like loops, you would need a templating engine like Jinja. If your services are already deployed on Kubernetes, this is a good option for managing cloud infrastructure the same way. This is also a good option for custom resources. E.g if you wish to manage custom Kafka installation, you can use&nbsp;<a href="https://github.com/strimzi/strimzi-kafka-operator">Strimzi</a>.<br>One more advantage of using CRDs, is that they provide a <a href="https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#create-a-customresourcedefinition">RESTFul interface</a> to manage your custom resources. This enables more ways to manage your infrastructure.</p><div><hr></div><h2>Adoption &amp; support</h2><h3>Terraform</h3><p>Terraform is quite mature, so most cloud providers already have terraform modules. You can check out the list&nbsp;<a href="https://registry.terraform.io/?product_intent=terraform">here</a>. The&nbsp;<a href="https://github.com/hashicorp/terraform">codebase</a>&nbsp;seems pretty active &amp; there is a lot of support on Stackoverflow.</p><h3>Kubernetes CRD</h3><p>Kubernetes is itself a very mature project, but the operator (CRD) implementation might not be very mature. There is an open-source project <a href="https://www.crossplane.io/">Crossplane</a>, that provides CRDs for all cloud providers that support Terraform.&nbsp;<a href="https://github.com/aws-controllers-k8s/community">AWS</a>&nbsp;&amp;&nbsp;<a href="https://github.com/Azure/azure-service-operator">Azure</a>&nbsp;have open source providers (CRD). Adoption &amp; support wise Terraform is more prevalent, but Kubernetes CRDs have a good enough ecosystem to make it useful. If your organization is already a Kubernetes shop, it is worth considering using CRDs.</p><div><hr></div><p>If your org is heavily invested in Kubernetes, CRDs may integrate seamlessly, while Terraform offers broader support for diverse infrastructures. Assessing the complexity of managed resources &amp; technical skillset can guide your decision.</p><p></p>]]></content:encoded></item><item><title><![CDATA[Pre-aggregated vs log-based metrics (Prometheus vs Splunk)]]></title><description><![CDATA[Comparing aggregated metrics solutions like Prometheus and log-based metrics solutions like Splunk]]></description><link>https://notesbydhiren.com/p/pre-aggregated-vs-log-based-metrics</link><guid isPermaLink="false">https://notesbydhiren.com/p/pre-aggregated-vs-log-based-metrics</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Sun, 28 Jan 2024 01:26:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!jV-v!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb103986d-64ff-4fe2-b783-62fa57c274f2_800x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Software observability is the ability to measure a system&#8217;s current state based on the data it generates, such as logs, metrics, and traces[1]. This post focuses on types of metrics systems.</p><p>Broadly, there are two ways in which metrics can be collected and monitored.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Pre-aggregated metrics</h2><p>In this approach, metrics are explicitly instrumented in the code. The metrics are collected and/or aggregated into a time-series database. Examples of these are Datadog[2] and Prometheus.&nbsp;</p><p>Since the database is designed for metrics, it is very efficient and cheap. The solutions also come with rich statistical functions, so it's easier to query.</p><p>The challenge is to know and explicitly instrument all the useful metrics for your monitors.</p><h2>Log-based Metrics</h2><p>In this approach, the system is designed to store logs but does have a query language to analyze and visualize them. Examples of these are Splunk, AWS Cloudwatch Log Insights, and metric filters.</p><p>Log-based monitors can help capture unknown metrics that were not explicitly instrumented. Also, since logs are fundamental for any observability, log-based monitors are easier to get started with.</p><p>However, there are some challenges like queries can be complicated and brittle. A seemingly harmless change in a log statement can break the monitoring. Besides, the performance of log-based metrics might not be great because the db is not designed to compute metrics and there is no pre-aggregation.</p><h2>How to use these systems?</h2><p>If you have a fairly small use case, log-based monitors can be a great place to start.</p><p>However, as you scale, it would make sense to incorporate both of these solutions to set up your metrics monitoring.</p><h4>Recommended strategy</h4><ul><li><p>Use a pre-aggregated system to monitor core metrics like Request counts, 5XX counts, percentile latency, and any other useful custom metrics.</p></li><li><p>Use log-based metric monitors as catchall for anything in the log that is unexpected. E.g. Number of warn logs, error logs, or any other generic suspicious patterns.</p></li></ul><p>The above strategy would help in achieving great monitoring coverage, while not sacrificing efficiency and cost.</p><p>Hope this helps in developing a framework for different metric systems.</p><h1>References</h1><p>[1] https://www.dynatrace.com/news/blog/what-is-observability-2/</p><p>[2] https://www.infoq.com/presentations/datadog-metrics-db/</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[3 Underrated Intellij features]]></title><description><![CDATA[Not so popular intellij features]]></description><link>https://notesbydhiren.com/p/3-underrated-intellij-features</link><guid isPermaLink="false">https://notesbydhiren.com/p/3-underrated-intellij-features</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Sun, 10 Dec 2023 20:06:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!jV-v!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb103986d-64ff-4fe2-b783-62fa57c274f2_800x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>After over 5 years of using IntelliJ/PyCharm, I've come across several features that, in my opinion, are underrated and under-utilized.</p><h2>Code inspection/analysis</h2><p><em>On the top bar : Code &#8594; Inspect Code</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This feature enables you to run built-in static code analysis on a selected scope or directory. It compiles a list of all the identified issues and  links to the relevant code sections. What's particularly useful is its ability to streamline simple refactors, such as removing unused imports. It proves invaluable for tidying up legacy code bases.</p><h2>Verson Control System (Git)</h2><p>Location: <em>On the top bar : VCS</em></p><p>When I started using IntelliJ, I initially relied on the terminal for Git operations. However, I was pleasantly surprised to discover that IntelliJ offers excellent support for version control. The commit section near the project displays all uncommitted files, allowing for a quick review before committing and sending a merge request. Additionally, IntelliJ provides commit message editors and templates.</p><p>One particularly noteworthy feature is the Git tab on the bottom panel, which offers a user-friendly interface for committing changes, undoing commits, reverting changes, and modifying commit messages.</p><p>My personal favorite is the merge conflict resolution tool, which enhances the user experience by providing a clear view of all changes and simplifying the merging process. I highly recommend giving it a try.</p><h2>Database consoles</h2><p>Location: <em>&#8220;Database&#8220; section on the side panel </em></p><p>For the longest time, I didn't realize that IntelliJ could interact with SQL databases. This functionality proves valuable for querying databases within the development context, minimizing the need for context switching. Additionally, it offers code completion for SQL databases and allows for the exploration of database tables and columns easily.</p><p></p><p>These were some of my favorite but underrated features. If there are others that I've overlooked, please feel free to let me know&#8212;I'm always eager to learn.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Reflections on a week of using GitHub Copilot]]></title><description><![CDATA[Observations about AI coding companion GitHub Co-Pilot]]></description><link>https://notesbydhiren.com/p/reflections-on-a-week-of-using-github</link><guid isPermaLink="false">https://notesbydhiren.com/p/reflections-on-a-week-of-using-github</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Thu, 18 May 2023 02:55:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7109b7bb-2dbf-4747-a730-d802763ba6a0_224x224.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I have been using GitHub co-pilot since the last week and I have been surprised in some ways and unimpressed in other ways. I do find the tool to be useful overall though.</p><h3>The good&nbsp;parts</h3><p>Copilot is extremely fast in coming up with multiple code complete suggestions. Around 30% of the times I accepted the suggestions without any modifications. It has a decent grasp of the context of your code, like the code above and below, opened files etc. It learns from the way you code as well. If I do not take Copilot&#8217;s suggestion and instead do something new, it realizes it was wrong and plays along well.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Also, it reduces the number of my keystrokes, which is incredibly useful for people with slower typing speed (including me). I have been using all the IDE features like version control UI, code generation, run configurations etc. and this is a boon for me.</p><h3>The challenges</h3><p>I am used to using classic IDE auto-complete suggestions. They are often correct because those are based on actual classes and compiled code. Also, the IDE code generation capabilities and refactoring rules ensure that the code before and after is logically equivalent. I found Copilot to be making up stuff a lot of times. Since I have the muscle memory of using the suggestions as is (thanks to awesome IDE code suggestions) I often found myself accepting sub-optimal or flat-out incorrect code from Copilot. Changing it back again and fixing it was a little frustrating and I wondered if I could have written it better myself. Also, I missed correcting or verifying the Copilot generated code a couple of times, which was scary. We know that Large Language Models (LLMs) often hallucinate, and Copilot is based on one such LLM (OpenAI Codex), it&#8217;s hard to be mindful of that while coding.</p><p>Another challenge is that it does not usually follow the best practices or the rather the practices that are consistent with the code base. E.g., Using for loops vs the lambda functions. This may or may not be a big deal, but since new developers would be relying on such tools, they need to be careful.</p><p>I also often found the Copilot to be glitchy, I am sure this would be resolved as the service matures.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Clean coders dilemma]]></title><description><![CDATA[Trade off between code quality and feature development]]></description><link>https://notesbydhiren.com/p/clean-coders-dilemma</link><guid isPermaLink="false">https://notesbydhiren.com/p/clean-coders-dilemma</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Sun, 30 Apr 2023 20:21:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!y9of!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe418f39f-2aab-4e48-a7cb-0b81b0bfcabf_1600x1068.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y9of!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe418f39f-2aab-4e48-a7cb-0b81b0bfcabf_1600x1068.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y9of!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe418f39f-2aab-4e48-a7cb-0b81b0bfcabf_1600x1068.jpeg 424w, https://substackcdn.com/image/fetch/$s_!y9of!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe418f39f-2aab-4e48-a7cb-0b81b0bfcabf_1600x1068.jpeg 848w, https://substackcdn.com/image/fetch/$s_!y9of!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe418f39f-2aab-4e48-a7cb-0b81b0bfcabf_1600x1068.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!y9of!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe418f39f-2aab-4e48-a7cb-0b81b0bfcabf_1600x1068.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y9of!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe418f39f-2aab-4e48-a7cb-0b81b0bfcabf_1600x1068.jpeg" width="1456" height="972" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e418f39f-2aab-4e48-a7cb-0b81b0bfcabf_1600x1068.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:972,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y9of!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe418f39f-2aab-4e48-a7cb-0b81b0bfcabf_1600x1068.jpeg 424w, https://substackcdn.com/image/fetch/$s_!y9of!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe418f39f-2aab-4e48-a7cb-0b81b0bfcabf_1600x1068.jpeg 848w, https://substackcdn.com/image/fetch/$s_!y9of!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe418f39f-2aab-4e48-a7cb-0b81b0bfcabf_1600x1068.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!y9of!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe418f39f-2aab-4e48-a7cb-0b81b0bfcabf_1600x1068.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Last weekend, while listening to <em><a href="https://claytonchristensen.com/books/the-innovators-dilemma/">The Innovator&#8217;s Dilemma</a></em>, the classic book on the nature of technological disruption, I was thinking about its parallels with writing clean code. The core idea of the book is that established companies focus too much on the needs of the current customers and may miss out on disruptive new technology. Startups on the other hand do not have that &#8216;baggage&#8217; and can disrupt the incumbent&#8217;s business with innovation. The dilemma then is for a company to balance the need of current customers with the need to innovate.&nbsp;</p><p>Surprisingly, I experienced a similar dilemma while struggling to prioritize cleaning up an existing code base and delivering direct end-user value with feature development. Focusing on current customer needs seems to be the right thing to do, but it comes with the cost of slower feature development over the long term because of poor code quality. Besides, it is even hard to convince the product management of the value, given the incentives are to deliver end-user value faster. Ultimately, a lot of projects are &#8216;disrupted&#8217; by new initiatives to rewrite the code base or by a SAAS offering. Both of the alternatives are expensive.&nbsp;</p><blockquote><p>Writing good code is easier in the earlier stages of the project, just as innovating is easier for a startup.&nbsp;</p></blockquote><p>Interstingly, innovating and writing quality code both get progressively difficult over time. Since the code after a while solves existing use-cases, it is hard to remove/refactor a bad interface and improve the code structure. Just like an established company has a &#8216;baggage&#8217; of existing users, code has the &#8216;baggage&#8217; of existing code, features, and use cases.&nbsp;</p><p><br>Prioritizing between refactoring and feature development is hard. One thing we can all do is <em>keep the campground clean</em>, i.e. leave the codebase in a better state than it was found.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Leaky abstractions in Apache Spark]]></title><description><![CDATA[Why understanding the Spark internals is crucial for its optimal usage.]]></description><link>https://notesbydhiren.com/p/leaky-abstractions-in-apache-spark</link><guid isPermaLink="false">https://notesbydhiren.com/p/leaky-abstractions-in-apache-spark</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Sat, 15 Apr 2023 20:01:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!a2jv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55baa8d-1970-4d40-ad38-f19033bb2755_1600x1067.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Apache Spark has become a de-facto data processing engine for Big data use cases. Although Spark provides great abstractions, there are times when understanding of internals are essential for using it optimally. This post lists ways Spark exposes <em>Leaky Abstractions</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a2jv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55baa8d-1970-4d40-ad38-f19033bb2755_1600x1067.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a2jv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55baa8d-1970-4d40-ad38-f19033bb2755_1600x1067.jpeg 424w, https://substackcdn.com/image/fetch/$s_!a2jv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55baa8d-1970-4d40-ad38-f19033bb2755_1600x1067.jpeg 848w, https://substackcdn.com/image/fetch/$s_!a2jv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55baa8d-1970-4d40-ad38-f19033bb2755_1600x1067.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!a2jv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55baa8d-1970-4d40-ad38-f19033bb2755_1600x1067.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a2jv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55baa8d-1970-4d40-ad38-f19033bb2755_1600x1067.jpeg" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b55baa8d-1970-4d40-ad38-f19033bb2755_1600x1067.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a2jv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55baa8d-1970-4d40-ad38-f19033bb2755_1600x1067.jpeg 424w, https://substackcdn.com/image/fetch/$s_!a2jv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55baa8d-1970-4d40-ad38-f19033bb2755_1600x1067.jpeg 848w, https://substackcdn.com/image/fetch/$s_!a2jv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55baa8d-1970-4d40-ad38-f19033bb2755_1600x1067.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!a2jv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55baa8d-1970-4d40-ad38-f19033bb2755_1600x1067.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/fr/@joshredd?utm_source=medium&amp;utm_medium=referral">Josh Redd</a> on&nbsp;<a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure></div><h3>Law of leaky abstractions</h3><p>The law of <a href="https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/">leaky abstractions</a> is a principle in software engineering that states that all non-trivial <a href="https://thevaluable.dev/abstraction-type-software-example/">abstractions</a> are leaky. It is impossible to create perfect abstractions. Even if your software is perfect it is not immune to natural disasters, electric grid failures, etc. However, there are reasonable expectations of abstractions that users expect out of the system. E.g. Useful error messages on failure, reasonable performance, and correctness.</p><h3>Programming Model</h3><p>Spark started as a platform that was a replacement for Hadoop MapReduce. It solved problems of disk-based data exchange, lack of expressivity, and fault tolerance without data redundancy<a href="http://people.csail.mit.edu/matei/papers/2012/nsdi_spark.pdf">&#185;</a>. The solution was an abstraction called <a href="https://www.databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html">Resilient Distributed Dataset </a>(RDD), an in-memory, distributed (partitioned across machines), immutable collection of records.</p><p>RDDs were designed to implement common <a href="https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations">functional operations</a> (e.g., map, reduce, filter, etc.) on collections. It&#8217;s easy to think about them as any other collection (List, Array).</p><p>E.g., To check the emptiness of a collection in Scala. <em>myList.size == 0 </em>and <em>myList.isEmpty </em>would perform similarly. But <em>rdd.count() ==0</em>, and <em>rdd.isEmpty</em> would <a href="https://stackoverflow.com/a/28454358">not</a>. Spark counts the size of <em>rdd, </em>there is no pre-computed value.</p><p>This is a <em>leaky abstraction</em> because <em>rdd.count() ==0 </em>and <em>rdd.isEmpty </em>is logically equivalent and the expectation is that performance would be comparable. Spark&#8217;s</p><p>As the project evolved, the Dataframe API was created, which paved the way for Spark SQL. Spark SQL abstracts away many details, but it has similar problems. Here is my <a href="https://www.softwarebytes.dev/p/apache-spark-optimization-count">previous article</a> with an example of equivalent queries with different performance.</p><h3>Configuration</h3><p>Configuring a Spark job can be complex due to <a href="https://spark.apache.org/docs/latest/configuration.html#application-properties">numerous configurations</a>. Typically, configuring memory on worker JVMs (Executors) and the number of tasks per executor is necessary.</p><p>Optimal values for the above configs depend on</p><ul><li><p>The Code being executed</p></li><li><p>Number of i/p files (aka partitions)</p></li><li><p>Size of files (aka size of data)</p></li><li><p>Type of file format (CSV, JSON, Parquet)</p></li><li><p>Data distribution</p></li></ul><p>In addition, there are other non-trivial considerations like compression, serialization, and observability.</p><p>Spark does not hide the complexity of understanding and optimizing these configurations. Developers must at least have a mental model of the Spark internals to utilize the cluster completely.</p><h3>Error Handling</h3><p>Spark errors are difficult to debug if you do not understand the internals. Here is an excellent <a href="https://medium.com/@yhoso/resolving-weird-spark-errors-f34324943e1c#37f7">post</a> on unintuitive errors that the Spark application can throw.</p><div><hr></div><p>Code performance, cluster configuration, and error handling are three aspects of <em>Leaky Abstractions </em>in Spark.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://notesbydhiren.com/subscribe?"><span>Subscribe now</span></a></p><h3>Comparison with other SQL-based data&nbsp;systems</h3><p>Now that we have reviewed the challenges with Spark, let&#8217;s contrast them with other SQL-based systems. SQL-based databases have been used for a long time. E.g., PostgreSQL, Oracle DB &amp; Snowflake. Note that Spark SQL is a processing engine and SQL databases are both processing and storage engines.</p><p>SQL databases offer better abstractions for optimal performance because they are less flexible. E.g., In SQL databases, users do not have a choice on the physical layout of data, and the database engine is tied to the catalog. In contrast, Spark can read data from file formats like CSV, JSON, Parquet, etc., and supports any hive-based metastore (catalog).</p><blockquote><p>Less flexibility -&gt; Less configurations -&gt; Better Abstractions</p></blockquote><p>Examples of these abstractions are <a href="https://www.linode.com/docs/guides/sql-indexes/">indexes</a> and <a href="https://www.sqlshack.com/database-table-partitioning-sql-server/">partitions</a> for structuring the data optimally based on the expected query patterns.</p><p>SQL databases do have some similar challenges to Spark SQL. Different ways of writing queries can lead to different query plans resulting in different peformance. <a href="https://www.khanacademy.org/computing/computer-programming/sql/relational-queries-in-sql/a/more-efficient-sql-with-query-planning-and-optimization">Example</a></p><p>There are no perfect abstractions.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Support my work by subscribing. Thanks. </p><h3>Resources</h3><ul><li><p>RDD Research Paper: <a href="https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf">https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf</a></p></li><li><p>Spark errors: <a href="https://medium.com/@yhoso/resolving-weird-spark-errors-f34324943e1c">https://medium.com/@yhoso/resolving-weird-spark-errors-f34324943e1c</a></p></li><li><p><a href="https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/">https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/</a></p></li><li><p>What Are Abstractions in Software Engineering with Examples? <a href="https://thevaluable.dev/abstraction-type-software-example/">https://thevaluable.dev/abstraction-type-software-example/</a></p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Do you want it right or do you want it right now?]]></title><description><![CDATA[Thoughts on tradeoff between speed and correctness in software]]></description><link>https://notesbydhiren.com/p/do-you-want-it-right-or-do-you-want</link><guid isPermaLink="false">https://notesbydhiren.com/p/do-you-want-it-right-or-do-you-want</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Sat, 08 Apr 2023 19:00:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!LRij!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d7cab19-027e-4efd-bedf-c2072f2941ed_1600x1052.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LRij!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d7cab19-027e-4efd-bedf-c2072f2941ed_1600x1052.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LRij!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d7cab19-027e-4efd-bedf-c2072f2941ed_1600x1052.jpeg 424w, https://substackcdn.com/image/fetch/$s_!LRij!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d7cab19-027e-4efd-bedf-c2072f2941ed_1600x1052.jpeg 848w, https://substackcdn.com/image/fetch/$s_!LRij!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d7cab19-027e-4efd-bedf-c2072f2941ed_1600x1052.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!LRij!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d7cab19-027e-4efd-bedf-c2072f2941ed_1600x1052.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LRij!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d7cab19-027e-4efd-bedf-c2072f2941ed_1600x1052.jpeg" width="1456" height="957" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d7cab19-027e-4efd-bedf-c2072f2941ed_1600x1052.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:957,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LRij!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d7cab19-027e-4efd-bedf-c2072f2941ed_1600x1052.jpeg 424w, https://substackcdn.com/image/fetch/$s_!LRij!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d7cab19-027e-4efd-bedf-c2072f2941ed_1600x1052.jpeg 848w, https://substackcdn.com/image/fetch/$s_!LRij!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d7cab19-027e-4efd-bedf-c2072f2941ed_1600x1052.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!LRij!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d7cab19-027e-4efd-bedf-c2072f2941ed_1600x1052.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I found this quote when listening to Patt Helland on a podcast. The quote captures the eternal tradeoff in distributed systems. Commonly understood through <a href="https://www.ibm.com/cloud/learn/cap-theorem">CAP theorem</a>. The core idea is that the data in the system can be consistent or have high availability, but not both. Its about knowing what is more valuable, the fact that you get the data that might be stale but available right now or the data that has all the updates but is available late.</p><p>One example of this tradeoff is reflected in AWS DynamoDB read consistency modes. DynamoDB is a scalable key-value store. They have different cost structure for eventually consistent reads and strongly consistency reads. Eventually consistent reads might not have the most recent updates, however they are fast. Strongly consistent reads will have the latest updates, but the latency &amp; the cost might be higher. Besides, there are some <a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadConsistency.html">other disadvantages</a> of strongly consistent reads.</p><p>This idea can be extended from databases to software quality and requirements clarification. It is always important to understand as a developer given the constraints, what is more important to business and stakeholders? Building the software with the &#8220;right&#8221;quality control mechanisms like unit testing, telemetry etc. or delivering it fast i.e. &#8220;right now&#8221;.</p><h3>References</h3><ul><li><p><a href="https://www.infoq.com/news/2019/12/data-storage-trends-helland/">https://www.infoq.com/news/2019/12/data-storage-trends-helland/</a></p></li><li><p><a href="https://www.se-radio.net/2020/02/episode-397-pat-helland-on-data-management-with-microservices/">https://www.se-radio.net/2020/02/episode-397-pat-helland-on-data-management-with-microservices/</a></p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Tips for Starting a New Job Successfully As A Software Engineer]]></title><description><![CDATA[Advice for successfully navigating a new team and culture in your software engineering career]]></description><link>https://notesbydhiren.com/p/tips-for-starting-a-new-job-successfully</link><guid isPermaLink="false">https://notesbydhiren.com/p/tips-for-starting-a-new-job-successfully</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Wed, 29 Mar 2023 02:40:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dIFW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a940d9-0864-4794-86a4-67eb78ec1db4_752x364.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As professionals, we sometimes take on new challenges by joining different teams or companies. Through my relatively short career, I have been fortunate to have worked with various teams and have learned fundamental lessons on managing change and being an effective engineer. In this article, I would like to share some of my learnings, acknowledging that what worked for me may not work for everyone.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dIFW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a940d9-0864-4794-86a4-67eb78ec1db4_752x364.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dIFW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a940d9-0864-4794-86a4-67eb78ec1db4_752x364.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dIFW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a940d9-0864-4794-86a4-67eb78ec1db4_752x364.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dIFW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a940d9-0864-4794-86a4-67eb78ec1db4_752x364.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dIFW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a940d9-0864-4794-86a4-67eb78ec1db4_752x364.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dIFW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a940d9-0864-4794-86a4-67eb78ec1db4_752x364.jpeg" width="752" height="364" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58a940d9-0864-4794-86a4-67eb78ec1db4_752x364.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:364,&quot;width&quot;:752,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The image is actually from a guided DIY project at a workshop and this sign is currently on my wall :)&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The image is actually from a guided DIY project at a workshop and this sign is currently on my wall :)" title="The image is actually from a guided DIY project at a workshop and this sign is currently on my wall :)" srcset="https://substackcdn.com/image/fetch/$s_!dIFW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a940d9-0864-4794-86a4-67eb78ec1db4_752x364.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dIFW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a940d9-0864-4794-86a4-67eb78ec1db4_752x364.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dIFW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a940d9-0864-4794-86a4-67eb78ec1db4_752x364.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dIFW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a940d9-0864-4794-86a4-67eb78ec1db4_752x364.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The image is actually from a guided DIY project at a workshop and this sign is currently on my wall :)</figcaption></figure></div><h3>Setting Expectations</h3><p>Navigating a new team can be challenging, especially if you're not sure what's expected of you. There are short-term expectations like becoming familiar with the team's work and completing administrative tasks like setting up your machine and sharing your compensation preferences. Usually, there's a defined process for this.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Medium-term expectations may not be as well-defined. It's important to understand what deliverables are expected of you within the next few months and how they can help you interact with the code base. Some teams don't have established processes for new engineers, so it's crucial to ask for a plan and identify tasks that can set you up for success.</p><h3>Leverage the Past</h3><p>Learning from past mistakes and successes can help you propose improvements or start conversations. As a new team member, you have a unique perspective that can contribute to the team's success. For example, you may have experienced a code formatting template in a previous team that could be useful in your new team.</p><p>When proposing changes, be mindful not to come across as a know-it-all or overly negative. Since you don't have institutional knowledge, it's best to ask for the team's opinion on a certain change, which can lead to more collaborative discussions.</p><h3>Cultural Differences</h3><p>Different teams or organizations operate in various ways, and it's essential to understand and adapt to cultural differences. For instance, some teams prefer unstructured discussions, while others prefer more formalized meetings. It's best to align yourself with the team's culture before starting and make small changes to feel included and connect better with the team.</p><h3>Better Questions</h3><p>As a new team member, there are no bad questions, but some questions are better than others. Open-ended questions can prompt relevant discussions and help you understand things you might not have otherwise thought of. However, be mindful of your colleagues' time and research beforehand to avoid taking up too much of their time.</p><h3>New Habits</h3><p>Starting a new job is an excellent opportunity to develop new habits that can benefit you in the long run. You can establish better work-life balance or watch more tech talks, for example. The <a href="https://faculty.wharton.upenn.edu/wp-content/uploads/2014/06/Dai_Fresh_Start_2014_Mgmt_Sci.pdf">fresh start effect</a> can help in developing better habits, and it's essential to take advantage of this opportunity.</p><p>I hope this article has provided you with some helpful tips to succeed in your new team. Remember that everyone's experience is different, and it's best to adapt to your new team's culture while staying true to yourself.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Using Python libraries with Scala in Spark]]></title><description><![CDATA[Apache Spark has grown to be a popular framework for big data processing.]]></description><link>https://notesbydhiren.com/p/using-python-libraries-with-scala</link><guid isPermaLink="false">https://notesbydhiren.com/p/using-python-libraries-with-scala</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Sat, 11 Mar 2023 11:23:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7dbd109-6ebe-4a09-971c-59ad504a3890_744x324.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Apache Spark has grown to be a popular framework for big data processing. It is a powerful framework as it allows expressing logic in different programming languages like Java, Scala, Python &amp; R.</p><p>However, there are use-cases which require using libraries from a different language than the one the application was written in. One such scenario is that a Spark application is written in Scala, but now there is a need for Python libraries. This is common as Python has a better ecosystem of libraries around statistics &amp; machine learning. E.g., SciPy, NumPy etc.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3>How can we leverage Python libraries in a Scala Spark application?</h3><p>Spark does have a Python API (PySpark), but its core is written in Scala. Let us think about how Spark achieves that.</p><p>Below is the high-level data flow between Python runtime and JVM (Scala runtime)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!f3rD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee989ab-b729-4517-a515-1ccd436f449d_936x702.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!f3rD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee989ab-b729-4517-a515-1ccd436f449d_936x702.png 424w, https://substackcdn.com/image/fetch/$s_!f3rD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee989ab-b729-4517-a515-1ccd436f449d_936x702.png 848w, https://substackcdn.com/image/fetch/$s_!f3rD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee989ab-b729-4517-a515-1ccd436f449d_936x702.png 1272w, https://substackcdn.com/image/fetch/$s_!f3rD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee989ab-b729-4517-a515-1ccd436f449d_936x702.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!f3rD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee989ab-b729-4517-a515-1ccd436f449d_936x702.png" width="936" height="702" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ee989ab-b729-4517-a515-1ccd436f449d_936x702.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:702,&quot;width&quot;:936,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!f3rD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee989ab-b729-4517-a515-1ccd436f449d_936x702.png 424w, https://substackcdn.com/image/fetch/$s_!f3rD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee989ab-b729-4517-a515-1ccd436f449d_936x702.png 848w, https://substackcdn.com/image/fetch/$s_!f3rD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee989ab-b729-4517-a515-1ccd436f449d_936x702.png 1272w, https://substackcdn.com/image/fetch/$s_!f3rD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ee989ab-b729-4517-a515-1ccd436f449d_936x702.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Reference: <a href="https://cwiki.apache.org/confluence/display/spark/pyspark+internals">https://cwiki.apache.org/confluence/display/spark/pyspark+internals</a></figcaption></figure></div><p>Spark leverages a library called <a href="https://www.py4j.org/">Py4J</a>, that allows invoking of JVM code from Python runtime. This communication happens over a socket in the driver.</p><p>We can re-use the same Py4J bridge to <a href="https://www.py4j.org/advanced_topics.html#implementing-java-interfaces-from-python-callback">callback</a> Python code from within the Scala code. Below is the data flow.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!65e0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7dbd109-6ebe-4a09-971c-59ad504a3890_744x324.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!65e0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7dbd109-6ebe-4a09-971c-59ad504a3890_744x324.png 424w, https://substackcdn.com/image/fetch/$s_!65e0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7dbd109-6ebe-4a09-971c-59ad504a3890_744x324.png 848w, https://substackcdn.com/image/fetch/$s_!65e0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7dbd109-6ebe-4a09-971c-59ad504a3890_744x324.png 1272w, https://substackcdn.com/image/fetch/$s_!65e0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7dbd109-6ebe-4a09-971c-59ad504a3890_744x324.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!65e0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7dbd109-6ebe-4a09-971c-59ad504a3890_744x324.png" width="744" height="324" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7dbd109-6ebe-4a09-971c-59ad504a3890_744x324.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:324,&quot;width&quot;:744,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!65e0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7dbd109-6ebe-4a09-971c-59ad504a3890_744x324.png 424w, https://substackcdn.com/image/fetch/$s_!65e0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7dbd109-6ebe-4a09-971c-59ad504a3890_744x324.png 848w, https://substackcdn.com/image/fetch/$s_!65e0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7dbd109-6ebe-4a09-971c-59ad504a3890_744x324.png 1272w, https://substackcdn.com/image/fetch/$s_!65e0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7dbd109-6ebe-4a09-971c-59ad504a3890_744x324.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When spark-submit happens, it enters the Python driver code, which just invokes the Scala Driver code of your existing application. In your Scala application you would define an interface that can be defined inside the Python runtime. In this interface implementation in Python you can leverage whatever libraries you wish to use.</p><h3>Pseudocode: Python&nbsp;Driver</h3><pre><code>from python_implementation import PythonImplementation&#8239;
from pyspark.java_gateway import ensure_callback_server_started  

sc = //Spark context 

// This enables Java to call Python code 
ensure_callback_server_started(sc._gateway) 

python_implementation = PythonImplementation(sc._gateway, spark._wrapped) 

scalaDriver = new jvm.your.scala.Driver(python_implementation) 

scalaDriver.execute(spark._jsparkSession)</code></pre><h3>Pseudocode: Scala Interface Definition that Python Implements</h3><pre><code>trait PythonImplementation {
&#8239;def process(df: DataFrame): DataFrame 
}</code></pre><h3>Pseudocode: Scala&nbsp;Driver</h3><pre><code>def execute(sparkSession: SparkSession) = {
&#8239;&#8239;scDf = //some dataframe&#9;
&#8239;&#8239;pythonEvaluation.evaluate(scDf) 
}</code></pre><h3>Pseudocode: Python Implementation</h3><pre><code>class PythonImplementation(object):
&#8239;&#8239;&#8239;def __init__(self, gateway, sql_context): 
&#8239;&#8239;&#8239;&#8239;&#8239;&#8239;&#8239;self.gateway = gateway 
&#8239;&#8239;&#8239;&#8239;&#8239;&#8239;&#8239;self.sql_context = sql_context 

&#8239;&#8239;&#8239;def process(self, df): 
&#8239;&#8239;&#8239;&#8239;&#8239;&#8239;&#8239;// Your Python logic  

&#8239;&#8239;&#8239;class Java: 
&#8239;&#8239;&#8239;&#8239;&#8239;&#8239;&#8239;implements = ['your.PythonImplementation']</code></pre><h3>Caveats</h3><ul><li><p>The solution relies on Spark private variables like _jvm, _gateway etc., which may break the solution in future versions of Spark</p></li><li><p>This does not and is not intended to solve performance problems with Python UDFs</p></li></ul><p>Follow me on <a href="https://www.linkedin.com/in/dhirennavani/">LinkedIn</a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Apache Spark Optimization: Don’t count your chickens]]></title><description><![CDATA[Don't count your chickens before they hatch might not just be wise in life, but can also be a great recommendation for writing performant Spark SQL jobs. Let&#8217;s take an example where we need to count number of managers, engineers and sales professionals from an employee dataset.]]></description><link>https://notesbydhiren.com/p/apache-spark-optimization-count</link><guid isPermaLink="false">https://notesbydhiren.com/p/apache-spark-optimization-count</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Sat, 11 Mar 2023 10:42:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c34824-2324-45e0-adce-8c817342c855_512x384.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Don't count your chickens before they hatch might not just be wise in life, but can also be a great recommendation for writing performant Spark SQL jobs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o9Qf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c34824-2324-45e0-adce-8c817342c855_512x384.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o9Qf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c34824-2324-45e0-adce-8c817342c855_512x384.jpeg 424w, https://substackcdn.com/image/fetch/$s_!o9Qf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c34824-2324-45e0-adce-8c817342c855_512x384.jpeg 848w, https://substackcdn.com/image/fetch/$s_!o9Qf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c34824-2324-45e0-adce-8c817342c855_512x384.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!o9Qf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c34824-2324-45e0-adce-8c817342c855_512x384.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o9Qf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c34824-2324-45e0-adce-8c817342c855_512x384.jpeg" width="512" height="384" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67c34824-2324-45e0-adce-8c817342c855_512x384.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:384,&quot;width&quot;:512,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;You might be wondering, what do I even mean?&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="You might be wondering, what do I even mean?" title="You might be wondering, what do I even mean?" srcset="https://substackcdn.com/image/fetch/$s_!o9Qf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c34824-2324-45e0-adce-8c817342c855_512x384.jpeg 424w, https://substackcdn.com/image/fetch/$s_!o9Qf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c34824-2324-45e0-adce-8c817342c855_512x384.jpeg 848w, https://substackcdn.com/image/fetch/$s_!o9Qf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c34824-2324-45e0-adce-8c817342c855_512x384.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!o9Qf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c34824-2324-45e0-adce-8c817342c855_512x384.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s take an example where we need to count number of managers, engineers and sales professionals from an employee dataset.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>We could do something like below.</p><pre><code>employees_df = spark.sql("select * from employees")
num_managers = employees_df.filter("employee_type is manager").count()
num_engg = employees_df.filter("employee_type is engineer").count()
num_sales = employees_df.filter("employee_type is sales_professional").count()</code></pre><p>However, these would lead to 3 separate SQL queries being executed and your data would essentially be scanned thrice i.e. to count for each employee type.</p><p>There has to be a better way!</p><p>So.. Spark can be lazy. Spark has a good reason to be, unlike me&#128512;</p><p>The lowest level of abstraction that Apache Spark provides are RDDs (Resilient Distributed Dataset). SparkSQL code is ultimately converted to RDDs.</p><p>RDDs support two types of operations</p><ul><li><p>Transformations: They are operation that transform an RDD. In the example that is the RDD code for the SQL query.</p></li><li><p>Actions: Operations that trigger the transformations to compute a statistic like count/sum or show the data. In the example that is the count() call.</p></li></ul><p>Spark does not process any transformations till it sees an action because its <em>lazy</em>. This makes sense because by the time the action is called, Spark can know all the things you wish to compute and optimize the execution of all the transformations. The later you call an action the better (i.e. the later you count your <em>chickens</em> the better)</p><p>The above example can now be written as below to reduce the number of actions to 1.</p><pre><code>spark.sql("select count_if(employee_type = manager) as mgr_count, 
count_if(employee_type = engineer) as engg_count,
count_if(employee_type = sales_professional) as sales_count
from employees").first()</code></pre><p>With the above code the only action that got executed was the call to first(). This query gives enough information to Spark to just do one pass over your data and calculate all of the counts you need.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Software Bytes! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[I would have written a shorter letter, but I did not have the time.]]></title><description><![CDATA[The above quote is attributed to Blaise Pascal.]]></description><link>https://notesbydhiren.com/p/i-would-have-written-a-shorter-letter</link><guid isPermaLink="false">https://notesbydhiren.com/p/i-would-have-written-a-shorter-letter</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Sun, 23 May 2021 02:02:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!jV-v!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb103986d-64ff-4fe2-b783-62fa57c274f2_800x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Software is a fast paced industry, as such the need to change the software is also frequent. More often than not the change is about a new feature, functionality or trying to run an experiment. Developers think of these things as additions and not really changes to the software, because it is convenient to think of it that way. This is consistent with the findings of the current research[1]. In comparison to adding something, figuring out what to remove has more cognitive load and requires the most precious commodity, time! This is the intention of the title and my adaptation below. </p><blockquote><p>We would have to manage less code, but we did not have the time.</p></blockquote><p>In addition, when there are no safeguards like unit/integ tests, the need to just add stuff worsens the situation. This culture results in <em>Tech Debt</em>, which incurs the compound interest of poor readability and in some cases performance. </p><p>It is misleading to say that since customer is getting immediate benefits, delivering the code by any means is justified. Developer productivity has direct impact on the rate of the delivery over a period. It is okay to make such tradeoffs when the organization/company is starting out and the initial speed is non-negotiable. However, as organizations mature, these factors can be the difference between long term success and failure. </p><p>So the next time someone refactors or removes the code. Thank them!</p><p>References</p><ol><li><p>https://www.bloomberg.com/opinion/articles/2021-04-24/research-shows-why-simplifying-is-hard-and-complicating-is-easy</p></li><li><p>https://quoteinvestigator.com/2012/04/28/shorter-letter/</p></li></ol>]]></content:encoded></item><item><title><![CDATA[It is wrong to suppose that if you can’t measure it, you can’t manage it – a costly myth.]]></title><description><![CDATA[The title is a quote by Edwards Deming. This quote applies to software as much as it applies to other industries.]]></description><link>https://notesbydhiren.com/p/it-is-wrong-to-suppose-that-if-you</link><guid isPermaLink="false">https://notesbydhiren.com/p/it-is-wrong-to-suppose-that-if-you</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Mon, 19 Apr 2021 02:16:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!jV-v!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb103986d-64ff-4fe2-b783-62fa57c274f2_800x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Software industry as like most other industries is very data-driven, we use telemetry &amp; Key Performances Indicators (KPIs) to understand our services and operations.</p><p>Software quality is one aspect that is difficult to measure. There are metrics like test coverage, cyclomatic complexity &amp; number of operational incidents etc. to measure software quality. Although these metrics are necessary but they are definitely not sufficient. </p><p>Readability of variable &amp; method names is one area of quality that is difficult to measure. Besides, modularity of software can technically be measured by cohesion and coupling metrics, but there is a risk of false positives. These aspects of quality have an element of subjectivity or dependence on specific business needs.</p><p>Having expressed the difficulty in measuring these aspects, I believe it is still possible to manage these by having right guidelines and code review processes. More importantly these can be managed by developing the culture of clean code that is imbibed in every member of the team. The solution has more to do with the judgment and common sense rather than identifying complex proxies for measuring readability and modularity.</p><p></p><p>References</p><ul><li><p>https://deming.org/myth-if-you-cant-measure-it-you-cant-manage-it/</p></li><li><p>https://martinfowler.com/bliki/CannotMeasureProductivity.html</p></li><li><p>https://hbr.org/2010/10/what-cant-be-measured</p></li></ul><p><em>If you found this post informative, kindly express your support by subscribing to this newsletter below.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://notesbydhiren.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Welcome to Software Bytes]]></title><description><![CDATA[Sharing bite-sized, digestible quotes on software and exploring them.]]></description><link>https://notesbydhiren.com/p/coming-soon</link><guid isPermaLink="false">https://notesbydhiren.com/p/coming-soon</guid><dc:creator><![CDATA[Dhiren Navani]]></dc:creator><pubDate>Sat, 06 Mar 2021 21:34:30 GMT</pubDate><enclosure url="https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/8b68d47a-d811-4f63-937e-07f1bbc91c8d_3456x4608.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hi &amp; welcome. I am going on this journey to enrich your inbox and my mind with bite-sized and digestible software quotes and explore it in a para or two. The purpose is to improve our collective understanding of software.</p><p>I have realized that quotes are the easiest way to retain information. &#8220;<em>Righty tighty, lefty loosey</em>&#8220; is a great example. However, expressions of these ideas in a concise manner is not as easy. For appreciation and for fairness, I would try to provide reference and credits for all the quotes that I mention here.</p><p>Hope you would encourage me by subscribing below and spreading the word.<br></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://notesbydhiren.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://notesbydhiren.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>