Jekyll2024-01-01T21:56:27+05:30https://sumit-ghosh.com/feed.xmlSumit’s SpaceBrain dump of things I find interesting.Sumit Ghoshsumit@sumit-ghosh.comTakeaways from My Indiehacker Experiment2023-10-02T00:00:00+05:302023-10-02T00:00:00+05:30https://sumit-ghosh.com/posts/takeaways-from-my-indiehacker-experiment<p>My <em>birdfind-app</em> repo got its first commit on Nov 24, 2022, and last commit on Feb 2, 2023. In this post, I’ll put down some learnings from this experiment.</p>
<p>After Solvent Protocol got closed down, I wasn’t in a hurry to find my next job. I always like to take a few months of break between jobs to collect myself and explore. To give myself the freedom to think open-endedly about what I want to do next. This time I decided on being an indiehacker. If you need to learn what that means, here’s a definition ChatGPT gave me.</p>
<blockquote>
<p>An indiehacker is a self-employed entrepreneur who builds and operates an independent online business, often focusing on digital products. They embrace a transparent and knowledge-sharing community while striving for financial independence and the ability to work on projects they are passionate about.</p>
</blockquote>
<p>The product I was building was called Twips at first, but then I changed the name to Birdfind. The initial idea was that it would be a one-stop platform for finding people on Twitter—a search engine for Twitter users. You could filter the list of users using a multitude of criteria, e.g. <em>followed by</em>, <em>follower of</em>, <em>follower count</em>, etc. Later on I pivoted to a similar but slightly different product where you could discover Twitter users within a niche. It was an intense 2 months of obsessive building, and I loved it. But like all things it came to an end, the primary catalyst being Twitter’s new API policy.</p>
<p>Here are my takeaways from this 2 months journey, as bullet points.</p>
<ul>
<li>As a programmer, it was effortless for me to keep building features and obsess about getting the engineering right. I did not think enough about the <em>business</em> part of the journey, about the users and their problems. I should have kept the feature set of the MVP very small, and focused on doing sales and acquiring customers after that. Because true insights only come from interacting with the customers, and I did not do enough of that.</li>
<li>When you’re building solo, there’s no one to bounce ideas off of or who will call you out on your bullshit. This <em>can</em> be a strength: you can move fast without any teamwork or communication overhead. But in most cases it ends up being a weakness, especially if it’s your first time building a business. VCs know this; that’s why they are wary of investing in solo founders, and they do have a point.</li>
<li>Being an indiehacker makes little financial sense. You will make a lot less money compared to full-time jobs for a considerable amount of time, probably years. Even if you’re itching to ditch the 9-5 and dive into building a business, financially it makes a lot more sense to go for the VC funding route.</li>
<li>As a solopreneur, you’re forced to limit your product’s scope and pace of development. For example, for Birdfind, I shied away from being a competitor to the big social media management apps because I knew I couldn’t build and maintain all those features by myself. I have noticed this trend across other indiehackers on Twitter too, all of their products are very small in scope, and in most cases that translates to losing out potential clients or dropping good ideas just because they are larger in scope.</li>
<li>There are some high-quality educational resources about startups on the internet, it’s a good idea to go through them before jumping in. It doesn’t have to be thorough; just a week or so of learning is enough to get started. But it’s very important because there are some classic mistakes and pitfalls that everyone is prone to make, and these resources help you avoid just those. These resources include Y Combinator’s videos on YouTube and the book <em>The Mom Test</em>.</li>
</ul>
<p>Given everything I’ve said so far, building something of your own is a very rewarding experience, especially once you have some momentum. It’s a lot of fun, and I think everyone should experience it at least once in their life. Working on Birdfind gave me a taste of the joy of building something, but it also made me more realistic about how to approach creating a startup and what to expect from it.</p>Sumit Ghoshsumit@sumit-ghosh.comMy birdfind-app repo got its first commit on Nov 24, 2022, and last commit on Feb 2, 2023. In this post, I’ll put down some learnings from this experiment.Only God Forgives2022-06-04T00:00:00+05:302022-06-04T00:00:00+05:30https://sumit-ghosh.com/posts/only-god-forgives<p>Recently I rewatched Nicolas Winding Refn’s <em>Only God Forgives</em>. It’s a very divisive movie, the critics’ review at Rotten Tomatoes is around 50% lmao. But I found it to be quite intriguing.</p>
<p>First a brief recap for those who haven’t seen the movie or forgotten: Julian—played by Gosling—runs a boxing club in Bangkok as a front for his drug business. He and his mother Crystal get caught up in a turf war with the local police—especially an oddball sword-weilding detective—when Julian’s older brother rapes and kills a local prostitute. The cycle of violence quickly escalates and finally results in Crystal’s death and Julian’s hands getting chopped off.</p>
<p>At it’s core, it’s a deeply moralistic tale which hammers in: “An eye for an eye, a tooth for a tooth.” The name of the movie is a direct reference to this. Two characters even verbalize this at different points in the movie. The mobster who planned the hit on the cops says that he’s ready for the consequences for what he did, and Crystal towards the end also accepts that she fucked up and she’s due for the consequences.</p>
<p>Only God forgives, in our mortal plane there’s no forgiveness for violence. Interestingly, the detective is almost an agent for a violently retributive God, he doesn’t have any real personality other than that.</p>
<blockquote>
<p>“Do you know who he is?”<br />
Julian nods.</p>
</blockquote>
<p>As if he knows what’s coming for him. Judgement, from God.</p>
<p>Is the detective an agent of God, or even God himself? Well he doesn’t forgive, and apparently God forgives. This aspect of the detective character seemed to be badly fleshed out to me, there are mixed signals about his true nature. In the movie’s defense, we do get a brief scene where he’s shown to be human: when he meets his daughter at home, and it’s implied that he’s a single parent who’s affectionate towards his child.</p>
<p>Julian on the other hand, is a mortal man who’s confused as to what he should become. He looks at his hands at several points in the movie, once he washes them and sees blood, and he keeps dreaming of the detective coming one day and chopping his hands off. We later learn that he’s a drug dealer, so he knows that he’s a sinner, and he has to pay for it one day. But he has a conscience unlike his brother, so by the detective’s sense of morality he deserves only amputation, not death, which makes sense.</p>
<p>The movie also has lots of references to sexual trauma and incest. It’s overtly implied that Crystal has incestuous relationship with her sons, and she made Julian kill his father. All of this has made Julian sexually broken, as it’s apparent from his fetishes. When his mother dies, he makes a cut in her belly and inserts his hand in it, which is almost sexual in a grotesque way. This sexual theme of the movie seemed heavy handed to me.</p>
<p>Although the movie has its faults regarding plot and theme, it’s held up by its pacing, atmosphere and stylistic elements; as a result the final package is gripping and entertaining. The most interesting character is that of the detective, and his presence elevates the movie to a spiritual level.</p>
<p>Overall, I’ll rate this a 7 out of 10.</p>Sumit Ghoshsumit@sumit-ghosh.comRecently I rewatched Nicolas Winding Refn’s Only God Forgives. It’s a very divisive movie, the critics’ review at Rotten Tomatoes is around 50% lmao. But I found it to be quite intriguing.Output Oriented Learning2022-06-03T00:00:00+05:302022-06-03T00:00:00+05:30https://sumit-ghosh.com/posts/output-oriented-learning<p>I think learning new things is one of the most satisfying and dopamine-inducing experience for me, many of my brightest memories of my life are around learning. But the learning has to be of a specific kind, obviously it’s not like I enjoyed all of my college courses. So what gives?</p>
<p>I wrote <a href="https://okrefusal.com/posts/owner-vs-victim-an-epiphany/">a post</a> about this about 4 years back which goes into the ownership aspect of this phenomenon. But I recently realized there’s also another aspect, the output of the said learning.</p>
<p>If the goal of the learning is mere input, then it’s very less likely that I’ll stick with it. But when the goal of the learning is some output, then I’m much more likely to stick with it, and moreover get passionate and excited about it.</p>
<p>As a concrete example, I’ve always wanted to learn a new programming language and get good at it. I even started with Golang, but didn’t go far and just got distracted in a few days. But recently when I had to learn Rust and Solana development for an interview assignment, I did an 18 hour coding marathon learning it all from scratch. And this pattern has emerged in the past too, whenever I was very energetic about learning something, the learning was means to some other end.</p>
<p>I suspect this is true not just for me, but for many others, possibly everyone. So if I’m to make it a general rule it’ll be like the following:</p>
<p>Learning can not be an end-in-itself. It has to be means to an output. Whenever we pick up a project, the end goal of the project can not be the fact that I’ll end up with some input—the knowledge, the goal should be some output—creating, teaching, writing, etc.</p>
<p>Of course the nature of the output matters too, but that will vary from person to person. Different kinds of output appeals to different people at different stages of their life. The output can be anything: earning money, teaching others, creating products, solving some societal issue, contributing to research, etc. Even if you’re motivating and influencing others by putting yourself out there, you’re making an impact on the world, you’re producing some output. The worst if you’re just learning by yourself, isolated.</p>Sumit Ghoshsumit@sumit-ghosh.comI think learning new things is one of the most satisfying and dopamine-inducing experience for me, many of my brightest memories of my life are around learning. But the learning has to be of a specific kind, obviously it’s not like I enjoyed all of my college courses. So what gives?Anthem by Ayn Rand2021-08-24T00:00:00+05:302021-08-24T00:00:00+05:30https://sumit-ghosh.com/posts/anthem-by-ayn-rand<p>Ayn rand is intriguing to me just because how polarising she is. Finding out that George Hotz is a big fan of Rand was a bit disappointing, but it made me reconsider reading her works just to see what the fuss is about. And I just did that today, read a very short novel of hers named <em>Anthem</em>.</p>
<p>I think Hotz <a href="https://www.youtube.com/watch?v=_L3gNaAVjQ4&t=10294s">described</a> her works best when he called them pornographic, and understadably there could be an allure in that. Most of the book paints a cartoonish version of dystopic communism which has a lot of similarity to Orwell’s 1984, except 1984 has an extensive worldbuilding which fleshes out the power of censorship in a totalitarian regime, while Anthem lacks any such redeeming qualities. I was cringing out through most of the book except the last two chapters. Those chapters are very clearly just Rand speaking out her philosophy of individualism, and one can understand the novel is mostly a thin badly-written cover around these last few chapters. It’s just that I would prefer if I didn’t have to unwrap all that cover just to get to this.</p>
<p>Rand’s philosophy has a strong tone of existentialism, especially the Nietzchian kind. Some of the following paragraphs from chapter 11 of the book gives a taste</p>
<blockquote>
<p>I stand here on the summit of the mountain. I lift my head and I spread my arms. This, my body and spirit, this is the end of the quest. I wished to know the meaning of things. I am the meaning. I wished to find a warrant for being. I need no warrant for being, and no word of sanction upon my being. I am the warrant and the sanction.</p>
<p>It is my mind which thinks, and the judgement of my mind is the only searchlight that can find the truth. It is my will which chooses, and the choice of my will is the only edict I must respect.</p>
<p>Many words have been granted me, and some are wise, and some are false, but only three are holy: “I will it!”</p>
</blockquote>
<p>Not gonna lie, I dig it. Although this isn’t anything too new, just the typical enlightenment philosophy of individualism, along with existentialism and <em>will to power</em>.</p>
<p>Some of the following stuff could be controversial, understandably.</p>
<blockquote>
<p>I do not surrender my treasures, nor do I share them. The fortune of my spirit is not to be blown into coins of brass and flung to the winds as alms for the poor of the spirit. I guard my treasures: my thought, my will, my freedom. And the greatest of these is freedom.</p>
</blockquote>
<p>But it’s interesting how Rand has not mentioned any material possession when she lists out the treasures she won’t like to share. But she did say that “the fortune of my spirit is not to be blown into coins of brass and flung to the winds as alms for the poor of the spirit”. An obvious reading of this would be a cry against things like taxation. But a more interesting reading would be a statement against market capitalism, and how one should not sacrifice their thought, will and freedom to capitalistic ends, how one should not blow their spirit away into coins of brass. But I need to read more Rand to confirm this, I’m most probably just projecting here.</p>
<p>Maybe I will read more Rand, or maybe not.</p>Sumit Ghoshsumit@sumit-ghosh.comAyn rand is intriguing to me just because how polarising she is. Finding out that George Hotz is a big fan of Rand was a bit disappointing, but it made me reconsider reading her works just to see what the fuss is about. And I just did that today, read a very short novel of hers named Anthem.Schopenhauer: a Positive Conception of Suffering2021-03-18T00:00:00+05:302021-03-18T00:00:00+05:30https://sumit-ghosh.com/posts/schopenhauer-positive-conception-of-suffering<p>Schopenhauer is infamous for his morbid pessimism, but upon reading him I did find him to be more than that. His core philosophy describes a realism, and although he certainly sometimes interprets it pessimistically, in other times he leaves it as be, without imposing any explicit value judgment on it. I recently decided that I’ll write very short posts about anything and everything, and increase the regularity and quantity of my writing rather than worrying about the novelty and quality of it. This post is about the Schopenhauerian <em>positive</em> conception of suffering. As it will be evident, I’m writing mostly to clarify my own understanding, but if it does end up helping or traumatizing others that’s all the better! The primary source of this post is his essay “On the Suffering of the World”.</p>
<p>At this point I should note that I’m using terms such as pessimism and realism quite loosely, I’m referring to the common dictionary definitions rather than the philosophical schools of thought. In case I do refer to the philosophical thoughts I’ll make sure to make it explicit.</p>
<p>With the opening sentence of the essay Schopenhauer sets the tone, “If the immediate and direct purpose of our life is not suffering then our existence is the most ill-adapted to its purpose in the world”. Upon witnessing the amount of suffering individuals and humanity as a whole goes through, it simply would be absurd if this were not the direct “purpose” of our life. Of course, “purpose” here does not signify some transcendental objective of our life. It simply states the fact that suffering is the most fundamental reality we face—“Each individual misfortune, to be sure, seems an exceptional occurrence; but misfortune in general is the rule”—thus it must be inherent in the nature of life itself.</p>
<p>Phenomenologically, whatever we take notice of thwarts our <em>will</em> in some way, when things go our way we simply don’t notice it. This can be noticed at various levels of individual and social reality. Considering pure sensations, we are not conscious of our whole body but only where it’s uncomfortable, “where the shoe pinches”. In case of interpersonal relationships as well as socio-political developments, conflicts are much more <em>real</em> to us and they leave a far greater impression than the easy times. “History shows us the life of nations and finds nothing to narrate but wars and tumults; the peaceful years appear only as occasional brief pauses and interludes”. Childhood traumas leave such an impact that they continue to haunt us for the rest of our lives, whereas an easy childhood is just a mere backdrop to the further life that lies ahead. Even if we disregard trauma as an outlier, in general, “we think not of the totality of our successful activities but of some insignificant trifle or other which continues to vex us”. If we keep thinking about this, we can come up with lots and lots of examples.</p>
<p>Schopenhauer was gripped by this realization, he concluded that evil, pain, and suffering are the <em>positive</em> aspects of existence, only they exist of themselves; happiness, gratification and all goodness can only defined <em>negatively</em>, as the absence of the former. This realization is not something very new; like with almost everything in philosophy, ancient Greeks did it first. In particular, Epicurean philosophy is founded on this purely <em>negative</em> notion of happiness as the absence of pain. But Schopenhauer went one step further and created a system of metaphysics founded upon this realization, but more on that later on a separate post.</p>
<p>Moreover, men <em>require</em> this constant pressure of wants, desires, frustrations and sufferings, otherwise they would simply give into “the most unbridled folly, indeed madness”. In an Utopia where suffering doesn’t exist and every desire is satisfied as soon as it arose, men would either kill themselves or deliberately create conflict and suffering, because suffering—when thought of as a thwarting of our will and the subsequent overcoming thereof—<em>is</em> the fundamental drive for life, the “purpose” of our lives.</p>Sumit Ghoshsumit@sumit-ghosh.comSchopenhauer is infamous for his morbid pessimism, but upon reading him I did find him to be more than that. His core philosophy describes a realism, and although he certainly sometimes interprets it pessimistically, in other times he leaves it as be, without imposing any explicit value judgment on it. I recently decided that I’ll write very short posts about anything and everything, and increase the regularity and quantity of my writing rather than worrying about the novelty and quality of it. This post is about the Schopenhauerian positive conception of suffering. As it will be evident, I’m writing mostly to clarify my own understanding, but if it does end up helping or traumatizing others that’s all the better! The primary source of this post is his essay “On the Suffering of the World”. At this point I should note that I’m using terms such as pessimism and realism quite loosely, I’m referring to the common dictionary definitions rather than the philosophical schools of thought. In case I do refer to the philosophical thoughts I’ll make sure to make it explicit. With the opening sentence of the essay Schopenhauer sets the tone, “If the immediate and direct purpose of our life is not suffering then our existence is the most ill-adapted to its purpose in the world”. Upon witnessing the amount of suffering individuals and humanity as a whole goes through, it simply would be absurd if this were not the direct “purpose” of our life. Of course, “purpose” here does not signify some transcendental objective of our life. It simply states the fact that suffering is the most fundamental reality we face—“Each individual misfortune, to be sure, seems an exceptional occurrence; but misfortune in general is the rule”—thus it must be inherent in the nature of life itself. Phenomenologically, whatever we take notice of thwarts our will in some way, when things go our way we simply don’t notice it. This can be noticed at various levels of individual and social reality. Considering pure sensations, we are not conscious of our whole body but only where it’s uncomfortable, “where the shoe pinches”. In case of interpersonal relationships as well as socio-political developments, conflicts are much more real to us and they leave a far greater impression than the easy times. “History shows us the life of nations and finds nothing to narrate but wars and tumults; the peaceful years appear only as occasional brief pauses and interludes”. Childhood traumas leave such an impact that they continue to haunt us for the rest of our lives, whereas an easy childhood is just a mere backdrop to the further life that lies ahead. Even if we disregard trauma as an outlier, in general, “we think not of the totality of our successful activities but of some insignificant trifle or other which continues to vex us”. If we keep thinking about this, we can come up with lots and lots of examples. Schopenhauer was gripped by this realization, he concluded that evil, pain, and suffering are the positive aspects of existence, only they exist of themselves; happiness, gratification and all goodness can only defined negatively, as the absence of the former. This realization is not something very new; like with almost everything in philosophy, ancient Greeks did it first. In particular, Epicurean philosophy is founded on this purely negative notion of happiness as the absence of pain. But Schopenhauer went one step further and created a system of metaphysics founded upon this realization, but more on that later on a separate post. Moreover, men require this constant pressure of wants, desires, frustrations and sufferings, otherwise they would simply give into “the most unbridled folly, indeed madness”. In an Utopia where suffering doesn’t exist and every desire is satisfied as soon as it arose, men would either kill themselves or deliberately create conflict and suffering, because suffering—when thought of as a thwarting of our will and the subsequent overcoming thereof—is the fundamental drive for life, the “purpose” of our lives.Machine Learning Model Management in 2020 and Beyond2020-11-23T00:00:00+05:302020-11-23T00:00:00+05:30https://sumit-ghosh.com/posts/machine-learning-model-management-in-2020-and-beyond<p><a href="https://analyticsindiamag.com/machine-learning-model-management-2020/">Model management</a> is a relatively new issue when it comes to Machine Learning. Since the technique is widely used in business, the need to manage multiple experiments and optimize dozens of parameters has become bread-and-butter of data scientists around the world. And thus, the tools supporting the Model management have emerged on the market. The rise of open-source and relatively intuitive ML frameworks such as Pytorch and Tensorflow has lowered the entry barrier for ML development.</p>
<p>In the coming years, machine learning will only extend its influence and keep penetrating new market segments. Organizations that don’t necessarily have the expertise to build models themselves will be forced to use models made by others.</p>
<p>Deploying and using ML models in such a large, enterprise scope involves a lot of moving parts. Doing it ad-hoc without any solid framework and pipeline in place can make ML development unwieldy and counter-effective. This is where machine learning model management comes into the picture.</p>
<h2 id="shortcomings-of-ad-hoc-machine-learning-model-development">Shortcomings of ad-hoc Machine Learning model development</h2>
<p>Doing ML model development without a management framework gets very complicated. To name just a few challenges:</p>
<ul>
<li><strong>No record of experiments</strong>: In an organization, multiple colleagues are likely to be working on the same problem, and they might be running their own set of experiments. In such a scenario, making sure that duplicate work is not being done is essential but hard to implement. For example, colleague A might want to try something out which has already been tried out by colleague B; but he has no way of knowing that, so he has to run the experiment himself. This should be avoided.</li>
<li><strong>Insights lost along the way</strong>: In a rapid iterative experimentation phase, an insight generated in an earlier experiment might be lost when the researcher moves on to the next iterations of that model. A solution would be to keep detailed notes manually, but no one does that just because it’s manual and requires a ton of effort.</li>
<li><strong>Difficult to reproduce results</strong>: Reproducing a particular experiment becomes problematic as that would require storing the model code along with the hyperparameters and the dataset manually, for each and every iteration of the model. This requires a ton of effort, so in practice, no one does it.</li>
<li><strong>Cannot search for or query models</strong>: Once a lot of experiments have been done, querying the past models would be a vital source of information and insights, but without solid version control and metadata tracking system, this becomes nearly impossible.</li>
<li><strong>Difficult to collaborate</strong>: Once a candidate model has been developed and is up for review, how would the reviewer effectively review the model? This problem becomes progressively more challenging as the team size grows. Issue trackers, JIRA boards, pull requests are useful tools designed to solve this problem in the realm of software development, but we need something analogous to that for ML model development too.</li>
</ul>
<h2 id="managing-the-lifecycle-of-a-machine-learning-model-with-mlops">Managing the lifecycle of a Machine Learning model with MLops</h2>
<p>Machine learning model management can be thought of as a part of a broader framework called MLOps. So before we go into the details of model management, let’s look into details of MLOps.</p>
<blockquote>
<p><em>MLOps can be thought of as a collection of principles, practices, and technologies that help to increase the efficiency of machine learning workflows.</em></p>
</blockquote>
<p>It was inspired by DevOps, as DevOps tries to make the software lifecycle from development to deployment as efficient as possible, the MLOps makes the machine learning development pipeline—from experimentation to deployment—faster and smoother.</p>
<p>The <strong>MLOps pipeline can be broadly divided into the following three parts,</strong> each optimizing a particular aspect of ML model development.</p>
<ul>
<li><strong>Model development and management:</strong> This deals with methodologies for faster experimentation and development of models.</li>
<li><strong>Model deployment</strong>: This includes a few subtasks.
<ul>
<li>Validation and quality assurance of models that are about to be deployed. After validation, automated, and fast deployment of models into production.</li>
<li>Deployment of new versions of models, i.e., upgrading, over-the-air seamlessly. This step can be thought of as analogous to continuous integration and continuous deployment of DevOps.</li>
</ul>
</li>
<li><strong>Model monitoring</strong>: Monitoring the usage and performance of models that are running in production. Systems that alert when <a href="https://neptune.ai/blog/concept-drift-best-practices">concept drift</a> occurs are common.</li>
</ul>
<p>Although these principles are similar to their software engineering or data management counterparts, <strong>ML models are fundamentally different from code or data</strong>, so they should be treated differently. Software management or data management tooling don’t work as effectively on models. So we have to <a href="https://medium.com/@ODSC/what-are-mlops-and-why-does-it-matter-8cff060d4067">rethink MLOps on its own.</a></p>
<p>In this article, we’re going to focus mostly on the first part of the MLOps pipeline: model management.</p>
<h2 id="what-is-machine-learning-model-management">What is Machine Learning model management?</h2>
<p>Now that we have a birds-eye view of what ML model management is, we can look into the specific practices that go into it.</p>
<p>While developing models, researchers carry out lots of experiments rapidly. By experiments I mean trying out different model architectures, tuning their hyperparameters, and then seeing their performance by training and validating. Even if a single researcher is doing these experiments independently, <strong>keeping track of all the experiments and their results can become hard.</strong></p>
<p>This <strong>challenge grows when multiple researchers are working on the same problem</strong> simultaneously. As these experiments tend to be very rapid and somewhat chaotic, traditional software management tools such as git and Kanban boards fall short when it comes to synchronization between researchers and the experimentation phase.</p>
<p>Solving this challenge is the core premise of ML model management. And the <strong>solution turns out to be something quite simple but tremendously powerful: logging everything.</strong></p>
<p>By logging the parameters pertaining to every experiment, dashboards can be generated at a central place where everyone can keep track of the different models. In addition, there should also be a way to keep track of all the versions of the models that are developed, so that they can be reproduced later easily.</p>
<p>This is the central idea behind ML model management, and in practice, these techniques end up multiplying the productivity of model developers.</p>
<p>So, we understand, ML model management consists of the following three ideas</p>
<ul>
<li><a href="https://docs.neptune.ai/logging-and-managing-experiment-results/index.html">Logging</a></li>
<li><a href="https://neptune.ai/features/notebook-versioning">Version Control</a></li>
<li>Dashboard</li>
</ul>
<p>Now let’s look at them one-by-one more closely.</p>
<h2 id="logging">Logging</h2>
<p>In an ML model development workflow, the fundamental unit of operation is an experiment. An experiment can be thought of as a single train-and-validate cycle of a particular model. Logging every experiment is the first step towards having a birds-eye view of the performance of the models.</p>
<p>For each experiment, the following parameters might be logged:</p>
<ul>
<li>The model name, along with a version number.</li>
<li>Set of hyperparameters used.</li>
<li>Training accuracy at each iteration of the training process.</li>
<li>Final test accuracy, or better, the confusion matrix.</li>
<li>Training time, memory consumed, etc.</li>
<li>Model binary</li>
<li>Code and environment configs</li>
</ul>
<p><img src="/images/posts/model-management-logging.jpg" alt="Logging" />*</p>
<h2 id="version-control">Version Control</h2>
<p>Just like software, machine learning models get built incrementally. After <strong>each incremental change, the model might perform differently than before.</strong></p>
<p>So it is of utmost importance to track the version of models that are being used, and make sure that the metadata and performance metrics that are logged are tagged with the name and version of the model that’s being used. This is where version control comes in.</p>
<p>Version control also helps us reproduce a model later easily, as all the information pertaining to the model is stored in the VCS.</p>
<p>As git is the industry-standard version control system for software development, many ML model management frameworks provide a VCS that’s git-like, in terms of architecture and user interface.</p>
<p><img src="/images/posts/model-management-version-control.jpg" alt="Version Control" /></p>
<h2 id="dashboard">Dashboard</h2>
<p>Once all the information and metadata pertaining to the experiments has been collected, a central dashboard is used to</p>
<ul>
<li>Visualize the metrics related to the models</li>
<li>Query all the models and experiments</li>
<li>Share results with collaborators and review the models</li>
</ul>
<p>Having an intuitive and powerful dashboard can unlock tremendous potential for collaboration and insight gathering; in a sense generating a powerful and rich dashboard is the final goal of model development.</p>
<p>It also helps if the dashboard has user management, authentication, and authorization features built into it, especially for large organizations. That way, security gets reinforced, and collaboration becomes much more manageable.</p>
<p><img src="/images/posts/model-management-dashboard.jpg" alt="Dashboard" /></p>
<h2 id="machine-learning-model-management-frameworks">Machine Learning model management frameworks</h2>
<p>Now that we understand what machine learning model management is, let’s look at some popular model management frameworks. Each has its own idiosyncrasies, strengths, and weaknesses, but they do follow a pattern as well.</p>
<h3 id="modeldb"><a href="https://github.com/VertaAI/modeldb"><strong>ModelDB</strong></a></h3>
<p>The ModelDB framework is one of the pioneers of making machine learning model management easy and intuitive. The Github repo of ModelDB describes itself as</p>
<blockquote>
<p>ModelDB is an open-source system to version machine learning models including their ingredients code, data, config, and environment and to track ML metadata across the model lifecycle.</p>
</blockquote>
<p>In essence, it’s a complete system that provides all the primary components of ML model management: logging, version control, and a central dashboard. It consists of a powerful backend built to run as docker containers, a pluggable storage system, and powerful integrations with major ML frameworks such as Tensorflow, Pytorch, and scikit-learn. You can use ModelDB in order to</p>
<ul>
<li>Make your ML models <strong>reproducible</strong>.</li>
<li>Manage your ML experiments, <strong>build performance dashboards, and share reports.</strong></li>
<li><strong>Track models across their lifecycle</strong> including development, deployment, and live monitoring.</li>
</ul>
<p>ModelDB started off as a research project by the CSAIL research lab at MIT. Since then, it has been adopted by VertaAI and currently it lives <a href="https://github.com/VertaAI/modeldb">here</a>. The latest version of Modeldb brought many features that made it enterprise-ready, including git-based model versioning and authentication features for its central dashboard; the announcement can be found <a href="https://blog.verta.ai/blog/modeldb-2.0-is-here">here</a>.</p>
<p><img src="/images/posts/model-management-modeldb.png" alt="ModelDB" /></p>
<h3 id="mlflow"><a href="https://mlflow.org/"><strong>MLflow</strong></a></h3>
<p>MLflow is another big, comprehensive ML model management framework. It’s created by Databricks, the company behind Apache Spark. And it’s really apparent how their talent and experience have been integrated into this giant but agile framework. From its Github repo</p>
<blockquote>
<p>MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models.</p>
</blockquote>
<p>As you can see, MLflow also provides features for deploying ML models, making this more of a full-fledged MLOps framework. MLflow consists of the following components.</p>
<ul>
<li>
<p><strong>MLflow Tracking</strong>: An API to log parameters, code, and results in machine learning experiments and compare them using an interactive UI. Essentially, this component covers the most fundamental <em>logging</em> part of the ML model management workflow.</p>
</li>
<li>
<p><strong>MLflow Projects:</strong> A code packaging format for reproducible runs using Conda and Docker, so you can share your ML code with others. This incorporates the <em>version control</em> part of the model management workflow.</p>
</li>
<li>
<p><strong>MLflow Models:</strong> A model packaging format and tools that let you easily deploy the same model (from any ML library) to batch and real-time scoring on platforms such as Docker, Apache Spark, Azure ML, and AWS SageMaker.</p>
</li>
<li>
<p><strong>MLflow Model Registry:</strong> A centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of MLflow Models. This component, along with the previous one, makes MLOps a fully-fledged MLOps framework, rather than just a model management framework.</p>
<p><img src="/images/posts/model-management-mlflow.png" alt="MLFlow" /></p>
</li>
</ul>
<h3 id="neptune"><a href="https://neptune.ai/"><strong>Neptune</strong></a></h3>
<p>Neptune is a machine learning experiment management tool which focuses on being lightweight and easy to integrate. It consists of two components,</p>
<ul>
<li>The server, which you can use as a service or install on your own hardware.</li>
<li>The client libraries.</li>
</ul>
<p>Like I said, the strength of Neptune is in its ease of integration with all kinds of workflows. Using Neptune, team members can use vastly different ML libraries and platforms, and yet share their results and collaborate on a single dashboard.</p>
<p>Moreover, you can use their software-as-a-service offering, skipping the need to deploy it on your own hardware; and that way integrating Neptune into your workflow becomes even easier.</p>
<p>Neptune offers the following features</p>
<ul>
<li><strong>Experiment management</strong>: Keep track of all the experiments your team carries out; tag, filter, group, sort and compare your experiments.</li>
<li><strong>Notebook versioning and diffing</strong>: Compare two notebooks or two different checkpoints of the same notebook. You can even compare their output side-by-side just like source code.</li>
<li><strong>Team collaboration</strong>: Adding comments, mentioning teammates, comparing results and discovering insights through discussions, it’s all possible.</li>
</ul>
<p>As you can see, it’s a fully fledged machine learning model management framework; the ease of use doesn’t come at the cost of functionality and power. If you’re looking for an easy entry into the world of machine learning model management, Neptune could be perfect.</p>
<h3 id="azure-machine-learning"><strong><a href="https://azure.microsoft.com/en-us/services/machine-learning/">Azure Machine Learning</a></strong></h3>
<p>Azure Machine Learning is the cloud MLOps platform offered by Microsoft. As it’s a <strong>complete MLOps system</strong>, it offers services to manage and automate the whole ML lifecycle, i.e., model management, deployment, and monitoring. It provides the following MLOps capabilities.</p>
<ul>
<li>Create <strong>reproducible ML pipelines.</strong></li>
<li>Create <strong>reusable software environments</strong> for training and deploying models.</li>
<li><strong>Register, package, and deploy models</strong> from anywhere.</li>
<li><strong>Deals with data governance</strong> for the end-to-end ML lifecycle.</li>
<li><strong>Notify and alert on events</strong> in the ML lifecycle.</li>
<li><strong>Monitor ML applications</strong> for operational and ML-related issues.</li>
<li>Automate the end-to-end ML lifecycle with Azure Machine Learning and Azure Pipelines.</li>
</ul>
<p>If you’re considering making your whole ML infrastructure cloud-based, or if you’re already on the cloud bandwagon, Azure Machine Learning might be an excellent platform to consider. It offers all the state-of-the-art MLOps workflows, and you won’t have to manage a single piece of hardware by yourself.</p>
<p><img src="/images/posts/model-management-azure.png" alt="Azure" /></p>
<h2 id="final-remarks">Final remarks</h2>
<p>Machine learning model management frameworks make your ML workflow much smoother, lets your team collaborate and share their insights, in turn increasing their efficiency and productivity by a great margin. You should incorporate an ML model management framework in your arsenal; it’s an essential component of using ML effectively in 2020 and beyond.</p>
<p>If you’re new to model management, I would suggest you <strong>start with a framework that’s lightweight enough to not add any friction to the workflow and codebase you already have</strong> in place. Once you understand how it suits you, and how you can use it to be more productive, you can incorporate it more deeply into your workflow. Neptune is such a framework, it’s lightweight enough to not get in your way, but it’s also powerful if you use it to its full potential.</p>Sumit Ghoshsumit@sumit-ghosh.comModel management is a relatively new issue when it comes to Machine Learning. Since the technique is widely used in business, the need to manage multiple experiments and optimize dozens of parameters has become bread-and-butter of data scientists around the world. And thus, the tools supporting the Model management have emerged on the market. The rise of open-source and relatively intuitive ML frameworks such as Pytorch and Tensorflow has lowered the entry barrier for ML development.Creating a VM using Libvirt, Cloud Image and Cloud-Init2020-09-11T00:00:00+05:302020-09-11T00:00:00+05:30https://sumit-ghosh.com/posts/create-vm-using-libvirt-cloud-images-cloud-init<p>In this post, we’re going to see how we can create virtual machines in seconds, the way it’s done in modern cloud infrastructures such as EC2 and Digital Ocean. This is kind of a continuation of <a href="/articles/virtualization-hypervisors-explaining-qemu-kvm-libvirt/">my previous post</a>, where I explain the basics of virtualization and create a virtual machine in the old-school way. So let’s dive right into it!</p>
<p>The old-school, traditional way of creating virtual machines goes like the following.</p>
<ol>
<li>Create a VM from scratch by allocating a newly created blank disk image to it.</li>
<li>Install an OS on the VM, from either an ISO or using PXE network boot.</li>
</ol>
<p>In the second step, the OS installation wizard usually needs many manual inputs, which is not ideal for automation purposes. Here’s how automated installation tools like <a href="https://en.wikipedia.org/wiki/Preseed">Preseed</a> for Debian and derivatives, and <a href="https://en.wikipedia.org/wiki/Kickstart_(Linux)">Kickstart</a> for RHEL and <a href="https://help.ubuntu.com/community/KickstartCompatibility">Ubuntu</a> comes into play. A sysadmin can pre-generate a file containing all the required configuration options, and these tools can read that file at install time, skipping the need for manual inputs. In this way, the whole VM creation process can be automated.</p>
<p>But still, the installation takes quite some time, which is unacceptable in the age of cloud computing. If you’ve used cloud platforms such as EC2 and Digital Ocean, you know how fast they can provision a VM. You just select an OS, the disk size, number of CPUs, RAM, and provide an SSH key—and you get a VM up and running in less than a minute. We’re gonna see how they do that, and how we can do the same in our machine.</p>
<h2 id="how-it-works">How It Works</h2>
<p>In the old-school way, we created a blank disk image, and installed the OS in that. The installation step is the bottleneck, but there’s a straightforward way to skip that step entirely: import a disk that already has the OS installed in it. This kind of disk image is called a <em>cloud image</em>. You can understand what a cloud image is by understanding how it’s usually created</p>
<ol>
<li>Create a VM with a blank disk image.</li>
<li>Install a minimal version of the OS in that VM.</li>
<li>Create a snapshot of the disk just after the installation.</li>
</ol>
<p>So, a <em>cloud image</em> is different than a normal <em>installation image</em>. Most OS vendors provide both an installation image and a cloud image of their OS; you can find the Ubuntu cloud images <a href="http://cloud-images.ubuntu.com/">here</a>. As you can imagine, once we have a cloud image, we can just <em>import</em> that and create a new VM using that image as the root disk, skipping the need for installation as the OS is already installed in that image.</p>
<h3 id="cloud-init">Cloud-Init</h3>
<p>Creating a VM using a cloud image is cool, but there’s a problem with this approach, we didn’t get any chance to customize the installation. All the configurations that are usually done in install time—setting the hostname, username, password, importing SSH key, we didn’t get to do any of that. Kickstart used to take care of this in the old paradigm; can we use Kickstart here too? Well, Kickstart comes into action at install time, it’s a feature of the installation wizard; we skipped the installation process entirely, so no, we can’t use Kickstart.</p>
<p>Canonical—the parent company behind Ubuntu—developed a solution to this problem called <a href="https://cloud-init.io/">cloud-init</a>. Their official website pitches cloud-init like the following</p>
<blockquote>
<p>Cloud images are operating system templates and every instance starts out as an identical clone of every other instance. It is the user data that gives every cloud instance its personality and cloud-init is the tool that applies user data to your instances automatically.</p>
<h4 id="use-cloud-init-to-configure">Use cloud-init to configure:</h4>
<ul>
<li>Setting a default locale</li>
<li>Setting the hostname</li>
<li>Generating and setting up SSH private keys</li>
<li>Setting up ephemeral mount points</li>
</ul>
</blockquote>
<p>As we can see, this is exactly what we want. Now let’s understand how cloud-init works.</p>
<p>Cloud-init is a program that runs after every boot. What it does is the following.</p>
<ol>
<li>Fetch two configuration files, <code class="language-plaintext highlighter-rouge">user-data</code> and <code class="language-plaintext highlighter-rouge">meta-data</code>, from some pre-defined locations called <a href="https://cloudinit.readthedocs.io/en/latest/topics/datasources.html">datasources</a>. The most commonly datasources are the following</li>
</ol>
<ul>
<li>Hosting the files on an HTTP endpoint and hardcoding the endpoint URL into the cloud image. Most public cloud providers use some variation of this method.</li>
<li>Attaching an ISO disk named <code class="language-plaintext highlighter-rouge">cidata</code> to the VM, containing the two config files. Cloud-init will look for a disk with the volume label <code class="language-plaintext highlighter-rouge">cidata</code>, and if it finds one, it’ll fetch the files from that disk. This datasource is called <a href="https://cloudinit.readthedocs.io/en/latest/topics/datasources/nocloud.html">NoCloud</a>, and we’re going to go with this one as it’s a lot simpler and suitable for a smaller-scale homelab.</li>
</ul>
<ol>
<li>Set the configuration options according to the two aforementioned configuration files.</li>
</ol>
<p>As we discussed, cloud-init needs two config files.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">meta-data</code> contains some basic metadata about the system, such as hostname and instance id.</li>
<li><code class="language-plaintext highlighter-rouge">user-data</code> is much more flexible; it can be anything ranging from a YAML config file to a bash script. You can find all the valid <code class="language-plaintext highlighter-rouge">user-data</code> formats <a href="https://cloudinit.readthedocs.io/en/latest/topics/format.html">here</a>.</li>
</ul>
<p>Now that we have a basic idea of cloud-init and how it works, let’s get our hands dirty and create a VM using cloud image and cloud-init.</p>
<h3 id="creating-a-vm-using-cloud-image-and-cloud-init">Creating a VM Using Cloud Image and Cloud-Init</h3>
<h3 id="prerequisites">Prerequisites</h3>
<p>First of all, we need to install the virtualization essentials: the complete Libvirt distribution with all relevant dependencies and utilities, and <code class="language-plaintext highlighter-rouge">virtinst</code>.</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span><span class="nb">sudo </span>apt <span class="nb">install </span>libvirt-daemon-system virtinst
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">libvirt-daemon-system</code> package contains the Libvirt C libraries, the Libvirt daemon, CLI utilites to interact with the daemon, and QEMU-KVM as the default hypervisor. The <code class="language-plaintext highlighter-rouge">virtinst</code> package contains additional helper programs such as <code class="language-plaintext highlighter-rouge">virt-inst</code>, <code class="language-plaintext highlighter-rouge">virt-viewer</code>, <code class="language-plaintext highlighter-rouge">genisoimage</code>, etc. If you’re unfamiliar with what these are and how they interact, check out <a href="/articles/virtualization-hypervisors-explaining-qemu-kvm-libvirt/">my previous post</a>.</p>
<p>Next, we need to download the cloud image we want. I’m going to go with the <a href="http://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img">latest cloud image of Ubuntu</a>. You can get all the Ubuntu cloud images <a href="http://cloud-images.ubuntu.com/">here</a>. A quick tip: in the following command, just replacing <code class="language-plaintext highlighter-rouge">focal</code> with the distribution name you want should also work.</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>wget http://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img
</code></pre></div></div>
<h3 id="creating-a-disk-image">Creating a Disk Image</h3>
<p>Now that we have a cloud image—which is almost like a template we want to build our VM’s disk image from—the next would be creating the disk image itself. Before we do that, we need to understand some of the optimizations the QCOW image format has</p>
<ul>
<li>
<p>QCOW uses a disk storage optimization strategy that delays allocation of storage until it is actually needed. This allows smaller file sizes than a raw disk image in cases when the image capacity is not totally used up. Let’s see this hands-on</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>qemu-img convert <span class="nt">-f</span> qcow2 <span class="nt">-O</span> raw focal-server-cloudimg-amd64.img focal-server-cloudimg-amd64.raw
<span class="gp">$</span><span class="w"> </span><span class="nb">ls</span> <span class="nt">-lh</span>
<span class="go">total 1.8G
-rw-rw-r-- 1 libvirt-qemu kvm 520M Sep 7 18:58 focal-server-cloudimg-amd64.img
-rw-r--r-- 1 sumit sumit 2.2G Sep 12 12:17 focal-server-cloudimg-amd64.raw
</span></code></pre></div> </div>
<p>As you can see above, <code class="language-plaintext highlighter-rouge">qemu-img</code> is the program we’re using to create, modify, and inspect disk images. I already had the <code class="language-plaintext highlighter-rouge">focal-server-cloudimg-amd64.img</code> image which is of the QCOW2 format, and I created a RAW copy of it. From the last <code class="language-plaintext highlighter-rouge">ls -lh</code> command, we can clearly see how these two formats differ—the file size of the RAW image is 2.2 GB, while the QCOW2 image takes up only 520MB, despite both having the same contents and the same <em>virtual</em> size. This is because the QCOW2 image only allocated the storage which is actually needed, and it will grow the image as more and more space is being used. Without using copy on write—as you can imagine—with every VM we create, we would have redundant copies of the original base image, and those would’ve eaten up our disk space.</p>
</li>
<li>
<p>QCOW stands for <em>QEMU Copy On Write</em>; as evident from the name, another key optimization it has is its ability to use a base image in a <em>copy on write</em> mode. From <a href="https://en.wikipedia.org/wiki/Qcow">Wikipedia</a></p>
<blockquote>
<p>The qcow format also allows storing changes made to a read-only base image on a separate qcow file by using <a href="https://en.wikipedia.org/wiki/Copy-on-write">copy on write</a>. This new qcow file contains the path to the base image to be able to refer back to it when required. When a particular piece of data has to be read from this new image, the content is retrieved from it if it is new and was stored there; if it is not, the data is fetched from the base image.</p>
</blockquote>
<p>We’ll be using this feature of QCOW, as it will save a lot of storage in case we create a lot of VMs. We’ll see how this works shortly.</p>
</li>
</ul>
<p>Now that we understand how QCOW works, let’s create a disk image for our VM.</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>qemu-img create <span class="nt">-b</span> focal-server-cloudimg-amd64.img <span class="nt">-f</span> qcow2 <span class="nt">-F</span> qcow2 hal9000.img 10G
<span class="go">Formatting 'hal9000.img', fmt=qcow2 size=10737418240 backing_file=focal-server-cloudimg-amd64.img backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
</span></code></pre></div></div>
<p>Here, we used the cloud image <code class="language-plaintext highlighter-rouge">focal-server-cloudimg-amd64.img</code> as the base image for our new image <code class="language-plaintext highlighter-rouge">hal9000.img</code>. At first, the new image will contain only a reference to the base image, nothing else. As long as the blocks in the new image are being <em>read</em>, QCOW will fetch those blocks directly from the base image and serve them. Only when some block in the new image is <em>written</em> to, it will <em>copy</em> the block contents from the base image into itself and make the <em>writes</em>// changes there. In this way, <em>copy on write</em> saves only the modifications or <em>writes</em> in the new image, keeping the image size small. We can verify this by checking their file sizes</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">sumit@HAL9000:~/images/base$</span><span class="w"> </span><span class="nb">ls</span> <span class="nt">-lh</span>
<span class="go">total 520M
-rw-rw-r-- 1 libvirt-qemu kvm 520M Sep 7 18:58 focal-server-cloudimg-amd64.img
-rw-r--r-- 1 sumit sumit 193K Sep 12 12:39 hal9000.img
</span></code></pre></div></div>
<p>The new image takes up only 193KB! This will keep growing as make writes to the image, but it’s still a huge storage optimization.</p>
<h3 id="creating-cloud-init-configuration-files">Creating Cloud-Init Configuration Files</h3>
<p>As we know, cloud-init uses two configuration files, <code class="language-plaintext highlighter-rouge">user-data</code> and <code class="language-plaintext highlighter-rouge">meta-data</code>. Below is the <code class="language-plaintext highlighter-rouge">meta-data</code> we’ll be using</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">instance-id</span><span class="pi">:</span> <span class="s">hal9000</span>
<span class="na">local-hostname</span><span class="pi">:</span> <span class="s">hal9000</span>
</code></pre></div></div>
<p>It’s a YAML file with a unique instance id and hostname.</p>
<p>Next, we’re going to create a <code class="language-plaintext highlighter-rouge">user-data</code></p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#cloud-config</span>
<span class="na">users</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">sumit</span>
<span class="na">ssh_authorized_keys</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDDzMSSuEky2HWKy3/p01BXURLkYhDkJ+Wpd45kU7s0737LXx9zhRqWyX0pnUcGf1A5uKpy6JiaHNjRT/PBKMye0ej1CSurPZXOEyjSSK4MlW8NkRAHiLBuBAhetG3jANWKxcvsvsp172XdK8yP81B0w4qlKQz7J5GbALuwSwFEQu01tjf4aErEvV8xXxl2y1O8DMxjTiXT2WLTeoUDQldBm3m56ogtajnJz7USiZPePUZHcm6DMp9/2+ucef3/1AAtK0adQzwhnj6W+0eCTqdQz+DF9erqsMkd7QoRaQ0/ZK/rqljMwdbux6NySA1U5Zx2JaNUlClmfqxlkBm8TbY7 sumit@sumit-ghosh.com</span>
<span class="na">sudo</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">ALL=(ALL)</span><span class="nv"> </span><span class="s">NOPASSWD:ALL"</span><span class="pi">]</span>
<span class="na">groups</span><span class="pi">:</span> <span class="s">sudo</span>
<span class="na">shell</span><span class="pi">:</span> <span class="s">/bin/bash</span>
</code></pre></div></div>
<p>User-data is very versatile and supports <a href="https://cloudinit.readthedocs.io/en/latest/topics/format.html">many different formats</a>; we’re using the cloud-config format, which is the most convenient. A cloud-config user-data is basically a YAML file that begins with <code class="language-plaintext highlighter-rouge">#cloud-config</code>. The above user-data should be mostly self-explanatory.</p>
<ul>
<li>I’m creating a user with the username “sumit” using <code class="language-plaintext highlighter-rouge">name: sumit</code>, you name the user whatever you want.</li>
<li>A list of authorized SSH public keys can be supplied using the <code class="language-plaintext highlighter-rouge">ssh_authorized_keys</code> array; I’m putting my SSH public key there, you should put yours.</li>
<li>We’re giving the user password-less <code class="language-plaintext highlighter-rouge">sudo</code> privileges using the <code class="language-plaintext highlighter-rouge">sudo: sudo: ['ALL=(ALL) NOPASSWD:ALL']</code> line.</li>
<li>We’re adding the user to the <code class="language-plaintext highlighter-rouge">sudo</code> group.</li>
<li>The default shell of the user is set to <code class="language-plaintext highlighter-rouge">/bin/bash</code>.</li>
</ul>
<p>Finally, we need to create an ISO image with the volume label <code class="language-plaintext highlighter-rouge">cidata</code> containing these two files; you should recall this is how the NoCloud data source works.</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>genisoimage <span class="nt">-output</span> cidata.iso <span class="nt">-V</span> cidata <span class="nt">-r</span> <span class="nt">-J</span> user-data meta-data
<span class="go">I: -input-charset not specified, using utf-8 (detected in locale settings)
Total translation table size: 0
Total rockridge attributes bytes: 331
Total directory bytes: 0
Path table size(bytes): 10
Max brk space used 0
183 extents written (0 MB)
</span></code></pre></div></div>
<p>The above <code class="language-plaintext highlighter-rouge">genisoimage</code> command should create a file named <code class="language-plaintext highlighter-rouge">cidata.iso</code> containing the user-data and meta-data files we created earlier. Notice that we’re setting the volume label to <code class="language-plaintext highlighter-rouge">cidata</code> using the <code class="language-plaintext highlighter-rouge">-V cidata</code> argument.</p>
<h3 id="creating-the-vm">Creating the VM</h3>
<p>Just like <a href="/articles/virtualization-hypervisors-explaining-qemu-kvm-libvirt/#creating-a-vm">before</a>, we’re gonna use <code class="language-plaintext highlighter-rouge">virt-install</code> to create our VM.</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>virt-install <span class="nt">--name</span><span class="o">=</span>hal9000 <span class="nt">--ram</span><span class="o">=</span>2048 <span class="nt">--vcpus</span><span class="o">=</span>1 <span class="nt">--import</span> <span class="nt">--disk</span> <span class="nv">path</span><span class="o">=</span>hal9000.img,format<span class="o">=</span>qcow2 <span class="nt">--disk</span> <span class="nv">path</span><span class="o">=</span>cidata.iso,device<span class="o">=</span>cdrom <span class="nt">--os-variant</span><span class="o">=</span>ubuntu20.04 <span class="nt">--network</span> <span class="nv">bridge</span><span class="o">=</span>virbr0,model<span class="o">=</span>virtio <span class="nt">--graphics</span> vnc,listen<span class="o">=</span>0.0.0.0 <span class="nt">--noautoconsole</span>
<span class="go">
Starting install...
Domain creation completed.
</span></code></pre></div></div>
<p>Most of the CLI parameters should be self-explanatory, although there are some that you should pay attention to</p>
<ul>
<li>The VM name is set to be “hal9000” using the <code class="language-plaintext highlighter-rouge">name</code> parameter.</li>
<li>The VM is allocated 2 GB of RAM using the <code class="language-plaintext highlighter-rouge">ram</code> parameter and a single-core virtual CPU using the <code class="language-plaintext highlighter-rouge">vcpus</code> parameter.</li>
<li>Using the <code class="language-plaintext highlighter-rouge">---import</code> argument, we’re letting <code class="language-plaintext highlighter-rouge">virt-install</code> know that we’re <em>importing</em> an image which has the OS already installed within.</li>
<li>We’re attaching both of the images we created using the <code class="language-plaintext highlighter-rouge">disk</code> parameter, the QCOW image <code class="language-plaintext highlighter-rouge">hal9000.img</code> to be used as the storage, and the ISO image <code class="language-plaintext highlighter-rouge">cidata.iso</code> as a datasource for cloud-init.</li>
<li>Mentioning the <code class="language-plaintext highlighter-rouge">os-variant</code> helps it make some OS-specific optimizations.</li>
<li>We’re mentioning a network interface using the <code class="language-plaintext highlighter-rouge">network</code> parameter as we want our VM to have network connectivity. Using a bridge network interface is the simplest networking model we can have, and that’s what we’re going with.</li>
<li>Using the <code class="language-plaintext highlighter-rouge">--graphics vnc,listen=0.0.0.0</code> argument, we’re setting a virtual console in the guest, and exposing it as a VNC server in the host. We can connect to the VM console using this VNC server later on if we want.</li>
<li>Finally, on using the <code class="language-plaintext highlighter-rouge">--noautoconsole</code> argument, <code class="language-plaintext highlighter-rouge">virt-install</code> doesn’t open a console to the guest VM after creation, it just exits. This is essential for automation.</li>
</ul>
<p>After creation, give it 10-20 seconds to boot up. Then get the newly created VM’s IP address using the following command.</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>virsh net-dhcp-leases default
<span class="go"> Expiry Time MAC address Protocol IP address Hostname Client ID or DUID
------------------------------------------------------------------------------------------------------------------------------------------------
2020-09-12 14:50:17 52:54:00:e5:0e:65 ipv4 192.168.122.103/24 hal9000 ff:56:50:4d:98:00:02:00:00:ab:11:db:1f:d2:8c:42:8a:75:70
</span></code></pre></div></div>
<p>As you can see, in my case, the IP address allocated to the <code class="language-plaintext highlighter-rouge">hal9000</code> VM is <code class="language-plaintext highlighter-rouge">192.168.122.103</code>. Now I can simply SSH into the VM</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>ssh sumit@192.168.122.103
</code></pre></div></div>
<p>And I have access to a brand new VM, created under a minute! You can automate the whole process using a script, such as <a href="https://github.com/SkullTech/scripts/blob/master/create-vm.yml">this Ansible playbook</a> I wrote.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Hopefully this post was helpful to you. If you any questions, feedback, or suggestion, please leave it in the comments down below. You can also reach me via email or on Twitter. Thanks for reading!</p>Sumit Ghoshsumit@sumit-ghosh.comIn this post, we’re going to see how we can create virtual machines in seconds, the way it’s done in modern cloud infrastructures such as EC2 and Digital Ocean. This is kind of a continuation of my previous post, where I explain the basics of virtualization and create a virtual machine in the old-school way. So let’s dive right into it!Virtualization and Hypervisors :: Explaining QEMU, KVM, and Libvirt2020-09-08T00:00:00+05:302020-09-08T00:00:00+05:30https://sumit-ghosh.com/posts/virtualization-hypervisors-explaining-qemu-kvm-libvirt<p>In this post, I will explore some core concepts and terminologies regarding virtualization in Linux systems. I intend it to serve as a reference, especially for beginners who are just getting started with virtualization and system administration in general.</p>
<h2 id="hypervisors">Hypervisors</h2>
<p>Virtualization, i.e., creating and running virtual machines, is handled by something called a hypervisor, which can either be software, firmware or hardware. The system on which the hypervisor runs virtual machines is called the <em>host</em> system, and the virtual machines themselves are called <em>guest</em> systems. The hypervisor manages and provides resources to the guest OS, and it translates system calls of the guest OS to suitable system calls or hardware interrupts in the host system.</p>
<p>Hypervisors can be generally categorized into two types.</p>
<ol>
<li>
<p><strong>Type 1 hypervisor:</strong> These run on bare metal and typically leverage features of the CPU specifically built for virtualization, for example, AMD-V and Intel VT-x. Examples of type 1 hypervisor are:</p>
<ul>
<li>KVM, which is a Linux kernel module and part of the official Linux kernel.</li>
<li>VMWare ESXi.</li>
<li>Microsoft Hyper-V.</li>
</ul>
</li>
<li>
<p><strong>Type 2 hypervisor:</strong> These run on top of a host OS, and thus it translates system calls made by the guest OS to system calls made to the host OS. Type 2 virtualization is also called <em>emulation</em>, to distinguish it from type 1 or <em>true</em> virtualization. Examples include:</p>
<ul>
<li>QEMU.</li>
<li>VMWare Workstation.</li>
<li>VirtualBox.</li>
</ul>
</li>
</ol>
<h3 id="differences-advantages-and-drawbacks">Differences: Advantages and Drawbacks</h3>
<ul>
<li>In general, type 1 hypervisors are a lot faster than type 2 hypervisor.</li>
<li>Type 1 hypervisors can only emulate the same architecture as of the host CPU. For example, it can only create VMs with x86 architecture if the machine has an x86 CPU. Type 2 hypervisors, on the other hand, emulate any and all architectures regardless of the host CPU, at least in theory.</li>
</ul>
<h2 id="qemu-and-kvm">QEMU and KVM</h2>
<p>When considered separately as standalone software, QEMU is a type 2 hypervisor, and KVM is a type 1 hypervisor, just like we already discussed. But in the specific case when,</p>
<ul>
<li>The guest OS and the host OS has the same architecture.</li>
<li>The host CPU supports bare-bones virtualization extensions, such as AMD-V or Intel VT-x.</li>
<li>The host OS has KVM installed.</li>
</ul>
<p>QEMU can use KVM to translate the calls made to the CPU and the memory, thus turning that part of the virtualization process bare-bones or type 1. The rest of the system calls, such as calls to IO devices and disk drives, are routed through the OS, in the usual type 2 fashion. As CPU and memory calls are the most critical system calls, using KVM to handle them makes the whole process a lot faster and smoother. In this way, QEMU-KVM works in tandem to bring the best of both worlds.</p>
<h2 id="libvirt">Libvirt</h2>
<p>Libvirt is an open-source set of software and libraries which provide a single way to manage multiple different hypervisors, such as the QEMU, KVM, Xen, LXC, OpenVZ, VMWare ESXi, etc. It consists of a stable C API, a daemon, and command-line utilities to work with the Libvirt API. The <a href="https://wiki.archlinux.org/index.php/libvirt">Arch wiki page on Libvirt</a> has much useful information, keep in mind that the packages and commands may not correspond well with other Linux distributions.</p>
<h3 id="installing-libvirt-with-qemu-kvm">Installing Libvirt with QEMU-KVM</h3>
<p>The following <code class="language-plaintext highlighter-rouge">apt</code> packages are essential for creating and managing virtual machines using Libvirt with QEMU-KVM in an Ubuntu system.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">libvirt-daemon-system</code>: The complete Libvirt distribution, i.e., the C libraries and the Libvirt daemon <code class="language-plaintext highlighter-rouge">libvirtd</code>. It also installs the following helper packages
<ul>
<li><code class="language-plaintext highlighter-rouge">libvirt-clients</code>, which is a collection of CLI utilities to interface with the Libvirt daemon.</li>
<li><code class="language-plaintext highlighter-rouge">bridge-utils</code> for networking.</li>
<li><code class="language-plaintext highlighter-rouge">qemu-kvm</code> as the default hypervisor.</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">virtinst</code>: For some helper utils such as <code class="language-plaintext highlighter-rouge">virt-install</code> and <code class="language-plaintext highlighter-rouge">virt-viewer</code>.</li>
</ul>
<p>Running the following command would install the above</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span><span class="nb">sudo </span>apt <span class="nb">install </span>libvirt-daemon-system virtinst
</code></pre></div></div>
<h2 id="creating-a-vm">Creating a VM</h2>
<p>Now that we have installed Libvirt with QEMU and KVM, I will show how we can create a VM using this. This is mostly for demonstration purposes, so I’m going to keep the process as simple and minimal as possible.</p>
<ol>
<li>
<p>Firstly, we’ll have to create a disk image file which is to be used by the VM as its <em>secondary storage</em> or disk. The format can be either RAW, or QCOW2—which has some very useful optimizations. For now, we’re going to go with RAW.</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>qemu-img create <span class="nt">-f</span> raw <span class="nt">-o</span> <span class="nv">size</span><span class="o">=</span>10G vol.raw
<span class="go">Formatting 'vol.raw', fmt=raw size=10737418240
</span></code></pre></div> </div>
<p>As you can see, we mentioned the image format using the <code class="language-plaintext highlighter-rouge">-f raw</code> argument, made it have 10 GB of storage using the <code class="language-plaintext highlighter-rouge">size=10G</code> argument, and we named the image file <code class="language-plaintext highlighter-rouge">vol.raw</code>.</p>
</li>
<li>
<p>Once we have a disk image, we can create the VM with the disk image as its storage using the <code class="language-plaintext highlighter-rouge">virt-install</code> command. We’ll also have to pass an installation media for installing an OS in the VM—you can use any installer image you have, I’m going to use the installer for Ubuntu Budgie 18.04.</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">sumit@HAL9000:~/Coding$</span><span class="w"> </span>virt-install <span class="nt">--name</span><span class="o">=</span>vm1 <span class="nt">--ram</span><span class="o">=</span>2048 <span class="nt">--vcpus</span><span class="o">=</span>1 <span class="nt">--disk</span> <span class="nv">path</span><span class="o">=</span>vol.raw <span class="nt">--cdrom</span><span class="o">=</span>/home/sumit/Downloads/ubuntu-budgie-18.04.3-desktop-i386.iso <span class="nt">--os-variant</span> ubuntu18.04 <span class="nt">--network</span> <span class="nv">bridge</span><span class="o">=</span>virbr0,model<span class="o">=</span>virtio
<span class="go">
Starting install...
</span></code></pre></div> </div>
<p>The CLI parameters are mostly self-explanatory.</p>
<ul>
<li>The VM name is set to be <em>vm1</em> using the <code class="language-plaintext highlighter-rouge">name</code> parameter.</li>
<li>The VM is allocated 2 GB of RAM using the <code class="language-plaintext highlighter-rouge">ram</code> parameter and a single-core virtual CPU using the <code class="language-plaintext highlighter-rouge">vcpus</code> parameter.</li>
<li>The disk image and installation image is supplied using the <code class="language-plaintext highlighter-rouge">disk</code> and <code class="language-plaintext highlighter-rouge">cdrom</code> parameters.</li>
<li>Mentioning the <code class="language-plaintext highlighter-rouge">os-variant</code> helps it make some OS-specific optimizations.</li>
<li>Finally, we have to mention a network interface using the <code class="language-plaintext highlighter-rouge">network</code> parameter so that the VM has network connectivity. Using a bridge network interface is the simplest networking model we can have, and that’s what we went with.</li>
</ul>
</li>
</ol>
<p>Once you run the <code class="language-plaintext highlighter-rouge">virt-install</code> command like above, a <code class="language-plaintext highlighter-rouge">virt-viewer</code> window should pop up with the installer running, go ahead and follow the installation wizard to install the OS. You can open the VM anytime later by running the following command.</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>virt-viewer vm1
</code></pre></div></div>
<p>Shutting down, starting, and rebooting is very easy too!</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>virsh shutdown vm1
<span class="gp">$</span><span class="w"> </span>virsh start vm1
<span class="gp">$</span><span class="w"> </span>virsh reboot vm1
</code></pre></div></div>
<h2 id="conclusion">Conclusion</h2>
<p>I hope this post helped you understand the basics of virtualization and get started with it. I plan to post more tutorials exploring specific use-cases; until then, have fun tinkering around! Feel free to leave a comment below if you have any questions or feedback. You can also always drop me an e-mail or contact me on Twitter if you want to talk. Cheers!</p>
<h2 id="clarifications">Clarifications</h2>
<p>I got some questions and comments about this post on Reddit. I’m including the discussions here as they shed light on some nuances regarding the topics discussed above.</p>
<p><a href="https://www.reddit.com/r/selfhosted/comments/iovfht/virtualization_and_hypervisors_explaining_qemu/g4iw4vr/?utm_source=share&utm_medium=web2x&context=3">Comment</a> by Reddit user <a href="https://www.reddit.com/user/retnikt0/">retnikt0</a>;</p>
<blockquote>
<p>QEMU and KVM are neither really type 1 nor type 2 hypervisors; they’re kind of somewhere in between. QEMU uses KVM so they certainly can’t be different types.</p>
</blockquote>
<p>Roughly KVM can be called a type 1 hypervisor and QEMU a type 2 hypervisor, but he’s right, there’s some more nuance to that if we want to be completely correct.</p>
<p>QEMU acts as an emulator (i.e. a type 2 hypervisor) if the KVM kernel module is not available. But when the KVM module is available, it uses that to speed up system calls, so in that case, in a way it sits halfway between type 1 and type 2 hypervisor.</p>
<p>When we consider KVM by itself, it’s not strictly a type 1 hypervisor either, as it still is part of an OS; it’s not a complete standalone system like VMWare ESXi built to only host VMs.</p>
<p><a href="https://www.reddit.com/r/sysadmin/comments/iovht7/i_wrote_an_blog_post_explaining_the_core_concepts/g4gtr0k?utm_source=share&utm_medium=web2x&context=3">Comment</a> by Reddit user <a href="https://www.reddit.com/user/NinjaAmbush/">NinjaAmbush</a>;</p>
<blockquote>
<p>You might want to explain why we’d want to use QEMU on top of KVM. I understand why having QEMU use KVM speeds up certain system calls, but it’s not clear to me why I want QEMU in this set up.</p>
</blockquote>
<p>KVM provides access to the virtualisation extensions available on x86 systems using an API. Using KVM without QEMU is certainly possible, but you’ll need to <a href="https://lwn.net/Articles/658511/">deal with the KVM API at extremely low-level</a>. QEMU provides gives a stable interface in the form of a set of binaries to interact with KVM, and set of helper components (such as the <a href="https://www.linux-kvm.org/page/Virtio">virtio</a> interface) so that full-fledged virtual machines can be built easily. The two projects are developed together; in the real world it’s unlikely to see KVM without QEMU, because that’s the way the developers want you to use them.</p>Sumit Ghoshsumit@sumit-ghosh.comIn this post, I will explore some core concepts and terminologies regarding virtualization in Linux systems. I intend it to serve as a reference, especially for beginners who are just getting started with virtualization and system administration in general.Caste, Gender and Patriarchy in Rural and Semi-Rural India2020-08-18T00:00:00+05:302020-08-18T00:00:00+05:30https://sumit-ghosh.com/posts/caste-gender-patriarchy-in-rural-semi-rural-india<p>The institutions of caste, gender, and patriarchy in India are not as unambiguous as it might seem from the perspective of an outsider; complex coordination between them permeates the lives of almost everyone from the rural and semi-rural parts of India, despite the penetration of modern consumerist culture. We’re going to explore how these institutions of power interact with each other, and in turn how they make up the social order of the aforementioned parts of India.</p>
<h2 id="constraints-on-marriage">Constraints on Marriage</h2>
<p>As evident from many incidents from the past few decades, marriage and sexuality is tightly controlled in these societies. Marriage is almost always limited to intra-cast, the primary motivation being the solidarity of caste and the strengthening of its power. Exogamy, in this context, breaches the traditional norms, which brings shame not only to the couple but to the whole family and typically also to the extended family and caste. To make sure the younger population learns a lesson, these couples are sometimes punished inhumanely—by torture and death—which is justified according to the “law of the land.” The panchayats impose these punishments: the village panchayats, the caste panchayats, or in some cases, the larger <em>khap</em> panchayats. The local police do not intervene in these “social” matters. The policemen are usually recruited from the upper-caste population, so they believe these matters best be left to the traditional systems of justice. The conventions of status, tradition, honour, etc are the usual rationale behind these, but underneath all those it’s almost always a form of assertion and retention of power.</p>
<h2 id="caste-and-gender">Caste and Gender</h2>
<p>The standards of the customs we discussed in the previous section are not uniform within the genders. The upper-caste men have a practice of taking wives from the lower-caste population, and that’s usually well tolerated; according to the folk wisdom, the women have no caste, their caste is determined by the company they keep and the family they marry into. The fact that the male to female ratio is severely imbalanced plays a role—in the rural parts there are many more males than females, so sometimes it becomes essential for men to look outside their caste for marriage prospects; otherwise, the bloodline would stop, the family heritage would be lost, the caste would grow weaker. As we can observe, the caste institutions are very closely aligned with patriarchy, and rather it’s built on top of patriarchy. The caste system is a power dynamic within the already privileged male population; women are not much more than <em>instruments</em> from the perspective of the institution of patrilineal caste. In the next section we’ll explore how the sexuality of women is controlled and managed throughout their lives.</p>
<h2 id="feminine-sexuality">Feminine Sexuality</h2>
<p>The experiences of women and men in the relevant societies are very different from each other. Women are almost like a resource—an instrument—they don’t have any fixed family identity or caste identity, let alone an identity as an independent individual. Throughout various rituals of socialization performed throughout their childhood, this social reality is hammered into their psyche.</p>
<p>A prepubescent girl isn’t treated much differently than a boy of similar age. They’re free to roam around, play with the boys, jump and laugh as they wish. As soon as they have their first menstruation, a dramatic shift occurs in their social reality. This change is signified by some specific ritual, the details of which vary from region to region, but it usually involves social isolation and a special diet. Pubescent girls are advised to stay indoors, they are required to wear saree (so that the upper body is covered properly), they aren’t allowed to eat particular “hot” or “cold” food, they are advised not to jump, talk loudly, or ride bicycles, they are taught to practice moderation and self-denial, and so on. These practices, in part, make sure the girls don’t explore their sexuality on their own, and in part, prepare them for the probable harsh environment of <em>sasural</em> or the house of the in-laws.</p>
<p>Throughout various festivities and rituals—like the <em>Durgapuja</em> in West Bengal—the girls are mentally prepared and socialized to accept that they won’t be able to visit their natal home frequently once they’re married. The marriage ritual itself involves a lot of symbolism signifying the complete transfer of the bride: where the bride severs all connection with their parents and the natal home, and is completely given over to her husband and his family. This transactional nature of marriage affects the relationship between the family and daughter to a great extent: any investment into her is considered an investment just for her own sake, and moreover, for her future marital family, it is not an investment towards the natal family. That’s why girls are often given less food, less education, and overall more attention compared to the boys. Understanding this reality, the girls encourage dowry because that’s one of the ways she can gain some identity and power in <em>sasural</em>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Despite the fact that much of the population is now exposed to urbanization and modern consumerist culture, the social landscape hasn’t changed much. Only the masculine and macho parts of the urban environment have been integrated, the women’s experience remains more or less the same. As we’ve discussed, a complex interplay between caste, gender, and power maintains the status quo.</p>
<h2 id="references">References</h2>
<ul>
<li>[1] Chowdhry, Prem. “Enforcing Cultural Codes: Gender and Violence in Northern India.” Economic and Political Weekly, vol. 32, no. 19, 1997, pp. 1019–1028., <a href="https://www.jstor.org/stable/4405393">www.jstor.org/stable/4405393</a>. Accessed 18 Aug. 2020.</li>
<li>Dube, Leela. “On the Construction of Gender: Hindu Girls in Patrilineal India.” Economic and Political Weekly, vol. 23, no. 18, 1988, pp. WS11–WS19. JSTOR, <a href="https://www.jstor.org/stable/4378429">www.jstor.org/stable/4378429</a>. Accessed 18 Aug. 2020.</li>
</ul>Sumit Ghoshsumit@sumit-ghosh.comThe institutions of caste, gender, and patriarchy in India are not as unambiguous as it might seem from the perspective of an outsider; complex coordination between them permeates the lives of almost everyone from the rural and semi-rural parts of India, despite the penetration of modern consumerist culture. We’re going to explore how these institutions of power interact with each other, and in turn how they make up the social order of the aforementioned parts of India.I’m Going to Use My Real Name Everywhere2020-08-04T00:00:00+05:302020-08-04T00:00:00+05:30https://sumit-ghosh.com/posts/im-going-to-use-my-real-name-everywhere<p>In <a href="https://okrefusal.com/posts/why-did-i-create-this-blog/">this</a> post, I mentioned that I had decided to use a pseudonym online to put out my political and philosophical thoughts. The central reason for this was the fear of being <em>cancelled</em> and potentially hurting my career and employment. Recently I’ve been rethinking this and decided to abandon the pseudonym and use my real name instead, let me tell you why.</p>
<h2 id="why-im-going-to-use-my-real-name">Why I’m Going to Use My Real Name</h2>
<p>As I started rethinking the pseudonym decision, I wondered why people use anonymous accounts on the Internet. This <a href="https://www.newstatesman.com/science-tech/social-media/2019/01/how-alt-anonymous-account-became-mainstream-trend-what-is-anon">article</a> came up in the search results, where the journalist had the same curiosity, and she went out and interviewed some people who use a pseudonym on the Internet regularly. The principle reasons that came up were</p>
<ol>
<li>For awkward and socially inept teens who struggle to make friends, these alt accounts can be useful outlets; they can comfortably be themselves and be confident in their own skin.</li>
<li>For some, the reasons are political; maybe they have some extreme political views, or maybe they just don’t want their friends and family to know their political viewpoints, and inadvertently ruin the relationships. Or it might be something more nefarious, such as <a href="https://en.wikipedia.org/wiki/Sockpuppet_(Internet)">sockpuppeting</a> and <a href="https://en.wikipedia.org/wiki/Astroturfing">astroturfing</a>.</li>
<li>For people struggling with mental health, sexual issues, or issues that have some stigma attached to it, an alt account can be a good medium to discuss these issues and explore these emotions.</li>
<li>Sometimes people just want to vent, and they might not have a good enough support system in their life. For them, alt accounts can be space for venting and having someone to listen to.</li>
<li>For people who have sensitive or public-facing jobs—such as government officials and political journalists—maintaining a professional image in their public social media accounts is a necessity. They can use anonymous accounts as an indulgence, for not-so-professional things.</li>
</ol>
<p>But for most people, the motivations for using pseudonyms boil down to some mental health or personal crisis.</p>
<p>As I went through these, I realized that all of these are very valid reasons, but none of these applies to me. I knew that I had to clearly figure out my personal motivations for doing this whole thing—blogging, being online and putting out my thoughts out there—in the first place. Realizing that the above motivations don’t apply to me was definitely a start, because understanding what something is not can be one of the best ways to understand the thing itself. As I introspected a bit more, my motivations became clearer;</p>
<ul>
<li>I want to <strong>engage in discourses</strong> around political, philosophical or social issues; <strong>genuinely and authentically</strong>. I don’t want to hide behind layers of irony all the time. I’ll admit circle-jerking and trolling might be fun sometimes, but it doesn’t get us anywhere. Most of the anonymous political Twitter I’ve seen so far end up doing just that, and I don’t want to be a part of that.</li>
<li>But I can do the above as an anon also, why use my real name? The answer is <a href="https://en.wikipedia.org/wiki/Skin_in_the_Game_(book)"><strong>skin in the game</strong></a>, if I may borrow Dr. Taleb’s term. When someone puts his contrarian ideas out there using his real identity, he has his reputation—something genuine—to lose. He has his skin in the game. And that carries a certain weight, people tend to take him more seriously.</li>
<li>I want to <strong>meet interesting people</strong>, get to know them, have interesting discussions, and so on. I don’t prefer the medium of the Internet for discussions, I prefer real-life conversations a lot more. The Internet is only a medium for my thoughts to be out there so that like-minded people can find me and reach out to me. And that won’t be possible if I hide behind an anonymous face.</li>
<li>I want to <strong>make my online activities as less time-consuming as I can</strong>. Maintaining multiple personas seems to only take up more bandwidth, and eventually, it becomes a source of stress and anxiety.</li>
<li>Most importantly, I want to <strong>build a sense of personal identity</strong> through all of these, these should be a process for me as well as others to get to know myself. I’m realizing that one of the main reasons I didn’t end up writing much in this blog because I subconsciously thought it was pointless, as it wasn’t getting attributed to the real me anyway.</li>
</ul>
<p>Other than that, I mostly agree with everything Jacob Falkovich said tackling the same issue in this <a href="https://putanumonit.com/2020/07/20/real-name/">blog post</a>.</p>
<h2 id="going-forward">Going forward</h2>
<p>I’m going to keep using my eponymous website <a href="https://sumit-ghosh.com">sumit-ghosh.com</a> for my portfolio as a software developer, and for my technical articles related to programming and computers. This blog, <a href="https://okrefusal.com">okrefusal.com</a>, will be used as a catch-all space for more random musings; it’ll be more personal and <em>human</em> than the former one. I’ll also move the non-technical posts I have on <a href="https://sumit-ghosh.com">sumit-ghosh.com</a> to here.</p>
<p>I’m going to stop using the <a href="https://twitter.com/R3FS1">R3FS1</a> Twitter handle; <a href="https://twitter.com/SkullTech101">SkullTech101</a> will be the de-facto space where the full, <em>uncompartmentalized</em> me can be found. Looking forward to meet you there!</p>Sumit Ghoshsumit@sumit-ghosh.comIn this post, I mentioned that I had decided to use a pseudonym online to put out my political and philosophical thoughts. The central reason for this was the fear of being cancelled and potentially hurting my career and employment. Recently I’ve been rethinking this and decided to abandon the pseudonym and use my real name instead, let me tell you why.