Counting Sand, the podcast that tackles the hard problem of how to make meaning of all the data available today. Introducing the themes at the heart of big data, high performance, and computer science, the show highlights the most cutting-edge applications. Whether discussing the best designs for a complex data system or the social implications of bringing a diverse skillset to data science, each new episode will provide research-backed perspectives on today’s hardest problems.
Podcast hosts
No host has claimed this podcast yet, if you are the host you can verify ownership by claiming this podcast
Keywords
© 2022 Angelo Kastroulis
Counting Sand
Read more
Reviews
No reviews yet
Podcast information
- Amount of episodes
- 22
- Subscribers
- 0
- Verified
- No
- Website
- Explicit content
- No
- Episode type
- episodic
- Podcast link
- https://podvine.com/link/..
- Last upload date
- August 9, 2022
- Last fetch date
- June 4, 2023 1:58 AM
- Upload range
- MONTHLY
- Author
- Angelo Kastroulis
- Copyright
- 2022 Angelo Kastroulis
susbcribers
- Bonus: Season 2 RecapIn a time crunch? Check out the time stamps below: [00:45] - Moore's Law, where do we go from here? [03:00] - How do we improve data system efficiency? [10:30] - Purpose-built systems (FPGAs) [11:13] - Insights on (FPGAs) [13:32] - Event Streaming [17:50] - Data storage [18:34] - Google’s approach to data storage [19:00] - Downtime [21:06] - Serves impact on environment and solutions to optimize [23:00] - Improving data systems, machine learning, artificial intelligence [24:06] - How do we regulate AI? [26:10] - Benefits of simulations through machine learning [28:38] - The impact computer science has on astrophysics [31:09] - How do we defy Moore's Law, the future of quantum computing Our Team: Host: Angelo Kastroulis Executive Producer: Náture Kastroulis Producer: Albert Perrotta Communications Strategist: Albert Perrotta Audio Engineer: Ryan Thompson Music: All Things Grow by Oliver Worth0 comments0
- The End of Moore's Law Part 2The last time we had Manos on the program, we talked about Moore's Law coming to an end. It's important to note that we can't rely on just sheer computing power doubling to be able to meet our ever-increasing demand for data. We must find new and exciting ways to collect and compute large amounts of data. In this episode of Counting Sand, we will dive deep into what does a database actually do? What is at the core of a data system? Most importantly, how can we use new and exciting techniques to free up the CPU's load by algorithmic trickery. In a time crunch? Check out the time stamps below: [00:53] - Guest Intro [01:30] - Intro to data systems [03:00] - Hardware types [05:00] - Why is it important to choose the right format [10:15] - What is column storage, and what the benefits [16:30] - Injecting the CPU, The hierarchy of memory [20:00] - Why not just duplicate data [22:55] - Acid properties Notable references: Relational Memory: Native In-Memory Accesses on Rows and Columns Our Team: Host: Angelo Kastroulis Executive Producer: Náture Kastroulis Producer: Albert Perrotta Communications Strategist: Albert Perrotta Audio Engineer: Ryan Thompson Music: All Things Grow by Oliver Worth0 comments0
- Counting Sand Jul 5 · 35m Dynamo: The Research Paper that Changed the WorldThe cycle between research and application is often too long and can take decades to complete. It is often asked what bit of research or technology is the most important? Before we can answer that question, I think it's important to take a step back and share the story of why we believe The Dynamo Paper is so essential to our modern world and how we encountered it. Citations: DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., ... & Vogels, W. (2007). Dynamo: Amazon’s highly available key-value store. ACM SIGOPS operating systems review, 41 (6), 205-220.Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., & Lewin, D. (1997, May). Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing (pp. 654-663).Lamport, L. (2019). Time, clocks, and the ordering of events in a distributed system. In Concurrency: the Works of Leslie Lamport (pp. 179-196).Merkle, R. C. (1987). A digital signature based on conventional encryption. In Proceedings of the USENIX Secur. Symp (pp. 369-378). Our Team: Host: Angelo Kastroulis Executive Producer: Náture Kastroulis Producer: Albert Perrotta Communications Strategist: Albert Perrotta Audio Engineer: Ryan Thompson Music: All Things Grow by Oliver Worth0 comments0
- The Promise of AI: Opportunities and ObstaclesThis show often discusses artificial intelligence and ideas to consider as technology progresses. We have discussed the deep tech of how it works and its implications on privacy. In this episode, we'll talk about the complex and controversial topic of AI policy and speak about some of the things we should be worried about regarding its future. In a time crunch? Check out the time stamps below: [01:15] - Guest Intro [03:38] - Western technology leadership [04:50] - Regulating AI [11:00] - The promise of self-driving cars [13:05] - AI data audition [17:50] - Neural networks to train AI [19:00] - Reducing mathematical knowledge, AI bottleneck [20:35] - What is in the way of the promise of AI [24:20] - Eric Daimler book [27:50] - The uses of trained AI models [29:30] - Health care industry data usage [33:25] - AI to speed up research [33:50] - What is rural AI? Guest Links: https://www.linkedin.com/in/ericdaimler/ https://conexus.com/ Our Team: Host: Angelo Kastroulis Executive Producer: Náture Kastroulis Producer: Albert Perrotta Communications Strategist: Albert Perrotta Audio Engineer: Ryan Thompson Music: All Things Grow by Oliver Worth0 comments0
- Energy, Edge Computing, and Data CentersWhat if there was a way to reduce the amount of energy consumed and produced from servers around the world. Would these new methods positively or negatively impact the environmental footprint of today’s big data ecosystems? In a time crunch? Check out the time stamps below: [02:15] - Research Paper [05:55] - Power consumption of data centers and methods to save energy [08:50] - Server cooling methods [12:00] - Energy production from data transportation [13:55] - The impact of location and climate through venting and cooling computers. [15:38] - Edge devices and cloud computing [20:47] - Cost and energy optimization [21:45] - Machine Learning + A.I. productive maintenance [24:45] - Automobile processing unit, big data Our Team: Host: Angelo Kastroulis Executive Producer: Náture Kastroulis Producer: Albert Perrotta Communications Strategist: Albert Perrotta Audio Engineer: Ryan Thompson Music: All Things Grow by Oliver Worth0 comments0
- Cutting-Edge Data Systems: Machine LearningOver the last couple of years, Harvard Data Systems Lab has been focused on cutting-edge research and applications of complex data systems, focusing on such areas as artificial intelligence and machine learning pipelines. In this episode of Counting Sand, Angelo and Stratos dive deep into what they have learned and what’s next in these fields. In a time crunch? Check out the time stamps below: [01:00] - What’s new at Harvard Data Systems Lab? [08:20] - What are examples of general data structure applications? [14:13] - How do we decrease the time spent from research to application? [20:23] - What are the benefits of machine learning? [22:15] - What are some helpful tips when writing a thesis? [25:00] - How important is the creative process when writing a research paper? Helpful links: Harvard Data Systems Lab: http://daslab.seas.harvard.edu/ Harvard Data Systems Lab Twitter: https://twitter.com/HarvardDASlab Our Team: Host: Angelo Kastroulis Executive Producer: Náture Kastroulis Producer: Albert Perrotta Communications Strategist: Albert Perrotta Audio Engineer: Ryan Thompson Music: All Things Grow by Oliver Worth0 comments0
- Kafka Event Streaming Part 1Is Kafka a one-size-fits-all solution? Or does this event sourcing software have an inherent set of strengths? Join Angelo and Kafka guru Anna McDonald as they share use cases and swap stories about how Kafka has radically changed the field of computer science. In a time crunch? Check out the time stamps below: [00:54] - How did Kafka change the world? [04:40] - What is so great about big data technology? [07:00] - Outbox pattern 101 [10:45] - clinical decision support use case [13:05] - Should I build it or buy it? [17:05] - Is Kafka a one-size-fits-all for businesses? [21:55] - Kafka tuning 101 [25:53] - A.I. for Kafka tuning Helpful links: https://www.confluent.io/ https://www.youtube.com/channel/UC37UjjtsxpZWS_0QGPKEHdA Our Team: Host: Angelo Kastroulis Executive Producer: Náture Kastroulis Producer: Albert Perrotta Communications Strategist: Albert Perrotta Audio Engineer: Ryan Thompson Music: All Things Grow by Oliver Worth0 comments0
- Counting Sand Apr 12 · 27m Simulating Biological Systems Part 2This episode touches on computer simulations, machine learning, and GPU's. How do these aspects of computer science relate and differ? Andy and John dive deep into how they push the boundaries of what is possible and practical in modern medicine by simulating biological systems. Our Team: Host: Angelo Kastroulis Executive Producer: Náture Kastroulis Producer: Albert Perrotta ; Communications Strategist: Albert Perrotta; Audio Engineer: Ryan Thompson Music: All Things Grow by Oliver Worth0 comments0
- Simulating Biological Systems Part 1The episode starts by asking the question, what if we could use computer science to shorten the amount of time it takes to discover new medications? Angelo then shares, "If we meditate on that for just a second, our minds might wander over into the world of machine learning and artificial intelligence, where we can imagine a world where these complicated neural networks or other types of AI are trying to discover a new kind of chemical compound. We might even think about the far future things like quantum because it has application in chemistry because chemistry can be thought of as an optimization problem. Or we could do something like a simulation. What if we could simulate the chemical structures of the world, or we could even simulate the body. We could conceivably introduce new kinds of compounds to the body and see how it reacts." Our Team: Host: Angelo Kastroulis Executive Producer: Náture Kastroulis Producer: Albert Perrotta ; Communications Strategist: Albert Perrotta; Audio Engineer: Ryan Thompson Music: All Things Grow by Oliver Worth0 comments0
- Galaxy EvolutionThe episode touches on the wondrous journey a galaxy undergoes as it evolves through its life cycle. Angelo starts off the episode by asking the question, what's an early-type galaxy? Paolo Bonfini explains that although you may think that early-type galaxies would be galaxies early in their evolution, they're not, they're galaxies a little later. They're the ultimate evolution of two galaxies coming together. Based on the topics touched on in Paolo's paper he then explains the role that supermassive black holes play in galaxy evolution. Paolo explains, "thanks to the recent development in gravitational-wave astronomy, which opened a completely new window of exploration because it's not based on electromagnetic waves, but on gravitational waves, which are a completely different thing. We are now able to explore black holes in more detail and we're able to study when supermassive black holes merged to create a bigger one." Relating to the idea of bringing new technology forward, Angelo asks has any computer science techniques assisted you to be able to model this or put it together? Paolo explains, "there are a lot of computations involved in this process. People have in mind the romantic view of the astronomer who just looks through the scope of the telescope and notes things down on a piece of paper, but modern astronomy is completely digitalized. And recently it has been even automated by a lot of procedures that they track and scan the sky to create huge catalogs. Even the images themselves, they are captured on digital devices, like, the same as they appear in the phone, basically, the same technology, but just on a more refined scale. And the first process for which you will need a computer is to combine exposures. So you cannot expose a telescope on a specific direction in the sky for a very long time, for several reasons. The summary is that, in order to take an image of some patch in the sky, you will have to take multiple images and then combine them. Now the modern telescopes, they are extremely accurate. So when you combine them, you need to align the stars to a sub pixel resolution. That means that you have to find the center of the star and itself be positioned within a single individual pixel. And when you combine images, you have to align them by with the precision of, let's say, a third of a pixel, which sounds impossible because you're like, how can you do that? But, there are some techniques that allow you to do that. And of course you need a lot of computational power for that. It can take several minutes to do this even a half an hour, let's say, to combine and produce the image that you see on famous websites, like the Hubble. I mean, this is just the first step. You mentioned a thing you need to actually extract, in my case for the study I was doing, in order to assess the lack of stars at the center of a given galaxy you actually have to measure it. So what you have to do is, you have to trace the light profile, starting from the outskirts of the galaxy going gradually towards the center. In this way, you can draw a light curve if you want. It's not exact, it's more like a light profile. So you have some intensity at the edge of the galaxy, which would be low intensity because the light is very diffused and all the center it grows, grows, grows. And at some point you will see doesn't grow as much. That's where you meet the depleted core, but you also need to quantify this because you want to actually extract the information about the amount of depleted mass, like comments that you would expect it to be versus how many you actually measure. So, you have to fit the light profile. And this is done by, okay. In my case, I've been doing this with some kind of basic statistical technique, which is the chi squared fitting. So you have our model and you just fit the model to the observation and once you have the model, you can project only the other path towards the center and you compare it with the actual model that you fit. And from the difference between the two, you have the amount of stars that are missing. So you need to explore a lot of parameters and therefore you need to have this thing automated via computer technology. There is no chance you can get this information doing it by hand." Referencing the famous space observatory, Hubble, Pablo explains what it was like to work with such a brilliant piece of machinery. He shares, "it's really amazing because the Hubble telescope was launched in the nineties and just to give you an idea is roughly the size of a bus. There is a replica of it you can visit at, I think it's the Aerospace Museum in Washington, so if you're curious. The main mirror is 2.3 meters in diameter, just to give you an idea, the larger the diameter, the higher the resolution you can achieve. On Earth, there are bigger telescopes. The biggest telescope we have on Earth is currently 10 meters. It's on the Canary Islands. On Earth you have the atmosphere on top of you and this makes everything flicker a little bit because you know, there is air moving, and these big masses of atmosphere move and this shifts the path of the light and this causes the images to be more confused. If you are instead outside the atmosphere, you don't have that problem and you really achieve the limiting resolution of your instrument. So the Hubble Space Telescope is particularly famous because of its resolution. It doesn't have a large collective area, it’s only two meters, let's say, so it doesn't collect a lot of light per second. So it doesn't have, let's say, the same contrast as ground-based telescopes, but it has extremely high resolution. So when you open an image and you're saying, okay, I want to look at this galaxy and I will work on this, which is at the center of the field of view because you pointed there. But, at the edges of it, you see a lot of tiny objects and if you zoom in you can see the structure. Maybe you see a lot of spiral galaxies around the merging objects in the background. And it's not at the center of your research. You're looking at the big galaxy at the center that you're studying. But, you know, it's like a small pleasure, small candy that you have for the eye. You're looking at these things around and you are like, well, man, this is incredible. There are so many things in the universe and I'm here focusing on these big galaxies at the center, but whatever else is happening in the background, and this is really the, I think it's the most impressive thing." Angelo concludes the episode by discussing the ups and downs of crafting a research paper. Paolo touches on the rollercoaster of emotions one undergoes due to the sheer volume of work that needs to be done. to the most rewarding aspect of writing such a paper. He explains, "you know that you are at the forefront of this research, and I think this is when the reward comes when you're actually presenting and you see the people being curious and asking you directly at the conference, “What is this?” “How did you get there?” “It's very interesting. Let's work together.” “This is an idea to make it even better” and so on." Our Guest - Thank you!: Paolo Bonfini - https://www.linkedin.com/in/paolo-bonfini-phd-085a6a179/ Paolo's Paper: Connecting traces of galaxy evolution: the missing core mass-morphological fine structure relation Our Team: Host: Angelo Kastroulis Executive Producer: Náture Kastroulis Producer: Albert Perrotta ; Communications Strategist: Albert Perrotta; Audio Engineer: Ryan Thompson Music: All Things Grow by Oliver Worth0 comments0
- The End of Moore's Law - What's Next?Angelo begins this episode with the predictions of Moore’s Law. In the early years, systems were restricted based on the CPUs ability to keep up. As the CPUs continued to advance, the bottlenecks ended up around data movement. Data movement of information from disc to memory and memory to cache become the big bottlenecks. Then of course, disks got faster and eventually you'd have so much RAM on a machine that it was just memory movement inside of RAM. Eventually, we believe the bottleneck will return to the CPU. Quantum computing is on the rise—that, we believe, is a game changer for Moore’s Law. Because we're no longer talking about conventional computer chips and transistors, instead we're talking about something completely different. Additionally, as machine learning and artificial intelligence systems make advancements, or like Angelo’s thesis using AI to tune data systems, the advancement of speed and acceleration will be impactful in order of magnitude from traditional systems. Talent and staffing will also change as we adapt to the future. Angelo admires Google’s practices of hiring ability over experience because the problems we face tomorrow are different than today. The key thing is to be able to independently make progress because there isn't much room for babysitting. It's too hard to predict where the next fire will be. Angelo explains further why he hires ability over experience every single time, because it is true, someone who has ability, someone who's brilliant and has the hunger to learn new things can be programmed like a stem cell. They can just inject themselves into whatever problem they might have. Angelo transitions into his own personal story and his quest for fulfillment and happiness. He introduces a personal story of a boy who was dying during the Nazi occupied island in Chios, Greece. A doctor took pity on this boy and secretly nursed him to health. We later learn that this boy is Angelo’s father. Angelo shares, “My father grew up in a world much different than mine. His siblings related stories of famine and suffering, but he never ever spoke of those things. What he chose to relate were accounts of human triumph, perseverance, hope, aspiration. The sea was his salvation, carrying him from Chios as a sailor, eventually to the United States.” So, what is our true potential? Intellectual achievements can be ignored or forgotten. But to be a successful family person, a husband, a father, a human, Angelo needed to be something more, something enduring. Education builds the qualities of perseverance, hard work, and accomplishment. There is no doubt you'll accomplish many things, but think about what it is that you're really trying to do. You see, building technical solutions isn't just about doing interesting stuff. Ultimately we're building these things for a reason. We're building technology. For example, if you're doing a healthcare application, it's going to touch somebody's life. That's the point of this breakthrough, right? You want to increase throughput, for example, in decision support, something Angelo spends a lot of time on. We want to say, increase throughput, build a system that can compute faster and bigger sets of data. Why are we doing that? Just because of the challenge of the data? No, we want to find out if a clinical intervention is working so that we can feed that information forward to those making the guidelines. You see, that's the real reason behind doing this. The great resignation has shown us that people care more about what it is they're doing and why they're doing it than just simply being interesting work. We owe it to our family to use our gifts, talents, and opportunities to the best of our ability, but to use them on something that matters. Angelo is really excited that we're going to have interesting conversations around things like the universe, data centers, energy and how they work. There's a reason the hard problem exists. Don't fixate on the fact that it's a problem. Although there is joy in having a problem and solving it. We're trying something a little bit new this season and we would love to hear which kinds of episodes you like most. Do you like interviews or do you like some of the educational discussional episodes? We're going to start a YouTube channel to help deep dive on topics like LSM Trees or RocksDB which are better served with diagrams than with just voice. Seeing the math for yourself or seeing the way that they operate for yourself on video is much more helpful. We're going to have supplementary content, bonus material that you can find on our YouTube channel, and we'll also have some bonus podcast episodes. We look forward to your feedback. Tell us what you like about the show, which topics you prefer, and what you wish we would dive a little deeper on. And we'll really try to do that. Citations Gordon Moore, Co-Founder of Intel Heisenberg, Uncertainty Principle Powell, James. (2008). The Quantum Limit to Moore's Law. Proceedings of the IEEE. 96. 1247 - 1248. 10.1109/JPROC.2008.925411. Merritt, Rick. (2013). Moore’s Law Dead by 2022, Expert Says of EETimes Atomic Hire (2019) Further Reading Moore’s Law Ending Work and Culture at Google Google Strategy to Hire About the Host Angelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group , a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation. Host: Angelo Kastroulis Executive Producer: Náture Kastroulis Producer: Albert Perrotta; Communications Strategist: Albert Perrotta; Audio Engineer: Ryan Thompson Music: All Things Grow by Oliver Worth0 comments0
- Bonus: Season 1 RecapAngelo begins this episode with reflections on history and what brought us to the AI Winter. Why do we need a balance between research and practice? You don’t want to rediscover what has already been discovered or settle for something that could be better if you took the time to research a bit more. In episode 4 we meet Angelo’s friend Andy Lee who talks about computer science predicting our biological age. Andy actually met Greg Fahy who talked about longevity. The study focused on injecting the thymus gland with a growth hormone that produced regeneration effects. The effects were measured through the epigenetic clock known as DNA methylation. In Episode 6, Jim Shalaby talks with Angelo about how COVID-19 changed healthcare forever. Patients don’t have to wait in waiting rooms, they don’t have to find transportation to get there, and the patient has access to the clinicians. The hard problems associated with explainability in artificial neural networks, we talked about in Episode 8. Angelo’s friend Nikos explained to us about five classic problems, one of which includes data privacy. Another big issue is developing a machine learning system to create adversarial attacks on the existing system. In episode 7, Angelo’s friend Manos shared how complicated it is for people to invoke their right to have their data removed from a system. Typically those systems have to schedule deletions to remove the data through tombstones and a process called compacting. What is on the horizon and what should we be paying attention to? We are going to run against barriers of technology. For instance, Moore's law is coming to an end. What do we do about that? What is happening in the short-term and how do we get past this barrier to the next? And then how do we blow away all those barriers with moonshots like quantum computing? Finally, wrapping up our first season, Angelo wants to reflect on gratitude. Gratitude for you our listeners. Thank you so much for joining us on this journey. We really want to hear about your thoughts. The show is evolving just as the world is and we want to make sure that we're covering topics that you're interested in. We would love for you to follow, rate, and review the show on your favorite podcast platform so that others can find us too. Thank you so much for listening. Our Guests - Thank you!: Nikos Myrtakis on LinkedIn Manos Athanassoulis on LinkedIn and Boston University Jim Shalaby on Twitter and LinkedIn Andy Lee on Twitter and LinkedIn About the Host Angelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group , a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation. Host: Angelo Kastroulis Executive Producer: Kerri Patterson; Producer: Albert Perrotta; Communications Strategist: Albert Perrotta; Audio Engineer: Ryan Thompson Music: All Things Grow by Oliver Worth0 comments0
- Machine Learning: Your Right to ExplainabilityHow do we make the next generation of machine learning models that are explainable? How do you start finding new kinds of models that might be explainable? Where do you even start thinking about that process from a research perspective? Nikos begins with a discussion on how we make decisions in general. In the scientific world, we mostly reason through statistical or cause-and-effect type scenarios. We can predict outcomes and train our models to produce the results we traditionally expect. He then discusses other early pioneers in this work, for example, back in the 70s, a rules engine was developed to help clinicians make diagnoses. It turns out that humans are very complex and hard to codify. Dr. Charles Forgy wrote his thesis on the Rete algorithm which is what modern-day rules-based engines stem from. After the AI winter period, there was the introduction of neural networks that would encode the rules. This became an issue for explainability on why the rule was created. The neural networks create a mathematical weighted data model evaluated against the outcome. Without the ability to open up the network to determine why some data was weighted higher than another, has been the challenge in explaining the results we see. There is also a concern from the European Union General Data Protection Regulation (GDPR) where a human has the right to obtain meaningful information about the logic involved, commonly interpreted as the right to an explanation. We want to look at explainability through two factors: a local point of view and a global point of view. The global objective is to extract a general summary that is representative of some specific data set. So we explain the whole model and not just local decisions. The local objective is to explain a simple prediction as a single individual observation in the data. But you have a decision according to a neural network or a classifier or a regression algorithm, so the objective is to explain just a single observation. There are five problems that present themselves in explainability: Instability, Transparency, Adversarial Attacks, Privacy, and Analyst Perspective. For Instability, we look at heat maps as they are very sensitive to hyperparameters, meaning the way that we tuned that network. How we adjusted the sensitivity then impacts the interpretation. Transparency becomes more difficult the more accurate machine learning is. We call that transparency because machine learning models, neural networks, are black boxes with very high dimensionality. But what's interesting is that we can say that their prediction accuracy makes explainability inversely proportional to that. An Adversarial Attacks example is to imagine that interpretability might enable people, or programs to manipulate the system. So if one knows that for instance, having three credit cards can increase his chance of getting a loan then they can game the system by increasing their chance of getting the loan without really increasing the probability of repaying the loan. Privacy can impact your access to the original data especially in complex systems where boundaries can exist between other companies. You might not have the ability to access original data. Lastly, the Analyst Perspective. When a human gets involved to explain the system, important questions include, where to start first and how ensuring the interpretation aligns with how the model actually behaved. There are some systems by which the ML has multi-use and the human is trying to understand the perspective of use for the result given. These are some specific ways we have found that create the complexity and challenges in explainability with machine learning models. We continue to learn and adjust based on those learnings. This is a very interesting and important topic that we will continue to explore. Citations Dr. Charles Forgy (1979), On The Efficient Implementation of Production Systems, Carnegie Mellon University, ProQuest Dissertations Publishing, 1979, 7919143 Nadia Burkart, Marco F. Huber (2020) A Survey on the Explainability of Supervised Machine Learning, arXiv:2011.07876 (cs) Further Reading https://openaccess.thecvf.com/content_CVPR_2019/papers/Pope_Explainability_Methods_for_Graph_Convolutional_Neural_Networks_CVPR_2019_paper.pdf https://towardsdatascience.com/explainable-deep-neural-networks-2f40b89d4d6f Nikos' Papers: https://www.mdpi.com/2079-9292/8/8/832/htm https://link.springer.com/article/10.1007/s11423-020-09858-2 https://arxiv.org/pdf/2011.07876.pdf https://arxiv.org/pdf/2110.09467.pdf Host: Angelo Kastroulis Executive Producer: Kerri Patterson ; Producer: Leslie Jennings Rowley ; Communications Strategist: Albert Perrotta; Audio Engineer: Ryan Thompson Music: All Things Grow by Oliver Worth0 comments0
- Counting Sand Dec 28 · 32m The Boundaries of Personal DataAngelo and Manos' connection began in the 265 Course at Harvard University on Big Data Systems. This course inspired Angelo's thesis. The two discuss Manos' papers and how the future of Big Data is on the boundaries of Moore's Law. If you think about LSM trees (Log-Structured Merge Trees) and compacting data, what is considered acceptable deletion when users ask for their data to be removed? Is it when the data is removed from the identifying user that is good enough? In the analysis of Big Data Systems, considerations are always towards performance. An extensive delete sequence will cause a significant disruption in the system. Most people would address the completion of current execution cycles, perhaps during non-peak hours, and flag the no longer valid data. Maybe it could be that your data starts to become dirty, then what? How do you solve issues like privacy and the request for the "Right to be forgotten" or the "Right to erase"? Manos speaks about the papers he has written, which you can read in the links below. He addresses the delete question and boundaries with privacy in mind. Performance is a crucial factor, and looking at the issue holistically is just as important as encryption when protecting privacy. Mano's Research Papers https://dl.acm.org/doi/10.1145/3318464.3389757 https://disc-projects.bu.edu/lethe/ https://blogs.bu.edu/mathan/2020/06/29/lets-talk-about-deletes/ Further Reading CS265: Big Data Systems - Spring 2020 Manos Athanassoulis homepage California Consumer Privacy Act - BCLP California Consumer Protection Act Information General Data Protection Regulation (GDPR) – Final text neatly arranged Fast 21 Chen Hao0 comments0
- How did COVID-19 change the way doctors make decisions?Angelo begins this episode with a few questions about the changes caused by COVID-19, specifically around the patient data gathering, such as blood pressure. With telemedicine practice, how reliable is the data, who is legally responsible for the accuracy of the data gathered, and how exactly do clinical decision support (CDS) tools adjust with this new change in a traditional clinician workflow? Angelo explores more on the topic of IoT devices and the data brought into medical decisions. Again, how accurate is the data from these IoT devices, such as Fitbit scales, that a clinician can diagnose and treat from? Jim brings up some of the challenges that came with telemedicine such as workflow within a clinic. If the clinician seeing a patient wants the dietitian to speak with the patient, it is more of a challenge to coordinate than being within a few feet of each other. The other challenge relates to security policy and considerations patients need to agree to regard their personal privacy. To get into a virtual visit with a clinician, a patient has to follow the security protocol that provides a barrier for some elderly and disabled patients. Lastly, the challenge of all this data a patient could be collecting in their IoT devices is, how do you move that data into the EHR or in some format a CDS tool could ingest? With the use of CDS, machine learning, and AI, the future is ripe for opportunity. Further Reading What is CDS - Health Gov IT ResearchGate Publication on IoT in Health Care Privacy-Preserving Single Decision Tree Jim Shalaby on Twitter and LinkedIn0 comments0
- Rosenblatt's Perceptron: What Can Neural Networks Do For Us?In any discussion of artificial intelligence and machine learning today, artificial neural networks are bound to come up. What are artificial neural networks, how have they developed, and what are they poised to do in the future? Host Angelo Kastroulis dives into the history, compares them to biological systems that they are meant to mimic, and talks about how hard problems like this one need to be handled carefully. Angelo begins with a discussion of how biological neural networks help make our brain a powerful computer of complexity. He then talks about how artificial neural networks recruit the same structures and connections to create artificial intelligence. To understand what we mean by artificial intelligence, Angelo explains how the Turing Test works and how Turing’s work forms a foundation for modern AI. He then discusses other early pioneers in this work, namely Frank Rosenblatt, who worked on models that could learn or “perceptrons.” Angelo then relates the history of how this work was criticized by Marvin Minsky and Seymour Papert and how mistakes in their own work put the potential advances of artificial neural networks back by about two decades. Using image recognition as a case study, Angelo ends the episode by talking about about various approaches’ benefits and drawbacks to illustrate what we can do with artificial neural networks today. Citations Hebb, D.O. (1949). The organization of behavior: A neuropsychological theory. New York: Wiley. Minsky, M. (1954.) Theory of neural-analog reinforcement systems and its application to the brain-model problem. Doctoral dissertation. Princeton: Princeton University. Minsky, M. and Papert, S. (1969). Perceptrons: An introduction to computational geometry. Cambridge: MIT Press. Rosenblatt, F. (1957). "The perceptron: A perceiving and recognizing automaton.”Buffalo: Cornell Aeronautical Laboratory, Inc. (Accessible at https://blogs.umass.edu/brain-wars/files/2016/03/rosenblatt-1957.pdf ) Rosenblatt, F. (1962). Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. Washington, D.C.: Spartan Books_._ Turing, A. (1950, October). "Computing machinery and intelligence," Mind, LIX: 236, pp. 433–460. https://doi.org/10.1093/mind/LIX.236.433 Further Reading Warren McCollough and the McCollough-Pitts Neuron Church-Turing Thesis Turing Test XOR or Exclusive or Host: Angelo Kastroulis Executive Producer: Kerri Patterson ; Producer: Leslie Jennings Rowley ; Communications Strategist: Albert Perrotta; Audio Engineer: Ryan Thompson Music: All Things Grow by Oliver Worth © 2021, Carrrera Group0 comments0
- Can Computer Science Help Us Live Longer?Can computer science help us live longer? It is a complex question that requires an understanding of our anticipated lifespan and how the health of our body at this moment compares to the average person our age, what some would call calculating our biological age. What computer science stands to help with is understanding how better to pinpoint that biological age and what factors may play into holding off the aging process. Guest Andy Lee, of NeuroInitiative and Vincere Biosciences, speaks with host Angelo Kastroulis about this hard problem of predicting our biological age and potentially reversing it. Angelo begins with a discussion on some of the classical age models that have previously looked at the question of determining one’s biological age. He talks about classic papers by Horvath and Fahy that have changed the way scientists think about aging. He introduced the idea of epigenetics, or the study of the changes in living things that are caused by the medication of the way genes are expressed rather that the more classical mode of altering the genetic code itself. With his Guest Andy Lee, founding CTO of NeuroInitiative and COO of Vincere Biosciences, Angelo dives deeper into DNA methylation signatures and the patterns that we can look at to begin to determine someone’s biological age. Andy describes how computer science and neural networks are modernizing these determinations and what that means for improving our longevity. The pair note the challenges posed by the sheer volume of genetic data and what advances in data science can make in our ability to push this area forward, including therapeutics for diseases such as Parkinson’s. They talk about how computer science is allowing us to have transformative information brought to us so that then we can intervene and act on it. About this Episode’s Guest Andy Lee is Co-Founder, Director, and CTO of NeuroInitiative, where he is co-inventor on multiple granted and pending patents surrounding the SEED simulation platform, as well as COO at Vincere Biosciences, Inc., a Cambridge, MA, company developing disease-modifying therapies for Parkinson's disease. Previously, Andy was VP of Engineering at Black Knight through F500 acquisition, spin-out, and IPO. He has led teams of over 100 members and continues to actively code to create new data-driven solutions. You can find out more on Twitter (@Andy_D_Lee) and LinkedIn. Citations Fahy, GM, Brooke, RT, Watson, JP, et al. Reversal of epigenetic aging and immunosenescent trends in humans. Aging Cell. 2019; 18:e13028. https://doi.org/10.1111/acel.13028 Horvath S. (2013). DNA methylation age of human tissues and cell types. Genome biology, 14(10), R115. https://doi.org/10.1186/gb-2013-14-10-r115 Johnson, A.A., Shokhirev, M., and Shoshitaishvili, B. (August 2019). Revamping the Evolutionary Theories of Aging. Ageing Research Reviews. 55. Doi: 10.1016/j.arr.2019.100947. The Matt Walker Podcast. https://sleepdiplomat.com/podcast Further Reading Handbook of Epigenetics, 2nd Edition NeuroInitiative Vincere Biosciences Andy Lee on Twitter and LinkedIn About the Host Angelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group , a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation. Host: Angelo Kastroulis Executive Producer: Kerri Patterson ; Producer: Leslie Jennings Rowley ; Audio Engineer: Ryan Thompson; Communications Strategist: Albert Perrotta Music: All Things Grow by Oliver Worth © 2021, Carrrera Group0 comments0
- How Can Computer Science Improve Life?On the most recent episode, host Angelo Kastroulis made a case computer science as a potential force for good. In this continuation of the theme, he talks about why excellence is a value to strive toward and how it differs from perfection, how simplifying the question can lead to a more valuable answer, and how questions about personalized medicine point to the potential for quantum to make big improvements to life. Angelo begins with a revelation that excellence is value that he holds dear. He distinguishes “quality”—which he defines as some standard that you're measuring yourself or others against to try to compare to similar things—and “excellence,” which means being outstanding or extremely good when compared with peers. He cautions against striving for perfection, as the benefit rarely exceeds the cost of the pursuit of perfection. This spurs musings about the limits to continuous improvement and how to balance costs and quality. He illustrates this through the example of parsing JSON and rest servers and asks how, instead of micro-optimizing a solution we might go about eliminating serialization completely. Ultimately, he posits that the single most important factor that delineates a mediocre developer from a phenomenal developer is the ability of the latter to step away from the decades-old tendency to meet requirements and instead find ways to rethink the problem space, eliminating limitations at the onset rather than mitigating later. Relatedly, he contends that attainable smaller goals are far more valuable than ones that are very, very lofty and unreachable because we can achieve those small goals. Angelo recounts part of a conversation he had with Kerri Patterson, Chief Strategy Officer at Carrera, on the importance of solving really hard problems and how they can impact our lives. She spoke about how most healthcare data systems are set up based on insurance rules, not patient outcomes and needs. She suggests incorporating quantum computing into clinical decision support and, ultimately, personalized medicine. This is taste of conversations that will appear in more detail later in the series. Angelo concludes this episode with a brief discussion of what machine learning is, how its two main categories—unsupervised and supervised learning—differ, and how these concepts will fuel much of the content in the upcoming episodes of Counting Sand. About the Host Angelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group , a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation. Citations Bruce, V., and Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77, 305-327. doi: 10.1111/j.2044-8295.1986.tb02199.x Foer, J. (2012, February). Feats of Memory Anyone Can Do. TED2012. Retrieved September 16, 2021, from https://www.ted.com/talks/joshua_foer_feats_of_memory_anyone_can_do Siegler, M.G. (2010, August 4). Eric Schmidt: Every 2 Days We Create As Much Information As We Did Up To 2003. Retrieved September 16, 2021, from https://techcrunch.com/2010/08/04/schmidt-data/ SINTEF. (2013, May 22). Big Data, for better or worse: 90% of data generated over last two years. Retrieved September 16, 2021, from https://sciencedaily.com/releases/2013/05/130522085217.htm Further Reading Urs Hölzle on Infrastructure for the Long Term More on Face Perception Hippocampus Parietal Lobe Method of Loci Host: Angelo Kastroulis Executive Producer: Kerri Patterson ; Producer: Leslie Jennings Rowley ; Communications Strategist: Albert Perrotta Music: All Things Grow by Oliver Worth © 2021, Carrrera Group0 comments0
- Can Computer Science Make Life Better?Can computer science improve our lives? Most people who work in the field would like to think so. Host Angelo Kastroulis, CEO of the Carrera Group, considers the ways that computer science is poised to make our lives better but also introduces caveats such as bias in machine learning. Along the way he sets up future episodes’ themes—including predictive analytics, simulation, and health decision support systems—that he will dive into with more technical detail. Acknowledging that technology is neither a panacea nor a tool without downsides, Angelo starts with a review of some research on the psychological and sociological effects of social media. Beyond social media, he also questions the predictive ability of big data and introduces the idea of bias in machine learning. He does this through recalling a chance encounter with an old friend and fellow computer scientist, Andy Lee. Andy is the chief technology officer and founder of NeuroInitiative, a company that uses advanced simulation techniques to try to create new drug compounds, as well as chief operating officer of Vincere Biosciences, a company that takes these drug compounds all the way to human trials and hopefully to the market. Andy talks about how, since bias is unavoidable, we should find a way to make this weakness the strength of the model. In considering this, Angelo defines key concepts such as a model’s features, what accuracy means, and why it is important not to conflate correlation with causation. He shares the important axiom "All machine learning models are bad and some are less bad than others" and exhorts listeners to “Never lie with stats.” He ends by suggesting a few actionable ways that computer science can help our lives become better, setting up themes—including predictive analytics, simulation, and health decision support systems—that he will dive into with more technical detail in future episodes of Counting Sand. About the Host Angelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group , a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation. Citations Bruce, V., and Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77, 305-327. doi: 10.1111/j.2044-8295.1986.tb02199.x Cameron, S. (2018, November 12). Shark Attacks, ice Creams, and the Randomised Trial. Retireived Septmber 16, 2021, from https://the-gist.org/2018/11/shark-attacks-ice-creams-and-the-randomised-trial/ Data Never Seeps Infographic. (n.d.). Domo.Com. Retrieved September 16, 2021, from https://www.domo.com/learn/infographic/data-never-sleeps-8 McLean Hospital. (2021, February 9). Here’s How Social Media Affects Your Mental Health. Retrieved September 16, 2021, Https://Www.Mcleanhospital.Org/Essential/It-or-Not-Social-Medias-Affecting-Your-Mental-Health. https://www.mcleanhospital.org/essential/it-or-not-social-medias-affecting-your-mental-health World Happiness 2019 Chapter 2. (n.d). https://Worldhappiness.Report/Ed/2019/Changing-World-Happiness/. Retrieved September 16, 2021, from https://worldhappiness.report/ed/2019/changing-world-happiness/ Further Reading Is Social Media Bad For You: The Evidence and the Unknowns World Happiness Report 2021 Person Perception 25 Years after Bruce and Young (1986) Host: Angelo Kastroulis Executive Producer: Kerri Patterson ; Producer: Leslie Jennings Rowley ; Audio Engineer: Mert Çetinkaya; Communications Strategist: Albert Perrotta Music: All Things Grow by Oliver Worth © 2021, Carrrera Group0 comments0
- Counting Sand Oct 5 · 37m Inspired by Archimedes...Counting SandHow much sand would it take to fill the universe? And what does this 2,000-year-old question have to do with a podcast on today’s big data challenges? In this kick-off episode of the Counting Sand podcast, host Angelo Kastroulis, CEO of Carrera Group , explains how an early research paper by Archimedes of Syracuse has much in common with his own approach to today’s big questions in data science and how the paper provides not only a metaphor for how we can meld research and practice in tackling today’s big problems but also the inspiration for the perfect podcast name. In order to explain the origin of the name of this podcast, Angelo starts with a little history on Archimedes, as both a practical designer and also a scientist interested in the theoretical underpinnings of mathematical principles. Angelo then talks about some important research by Archimedes but begins by explaining what a research paper is, what the history of research papers is, and why anyone undertakes writing one. He then spends time talking about Archimedes’ paper that attempts to spell out how many grains of sand would be needed to fill the universe. Of course, to answer this, Archimedes needed to approximate the size of the universe and, in order to do that, he had to develop a new number system. Angelo—who himself has both a Greek and entrepreneurial heritage—begins to draw parallels to Archimedes and his approach to the sand problem and his own approach to understanding and addressing big problems today. He talks about his journey to find the balance of the theoretical and practical, just as Archimedes did, applying a rigorous methodology, dealing with disappointment, and exercising patience. Angelo shares his first operating axiom: “When the solution isn’t readily apparent, be patient, keep researching; the solution will present itself.” In his work as a data scientist and technologist best known for his high-performance computing and Health IT experience, Angelo uses this process time and again. In this episode he gives examples from his own research career and the applications he has developed. Ultimately he shares his axiom #2: “If you find yourself doing too much theory, do more application and it will make your theory better, If you find yourself doing too much application, do more theory and it will make your application better.” As Angelo says, Counting Sand will be a bit different than other podcasts. We will talk about some big problems and both discuss the theory behind potential solutions and see how they can be applied to tackle real problems. We are excited to bring listeners along for the ride. Citations Bourne, S. (2004, Deecembeer 6). A Conversation with Bruce Lindsay. A conversation with Bruce Lindsay – ACM Queue. Retrieved October 4, 2021, from https://queue.acm.org/detail.cfm?id=1036486. Heath, T.G. (2020). The Sand-Reckoner of Archimedes (Vol. 1). Library of Alexandria. Kastroulis, A. (2019). Towards Learned Access Path Selection: Using Artificial Intelligence to Determine the Decision Boundary of Scan vs Index Probes in Data Systems (Doctoral dissertation, Harvard University) Further Reading On Archimedes’ Sand Reckoner Angelo Kastroulis’ Harvard master’s thesis The Harvard Data Systems Lab “Publish or Perish” About the Host Angelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group , a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation. Host: Angelo Kastroulis Executive Producer: Kerri Patterson ; Producer: Leslie Jennings Rowley ; Communications Strategist: Albert Perrotta Music: All Things Grow by Oliver Worth © 2021, Carrera Group0 comments0
- Introducing Counting SandHow many grains of sand are there in the universe? To answer that hard question, Archimedes had to invent an entire number system and only then could he attempt the computation. Today’s data challenges make that look easy. On the COUNTING SAND podcast, host Angelo Kastroulis tackles the hard problem of how to make meaning of all the data available today. Every few episodes, Angelo will introduce a new theme at the heart of big data and IT infrastructure and highlight the most cutting-edge applications. Whether discussing the best designs for a complex data system or the social implications of bringing a diverse skill set to data science, each new episode will provide research-backed perspectives on today’s hardest problems. If you are pushing boundaries on methodologies that solve complex problems and want to stay up to date on the latest industry trends in Artificial Intelligence, Systems Thinking, and Big Data in general, our discussions will give you insight that you need now. Starting October 5, 2021. Follow us now on your favorite podcasting platform or find us on CountingSandShow.com About the Host Angelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group, a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation. Host: Angelo Kastroulis Executive Producer: Kerri Patterson; Producer: Leslie Jennings Rowley ; Communications Strategist: Albert Perrotta Music: All Things Grow by Oliver Worth © 2021, Carrrera Group0 comments0
✋ Yay! You have heard it all
Podcast hosts
No host has claimed this podcast yet, if you are the host you can verify ownership by claiming this podcast
Keywords
© 2022 Angelo Kastroulis