Top Talent like Lakshmi are on Pangea

Pangea, a YC company, connects companies with fractional talent. Fractional hiring allows companies to move faster and work with more specilaized talent, while giving talent more flexibilty and independence. If you are talent open to fractional work, apply here. If you’re a company looking for high-quality fractional talent, learn more here.

Lakshmi Muthukumar

Data AnalystHillsboro, OR, US
Machine Learning
Data Science
Statistical Modeling
Statistical Analysis
Numerical Analysis
Web Scrapping
Data Mining
Available for hire fromNegotiable
Full-Time Roles
Contracts
Data Scientist | Machine Learning Engineer | Data Anayst
Data scientist who is proficient in Python, SQL, and Azure ML. I also have research experience in cheminformatics and AI which helps me solve problems logically.

Projects

Deep Learning: Baby Cry Prediction

Deep Learning: Baby Cry Prediction | Github | Streamlit App - Worked with Cappella(on Pangea), an early-stage MIT-founded startup building an AI-driven baby cry translator. - Data for baby cry translator-prediction app comes from donateacry-corpus by Gabor Veres. - Performed data exploration, data preprocessing, features scaling, features engineering using pandas, numpy, librosa(audio library) packages in python. - Librosa is an audio signal library in Python, and it was used to construct various spectrograms like Mel-Frequency Cepstral Coefficient, Chroma Energy Normalized, Spectral Centroid, Spectral Contrast, Spectral Rolloff, Zero Crossing Rate along with other audio features like tempo. - Exploratory analysis of the data revealed a class imbalance, and hence techniques for imbalanced data particularly SMOTE, i.e., oversampling the minority class was used to handle the class imbalance. - Built a multi class audio classification model with CNN in Keras Tensorflow to predict baby cry audio clips to translate in corresponding categories. - Achieved an accuracy of about 93 % using the model and proposed other deep learning methodologies for further improvement of the models along with the availability of more data not just in open-source platforms. - Created an AI powered app on Streamlit with the model created using deep learning technique for the purposes of demo. - Communicated the results of data collection, and requirements by interacting with other data sources and optimal strategy for taking best decisions. - Responsible for developing system models, prediction algorithms, solutions to prescriptive analytics problems and data mining techniques for the proposed business question. Environment: Python, Pandas, Numpy, Matplotlib, Librosa, Keras, Tensoflow, Streamlit, Scikit Learn Imblearn, Github. See More

A/B Testing: Student Performance - Test Score Analysis

1. Introduction: - The performance of high school students from all over the U.S. is under analysis - The data comes from Kaggle with 1000 observations and 8 variables. - Visual analysis, descriptive statistics, Shapiro-Wilk, independent sample t-test for hypothesis testing were done 2. Recommendations: - Women do better in literature and men do better in engineering/allied - All races perform differently in all the different categories - Parents with higher degree of education have children with better performance in all categories - We can infer that students with higher household incomes have better performance in the categories - We can infer that students with higher motivation to complete the test preparation course have better performance in the categories 3. Conclusions: - Sampling bias is common in such studies - Sample size for such a geographically diverse group should be considered - Data is generic, a location specific data might help improve the analysis - In addition to the current variable, internet access or access to external help can be an important variableSee More

Supervised Machine Learning: Credit Card Approval Prediction

1. Description: - Credit Card Approval Prediction: Used machine learning to determine the features that affect the approval of customer credit card!. This dataset is available from the UCI machine learning repository. - This data file concerns credit card applications. All attribute names and values have been changed to meaningless symbols to protect confidentiality of the data. - This dataset is interesting because there is a good mix of attributes -- continuous, nominal with small numbers of values, and nominal with larger numbers of values. There are also a few missing values. 2. Machine Learning Models: (A). LOGISTIC REGRESSION (B). DECISION TREES (C). RANDOM FORESTS (D). SVM (E). STOCHASTIC GRADIENT DESCENT (F). KNN Top 2 Models for dataset: 1. Logistic Regression Hyperparameters were tuned, and these are the parameters used in the model are Kernel = ‘poly’, Degree = 3, Gamma = 'auto' Accuracy was 86% for this model 2. Random Forest Hyperparameters were tuned, and these are the parameters used in the model are criterion = ‘entropy’, max_depth = None, min_samples_split = 3 Accuracy was 85% for this model Variable Importance Plot indicates that Prior Default is the most important v 3. Conclusions: - It was found that approval status of the applicant is mostly correlated to priordefault. - Data was split into training and test set. Machine learning models were trained in training set and tested for accuracy on the test set. - Logistic Regression model does the best with an accuracy of 86% 6. Future Scope: - ML models accuracy is not high. A larger dataset with more information on the applicants will aid better prediction. - Lack of information about how the dataset was created may impact the prediction of approval status using only the generalized applicant data as predictors.See More

Unsupervised Learning: Credit Card Customer Segmentation

1. Dataset Description: - The dataset chosen for the current project is Credit Card Dataset from Kaggle. - This case requires to develop a customer segmentation to define marketing strategy. - The sample Dataset summarizes the usage behavior of about 9000 active credit card holders during the last 6 months. The file is at a customer level with 18 behavioral variables. 2. Unsupervised Machine Learning: - The data was first standardized for clustering analysis. - Evaluation and Model Selection - Visualization of Clusters - Interpretation of Clusters Models: (A). KMeans - The tuned hyperparameters for KMeans are n_clusters = 3, n_init=1000, max_iter=400, init='k-means++’ - The silhouette score Kmeans is 0.2509 (B). Agglomerative Hierarchical - The tuned hyperparameters for Agglomerative Clustering are n_clusters = 3, affinity='euclidean’, linkage="ward" - The silhouette score Agglomerative Clustering is 0.1731 (C). DBSCAN - The tuned hyperparameters for DBSCAN are eps=4, metric="euclidean", and min_samples=3 - The silhouette score DBSCAN is 0.6239 (D). Gaussian Mixture - The tuned hyperparameters for Gaussian Mixture Models are n_clusters = 3, affinity='euclidean’, linkage="ward" - The silhouette score Gaussian Mixture Models is 0.1657 Final Model Selection for dataset: - We compared the silhouette scores of all the models. - From the silhouette scores, DBSCAN is the best model but PCA reveals a different story. DBSCAN clusters are not distinct and most values are clustered as noise. - KMeans is the best model for the current dataset 5. Conclusions: - We have performed data preprocessing, feature extraction with PCA, looked at various clustering metrics (inertias, silhouette scores), experimented with various Clustering algorithms and data visualizations. - We have segmented the customers into three smaller groups: the Active Users, the Cautious Spenders, and the Average Joe. See More

Work History

I

Instructor In charge

IntellicircleMay 2017 - Dec 2019 • 2 yrs 8 mosI was teaching, grading kids from K-4 to K-12 in math and coding.
U

Freelancer

Upwork- I am freelancing on a wide range of projects from web scraping, teaching Python, data analysis to machine learning projects

Education

Texas Tech University

Professional Degree, Chemical EngineeringJun 2010 - Mar 2015

How Pangea Works

Effortlessly discover top talent

We’ve distilled the candidate search from endless hours down to just a few minutes. Using Pangea’s AI-powered search tools, you can find top fractional talent able to take on your next project. Our system looks at your company’s niche and your needs to find the perfect match faster than any traditional hiring platform.

Start working with talent today

The top talent on Pangea is ready to get started with you right now. You can message or hire a candidate right from their profile page and start assigning work as soon as they respond. And the best part? Pangea’s fractional contract structure lets you start small and ramp up as your needs change, keeping your costs manageable and your team’s capabliities flexible.

Track work and invoices in one place

Assign tasks, track progress, and complete invoices all on Pangea. We’ve combined every part of the hiring process into one platform to eliminate the miscommunication that’s unavoidable on other freelance platforms. We even send out 1099s to your contractors at the end of the year!

Talk with a Talent Expert

Members of our team are available to help you speed through the hiring process.
Available Now
Book a Call
Data Scientist | Machine Learning Engineer | Data Anayst
Data scientist who is proficient in Python, SQL, and Azure ML. I also have research experience in cheminformatics and AI which helps me solve problems logically.

Talk with a Talent Expert

Members of our team are available to help you speed through the hiring process.
Available Now
Book a Call

Top Talent like Lakshmi are on Pangea

Pangea, a YC company, connects companies with fractional talent. Fractional hiring allows companies to move faster and work with more specilaized talent, while giving talent more flexibilty and independence. If you are talent open to fractional work, apply here. If you’re a company looking for high-quality fractional talent, learn more here.

Lakshmi Muthukumar

Data AnalystHillsboro, OR, US
Machine Learning
Data Science
Statistical Modeling
Statistical Analysis
Numerical Analysis
Web Scrapping
Data Mining
Available for hire fromNegotiable
Full-Time Roles
Contracts

Projects

Deep Learning: Baby Cry Prediction

Deep Learning: Baby Cry Prediction | Github | Streamlit App - Worked with Cappella(on Pangea), an early-stage MIT-founded startup building an AI-driven baby cry translator. - Data for baby cry translator-prediction app comes from donateacry-corpus by Gabor Veres. - Performed data exploration, data preprocessing, features scaling, features engineering using pandas, numpy, librosa(audio library) packages in python. - Librosa is an audio signal library in Python, and it was used to construct various spectrograms like Mel-Frequency Cepstral Coefficient, Chroma Energy Normalized, Spectral Centroid, Spectral Contrast, Spectral Rolloff, Zero Crossing Rate along with other audio features like tempo. - Exploratory analysis of the data revealed a class imbalance, and hence techniques for imbalanced data particularly SMOTE, i.e., oversampling the minority class was used to handle the class imbalance. - Built a multi class audio classification model with CNN in Keras Tensorflow to predict baby cry audio clips to translate in corresponding categories. - Achieved an accuracy of about 93 % using the model and proposed other deep learning methodologies for further improvement of the models along with the availability of more data not just in open-source platforms. - Created an AI powered app on Streamlit with the model created using deep learning technique for the purposes of demo. - Communicated the results of data collection, and requirements by interacting with other data sources and optimal strategy for taking best decisions. - Responsible for developing system models, prediction algorithms, solutions to prescriptive analytics problems and data mining techniques for the proposed business question. Environment: Python, Pandas, Numpy, Matplotlib, Librosa, Keras, Tensoflow, Streamlit, Scikit Learn Imblearn, Github.

A/B Testing: Student Performance - Test Score Analysis

1. Introduction: - The performance of high school students from all over the U.S. is under analysis - The data comes from Kaggle with 1000 observations and 8 variables. - Visual analysis, descriptive statistics, Shapiro-Wilk, independent sample t-test for hypothesis testing were done 2. Recommendations: - Women do better in literature and men do better in engineering/allied - All races perform differently in all the different categories - Parents with higher degree of education have children with better performance in all categories - We can infer that students with higher household incomes have better performance in the categories - We can infer that students with higher motivation to complete the test preparation course have better performance in the categories 3. Conclusions: - Sampling bias is common in such studies - Sample size for such a geographically diverse group should be considered - Data is generic, a location specific data might help improve the analysis - In addition to the current variable, internet access or access to external help can be an important variable

Supervised Machine Learning: Credit Card Approval Prediction

1. Description: - Credit Card Approval Prediction: Used machine learning to determine the features that affect the approval of customer credit card!. This dataset is available from the UCI machine learning repository. - This data file concerns credit card applications. All attribute names and values have been changed to meaningless symbols to protect confidentiality of the data. - This dataset is interesting because there is a good mix of attributes -- continuous, nominal with small numbers of values, and nominal with larger numbers of values. There are also a few missing values. 2. Machine Learning Models: (A). LOGISTIC REGRESSION (B). DECISION TREES (C). RANDOM FORESTS (D). SVM (E). STOCHASTIC GRADIENT DESCENT (F). KNN Top 2 Models for dataset: 1. Logistic Regression Hyperparameters were tuned, and these are the parameters used in the model are Kernel = ‘poly’, Degree = 3, Gamma = 'auto' Accuracy was 86% for this model 2. Random Forest Hyperparameters were tuned, and these are the parameters used in the model are criterion = ‘entropy’, max_depth = None, min_samples_split = 3 Accuracy was 85% for this model Variable Importance Plot indicates that Prior Default is the most important v 3. Conclusions: - It was found that approval status of the applicant is mostly correlated to priordefault. - Data was split into training and test set. Machine learning models were trained in training set and tested for accuracy on the test set. - Logistic Regression model does the best with an accuracy of 86% 6. Future Scope: - ML models accuracy is not high. A larger dataset with more information on the applicants will aid better prediction. - Lack of information about how the dataset was created may impact the prediction of approval status using only the generalized applicant data as predictors.

Unsupervised Learning: Credit Card Customer Segmentation

1. Dataset Description: - The dataset chosen for the current project is Credit Card Dataset from Kaggle. - This case requires to develop a customer segmentation to define marketing strategy. - The sample Dataset summarizes the usage behavior of about 9000 active credit card holders during the last 6 months. The file is at a customer level with 18 behavioral variables. 2. Unsupervised Machine Learning: - The data was first standardized for clustering analysis. - Evaluation and Model Selection - Visualization of Clusters - Interpretation of Clusters Models: (A). KMeans - The tuned hyperparameters for KMeans are n_clusters = 3, n_init=1000, max_iter=400, init='k-means++’ - The silhouette score Kmeans is 0.2509 (B). Agglomerative Hierarchical - The tuned hyperparameters for Agglomerative Clustering are n_clusters = 3, affinity='euclidean’, linkage="ward" - The silhouette score Agglomerative Clustering is 0.1731 (C). DBSCAN - The tuned hyperparameters for DBSCAN are eps=4, metric="euclidean", and min_samples=3 - The silhouette score DBSCAN is 0.6239 (D). Gaussian Mixture - The tuned hyperparameters for Gaussian Mixture Models are n_clusters = 3, affinity='euclidean’, linkage="ward" - The silhouette score Gaussian Mixture Models is 0.1657 Final Model Selection for dataset: - We compared the silhouette scores of all the models. - From the silhouette scores, DBSCAN is the best model but PCA reveals a different story. DBSCAN clusters are not distinct and most values are clustered as noise. - KMeans is the best model for the current dataset 5. Conclusions: - We have performed data preprocessing, feature extraction with PCA, looked at various clustering metrics (inertias, silhouette scores), experimented with various Clustering algorithms and data visualizations. - We have segmented the customers into three smaller groups: the Active Users, the Cautious Spenders, and the Average Joe.

Work History

I

Instructor In charge

IntellicircleMay 2017 - Dec 2019 • 2 yrs 8 mosI was teaching, grading kids from K-4 to K-12 in math and coding.
U

Freelancer

Upwork- I am freelancing on a wide range of projects from web scraping, teaching Python, data analysis to machine learning projects

Education

Texas Tech University

Professional Degree, Chemical EngineeringJun 2010 - Mar 2015

How Pangea Works

Effortlessly discover top talent

We’ve distilled the candidate search from endless hours down to just a few minutes. Using Pangea’s AI-powered search tools, you can find top fractional talent able to take on your next project. Our system looks at your company’s niche and your needs to find the perfect match faster than any traditional hiring platform.

Start working with talent today

The top talent on Pangea is ready to get started with you right now. You can message or hire a candidate right from their profile page and start assigning work as soon as they respond. And the best part? Pangea’s fractional contract structure lets you start small and ramp up as your needs change, keeping your costs manageable and your team’s capabliities flexible.

Track work and invoices in one place

Assign tasks, track progress, and complete invoices all on Pangea. We’ve combined every part of the hiring process into one platform to eliminate the miscommunication that’s unavoidable on other freelance platforms. We even send out 1099s to your contractors at the end of the year!

Talk with a Talent Expert

Members of our team are available to help you speed through the hiring process.
Available Now
Book a Call
Pangea empowers fractional work across the world for marketing and design roles.