We provide IT Staff Augmentation Services!

Big Data Machine Learning Engineer Resume

2.00/5 (Submit Your Rating)

New York City, NY

SUMMARY

  • 9+ years hands - on experience in DataScience and Analytics including Big Query using, SQL,Data Collection, data warehousing, DataCleaning, Featurization, Feature Engineering DataMining, Machine learning and Statistical Analysis with large datasets of structured.
  • Experience in developing a web application for internal use that centralized data acquisition, using HTML/CSS, Angular JS,TypeScript, and Agile software development.
  • Strong experience in building CloudDataLake using AWS S3, AWS Glue, GlueDataCatalog Serverless Framework, Lambda, RDS, Aurora, EC2, SNS, SQS, IAM,ECS, CloudFormation, CloudWatch.
  • Experienced in BigData Ecosystem with Hadoop, HDFS, MapReduce, Pig, Hive, HBase, Impala, Sqoop, Flume, Kafka, Oozie, Spark, PySpark and Spark Streaming.
  • Expert in CoreJavawith in-depth knowledge of CoreJavaconcepts such as Multithreading, Synchronization, Collections and Event/Exception handling Experience in Software Development Life Cycle (SDLC) of Projects - System study, Analysis, Physical and Logical design, Resource Planning, Coding and implementing business applications
  • Experience in real time data from various data sources throughKafkadata pipelines and applied various transformations to normalize the data stored in HDFS Data Lake.
  • Theoretical foundations and practical hands-on projects related to (i) supervisedlearning(linear and logistic regression, boosted decision trees, Support Vector Machines, neural networks, NLP), (ii) unsupervisedlearning(clustering, dimensionality reduction, recommender systems), (iii) probability & statistics, experiment analysis
  • Experience in creating complexdatapipeline process using T-SQL scripts,SSISpackages, Alteryx workflow, PL/SQL scripts, Cloud REST APIs, Python scripts, GCP Composer, GCP dataflow.
  • Experience in Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark withDatabricks.
  • Experience with structured (MySQL, Oracle SQL, Postgre SQL) and unstructured (NoSQL) databases. Strong understanding of relational databases. Familiar with cross platforms ETL using Python/JAVASQL connector, PySpark Data Frame.
  • Experience with container-based deployments using Docker, working with Docker images, Docker Hub and Docker-registries and Kubernetes.
  • Experience in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) tools.
  • Experience implementing batch and real-timedatapipelines using AWS Services, S3, Lambda, DynamoDB, Redshift, EMR, kinesis
  • Good knowledge in Apache Hadoop ecosystem components Spark,Cassandra, HDFS, Hive,SQOOP, Airflow.
  • Knowledge on Google Cloud Platform (GCP) services like compute engine, cloud load balancing, cloud storage, cloud DataProc, Cloud Pub/Sub, cloud SQL, Big Query, stack driver monitoring and cloud deployment manager

PROFESSIONAL EXPERIENCE

Big Data Machine Learning Engineer

Confidential, New York City, NY

Responsibilities:

  • Conduct in-depth exploration of data to determine opportunities for improvement and present recommendations to supervisor and client.
  • Perform strategic data analysis and research in alignment with identified business needs and requirements.
  • Expertly extract data from multiple databases, and manipulate and explore data utilizing quantitative, statistical, and visualization tools.
  • Prioritize data by performing statistical analysis to determine data to be used.
  • Create intuitive static and interactive visualizations using routine program monitoring data for websites, annual and quarterly reports, and presentation to stakeholders.
  • Extract data from Big Data Hadoop Data Lake, and Excel, performing analysis, data cleansing, and sorting, merge reporting and expertly utilize Base SQL, Hive and Excel to create dashboards.
  • Mathematical & Statistical Modeling
  • Apply knowledge to select appropriate modeling techniques and effectively establish and maintain processes for validating /updating predictive models.
  • Design and implemented statistical models, predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, and Big Data environments.
  • Skillfully utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, Tensor Flow, MLLib, Python, a broad variety ofmachinelearningmethods including classifications, regressions, dimensionally reduction.
  • Deliver expertise working on different data formats such as JSON, XML and performmachinelearningalgorithms in R.
  • Application of variousmachinelearningalgorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, neural networks, deeplearning, SVM, clustering to identify Volume using scikit-learn package in python.
  • Python REST API implementation using FLASK forMachinelearningmodels.
  • Implemented different models like Logistic Regression, Random Forest and Gradient-Boost Trees to predict whether a given die will pass or fail the test.
  • Perform data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from the database and used ETL for data transformation.
  • Perform Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
  • Communicate the results with operations team for taking best decisions.
  • Collect data needs and requirements by Interacting with the other departments.

Environment: Python 2.x, R, HDFS, Hadoop 2.3, Hive, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, Spark SQL, PySpark.

Machine learning Engineer

Confidential, Austin, TX

Responsibilities:

  • Gathering, retrieving and organizing data and using it to reach meaningful conclusions.
  • Developed a system for collecting data and generating their findings into reports that improved the company.
  • Setting up the analytics system to provide insights.
  • Initially the data was stored in MongoDB. Later the data was moved to Elasticsearch.
  • Used Kibana to visualize the data collected from Twitter using Twitter REST APIs.
  • Conceptualized and created a knowledge graph database of news events extracted from tweets using Java, Virtuoso, Stanford CoreNLP, Apache Jena, RDF.
  • Producing and maintaining internal and client-based reports.
  • Creating stories with data that a non-technical team could also understand.
  • Worked on Descriptive, Diagnostic, Predictive and Prescriptive analytics.
  • Implementation of Character Recognition using Support vectormachinefor performance optimization.
  • Monitored the Data quality and integrity of data was maintained to ensure effective functioning of department.
  • Managed database design and implemented a comprehensive Star-Schema with shared dimensions.
  • Utilized various techniques like Histogram, bar plot, Pie-Chart, Scatter plot, Box plots to determine the condition of the data.
  • Building and testing hypothesis, ensuring statistical significance and building statistical models for business application.
  • DevelopedMachineLearningalgorithms with Spark Mlib standalone and Python.
  • Performed data pre-processing tasks like merging, sorting, finding outliers, missing value imputation, data normalization, making it ready for statistical analysis.
  • Implemented variousmachinelearningmodels such as regression, classification, Tree based and Ensemble models.
  • Performed model Tuning by adjusting the Hyper parameters and raised the model accuracy.
  • Validated different models developed applying appropriate measures such as k-Fold cross validation, AUC, ROC to identify the best performing model.
  • CreatedMachineLearningand statistical methods, (SVM, CRF, HMM, sequential tagging) or willingness to intensely learn.
  • Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI.

Environment: Python 3, R Studio, MLLib, Regression, SQL Server, Hive, Hadoop Cluster, ETL, Tableau, NumPyPandas, Matplotlib, Power BI, Scikit-Learn, ggplot2, Shiny, TensorFlow, Teradata.

Data Engineer

Confidential, Harrishburgh, PA

Responsibilities:

  • Experience in building and architecting multipleDatapipelines, end to end ETL and ELT process forDataingestion and transformation in GCP
  • Strong understanding of AWS components such as EC2 and S3
  • Implemented a Continuous Delivery pipeline with Docker and Git Hub
  • Worked with g-cloud function with Python to loadDatain to Bigquery for on arrival csv files in GCS bucket
  • Process and load bound and unboundDatafrom Google pub/sub topic to Bigquery using cloud Dataflow with Python.
  • Devised simple and complex SQL scripts to check and validate Dataflow in various applications.
  • PerformedDataAnalysis,DataMigration,DataCleansing, Transformation, Integration,DataImport, andDataExport through Python. good experience with ETL concepts, building ETL solutions andDatamodeling
  • Architected several DAGs (Directed Acyclic Graph) for automating ETL pipelines
  • Experience in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension)
  • Devised PL/SQL Stored Procedures, Functions, Triggers, Views and packages. Made use of Indexing, Aggregation and Materialized views to optimize query performance.
  • Developed logistic regression models (Python) to predict subscription response rate based on customers variables like past transactions, response to prior mailings, promotions, demographics, interests, and hobbies, etc.
  • Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, clouddataflow, Pub/suB cloud shell, GSUTIL, BQ command line utilities,DataProc, Stack driver
  • Implemented Apache Airflow for authoring, scheduling and monitoringDataPipelines
  • Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling
  • Worked on confluence and Jira skilled indatavisualization like Matplotlib and seaborn library
  • Hands on experience with bigdatatools like Hadoop, Spark, Hive
  • Experience implementingmachinelearningback-end pipeline with Pandas, NumPy

Environment: GCP, Big query, Gcs Bucket, G-Cloud Function, Apache Beam, Cloud Dataflow, Cloud Shell, Gsutil, Docker, Kubernetes, AWS, Apache Airflow, Python, Pandas, Matplotlib, seaborn library, text mining, NumPy, Scikit-learn, Heat maps, Bar charts, Line charts, ETL workflows, linear regression, multivariate regression, Python, Scala, Spark

Software Engineer

Confidential, Columbus, IN

Responsibilities:

  • Providing support to manage thesoftwareand database for all the warehouses operation and maintaining application servers, databases servers by applying patches from time to time
  • Involved in preparing high level design documents, coding, analyzing business and enhancing my programming skills.
  • Developed Python automation scripts to facilitate quality testing.
  • Datawas loaded intoOraclepartitions based on monthlydata.
  • Applied Oracle's Advanced Queuing to achieve integration at the application level, so that Java, C++, and PLSQL software components can interact using a shared messaging architecture.
  • Wrote Python modules to extract/load assetdatafrom the MySQL source database.
  • Worked with backed team to design, build and implement RESTFUL API's for various services.
  • Analyzed business process workflows and assisted in the development of ETL procedures for mappingdatafrom source to target systems
  • Gathering software requirements and communication with the client and their end users.
  • Systems Analysis, Design, Software Engineering
  • Database Design and Development in SQL Server, Oracle, Microsoft Access and MySQL.
  • Programming in C/C++, C# and Visual Basic Programming Languages
  • Website Based Solution and Development using C#/SQL Server, PHP/MySQL, and JavaScript.
  • Manage Software Development Teams.
  • Worked on Docker containers to package all the necessary applications in an image to setup initial workstation.
  • Moving or copying the databases, detaching and attaching, backing and restoring databases.
  • Involved in resolving ETL Production issues, performed its recovery steps and implemented in bug fixes.
  • Performance tuning of complex SQL qairueries scheduled BI jobs according to the design flow using Control-M jobs.
  • Worked withDataEngineersto submit SQL statements, import and exportdataand generate reports in SQL server.

Environment: Python, ETL, MySQL, SOAP, SQL, Netezza, Web Sphere, Web Services, Shell Script, Control-M

We'd love your feedback!