Big Data Machine Learning Engineer Resume New York City, NY - Hire IT People

SUMMARY

9+ years hands - on experience in DataScience and Analytics including Big Query using, SQL,Data Collection, data warehousing, DataCleaning, Featurization, Feature Engineering DataMining, Machine learning and Statistical Analysis with large datasets of structured.
Experience in developing a web application for internal use that centralized data acquisition, using HTML/CSS, Angular JS,TypeScript, and Agile software development.
Strong experience in building CloudDataLake using AWS S3, AWS Glue, GlueDataCatalog Serverless Framework, Lambda, RDS, Aurora, EC2, SNS, SQS, IAM,ECS, CloudFormation, CloudWatch.
Experienced in BigData Ecosystem with Hadoop, HDFS, MapReduce, Pig, Hive, HBase, Impala, Sqoop, Flume, Kafka, Oozie, Spark, PySpark and Spark Streaming.
Expert in CoreJavawith in-depth knowledge of CoreJavaconcepts such as Multithreading, Synchronization, Collections and Event/Exception handling Experience in Software Development Life Cycle (SDLC) of Projects - System study, Analysis, Physical and Logical design, Resource Planning, Coding and implementing business applications
Experience in real time data from various data sources throughKafkadata pipelines and applied various transformations to normalize the data stored in HDFS Data Lake.
Theoretical foundations and practical hands-on projects related to (i) supervisedlearning(linear and logistic regression, boosted decision trees, Support Vector Machines, neural networks, NLP), (ii) unsupervisedlearning(clustering, dimensionality reduction, recommender systems), (iii) probability & statistics, experiment analysis
Experience in creating complexdatapipeline process using T-SQL scripts,SSISpackages, Alteryx workflow, PL/SQL scripts, Cloud REST APIs, Python scripts, GCP Composer, GCP dataflow.
Experience in Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark withDatabricks.
Experience with structured (MySQL, Oracle SQL, Postgre SQL) and unstructured (NoSQL) databases. Strong understanding of relational databases. Familiar with cross platforms ETL using Python/JAVASQL connector, PySpark Data Frame.
Experience with container-based deployments using Docker, working with Docker images, Docker Hub and Docker-registries and Kubernetes.
Experience in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) tools.
Experience implementing batch and real-timedatapipelines using AWS Services, S3, Lambda, DynamoDB, Redshift, EMR, kinesis
Good knowledge in Apache Hadoop ecosystem components Spark,Cassandra, HDFS, Hive,SQOOP, Airflow.
Knowledge on Google Cloud Platform (GCP) services like compute engine, cloud load balancing, cloud storage, cloud DataProc, Cloud Pub/Sub, cloud SQL, Big Query, stack driver monitoring and cloud deployment manager

PROFESSIONAL EXPERIENCE

Big Data Machine Learning Engineer

Confidential, New York City, NY

Responsibilities:

Conduct in-depth exploration of data to determine opportunities for improvement and present recommendations to supervisor and client.
Perform strategic data analysis and research in alignment with identified business needs and requirements.
Expertly extract data from multiple databases, and manipulate and explore data utilizing quantitative, statistical, and visualization tools.
Prioritize data by performing statistical analysis to determine data to be used.
Create intuitive static and interactive visualizations using routine program monitoring data for websites, annual and quarterly reports, and presentation to stakeholders.
Extract data from Big Data Hadoop Data Lake, and Excel, performing analysis, data cleansing, and sorting, merge reporting and expertly utilize Base SQL, Hive and Excel to create dashboards.
Mathematical & Statistical Modeling
Apply knowledge to select appropriate modeling techniques and effectively establish and maintain processes for validating /updating predictive models.
Design and implemented statistical models, predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, and Big Data environments.
Skillfully utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, Tensor Flow, MLLib, Python, a broad variety ofmachinelearningmethods including classifications, regressions, dimensionally reduction.
Deliver expertise working on different data formats such as JSON, XML and performmachinelearningalgorithms in R.
Application of variousmachinelearningalgorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, neural networks, deeplearning, SVM, clustering to identify Volume using scikit-learn package in python.
Python REST API implementation using FLASK forMachinelearningmodels.
Implemented different models like Logistic Regression, Random Forest and Gradient-Boost Trees to predict whether a given die will pass or fail the test.
Perform data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from the database and used ETL for data transformation.
Perform Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
Communicate the results with operations team for taking best decisions.
Collect data needs and requirements by Interacting with the other departments.

Environment: Python 2.x, R, HDFS, Hadoop 2.3, Hive, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, Spark SQL, PySpark.

Machine learning Engineer

Confidential, Austin, TX

Responsibilities:

Gathering, retrieving and organizing data and using it to reach meaningful conclusions.
Developed a system for collecting data and generating their findings into reports that improved the company.
Setting up the analytics system to provide insights.
Initially the data was stored in MongoDB. Later the data was moved to Elasticsearch.
Used Kibana to visualize the data collected from Twitter using Twitter REST APIs.
Conceptualized and created a knowledge graph database of news events extracted from tweets using Java, Virtuoso, Stanford CoreNLP, Apache Jena, RDF.
Producing and maintaining internal and client-based reports.
Creating stories with data that a non-technical team could also understand.
Worked on Descriptive, Diagnostic, Predictive and Prescriptive analytics.
Implementation of Character Recognition using Support vectormachinefor performance optimization.
Monitored the Data quality and integrity of data was maintained to ensure effective functioning of department.
Managed database design and implemented a comprehensive Star-Schema with shared dimensions.
Utilized various techniques like Histogram, bar plot, Pie-Chart, Scatter plot, Box plots to determine the condition of the data.
Building and testing hypothesis, ensuring statistical significance and building statistical models for business application.
DevelopedMachineLearningalgorithms with Spark Mlib standalone and Python.
Performed data pre-processing tasks like merging, sorting, finding outliers, missing value imputation, data normalization, making it ready for statistical analysis.
Implemented variousmachinelearningmodels such as regression, classification, Tree based and Ensemble models.
Performed model Tuning by adjusting the Hyper parameters and raised the model accuracy.
Validated different models developed applying appropriate measures such as k-Fold cross validation, AUC, ROC to identify the best performing model.
CreatedMachineLearningand statistical methods, (SVM, CRF, HMM, sequential tagging) or willingness to intensely learn.
Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI.

Environment: Python 3, R Studio, MLLib, Regression, SQL Server, Hive, Hadoop Cluster, ETL, Tableau, NumPyPandas, Matplotlib, Power BI, Scikit-Learn, ggplot2, Shiny, TensorFlow, Teradata.

Data Engineer

Confidential, Harrishburgh, PA

Responsibilities:

Experience in building and architecting multipleDatapipelines, end to end ETL and ELT process forDataingestion and transformation in GCP
Strong understanding of AWS components such as EC2 and S3
Implemented a Continuous Delivery pipeline with Docker and Git Hub
Worked with g-cloud function with Python to loadDatain to Bigquery for on arrival csv files in GCS bucket
Process and load bound and unboundDatafrom Google pub/sub topic to Bigquery using cloud Dataflow with Python.
Devised simple and complex SQL scripts to check and validate Dataflow in various applications.
PerformedDataAnalysis,DataMigration,DataCleansing, Transformation, Integration,DataImport, andDataExport through Python. good experience with ETL concepts, building ETL solutions andDatamodeling
Architected several DAGs (Directed Acyclic Graph) for automating ETL pipelines
Experience in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension)
Devised PL/SQL Stored Procedures, Functions, Triggers, Views and packages. Made use of Indexing, Aggregation and Materialized views to optimize query performance.
Developed logistic regression models (Python) to predict subscription response rate based on customers variables like past transactions, response to prior mailings, promotions, demographics, interests, and hobbies, etc.
Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, clouddataflow, Pub/suB cloud shell, GSUTIL, BQ command line utilities,DataProc, Stack driver
Implemented Apache Airflow for authoring, scheduling and monitoringDataPipelines
Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling
Worked on confluence and Jira skilled indatavisualization like Matplotlib and seaborn library
Hands on experience with bigdatatools like Hadoop, Spark, Hive
Experience implementingmachinelearningback-end pipeline with Pandas, NumPy

Environment: GCP, Big query, Gcs Bucket, G-Cloud Function, Apache Beam, Cloud Dataflow, Cloud Shell, Gsutil, Docker, Kubernetes, AWS, Apache Airflow, Python, Pandas, Matplotlib, seaborn library, text mining, NumPy, Scikit-learn, Heat maps, Bar charts, Line charts, ETL workflows, linear regression, multivariate regression, Python, Scala, Spark

Software Engineer

Confidential, Columbus, IN

Responsibilities:

Providing support to manage thesoftwareand database for all the warehouses operation and maintaining application servers, databases servers by applying patches from time to time
Involved in preparing high level design documents, coding, analyzing business and enhancing my programming skills.
Developed Python automation scripts to facilitate quality testing.
Datawas loaded intoOraclepartitions based on monthlydata.
Applied Oracle's Advanced Queuing to achieve integration at the application level, so that Java, C++, and PLSQL software components can interact using a shared messaging architecture.
Wrote Python modules to extract/load assetdatafrom the MySQL source database.
Worked with backed team to design, build and implement RESTFUL API's for various services.
Analyzed business process workflows and assisted in the development of ETL procedures for mappingdatafrom source to target systems
Gathering software requirements and communication with the client and their end users.
Systems Analysis, Design, Software Engineering
Database Design and Development in SQL Server, Oracle, Microsoft Access and MySQL.
Programming in C/C++, C# and Visual Basic Programming Languages
Website Based Solution and Development using C#/SQL Server, PHP/MySQL, and JavaScript.
Manage Software Development Teams.
Worked on Docker containers to package all the necessary applications in an image to setup initial workstation.
Moving or copying the databases, detaching and attaching, backing and restoring databases.
Involved in resolving ETL Production issues, performed its recovery steps and implemented in bug fixes.
Performance tuning of complex SQL qairueries scheduled BI jobs according to the design flow using Control-M jobs.
Worked withDataEngineersto submit SQL statements, import and exportdataand generate reports in SQL server.

Environment: Python, ETL, MySQL, SOAP, SQL, Netezza, Web Sphere, Web Services, Shell Script, Control-M

We provide IT Staff Augmentation Services!

Big Data Machine Learning Engineer Resume

New York City, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship