Data Engineer Resume Hartford, CA - Hire IT People

SUMMARY

7+ years of IT experience in Software Development as a Big Data /Hadoop Developer with good knowledge and experience in Hadoop framework.
Expertise in Hadoop architecture and various components such as HDFS, YARN, High Availability, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce programming paradigm.
Experience with all aspects of development from initial implementation and requirement discovery, through release, enhancement, and support (SDLC & Agile techniques).
Experience in Design, Development, Data Migration, Testing, Support and Maintenance using Redshift Databases.
Experience on Apache Hadoop technologies like Hadoop distributed file system (HDFS), Map Reduce framework, Hive, PIG, Python, Sqoop, Oozie, HBase, Spark, Scala, and Python.
Experience in AWS cloud solution development using Lambda, SQS, SNS, Dynamo DB, Athena, S3, EMR, EC2, Redshift, Glue, and CloudFormation.
Experience in using Microsoft Azure SQL database, Data Lake, Azure ML, Azure data factory, Functions, Databricks and HDInsight.
Working experience in big data on cloud using AWS EC2 and Microsoft Azure, and handled redshift & Dynamo databases with huge amount of data.
Extensive experience in migrating on premise Hadoop platforms to cloud solutions using AWS and Azure.
Experience in writing python as ETL framework and Pyspark to process huge amount of data daily.
Strong experience in implementing data models and loading unstructured data using HBase, Dynamo Db and Cassandra.
Created multiple report dashboards, visualizations and heat maps using tableau, QlikView and qliksense reporting tools.
Strong experience in extracting and loading data using complex business logic's using Hive from different data sources and built the ETL pipelines to process tera bytes of data daily.
Experienced in transporting and processing real time event streaming using Kafka and Spark Streaming.
Hands on experience with importing and exporting data from Relational databases to HDFS, Hive and HBase using Sqoop.
Experienced in processing real time data using Kafka 0.10.1 producers and stream processors and implemented stream process using Kinesis and data landed into data lake S3.
Experience in implementing multitenant models for the Hadoop 2.0 Ecosystem using various big data technologies.
Designed and developed spark pipelines to ingest real time event - based data from Kafka and other message queue systems and processed huge data with spark batch processing into data warehouse hive.
Experienced in creating and analyzing Software Requirement Specifications (SRS) and Functional Specification Document (FSD).
Designed data models for both OLAP and OLTP applications using Erwin and used both star and snowflake schemas in the implementations.
Capable of organizing, coordinating, and managing multiple tasks simultaneously.

TECHNICAL SKILLS

Programming: Python, R, SQL, HTML, CSS

Databases: SQL Server, MySQL, NoSQL, Hive, Hadoop, Redshift

Python: NLTK, spaCy, matplotlib, NumPy, Pandas, Scikit-Learn

Tools: Git, Docker, Flask, DVC, Keras, Tensorflow, PyTorch

Cloud: AWS (S3, EC2, Redshift, Lambda, EMR), Azure (Synapse Analytics, Azure SQL, ADF), GCP

Visualization: Tableau, Power BI, Sisense, Excel

Core Competencies: Supervised and Unsupervised ML, SVM, DNN, Text Analytics, MX Net, Big Data, NLP

PROFESSIONAL EXPERIENCE

Confidential | Hartford, CA

Data Engineer

Responsibilities:

Extensively worked in all phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver data science solutions.
Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation.
Supporting Continuous storage in AWS using Elastic Block Storage, S3, and Glacier. Created Volumes and configured Snapshots for EC2 instances.
Worked as a Data Engineer to review business requirement and compose source to target data mapping documents.
Installed and configured Hive and written Hive UDFs and Used MapReduce for unit testing.
Participated in JAD meetings to gather the requirements and understand the End Users System.
Used SDLC Methodology of Data Warehouse development using Kanbanize.
Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
Created table structure for data marts in netezza
Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.
Experience in building multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP.
Configured Hive Meta store with MySQL, which stores the metadata for Hive tables.
Created Use Case Diagrams using UML to define the functional requirements of the application.
Worked on configuring and managing disaster recovery and backup on Cassandra Data.
Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
Experience in designing, developing, and deploying projects in GCP suite including GCP Suite such as Big Query, Data Flow, Data proc, Google Cloud Storage, Composer, and Looker etc.
Created jobs and transformation in Pentaho Data Integration to generate reports and transfer data from HBase to RDBMS.
Designed the HBase schemes based on the requirements and HBase data migration and validation.
Created automated pipelines in AWS Code Pipeline to deploy Docker containers in AWS ECS using services.
Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization, and user report generation.
Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases into EDW.
Worked on Data modeling, Advanced SQL with Columnar Databases using AWS.
Worked with NoSQL database HBase in getting real time data analytics.
Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
Worked on configuring and managing disaster recovery and backup on Cassandra Data.
Generated various presentable reports and documentation using report designer and pinned reports in Erwin.

Environment: Hadoop, Agile, Hive, Netezza, PL/SQL, HBase, GCP, AWS, NoSQL, Oozie 5, MongoDB, PL/SQL, SSRS, SSIS, OLTP, OLAP, Puppet

Confidential | Jacksonville, FL

Data Engineer

Responsibilities:

Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, and testing.
Worked on Setting up and built AWS infrastructure with various services available by writing cloud formation templates (CFT) in Json and Yaml.
Developed Cloud Formation scripts to build EC2 on demand.
With the help of IAM created roles, users and groups and attached policies to provide minimum access to the resources.
Updating the bucket policy with IAM role to restrict the access to user and configured AWS Identity Access Management (IAM) Group and users for improved login authentication.
Created topics in SNS to send notifications to subscribers as per the requirement.
Moving data from Oracle to HDFS using Sqoop.
Created Hive Tables, loaded transactional data from Oracle using Sqoop and worked with highly unstructured and semi structured data.
Developed MapReduce (YARN) jobs for cleaning, accessing, and validating the data.
Created and worked Sqoop jobs with incremental load to populate Hive External tables
Scripts were written for distribution of query for performance test jobs in Amazon Data Lake.
Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
Apache Hadoop installation and configuration of multiple nodes on AWS EC2 system.
Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
Working on CDC tables using Spark Application to load data into Dynamic Partition Enabled Hive Tables.
Designed and developed automation test scripts using Python.
Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
Analyzed the SQL scripts and designed the solution to implement using Pyspark.
Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
Supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud performed Export and import of data into s3.
Involved in designing the row key in Hbase to store Text and JSON as key values in Hbase table and designed row key in such a way to get/scan it in a sorted order.
Creating Hive tables and working on them using Hive QL.
Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
Monitored workload, job performance and capacity planning using Cloudera Manager.

Environment: AWS, Hadoop, Hive, Yarn, HBase, SSRS, SSIS, Oracle Database 11g, Oracle BI tools, Tableau, MS-Excel, Python, Naive Bayes, SVM, K- means, ANN, Regression, MS Access, SQL Server Management Studio.

Confidential | San Diego, CA

Data Engineer

Responsibilities:

Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
Designed of Cloud architectures for customers looking to migrate or develop new PaaS, IaaS, or hybrid solutions utilizing Amazon Web Services (AWS).
Designed, build, configured, test, install software, manage and support all aspects and components of the application development environments in AWS.
Utilized AWS Cloud formation to create new AWS environments following best practices in VPC / Subnet design.
Analyzed the business, technical, functional, performance and infrastructure requirements needed to access and process large amounts of data.
Coordinated with the Dev, DBA, QA, and IT Operations environments to ensure there are no resource conflicts.
Worked within and across agile teams to design, develop, test, implement, and support technical solutions across a full stack of development tools and technologies, and tracking all stories on JIRA.
Responsible for the development and maintenance of processes and associated scripts/tools for automated build, testing, and deployment of the products to various developments.
Manage Production Server infrastructure environment, collaborated with development team to troubleshoot and resolve issues, deliver product release with frequent deployment with zero downtime deployment.
Extensively involved in infrastructure as code, execution plans, resource graph and change automation using Terraform. Managed AWS infrastructure as code using Terraform.
Created Terraform scripts for EC2 instances, Elastic Load balancers and S3 buckets.
Managed different infrastructure resources, like physical machines, VMs and even Docker containers using Terraform It supports different Cloud service providers like AWS.
Built Jenkins jobs to create AWS infrastructure from GitHub repos containing Terraform code.
Configure ELK stack in conjunction with AWS and using Log Stash to output data to AWS S3.
Involved in AWS EC2 based automation through Terraform, Ansible, Python, and Bash Scripts. Adopted new features as they were released by Amazon, including ELB & EBS.
Experience in Virtualization technologies and worked with containerizing applications.
Automated deployment of application using deployment tool (Ansible).

Environment: AWS, PaaS, IaaS, JSON, EC2, Python, Pandas, Regression, Classification, CNN, RNN, Random Forest, TensorFlow, Keras, Seaborn, NumPy, SVM, Preprocessing, SQL, AWS Sage maker, AWS S3.

Confidential

Data Analyst

Responsibilities:

Exported the analyzed data to the Relational databases using Sqoop for performing visualization and generating reports for the Business Intelligence team.
Collaborated with the business to define requirements and recommend optimized solutions. Ability to quickly understand complex business processes and associated data sets.
Consulted with internal and external stakeholders to identify specific needs within customerapplication modules and document requirements for data, reports, analysis, metadata, training, service levels, data quality, and performance and troubleshooting.
Experience extracting data from MySQL into HDFS using Sqoop and developed Simple to complex Map Reduce jobs.
Responsible for writing complicated SQL queries with a good understanding of transactional databases.
Assisted reporting teams in developing Tableau visualizations and dashboards using Tableau Desktop.
Analyzed the data by performing Hive queries and running Pig Scripts to know user behavior and creating partitioned tables in Hive as part of my job.
Administered and supported distribution of Horton works.
Wrote Korn shell, Bash shell, Pearl scripts to automate most Database maintenance tasks.
Worked on installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
Monitoring the running Map-Reduce programs on the cluster.
Responsible for loading data from UNIX file systems to HDFS.
Through PHP, I created documents and execute software designs that may involve complicated workflows or multiple product areas
An alternate UNIX/Oracle-based system required bug fixes, change requests, and tuning. My position was responsible for all requests of this system. Implementation, testing, and documentation were performed for this system.
Consult with project managers, business analysts, and development teams on application development and business plans
Installed and configured Hive and Created Hive UDFs.
Involved in creating Hive Tables, loading with data, and Writing Hive queries which will invoke and run Map Reduce jobs in the backend.
Implemented the workflows using the Apache Oozie framework to automate tasks.
Developed scripts and automated data management from end to end and sync up between the clusters.
Designed, developed, tested, and deployed Power BI scripts and performed detailed analytics.
Performed DAX queries and functions in Power BI.

Environment: s: Apache Hadoop, Java, Bash, ETL, Map Reduce, Hive, Pig, Horton works, Deployment tools, Data tax, Flat files, Oracle 11g/10g, MySQL, Window NT, UNIX, Sqoop, Oozie, Tableau.

We provide IT Staff Augmentation Services!

Data Engineer Resume

Hartford, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship