Senior Data Engineer Resume Boston, MA - Hire IT People

SUMMARY

10+ years of IT experience in Software Development with 5+ years’ work experience as Big Data /Hadoop Developer with good knowledge of Hadoop framework.
Expertise in Hadoop architecture and various components such as HDFS, YARN, High Availability, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce programming paradigm.
Experience with all aspects of development from initial implementation and requirement discovery, through release, enhancement and support (SDLC & Agile techniques).
Having 4+ years of experience in Design, Development, Data Migration, Testing, Support and Maintenance using Redshift Databases.
Having 4+ years of experience on Apache Hadoop technologies like Hadoop distributed file system (HDFS), Map Reduce framework, Hive, PIG, Pyspark, Sqoop, Oozie, HBase, Spark, Scala and Python.
3+ years of experience in AWS cloud solution development using Lambda, SQS, SNS, Dynamo DB, Athena, S3, EMR, EC2, Redshift, Glue, and CloudFormation.
Experience in using Microsoft Azure SQL database, Data Lake, Azure ML, Azure data factory, Functions, Databricks and HDInsight.
Working experience in Big data on cloud using AWS EC2 & Microsoft Azure, and handled redshift & Dynamo databases with huge amount of data 300 TB.
Extensive experience in migrating on premise Hadoop platforms to cloud solutions using AWS And Azure.
3+ years of experience in writing python as ETL framework and Pyspark to process huge amount of data daily.
Strong experience in implementing data models and loading unstructured data using HBase, Dynamo Db and Cassandra.
Created multiple report dashboards, visualizations and heat maps using tableau, QlikView and qliksense reporting tools.
Strong experience in extracting and loading data using complex business logic’s using Hive from different data sources and built the ETL pipelines to process tera bytes of data daily.
Experienced in transporting, and processing real time event streaming using Kafka and Spark Streaming.
Hands on experience with importing and exporting data from Relational databases to HDFS, Hive and HBase using Sqoop.
Experienced in processing real time data using kafka 0.10.1 producers and stream processors and implemented stream process using Kinesis and data landed into data lake S3.
Experience in implementing multitenant models for the Hadoop 2.0 Ecosystem using various big data technologies.
Designed and developed spark pipelines to ingest real time event - based data from Kafka and other message queue systems and processed huge data with spark batch processing into data warehouse hive.
Experienced in creating and analyzing Software Requirement Specifications (SRS) and Functional Specification Document (FSD).
Excellent working experience in Scrum / Agile framework, Iterative and Waterfall project execution methodologies.
Designed data models for both OLAP and OLTP applications using Erwin and used both star and snowflake schemas in the implementations.
Capable of organizing, coordinating and managing multiple tasks simultaneously.
Excellent communication and inter-personal skills, self-motivated, organized and detail-oriented, able to work well under deadlines in a changing environment and perform multiple tasks effectively and concurrently.
Strong analytical skills with ability to quickly understand client’s business needs. Involved in meetings to gather information and requirements from the clients.

TECHNICAL SKILLS

Hadoop: Hadoop, Spark(Pyspark),Map Reduce, HIVE, PIG, Impala SQOOP, HDFS, HBASE, Oozie, Ambari, Spark, Scala and Mongo DB

Cloud Technologies: AWS Kinesis, Lambda, EMR, EC2, SNS, SQS, Dynamo DB, Step Functions, Glue, Athena, CloudWatch, Azure Data Factory, Azure Data Lake, Functions, Azure SQL Data Warehouse, Databricks and HDInsight

DBMS: Amazon Redshift, Postgres, Oracle 9i, SQL Server, IBM DB2 And TeraData

ETL Tools: Data Stage, Talend and ABInitio

Reporting Tools: Power BI, Tableau, TIBCO Spotfire, Qlikview and Qliksense

Deployment Tools: Git, Jenkins, Terraform and CloudFormation

Programming Language: Python, Scala, PL/SQL and Java

Scripting: Unix Shell and Bash scripting

PROFESSIONAL EXPERIENCE

Confidential, Boston, MA

Senior Data Engineer

Responsibilities:

Build Data pipelies in Airflow in GCP for ETL related jobs using airflow operators
Created Spark data pipelines using GCP Data Proc
Used cloud shell SDK in GCP to configure services Data Prod, storage,Bigquery
Developed framework to generate dailyadhc reports and extract data from bigquery
Designed and co-ordinated with Data science team in implementing Advance analytical models in Hadoop CLuster over large datasets
Wrote Hive SQL scripts for creating complex tables with high performance metrics like partitioning, clustering and skewing
Read data from bigquery into pandas or spark dataframe for advance ETL capabilities
Created BigQuery Views for row levwl security or exosing data to other teams
Worked with google data catalogs and other google cloud API's for monitoring, query and other billing related analysis for big query usage.

Confidential, Charlotte, NC

Data Engineer

Responsibilities:

Written ETL jobs in using spark data pipelines to process data from different source to transform data to multiple targets.
Created streams using Spark and processed real time data into RDDs & data frames and created analytics using PYSPARK SQL.
Create Pyspark frame to bring data from DB2 to Amazon S3.
Optimize the Pyspark jobs to run on EMR Cluster for faster data processing
Designed Redshift based data delivery layer for business intelligence tools to operate directly on AWS S3.
Implemented kinesis data streams to read real time data and loaded into data s3 for downstream processing.
AWS Infrastructure setup on EC2 and S3 API implementation for accessing S3 bucket data file.
Designed “Data Services” to intermediate data exchange between the Data Clearinghouse and the Data Hubs.
Written ETL flows and MapReduce to process data from AWS S3 to dynamo DB and HBase.
Involved in the ETL phase of the project & Designed and analyzed the data in oracle and migrated to Redshift and Hive.
Created Databases and tables using Redshift and dynamo DB and written complex EMR scripts to process Tera bytes of data into AWS S3 cluster.
Performed real time analytics on transactional data using python to create statistical model for predictive and reverse product analysis.
Developed spark applications in python(PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
Worked on reading and writing multiple data formats like JSON,ORC,Parquet on HDFS using PySpark.
Involving in client meetings and explaining the views to supporting and gathering requirements.
Working in an agile methodology, understand the requirements of the user stories
Prepared High-level design documentation for approval
Also, data visualization software tableau, quick sight and Kibana are used as part of bringing new insights from data extracted and better representation of data.
Designed data models for dynamic and real-time data with intention to be used by various applications with OLAP and OLTP needs

Confidential, Herndon, VA

Cloud Support Engineer

Responsibilities:

Assist in incident management including problem resolution and tracking problems
Maintain a good knowledge of IT technologies and developments
Provide proactive communication to the customer base for wide scale service affecting problems
Provide fast value-add responses to inbound tickets from customers, acknowledging receipt and providing next steps to the customer through both written and verbal channels
Utilize monitoring tools to proactively identify problems with systems, applications, and networks

Confidential, Auburn Hills, MI

Big Data Engineer/Hadoop Developer

Responsibilities:

Full life cycle of the project from Design, Analysis, logical and physical architecture modeling,development, Implementation, testing.
Conferring with data scientists and other qlikstream developers to obtain information on limitations or capabilities for data processing projects
Designed and developed automation test scripts using Python
Creating Data Pipelines using Azure Data Factory.
Automating the jobs using Python.
Creating tables and loading data in Azure MySql Database
Creating Azure Functions, Logic Apps for Automating the Data pipelines using Blob triggers.
Analyze SQL scripts and design the solution to implement using Pyspark
Developed Spark code using Python(Pysaprk) for faster processing and testing of data.
Used SparkAPI to perform analytics on data in Hive
Optimizing and tuning Hive and spark queries using data layout techniques such as partitioning, bucketing or other advanced techniques.
Data Cleansing, Integration and Transformation using PIG
Involved in exporting and importing data from local file system and RDBMS to HDFS
Designing and coding the pattern for inserting data into Data lake.
Moving the data from On-Prem HDP clusters to Azure
Building, installing, upgrading or migrating petabyte size big data systems
Fixing Data related issues
Loading data to DB2 data base using Data Stage.
Monitoring the functioning of big data and messaging systems like Hadoop, Kafka, Kafka Mirror makers to ensure they operate at their peak performance at all times.
Created Hive tables, and loading and analyzing data using hive queries
Communicatingregularly with the business teams to ensure that any gaps between business requirementsand technicalrequirements are resolved.
Reading and translating data models, data querying and identifying data anomalies and provide root cause analysis.
Support "Qlik Sense" reporting, to gauge performance of various KPIs/facets to assist top management in decision-making.
Engage in project planning and delivering to commitments.
POC’s on new technologies(Snowflake) that are available in the market to determine the best suitable one for the Organization needs

Confidential, McLean, VA

Data Warehouse Architect - Hadoop Developer/SQL Developer

Responsibilities:

Set up and builtAWSinfrastructure with various services available by writing cloud formation templates(CFT) in json and yaml.
Developed Cloud Formation scripts to build EC2 on demand
With the help of IAM created roles, users and groups and attached policies to provide minimum access to the resources.
Updating the bucket policy with IAM role to restrict the access to user.
ConfiguredAWSIdentity Access Management (IAM) Group and users for improved login authentication.
Created topics in SNS to send notifications to subscribers as per the requirement.
Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
Moving data from Oracle to HDFS using Sqoop
Data profiling on critical tables from time to time to check for the abnormalities
Created Hive Tables, loaded transactional data from Oracle using Sqoop and Worked with highly unstructured and semi structured data.
Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.
Created and worked Sqoop jobs with incremental load to populate Hive External tables
Scripts were written for distribution of query for performance test jobs in Amazon Data Lake.
Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
Working on CDC (Change Data Capture) tables using Spark Application to load data into Dynamic Partition Enabled Hive Tables.
Designed and developed automation test scripts using Python
Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
Analyzed the SQL scripts and designed the solution to implement using Pyspark
Implemented HiveGenericUDF's to in corporate business logic into HiveQueries.
Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
Supporting data analysis projects by using Elastic MapReduce on the Confidential (AWS) cloud performed Export and import of data into s3.
Involved in designing the row key in Hbase to store Text and JSON as key values in Hbase table and designed row key in such a way to get/scan it in a sorted order.
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
Creating Hive tables and working on them using Hive QL.
Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and
Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
Monitored workload, job performance and capacity planning using Cloudera Manager.
Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.

Confidential

Hadoop Analyst

Responsibilities:

Participated in SDLC Requirements gathering, Analysis, Design, Development and Testing of application developed using AGILE methodology.
Developing Managed, external and partition tables as per the requirement.
Ingested structured data into appropriate schemas and tables to support the rule and analytics.
Developing custom User Defined Functions (UDF's) in Hive to transform the large volumes of data with respect to business requirement.
Developing Pig Scripts, Pig UDF's and Hive Scripts, Hive UDF's to load data files.
Responsible for building scalable distributed data solutions using Hadoop.
Involved in loading data from edge node to HDFS using shell scripting
Implemented scripts for loading data from UNIX file system to HDFS.
Load and transform large sets of structured, semi structured and unstructured data.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Actively participated in Object Oriented Analysis Design sessions of the Project, which is based on MVC Architecture using Spring Framework.
Developed the presentation layer using HTML, CSS, JSPs, BootStrap, and AngularJS.
Adopted J2EE design patterns like DTO, DAO, Command and Singleton.
Implemented Object-relation mapping in the persistence layer using hibernate framework in conjunction with spring functionality.
Generated POJO classes to map to the database table.
Configured Hibernate's second level cache using EHCache to reduce the number of hits to the configuration table data.
ORM tool Hibernate to represent entities and fetching strategies for optimization.
Implementing the transaction management in the application by applying Spring Transaction and Spring AOP methodologies.
Written SQL queries and stored procedures for the application to communicate with Database
Used Junit framework for unit testing of application.
Used Maven to build and deploy the application.

Confidential

Java Developer

Responsibilities:

Participated in gathering business requirements, analyzing the project and creating use Cases and Class Diagrams.
Interacted coordinated with the Design team, Business analyst and end users of the system.
Created sequence diagrams, collaboration diagrams, class diagrams, use cases and activity diagrams using Rational Rose for the Configuration, Cache & logging Services.
Implementing Tiles based framework to present the layouts to the user. Created the WebUI using Struts, JSP, Servlets and Custom tags.
Designed and developed Caching and Logging service using Singleton pattern, Log4j.
Coded different action classes in struts responsible for maintaining deployment descriptors like struts-config, ejb-jar and web.xml using XML.
Used JSP, JavaScript, Custom Tag libraries, Tiles and Validations provided by struts framework.
Wrote authentication and authorization classes and manage it in the front controller for all the users according to their entitlements.
Developed and deployed Session Beans and Entity Beans for database updates.
Implemented caching techniques, wrote POJO classes for storing data and DAO’s to retrieve the data and did other database configurations using EJB 3.0.
Developed stored procedures and complex packages extensively using PL/SQL and shell programs.
Used Struts-Validator frame-work for all front-end Validations for all the form entries.
Developed SOAP based Web Services for Integrating with the Enterprise Information System Tier.
Design and development of JAXB components for transfer objects.
Prepared EJB deployment descriptors using XML.
Involved in Configuration and Usage of Apache Log4J for logging and debugging purposes.
Wrote Action Classes to service the requests from the UI, populate business objects & invoke EJBs.
Used JAXP (DOM, XSLT), XSD for XML data generation and presentation

We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

Boston, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship