We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

4.00/5 (Submit Your Rating)

Boston, MA

SUMMARY

  • 10+ years of IT experience in Software Development with 5+ years’ work experience as Big Data /Hadoop Developer with good knowledge of Hadoop framework.
  • Expertise in Hadoop architecture and various components such as HDFS, YARN, High Availability, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce programming paradigm.
  • Experience with all aspects of development from initial implementation and requirement discovery, through release, enhancement and support (SDLC & Agile techniques).
  • Having 4+ years of experience in Design, Development, Data Migration, Testing, Support and Maintenance using Redshift Databases.
  • Having 4+ years of experience on Apache Hadoop technologies like Hadoop distributed file system (HDFS), Map Reduce framework, Hive, PIG, Pyspark, Sqoop, Oozie, HBase, Spark, Scala and Python.
  • 3+ years of experience in AWS cloud solution development using Lambda, SQS, SNS, Dynamo DB, Athena, S3, EMR, EC2, Redshift, Glue, and CloudFormation.
  • Experience in using Microsoft Azure SQL database, Data Lake, Azure ML, Azure data factory, Functions, Databricks and HDInsight.
  • Working experience in Big data on cloud using AWS EC2 & Microsoft Azure, and handled redshift & Dynamo databases with huge amount of data 300 TB.
  • Extensive experience in migrating on premise Hadoop platforms to cloud solutions using AWS And Azure.
  • 3+ years of experience in writing python as ETL framework and Pyspark to process huge amount of data daily.
  • Strong experience in implementing data models and loading unstructured data using HBase, Dynamo Db and Cassandra.
  • Created multiple report dashboards, visualizations and heat maps using tableau, QlikView and qliksense reporting tools.
  • Strong experience in extracting and loading data using complex business logic’s using Hive from different data sources and built the ETL pipelines to process tera bytes of data daily.
  • Experienced in transporting, and processing real time event streaming using Kafka and Spark Streaming.
  • Hands on experience with importing and exporting data from Relational databases to HDFS, Hive and HBase using Sqoop.
  • Experienced in processing real time data using kafka 0.10.1 producers and stream processors and implemented stream process using Kinesis and data landed into data lake S3.
  • Experience in implementing multitenant models for the Hadoop 2.0 Ecosystem using various big data technologies.
  • Designed and developed spark pipelines to ingest real time event - based data from Kafka and other message queue systems and processed huge data with spark batch processing into data warehouse hive.
  • Experienced in creating and analyzing Software Requirement Specifications (SRS) and Functional Specification Document (FSD).
  • Excellent working experience in Scrum / Agile framework, Iterative and Waterfall project execution methodologies.
  • Designed data models for both OLAP and OLTP applications using Erwin and used both star and snowflake schemas in the implementations.
  • Capable of organizing, coordinating and managing multiple tasks simultaneously.
  • Excellent communication and inter-personal skills, self-motivated, organized and detail-oriented, able to work well under deadlines in a changing environment and perform multiple tasks effectively and concurrently.
  • Strong analytical skills with ability to quickly understand client’s business needs. Involved in meetings to gather information and requirements from the clients.

TECHNICAL SKILLS

Hadoop: Hadoop, Spark(Pyspark),Map Reduce, HIVE, PIG, Impala SQOOP, HDFS, HBASE, Oozie, Ambari, Spark, Scala and Mongo DB

Cloud Technologies: AWS Kinesis, Lambda, EMR, EC2, SNS, SQS, Dynamo DB, Step Functions, Glue, Athena, CloudWatch, Azure Data Factory, Azure Data Lake, Functions, Azure SQL Data Warehouse, Databricks and HDInsight

DBMS: Amazon Redshift, Postgres, Oracle 9i, SQL Server, IBM DB2 And TeraData

ETL Tools: Data Stage, Talend and ABInitio

Reporting Tools: Power BI, Tableau, TIBCO Spotfire, Qlikview and Qliksense

Deployment Tools: Git, Jenkins, Terraform and CloudFormation

Programming Language: Python, Scala, PL/SQL and Java

Scripting: Unix Shell and Bash scripting

PROFESSIONAL EXPERIENCE

Confidential, Boston, MA

Senior Data Engineer

Responsibilities:

  • Build Data pipelies in Airflow in GCP for ETL related jobs using airflow operators
  • Created Spark data pipelines using GCP Data Proc
  • Used cloud shell SDK in GCP to configure services Data Prod, storage,Bigquery
  • Developed framework to generate dailyadhc reports and extract data from bigquery
  • Designed and co-ordinated with Data science team in implementing Advance analytical models in Hadoop CLuster over large datasets
  • Wrote Hive SQL scripts for creating complex tables with high performance metrics like partitioning, clustering and skewing
  • Read data from bigquery into pandas or spark dataframe for advance ETL capabilities
  • Created BigQuery Views for row levwl security or exosing data to other teams
  • Worked with google data catalogs and other google cloud API's for monitoring, query and other billing related analysis for big query usage.

Confidential, Charlotte, NC

Data Engineer

Responsibilities:

  • Written ETL jobs in using spark data pipelines to process data from different source to transform data to multiple targets.
  • Created streams using Spark and processed real time data into RDDs & data frames and created analytics using PYSPARK SQL.
  • Create Pyspark frame to bring data from DB2 to Amazon S3.
  • Optimize the Pyspark jobs to run on EMR Cluster for faster data processing
  • Designed Redshift based data delivery layer for business intelligence tools to operate directly on AWS S3.
  • Implemented kinesis data streams to read real time data and loaded into data s3 for downstream processing.
  • AWS Infrastructure setup on EC2 and S3 API implementation for accessing S3 bucket data file.
  • Designed “Data Services” to intermediate data exchange between the Data Clearinghouse and the Data Hubs.
  • Written ETL flows and MapReduce to process data from AWS S3 to dynamo DB and HBase.
  • Involved in the ETL phase of the project & Designed and analyzed the data in oracle and migrated to Redshift and Hive.
  • Created Databases and tables using Redshift and dynamo DB and written complex EMR scripts to process Tera bytes of data into AWS S3 cluster.
  • Performed real time analytics on transactional data using python to create statistical model for predictive and reverse product analysis.
  • Developed spark applications in python(PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
  • Worked on reading and writing multiple data formats like JSON,ORC,Parquet on HDFS using PySpark.
  • Involving in client meetings and explaining the views to supporting and gathering requirements.
  • Working in an agile methodology, understand the requirements of the user stories
  • Prepared High-level design documentation for approval
  • Also, data visualization software tableau, quick sight and Kibana are used as part of bringing new insights from data extracted and better representation of data.
  • Designed data models for dynamic and real-time data with intention to be used by various applications with OLAP and OLTP needs

Confidential, Herndon, VA

Cloud Support Engineer

Responsibilities:

  • Assist in incident management including problem resolution and tracking problems
  • Maintain a good knowledge of IT technologies and developments
  • Provide proactive communication to the customer base for wide scale service affecting problems
  • Provide fast value-add responses to inbound tickets from customers, acknowledging receipt and providing next steps to the customer through both written and verbal channels
  • Utilize monitoring tools to proactively identify problems with systems, applications, and networks

Confidential, Auburn Hills, MI

Big Data Engineer/Hadoop Developer

Responsibilities:

  • Full life cycle of the project from Design, Analysis, logical and physical architecture modeling,development, Implementation, testing.
  • Conferring with data scientists and other qlikstream developers to obtain information on limitations or capabilities for data processing projects
  • Designed and developed automation test scripts using Python
  • Creating Data Pipelines using Azure Data Factory.
  • Automating the jobs using Python.
  • Creating tables and loading data in Azure MySql Database
  • Creating Azure Functions, Logic Apps for Automating the Data pipelines using Blob triggers.
  • Analyze SQL scripts and design the solution to implement using Pyspark
  • Developed Spark code using Python(Pysaprk) for faster processing and testing of data.
  • Used SparkAPI to perform analytics on data in Hive
  • Optimizing and tuning Hive and spark queries using data layout techniques such as partitioning, bucketing or other advanced techniques.
  • Data Cleansing, Integration and Transformation using PIG
  • Involved in exporting and importing data from local file system and RDBMS to HDFS
  • Designing and coding the pattern for inserting data into Data lake.
  • Moving the data from On-Prem HDP clusters to Azure
  • Building, installing, upgrading or migrating petabyte size big data systems
  • Fixing Data related issues
  • Loading data to DB2 data base using Data Stage.
  • Monitoring the functioning of big data and messaging systems like Hadoop, Kafka, Kafka Mirror makers to ensure they operate at their peak performance at all times.
  • Created Hive tables, and loading and analyzing data using hive queries
  • Communicatingregularly with the business teams to ensure that any gaps between business requirementsand technicalrequirements are resolved.
  • Reading and translating data models, data querying and identifying data anomalies and provide root cause analysis.
  • Support "Qlik Sense" reporting, to gauge performance of various KPIs/facets to assist top management in decision-making.
  • Engage in project planning and delivering to commitments.
  • POC’s on new technologies(Snowflake) that are available in the market to determine the best suitable one for the Organization needs

Confidential, McLean, VA

Data Warehouse Architect - Hadoop Developer/SQL Developer

Responsibilities:

  • Set up and builtAWSinfrastructure with various services available by writing cloud formation templates(CFT) in json and yaml.
  • Developed Cloud Formation scripts to build EC2 on demand
  • With the help of IAM created roles, users and groups and attached policies to provide minimum access to the resources.
  • Updating the bucket policy with IAM role to restrict the access to user.
  • ConfiguredAWSIdentity Access Management (IAM) Group and users for improved login authentication.
  • Created topics in SNS to send notifications to subscribers as per the requirement.
  • Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
  • Moving data from Oracle to HDFS using Sqoop
  • Data profiling on critical tables from time to time to check for the abnormalities
  • Created Hive Tables, loaded transactional data from Oracle using Sqoop and Worked with highly unstructured and semi structured data.
  • Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables
  • Scripts were written for distribution of query for performance test jobs in Amazon Data Lake.
  • Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
  • Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
  • Working on CDC (Change Data Capture) tables using Spark Application to load data into Dynamic Partition Enabled Hive Tables.
  • Designed and developed automation test scripts using Python
  • Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark
  • Implemented HiveGenericUDF's to in corporate business logic into HiveQueries.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
  • Supporting data analysis projects by using Elastic MapReduce on the Confidential (AWS) cloud performed Export and import of data into s3.
  • Involved in designing the row key in Hbase to store Text and JSON as key values in Hbase table and designed row key in such a way to get/scan it in a sorted order.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Creating Hive tables and working on them using Hive QL.
  • Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
  • Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and
  • Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.

Confidential

Hadoop Analyst

Responsibilities:

  • Participated in SDLC Requirements gathering, Analysis, Design, Development and Testing of application developed using AGILE methodology.
  • Developing Managed, external and partition tables as per the requirement.
  • Ingested structured data into appropriate schemas and tables to support the rule and analytics.
  • Developing custom User Defined Functions (UDF's) in Hive to transform the large volumes of data with respect to business requirement.
  • Developing Pig Scripts, Pig UDF's and Hive Scripts, Hive UDF's to load data files.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to HDFS using shell scripting
  • Implemented scripts for loading data from UNIX file system to HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Actively participated in Object Oriented Analysis Design sessions of the Project, which is based on MVC Architecture using Spring Framework.
  • Developed the presentation layer using HTML, CSS, JSPs, BootStrap, and AngularJS.
  • Adopted J2EE design patterns like DTO, DAO, Command and Singleton.
  • Implemented Object-relation mapping in the persistence layer using hibernate framework in conjunction with spring functionality.
  • Generated POJO classes to map to the database table.
  • Configured Hibernate's second level cache using EHCache to reduce the number of hits to the configuration table data.
  • ORM tool Hibernate to represent entities and fetching strategies for optimization.
  • Implementing the transaction management in the application by applying Spring Transaction and Spring AOP methodologies.
  • Written SQL queries and stored procedures for the application to communicate with Database
  • Used Junit framework for unit testing of application.
  • Used Maven to build and deploy the application.

Confidential

Java Developer

Responsibilities:

  • Participated in gathering business requirements, analyzing the project and creating use Cases and Class Diagrams.
  • Interacted coordinated with the Design team, Business analyst and end users of the system.
  • Created sequence diagrams, collaboration diagrams, class diagrams, use cases and activity diagrams using Rational Rose for the Configuration, Cache & logging Services.
  • Implementing Tiles based framework to present the layouts to the user. Created the WebUI using Struts, JSP, Servlets and Custom tags.
  • Designed and developed Caching and Logging service using Singleton pattern, Log4j.
  • Coded different action classes in struts responsible for maintaining deployment descriptors like struts-config, ejb-jar and web.xml using XML.
  • Used JSP, JavaScript, Custom Tag libraries, Tiles and Validations provided by struts framework.
  • Wrote authentication and authorization classes and manage it in the front controller for all the users according to their entitlements.
  • Developed and deployed Session Beans and Entity Beans for database updates.
  • Implemented caching techniques, wrote POJO classes for storing data and DAO’s to retrieve the data and did other database configurations using EJB 3.0.
  • Developed stored procedures and complex packages extensively using PL/SQL and shell programs.
  • Used Struts-Validator frame-work for all front-end Validations for all the form entries.
  • Developed SOAP based Web Services for Integrating with the Enterprise Information System Tier.
  • Design and development of JAXB components for transfer objects.
  • Prepared EJB deployment descriptors using XML.
  • Involved in Configuration and Usage of Apache Log4J for logging and debugging purposes.
  • Wrote Action Classes to service the requests from the UI, populate business objects & invoke EJBs.
  • Used JAXP (DOM, XSLT), XSD for XML data generation and presentation

We'd love your feedback!