Sr. Data Engineer Resume
Rockville, MD
SUMMARY
- Overall 7+ years of experience in leveraging big data tools also certified in AWS
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Hands on experience in Test - driven development, Software Development Life Cycle (SDLC) methodologies like Agile and Scrum.
- Experienced with performing Real Time Analytics on NoSQL distributed data bases like Cassandra, HBase and MongoDB.
- Good understanding of designing attractive data visualization dashboards using Tableau.
- Develop Scala scripts, UDFs using both Data frames and RDDs in Spark for Data Aggregation, queries and writing data back into OLTP Systems.
- Create batch data by using spark with the help of ScalaAPI in developing Data Ingestionpipelines using Kafka.
- Hands on experience in designing and developing POCs in Spark to compare the performance of Spark with Hive and SQL/Oracle using Scala.
- Used Flume and Kafka to direct data from different sources to/from HDFS.
- Worked with AWS cloud and created EMR clusters with spark for analyzing raw data processing and access data from S3 buckets.
- Scripted an ETL Pipeline on Python that ingests files from AWS S3 to Redshift Table.
- Hands on experience with various file formats such as ORC, Avro, Parquet and JSON.
- Good Knowledge on SQL queries and creating database objects like stored procedures, triggers, packages and functions using SQL and PL/SQL for implementing the business techniques.
- Proficient in Normalization/De-normalization techniques in relational/dimensional database environments and have done normalizations up to 3NF.
- Good understanding of RalphKimball (Dimensional) & Bill Inman (Relational) model Methodologies.
- Experience In Working Dimensional Data modeling, StarSchema/Snowflakeschema, Fact & Dimensions Tables.
- Expertise in working with Linux/Unix and shell commands on the Terminal.
- Experience in working on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing.
- Good understanding and exposure to Pythonprogramming.
- Experience in using various IDEs Eclipse, Intellij and repositories SVN and Git.
- Exporting and importing data to and from Oracle using SQL developer for analysis.
- Good experience in using Sqoop for traditional RDBMS data pulls and worked with different distributions of Hadoop like Hortonworks and Cloudera.
- Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
TECHNICAL SKILLS
Big Data Ecosystem: MapReduce, HDFS, HIVE 2.3, HBase 1.2 Pig, Sqoop, Flume 1.8, HDP, Oozie, Zookeeper, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks
Cloud Services: Amazon AWS, EC2, Redshift, Docker, Kubernetes, AWS ECS, Terraform, AWS Cloud Formation, AWS Cloud Watch, X-ray, AWS Cloud Trail.
Relational Databases: Oracle 12c, MySQL, MS-SQL Server2016
NoSQL Databases: HBase, Hive 2.3, and MongoDB
Version Control: GIT, GitLab, SVN
Programming Languages: Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.
Software Development & Testing Life cycle: UML, Design Patterns, Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC
Operating Systems: Windows, UNIX/Linux and Mac OS.
PROFESSIONAL EXPERIENCE:
Confidential, Rockville, MD
Sr. Data Engineer
Responsibilities:
- As a Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, HBase, Hive and Cloud Architecture.
- Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, and Hive.
- Used Agile (SCRUM) methodologies for Software Development.
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
- Designed and develop end to end ETL processing from Oracle to AWS using Amazon S3, EMR, and Spark.
- Developed the code to perform Data extractions from Oracle Database and load it into AWS platform using AWS Data Pipeline.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented AWS cloud computing platform using S3, RDS, Dynamo DB, Redshift, and Python.
- Responsible in loading and transforming huge sets of structured, semi structured and unstructured data.
- Implemented business logic by writing UDFs and configuring CRON Jobs.
- Extensively involved in writing PL/SQL, stored procedures, functions and packages.
- Created logical and physical data models using Erwin and reviewed these models with business team and data architecture team.
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Developed Spark scripts by using python and bash Shell commands as per the requirement.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Responsible for translating business and data requirements into logical data models in support Enterprise data models, ODS, OLAP, OLTP and Operational data structures.
- Created SSIS packages to migrate data from heterogeneous sources such as MS Excel, Flat files and CVS files.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
- Designed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data modeling tools like Erwin.
- Developed the Star Schema/Snowflake Schema for proposed warehouse models to meet the requirements.
- Used Microsoft Windows server and authenticated client server relationship via Kerberos protocol.
- Assigned name to each of the columns using case class option in Scala.
Environment: Hive 2.3, Hadoop 3.0, HDFS, Oracle, HBase 1.2, Flume 1.8, Pig 0.17, Sqoop 1.4, Oozie 4.3, Python, PL/SQL, NoSQL, OLAP, OLTP, SSIS, MS Excel 2016, SSRS, Visio
Confidential, Hartford, CT
AWS Engineer
Responsibilities:
- Worked as aAWSEngineer for providing solutions for big data problem.
- Responsible for automating build processes towards CI/CD automation goals.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hivewith existing applications.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
- Develop predictive analytic using Apache Spark Scala APIs.
- Configured Sqoop and developed scripts to extract data from MYSQL into HDFS.
- Created tables in HBase to store variable data formats of PII data coming from different portfolios.
- Implemented best income logic using Pig scripts.
- Involve in fully automated CI/CD pipeline process through Github, Jenkins and Puppet
- Built and deployed Docker containers to improve developer workflow, increasing scalability and optimization.
- Used AWS CloudTrail for audit findings and Cloud Watch for monitoring AWS resources
- Involved in identifying job dependencies to design workflow for Oozie &YARN resource management.
- Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
- Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
- Involved in scheduling Oozie workflow to automatically update the firewall.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Worked in the BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
- Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
- Developed Nifi flows dealing with various kinds of data formats.
- Implemented test scripts to support test driven development and continuous integration.
- Responsible for managing data coming from different sources
- Worked in tuning Hive&Pig to improve performance and solved performance issues in both scripts
Environment: Agile, Hadoop 3.0, HBase 2.1, CI, CD, Sqoop, Hive 1.9, AWS, HDFS, Scala, Spark, S3, CSV
Confidential, Denver, CO
Data Analyst/Data Modeler
Responsibilities:
- Analyzed the physical data model to understand the relationship between existing tables. Cleansed the unwanted tables and columns as per the requirements as part of the duty being a DataAnalyst.
- Established and maintained comprehensive data model documentation including detailed descriptions of business entities, attributes, and data relationships.
- Designed Star and SnowflakeDataModels for EnterpriseDataWarehouse using ERStudio.
- Worked on MetadataRepository (MRM) for maintaining the definitions and mapping rules up to mark
- Trained Spotfire tool and gave guidance in creating Spotfire Visualizations to couple of colleagues
- Created DDL scripts for implementing Data Modeling changes.
- Developed data Mart for the base data in StarSchema, Snow-FlakeSchema involved in developing the datawarehouse for the database.
- Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required
- Extensively used ER Studio as the main tool for modeling along with Visio and worked on Unit Testing for three reports and created SQL Test Scripts for each report as required
- Configured & developed the triggers, workflows, validation rules & having hands on the deployment process from one sandbox to other.
- Managed Logical and PhysicalDataModels in ER Studio Repository based on the different subject area requests for integrated model.
- Created automatic field updates via workflows and triggers to satisfy internal compliance requirement of stamping certain data on a call during submission.
- Worked on MetadataRepository (MRM) for maintaining the definitions and mapping rules up to mark.
- Developed dataMart for the base data in StarSchema, Snow-FlakeSchema involved in developing the data warehouse for the database.
- Developed enhancements to MongoDB architecture to improve performance and scalability.
- Forward Engineering the Data models, Reverse Engineering on the existing Data Models and Updates the Data models.
- Performed data cleaning and data manipulation activities using NZSQL utility and analyzed and understood the architectural design of the project in a step by step process along with the data flow.
Environment: ER Studio, DDL, HTML, SQL, MRM, NZSQL
Confidential
Data Analyst
Responsibilities:
- Worked with the business analysts to understand the project specification and helped them to complete the specification.
- Worked in Data Analysis, data profiling and data governance identifying Data Sets, Source Data, Source Metadata, Data Definitions and Data Formats.
- Used MS Access, MS Excel, Pivot tables and charts, MS PowerPoint, MS Outlook, MS Communicator and User Base to perform responsibilities.
- Extracted Data using SSIS from DB2, XML, Excel and flat files perform transformations and populate the data warehouse
- Performed Teradata, SQL Queries, creating Tables, and Views by following Teradata Best Practices.
- Prepared Business Requirement Documentation and Functional Documentation.
- Primarily responsible for coordinating between project sponsor and stake holders.
- Conducted JAD sessions to allow different stakeholders such as editorials, designers, etc.
- Extensively SQL experience in querying, data extraction and data transformations.
- Experienced in developing business reports by writing complex SQL queries using views, macros, volatile and global temporary tables.
- Developed numerous reports to capture the transactional data for the business analysis.
- Collaborated with a team of Business Analysts to ascertain capture of all requirements.
- Involved in writing complex SQL queries using correlated sub queries, joins, and recursive queries.
- Designed reports in Access, Excel using advanced functions not limited to pivot tables, formulas
- Used SQL, PL/SQL to validate the Data going in to the Data warehouse
- Wrote complex SQL, PL/SQL testing scripts for Backend Testing of the data warehouse application. Expert in writing Complex SQL/PLSQL Scripts in querying Teradata and Oracle.
- Extensively tested the Business Objects report by running the SQL queries on the database by reviewing the report requirement documentation.
- Implemented the Data Cleansing using various transformations.
- Used Data Stage Director for running and monitoring performance statistics.
- Designed and implemented basic s for testing and report/ data validation.
- Ensured the compliance of the extracts to the Data Quality Center initiatives.
- Wrote multiple SQL queries to analyze the data and presented the results using Excel, Access, and Crystal reports.
- Gathered and documented the Audit trail and traceability of extracted information for data quality.
Environment: MS PowerPoint, MS Outlook, DB2, XML, SSIS, SQL, PL/SQL