Sr Data Engineer Resume
Newport Beach, CA
SUMMARY
- Having 7+ years of IT experience in Analysis, design, development, implementation, maintenance, and support with experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirement.
- Strong experience in Hadoop distributed file system (HDFS), Impala, Sqoop, Kafka, Hive, Spark, Hue, MapReduce framework, SSIS, SSRS, Zookeeper and Pig.
- Experience on Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.
- Well versed withBig data on AWS cloud services i.e. EC2, S3, Glue, Anthena, DynamoDB and RedShift
- Experience in migrating other databases to Snowflake.
- Expertise in Client Scripting language and server side scripting languages like HTML, JavaScript,
- Experience in data analysis using Hive, Pig Latin, Hbase and custom Map Reduce programs in Java.
- Extensive experience in working with Informatica Power Center, SSIS, SSAS.
- Experienced in processing large datasets with Spark using Python.
- Solid experience on Spark, Scala, Hbase and Kafka.
- Experienced in writing SQL Queries, Stored procedures, functions, packages, tables, views, triggers using relational database like Oracle, MySQL, and MS SQL server.
- Experienced with Docker and Kuberneteson multiple cloud providers, from helping developers build and containerize their application (CI/CD) to deploying either on public or private cloud.
- Expertise in JAVA/J2EE, Oracle, My - SQL technologies. Good exposure to plan and execute all the phases of SDLC.
- Expertise in preparing Interactive Data Visualization's using Tableau Software from different sources.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in Cassandra cluster.
- Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations
- Performed Hadoop backup Strategy to take the backup of hive, HDFS, HBase, Oozie etc.
- Experienced with version control systems like Git, GitHub, to keep the versions and configurations of the code organized.
- Spark for ETL follower, Data bricks Enthusiast, cloud Adoption and Data Engineering enthusiast on open source Community.
- Experience in working with number of public and private cloud platforms like Microsoft Azure.
- Experience in analyzing data using Python, R, SQL, Microsoft Excel, Hive, PySpark, for Data Mining, Data Cleansing, Data Munging and Machine Learning.
- Hands-on experience in handling database issues and connections with SQL and NoSQL databases like MongoDB, by installing and configuring various packages in python.
- Skilled in designing and implementing ETL Architecture for cost effective and efficient environment.
- Created AWS Lambda, EC2 instances provisioning on AWS environment and implemented security groups, administered Amazon VPC's.
- Extensively worked on HDFS, HIVE, Oozie and Java.
- Well versed with Agile with SCRUM, Waterfall Model
- Extensive Shell/Python scripting experience for Scheduling and Process Automation.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark2.0, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
Languages: Java, Python, Jruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++
No SQL Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB
Development Methodology: Agile, waterfall
Development /Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
Cloud Technologies: AWS, Azure, Snowflake, GCP, Databricks
PROFESSIONAL EXPERIENCE
Confidential - Newport Beach, CA
Sr DATA ENGINEER
Responsibilities:
- Designed and mechanized Custom-constructed input connectors utilizing Spark, Sqoop and Oozie to ingest and break down informational data from RDBMS to Azure Data lake.
- Developed Informatica mappings, sessions, and workflows to load transformed data into EDW from various source systems such as SQL Server, and Flat Files.
- Used Azure Data Factory, SQL API and MongoDB API and integrated data from MongoDB, MS SQL, and cloud (Blob, Azure SQL DB).
- Experience in Configure, Design, Implement and monitorKafkaCluster and connectors.
- Broad involvement in working with SQL, with profound knowledge on T-SQL (MS SQL Server).
- Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, T-SQL, and SQL Server usingPython.
- Created scripts to readCSV, json and parquet filesfrom S3 buckets inPythonand load intoAWS S3, DynamoDB and Snowflake.
- Creating Spark clusters and configuring high concurrency clusters using azure Databricks to speed up the preparation of high quality data.
- Develop solutions to leverage ETL tools and identify opportunities for process improvements using Informatica and Python
- Defined and deployed monitoring, metrics, and logging systems on AWS.
- Authoring Python (PySpark) Scripts for custom UDF’s for Row/ Column manipulations, merges, aggregations, stacking, data labelling and for all Cleaning and conforming tasks.
- Worked on Integration testing, Big Data Integration & Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods.
- Have extensive experience in creating pipeline jobs, schedule triggers using Azure data factory.
- Played a crucial role in Migrating the legacy applications into Google Cloud Platform (GCP) and Responsible for Deploying Artifacts into the GCP platform. Worked on Stack Driver Monitoring in GCP to check and monitor alerts for applications that run on the Google Cloud Platform.
- Develop Nifi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka broker
- Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and Cassandra
- Developed a general-purpose Change Data Capture (CDC) process based on audit table for a standard incremental ETL process.
- Design and developed services to persist and read data from Hadoop, HDFS, Hive and writing java based MapReduce batch jobs using Hortonworks Hadoop Data Platform.
- Mapped the data types between the source and target tables. Reached out to the Data Architecture team for any difference or issues.
- Responsible for creating user profile and other unstructured data storage using Java MongoDB
- Designed and developed services to persist and read data from Hadoop, HDFS, Hive and writing Java based Map Reduce batch jobs using Hortonworks Hadoop Data Platform
- Created Azure SQL database, performed monitoring and restoring of Azure SQL database. Performed migration of Microsoft SQL server to Azure SQL database.
- Developed various automated scripts for DI (Data Ingestion) and DL (Data Loading) using Java map reduce.
- Created new Mappings and modified existing Mapping /Workflows (Target - XML and DB), processed defect fixes, tested, validated data.
- Designed, developed, and implemented complex SSIS packages, asynchronous ETL processing, Ad hoc reporting, and SSRS report server, and data mining in SSAS.
Environment: Python, SSIS, SSRS, SSAS, PySpark, Git, Azure, Java, Databricks, Snowflake, Informatica, Hive, S3, Kafka, GCP, SQL Server, JavaScript, Shell Scripting, HBase, Cassandra.
Confidential - Pittsburgh, PA
Sr. DATA ENGINEER ANALYST
Responsibilities:
- Built a mechanism for automatically moving the existing proprietary binary format data files to HDFS using a service called Ingestion service.
- Performed Data transformations in HIVE and used partitions, buckets for performance improvements.
- Experience in moving data between DCP and Azure using Azure Data factory.
- Experienced in using the spark application master to monitor thespark jobsand capture the logs for the spark jobs.
- ImplementedSparkusing pyspark and Spark SQL for faster testing and processing of data.
- Hadoop Developer with hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Hbase, Zookeeper, Oozie and Flume.
- Strong Experience in Core Java, Scala, SQL, PL/SQL and Restful webservices.
- Hands on Experience in AWS EC2, S3, Redshift, EMR, RDS.
- Developed a data pipeline for data processing using Kafka-Spark API.
- Good exposure with Agile software development process.
- Good knowledge in using cloud shell for various tasks and deploying services.
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, SQOOP, Apache Spark, with Cloudera Distribution.
- Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.
- Written transformations in pyspark and spark sql and derive new datasets.
- Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in in Azure Databricks.
- Extracted all the Transmit reporting fields from very large Json files stored in Mongo DB by applying various filters.
- Developed Pyspark application for creating Payfone reporting tables with different masking’s in both Hive and MySQL DB and made available for newly build fetch API’s
- Testing and troubleshooting application issues.
Environment: Python, Pyspark, Hive, TSQL, SSIS, MS SQL Server, Java, GIT, JIRA, MySQL, SQL, SSRS, Oracle BI, MS Excel, MS Access, Tableau, sparksql, Sqoop, JavaScript, PyCharm, aws, MySQL, shell scripting, Azure, Spark, NoSQL, aws, Kafka, hive, Jenkins, agile, emr, HDfs, Jenkins.
Confidential - Nashville, TN
DATA ENGINEER
Responsibilities:
- Implemented reporting Data Warehouse with online transaction system data.
- Have extensively used Apache spark features such as rdd operations (mapping, merging, combining, aggregation of data, vectorization of data etc.) And data frames and datasets for transformation, enrichment of data, data storage operations, applying descriptive statistics, and aggregation of data.
- Tuned ETL jobs in the new environment after fully understanding the existing code.
- Developed data format file that is required by the model to perform analytics using spark sql and hive query language.
- Analyzed Stored Procedures to convert business logic into Hadoop jobs
- Tuned ETL jobs in the new environment after fully understanding the existing code.
- Worked on Big data on AWS cloud services i.e. EC2, S3, EMR and DynamoDB.
- Created hive tables using partitions for optimal usage.
- Experience in writing Map Reduce jobs using Java with Eclipse
- Analyzed the sql scripts and designed it by using pyspark sql for faster performance
- Designed and mechanized Custom-constructed input connectors utilizing Spark, Sqoop and Oozie to ingest and break down informational data from RDBMS to Azure Data lake.
- Carried out data transformation and cleansing using SQL queries, Python Pyspark.
- Was responsible for ETL and data validation using SQL server integration services.
Environment: SQL server, ETL, Pyspark, Databricks, Azure, Spark, AWS, DynamoDB.
Confidential
DATABASE DEVELOPER
Responsibilities:
- Designing and developing of mappings, mapplets, sessions and workflow to load the data for Ultimatix projects from source to target database using Informatica PowerCenter.
- Used Business Objects to create reports based on SQL queries. Generated executive dashboard reports with latest company financial data by business unit and by product.
- Created PL/SQL SQL scripts for the ETL Converting/ Migration data form other systems, Oracle, XML and Flat file into Oracle database tablets for data warehousing and BI purpose. Used sql loader, external tablets and utl file for the purpose of loading flat file and XML data.
- Implemented Teradata RDBMS analysis with Business Objects to develop reports, interactive drill charts, balanced scorecards and dynamic Dashboards.
- Developed PL/SQL procedures, functions and packages and used SQL loader to load the data into the database.
- Designed and developed informatics Mapping to load the data from source Analyzer, Data warehousing designer, Mapping mapplet Designer and Transformation Designer.
Environment: ETL, SQL, data warehousing, Oracle 11g, PL/SQL, XML.