We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

3.00/5 (Submit Your Rating)

Schenectady, NY

SUMMARY

  • Over 8+ years of experience in Information Technology which includes experience in Big data, HADOOP Ecosystem, Core Java/J2EE and strong in Design, Software processes, Requirement gathering, Analysis and development of software applications
  • Excellent Hands on Experience in developing Hadoop Architecture in Windows and Linux platforms.
  • Experience in building big data solutions using Lambda Architecture using Cloudera distribution of Hadoop, MapReduce, Cascading, HIVE, PIG and Sqoop.
  • Experience in using Microsoft Azure SQL database, Data Lake, Azure ML, Azure data factory, Functions, Databricks and HDInsight.
  • Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
  • Strong development experience in Java/JDK 7, JEE6, Maven, Jenkins, Jersey, Servlets, JSP, Struts, Spring, Hibernate, JDBC, Java Beans, JMS, JNDI, XML, XML Schema, Web Services, SOAP, JUnit, ANT, Log4j.
  • Experienced in J2EE Design Patterns such as MVC, Business Delegate, Service Locator, Singleton, Transfer Object, Singleton, Session Façade, and Data Access Object.
  • Worked on Hadoop, Hive, JAVA, python, Scala Struts web framework.
  • Excellent working experience on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
  • Developed Python code to gather teh data from HBase and designs teh solution to implement using Pyspark
  • Experienced in designing and developing applications in Spark using Scala to compare teh performance of Spark wif Hive and SQL/Oracle.
  • Worked on Google Cloud Platform(GCP) Services like Vision API, Instances
  • Hands on experience working on NoSQL databases including Hbase, MongoDB, Cassandra and its integration wif Hadoop cluster.
  • Hands on experience in writing Ad - hoc Queries for moving data from HDFS to HIVE and analyzing teh data using HIVEQL.
  • Good understanding on Cloud Based technologies such as GCP, AWS.
  • Hands on Experience on Snowflake and GCP.
  • Good knowledge in RDBMS concepts (Oracle 11g, MS SQL Server 2000) and strong SQL, PL/SQL query writing skills (by using TOAD & SQL Developer tools), Stored Procedures and Triggers.
  • Expertise in Amazon Web Services including Elastic Cloud Compute (EC2) and Dynamo DB.
  • Expertise in Automating deployment of large Cassandra Clusters on EC2 using EC2 APIs
  • Experienced in development and utilization of ApacheSOLR wif Data Computations and Transformation for use by Down Stream Online Applications.
  • Good understanding and experience wif Software Development methodologies like Agile and Waterfall.
  • Experienced in importing and exporting data using Sqoop from HDFS (Hive & HBase) to Relational Database Systems (Oracle &Teradata) and vice-versa.
  • Experienced in developing and designing Web Services (SOAP and Restful Web services).
  • Expertise in various Java/J2EE technologies like JSP, Servlets, Hibernate, Struts, spring.

PROFESSIONAL EXPERIENCE

Confidential, Schenectady, NY

Sr. Data Engineer

Responsibilities:

  • Worked on analyzing Hadoop cluster using di erent big data analytic tools including Kafka, Pig, Hive and MapReduce.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing teh data in In Azure Databricks.
  • Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures
  • Configured Spark streaming to receive real time data from teh Kafka and store teh stream data to HDFS using Scale.
  • Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning and processing.
  • Build a program wif Python and Apache Beam API and execute it in cloud Dataflow to run Data validation between raw source file and Bigquery tables.
  • Designed and configured Flume servers to collect data from teh network proxy servers and store to HDFS and HBASE.
  • Experience in moving data between GCP (Google Cloud Platform) and Azure using Azure Data Factory.
  • Process and load bound and unbound Data from Google pub/sub topic to Bigquery using cloud Dataflow wif Python.
  • Worked on implementing Spark using Scala and SparkSQL for faster analyzing and processing of data.
  • Utilized Java and MySQL from day to day to debug and fix issues wif client processes
  • Used JAVA, J2EE application development skills wif Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC)
  • Implemented AWS EC2, Key Pairs, Security Groups, AutoScaling, ELB, SQS, and SNS using AWS API and exposed as teh Restful Web services.
  • Monitor Azkaban jobs in on-premise (Horton works distribution) and GCP (Google Cloud Platform).
  • Involved in launching and Setup of HADOOP/ HBASE Cluster which includes configuring di erent components of HADOOP and HBASE Cluster.
  • Hands-on experience of Web logic Application Server, Web Sphere Application Server, Web Sphere Portal Server, and J2EE application deployment technology
  • Handled in Importing and exporting data into HDFS and Hive using SQOOP and Kafka.
  • Used Kafka consumer’s API in Scala for consuming data from Kafka topics.
  • Involved in creating Hive tables, loading teh data and writing hive queries, which will run internally in map reduce.
  • Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
  • Developed spark applications in python (PySpark) on distributed environment to load huge number of CSV files wif di erent schema in to Hive ORC tables.
  • Worked on reading and writing multiple data formats like JSON, ORC, Parquet on HDFS using PySpark.
  • Involved in HDFS maintenance and WEBUI it through Hadoop-Java API.
  • Implemented Reporting, Notification services using AWS API and used AWS (Amazon Web services) compute servers extensively.
  • Worked on Designing and Developing ETL Work ows using Java for processing data in HDFS/Hbase using Oozie.
  • Work related to downloading BigQuery data into pandas or Spark data frames for advanced ETL capabilities.
  • Worked wif Google data catalog and other Google cloud API’s for monitoring, query and billing related analysis for BigQuery usage.
  • Wrote complex Hive queries and UDFs.
  • Created ETL Framework using Azure Databricks.
  • Create Snapshots of EBS Volumes. Monitor AWS EC2 Instances using Cloud Watch and worked on AWS Security Groups and their rules
  • Created ETL pipeline using Azure Data Factory.
  • Involved in developing Shell scripts to easy execution of all other scripts (Pig, Hive, and MapReduce) and move teh data files wifin and outside of HDFS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Worked wif NoSQL databases like Hbase in creating tables to load large sets of semi structured data.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase.
  • Worked on loading data from UNIX file system to HDFS
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Confidential, Dallas, TX

Data Engineer

Responsibilities:

  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Creating Spark clusters and configuring high concurrency clusters using Azure Databricks to speed up teh preparation of high-quality data.
  • Creating Data Pipelines using Azure Data Factory.
  • Designed stream processing job used by Spark Streaming which is coded in Scala.
  • Migrating an entire oracle database to BigQuery and using of power bi for reporting.
  • As a Big Data Developer implemented solutions for ingesting data from various sources and processing teh Data-at- Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, MongoDB.
  • Created python scripts to ingest data from on-premise to GCS and built data pipelines using Apache Beam API and Data Flow for data transformation from GCS to Big query.
  • Developed a job server (REST API, spring boot, ORACLE DB) and job shell for job submission, job profile storage, job data (HDFS) query/monitoring.
  • Developed PySpark and SparkSQL code to process teh data in Apache Spark on Amazon EMR to perform teh necessary transformations based on teh STMs developed
  • Used Spark SQL wif Scala for creating data frames and performed transformations on data frames.
  • Create ETL scripts to retrieve data feeds, page metrics from Google analytic services
  • Explored wif teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, PairRDD's, SparkYARN.
  • Deployed application to AWS and monitored teh load balancing of di erent EC2 instances
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted teh data from SQL into HDFS using Sqoop.
  • Developed a POC for project migration from on premise Hadoop MapR system to GCP (Google Cloud Platform).
  • Worked on implementing Spark Framework a Java based Web Frame work.
  • Worked on Big Data Integration &Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods.
  • Extensively worked on Python and build teh custom ingest framework and worked on Rest API using python.
  • Imported teh data from di erent sources like HDFS/Hbase into SparkRDD.
  • Worked on AWS Relational Database Services, AWS Security Groups and their rule and implemented Reporting, Noti cation services using AWS API.
  • Analyzed teh SQL scripts and designed teh solution to implement using Pyspark.
  • Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and exposed as teh Restful Web services.
  • Implemented using SCALA and SQL for faster testing and processing of data. Real time streaming teh data using wif KAFKA.
  • Developed and designed automation framework using Python and Shell scripting.
  • Involved in writing Java API for Amazon Lambda to manage some of teh AWS services.
  • Load teh data into Spark RDD and do in memory data Computation to generate teh Output response.
  • Developed and written Apache PIG scripts and HIVE scripts to process teh HDFS data.
  • Used Hive to find correlations between customer's browser logs in di erent sites and analyzed them to build risk profile for such sites.
  • Utilized Agile Scrum Methodology to halp manage and organize a team of 4 developers wif regular code review sessions.

Confidential

Big Data Developer

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase.
  • Involved in writing Client side Scripts using Java Scripts and Server Side scripts using Java Beans and used servlets for handling teh business.
  • Developed Scala programs wif Spark for data in Hadoop ecosystem.
  • Developed another user based Web services (SOAP) through WSDL using WebLogic application server and JAXB as binding framework to interact wif other components.
  • Developed Spark applications using Scala for easy Hadoop transitions.
  • Implemented applications wif Scala along wif Akka and Play framework.
  • Managed and reviewed Hadoop Log les as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
  • Provisioning of Ec2 Instances on both Windows and Linux and worked on AWS Relational Database Services, AWS Security Groups and their rules
  • Implemented Reporting, Notification services using AWS API.
  • Developed MapReduce jobs using apache commons components.
  • Used Service Oriented Architecture (SOA) based SOAP and REST Web Services (JAX-RS) for integration wif other systems.
  • Collected and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Involved in designing and developing teh application using JSTL, JSP, Java script, AJAX, HTML, CSS and collection.
  • Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and exposed as teh Restful Web services.
  • Solved performance issues in Hive and Pig scripts wif understanding of Joins, Group and aggregation and translate to MapReduce jobs.
  • Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
  • Coordinated wif various stakeholders such as teh End Client, DBA Teams, Testing Team and Business Analysts.
  • Developed Java Web Applications using JSP and Servlets, Struts, Hibernate, spring, Rest Web Services, SOAP.
  • Involved in gathering requirements and developing a project plan.
  • Involved in UI designing, Coding, Database Handling.
  • Involved in Unit Testing and BugFixing.
  • Worked over teh entire Software Development Life Cycle (SDLC) as a part of a team as well as independently.
  • Written SQL queries to query teh database and providing data extracts to users as per request.

Confidential

Java/Scala Developer

Responsibilities:

  • Develop Web tier using Spring MVC Framework.
  • Perform database operations on teh consumer portal using Spring JDBC template.
  • Implementeddesign patternsin Scala for teh application.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
  • ImplementedRestfulservices in Spring.
  • Serialize and de-serialize objects usingPlayJSON library.
  • Developing traits and case classes etc..Scala.
  • Develop quality code adhering to Scala coding Standards and best practices.
  • Writing complex SQL queries.
  • Develop GUI usingJQueryJSONandJava script.
  • Unit testing Integration testing and bug fixing.
  • Understood and analyzed client requirements to prepare Traceability Matrix, Test Plans, Test Cases and Test Report that impacted teh project deliverables.
  • Performed intensive testing wif di erent test cases for a particular scenario to assure quality of deliverables
  • Identified di erent bugs and provided a detailed analysis for each bug which halped development team to resolve bugs faster.
  • Performed various analysis to gain in-sights for data-driven decision-making on numerous automation projects to identify feasibility and to optimize business processes.

We'd love your feedback!