We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

SUMMARY

  • IT Consultant with 7+ years of extensive experience in Operations, developing, maintaining, monitoring, and upgrading Hadoop Clusters (Hortonworks/Mapr distributions).
  • Extensive HealthCare Domain, Telecom Domain knowledge with primary skillset.
  • Good Experience in translating client’s Big Data business requirements and transforming them into Hadoop centric technologies.
  • Hands on experience in installing/configuring/maintaining Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Spark,Kafka, Zookeeper, Hue and Sqoop using Hortonworks.
  • Hands on experience in developing and deploying enterprise based applications using major components in Hadoop ecosystem like Hadoop 2.x, YARN, Hive, Pig, Map Reduce, Spark, Kafka, Storm, Oozie, HBase, Flume, Sqoop and Zookeeper.
  • Experience in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Experience in converting Hive/SQL queries into Spark transformations using Java. Experience on ETL development usingKafka, Flume, and Sqoop.
  • Proficient with container systems like Docker and container orchestration like Kubernetes.
  • Used Jenkins pipelines to drive all micro services builds out to the Docker registry and then deployed to Kubernetes, Created Pods and managed using Kubernetes
  • Used Kubernetes to orchestrate the deployment, scaling and management of Docker Containers.
  • Utilized Kubernetes for the runtime environment of the CI/CD system to build, test deploy.
  • Automated build and deployment using Jenkins to reduce human error and speed up production processes
  • Managed GitHub repositories and permissions, including branching and tagging
  • Expert in designing and developing Jenkins deployments.
  • Experience on Continuous Integration Jenkins, performed end to end automation for build and deployments.
  • Extensive experience in using MAVEN as build tools for building of deployable artifacts (jar, war & ear) from source code.
  • Good knowledge on using Artifactory repos for Maven builds.
  • Kubernetesfor automating application deployment, scaling, and management.
  • Hands on developing and executing Shell scripts, Perl scripts and Python scripting.
  • Good knowledge on Linux, and UNIX administration.
  • Hands on experience using Bug tracking tools like JIRA, HP Quality center .
  • Experience in Amazon Web Services (AWS),Microsoft Azure .
  • Built large - scale data processing pipelines and data storage platforms using open-source big data technologies.
  • Experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster.
  • Experience in installing, configuring Hive, its services and Metastore. Exposure to Hive Querying Language, knowledge about tables like importing data, altering and dropping tables.
  • Experience in installing and running Pig, its execution types, Grunt, Pig Latin Editors. Good knowledge about how to load, store, filter data and also combining and splitting data.
  • Experience in tuning and debugging Spark application running.
  • Experience integration ofKafkawith Spark for real time data processing.
  • Involved in all phases of Software Development Life Cycle (SDLC) and Worked on all activities related to the Operations, implementation, administration and support of ETL processes for large-scale Data Warehouses.
  • In depth knowledge about database imports, worked with imported data to populate tables in Hive. Exposure about how to export data from relational databases to Hadoop Distributed File System.
  • Experience in setting up the High-Availability Hadoop Clusters.
  • Good knowledge about planning a Hadoop cluster like choosing the distribution, hardware selection for both master as well as slave nodes and cluster sizing.
  • Experience in developing Shell Scripts for system management.
  • Experience in Hadoop administration with good knowledge about Hadoop features like safe mode, auditing.
  • Experience with Software Development Processes & Models: Agile, Waterfall & Scrum Model.
  • Have good knowledge on sprint planning tools like Rally, Jira and GitHub version control tools as well.
  • Team Player and a fast learner with good analytical and problem-solving skills.
  • Self-Starter and Ability to work independently as well as a Team.
  • Experience in UNIX shell scripting and has good understanding of OOPS and Data structures.

TECHNICAL SKILLS

Operating Systems: Win 95/98/NT/2000/XP, Windows 7, UNIX

CI Tools: Jenkins

Build Tools: MAVEN

Version Tools: GIT

Project Management Tools/Methodology: MS-Project, Unified Modeling Language (UML), Rational Unified Process (RUP), Software Design Life Cycle (SDLC), Agile (SCRUM),KANBAN

Process/Model Tools: Rational Rose, MS Visio, Rally,Jira

Hadoop Technologies: Hadoop/Big Data Technologies HDFS, SPARK, Scala, Hive,Hbase, Pig, Sqoop, Flume, Java, Kafka, Gobblin, Kubernetes,Docker

Cloud Platform: AWS,Azure

Language: JDk 1.8, Java/J2EE with Web API’s,Springboot, Microservices, JDBC

Database: DB2, MS Access, Oracle 9i, HBase

Database Tools: SQLDeveloper

Testing Strategies: System Integration Testing, Regression and System Testing

Testing Tools: HP Quality Center, Quality Center, SONAR, Cucumber

Office Tools: MS Word, MS Excel, MS PowerPoint, MS Access, MS Project

PROFESSIONAL EXPERIENCE

Confidential

Senior Data Engineer

Responsibilities:

  • Experience in Spark (using RDDs, Data frames& SQLs) and Hadoop (using Map-reduce) eco-system with underlying programming language as Scala.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark ‘SQL Context.
  • Analyzing the clients existingHadoopinfrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
  • Used Jenkins pipelines to drive all micro services builds out to the Docker registry and then deployed to Kubernetes, Created Pods and managed using Kubernetes
  • Used Kubernetes to orchestrate the deployment, scaling and management of Docker Containers.
  • Defining job flows inHadoopenvironment-using tools like Oozie for data scrubbing and processing.
  • Strong knowledge in administration and development of Hive with HiveQL.
  • Used Hive to analyze data in HDFS to identify issues and behavioral patterns
  • Worked with Sqoop in importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
  • Effectively used Oozie to develop automatic workflows of Sqoop, Mapreduce and Hive jobs.
  • Scheduling the Jobs in the Jenkins as per the deployments.Setting the workflows in the Jenkins Pipeline.
  • Troubleshooting and monitoring the cluster.
  • Worked on Hive quires from Hue environment.
  • Created Hive tables and involved in data loading and writing Hive.
  • Monitored the user jobs from Resource manager and optimizing the long running jobs.
  • Handling data import from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from Mysql into HDFS.
  • Written scripts for automating the processes such as taking periodic backups, setting up user batch jobs.
  • Deployed multi module applications with built tool like Maven and integrated with Continuous integration servers like Jenkins.
  • Developed test cases using Cucumber and configured GIT for maintaining repository for the project, SONAR to pass code quality.
  • Kubernetesfor automating application deployment, scaling, and management.

Environment: Hadoop,MAPR, HDFS, MapReduce, Spark Core, Spark SQL, Scala, Hive, Hbase, Sqoop,,Kafka, JDk 1.8, Java/J2EE with Web API’s,Springboot, Microservices, JDBC, Cucumber, SONAR, Kubernetes, MAVEN 2.0,Azure.

Confidential

Hadoop /Spark Developer

Responsibilities:

  • Experience in deployingHadoopcluster on Public and Private Cloud Environment like Amazon AWS, Rackspace and Open Stack.
  • Defined automation roadmaps for the team. Worked with the teams to understand their needs and drove them towards continuous integration and delivery. Migrated over fifty applications to the DEVOPS standards which includes 100+ sub - applications
  • Experience in installingHadoopcluster using different distributions of ApacheHadoop Hortonworks.
  • Experience in Spark (using RDDs, Data frames& SQLs) and Hadoop(using Map-reduce) eco-system with underlying programming language as Scala.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
  • Good Experience in understanding the client's Big Data business requirements and transform it intoHadoopcentric technologies.
  • Analyzing the clients existingHadoopinfrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
  • Defining job flows inHadoopenvironment-using tools like Oozie,UC4 for data scrubbing and processing.
  • Loading logs from multiple sources directly into HDFS using tools like Flume.
  • Good experience in performing minor and major upgrades.
  • In depth understanding of HDFS architecture and MapReduce framework.
  • Strong knowledge in administration and development of Hive, Pig with HiveQL and PigLatin scripts respectively.
  • Experience in writing Pig Latin scripts for advanced analytics on data for recommendations.
  • Used Hive and Pig to analyze data in HDFS to identify issues and behavioral patterns
  • Worked with Sqoop in importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
  • Experience in streaming data to HDFS using Flume.
  • Effectively used Oozie to develop automatic workflows of Sqoop, Mapreduce and Hive jobs.
  • Scheduling the Jobs in the UC4 as per the deployments.Setting the workflows in the UC4.
  • Troubleshooting and monitoring the cluster.
  • Worked on Hive quires from Hue environment.
  • Created Hive tables and involved in data loading and writing Hive.
  • Moved data between the clusters.
  • Worked on Disaster Recovery.
  • Monitored the user jobs from Resource manager and optimizing the long running jobs.
  • Worked on Toad oracle 11.6 for data ingestion.
  • CreatedKafkatopics, provide ACLs to users and setting up rest mirror and mirror maker to transfer the data between twoKafkaclusters.
  • Helped the users to connect to Kerberized Hive from SQL Workbench and BI tools.
  • Migrated data across clusters using DISTCP.
  • Scheduling Workflows through UC4 application.
  • Written scripts for disk monitoring and logs compression.
  • Expert in designing and developing Jenkins deployments.
  • Experience on Continuous Integration Jenkins, performed end to end automation for build and deployments.
  • Handling data import from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from Mysql into HDFS.
  • Written scripts for automating the processes such as taking periodic backups, setting up user batch jobs.
  • Good understanding of Distributed Systems and Parallel Processing architecture.
  • Using Jenkins AWS Code Deploy plug-in to deploy to AWS.
  • Wrote UNIX shell scripts to fetch, parse, and load data from external sources.
  • Written wrapper scripts to automate the deployment.
  • Experience in developing MicroServicesweb application using test driven methodologies (TDD) and Junit as a testing framework.
  • Deployed multi module applications with built tool like Maven and integrated with Continuous integration servers like Jenkins.
  • Developed test cases using JUNIT and configured GIT for maintaining repository for the project

Environment: Hadoop2.6.0, HDFS, MapReduce, Spark Core, Spark SQL, Scala, Pig 0.14, Hive 1.2.1, Sqoop 1.4.4, Flume 1.6.0,Kafka,Gobblin,Knox0.6.0,Ambari 2.4.1,Storm 0.9.3, JDk 1.8, Java/J2EE with Web API’s, Microservices, JDBC, JUNIT4, MAVEN 2.0,GIT,GITBASH

Confidential

Hadoop Developer

Responsibilities:

  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Experience in installingHadoopcluster using different distributions of ApacheHadoop Hortonworks.
  • Developed and executed custom MapReduce programs, PigLatin scripts and HQL queries.
  • Implemented Java and J2EE Design patterns like Business Delegate and Data Transfer Object (DTO), Data Access Object and Service Locator.
  • Worked on importing the data from different databases into Hive Partitions directly using Sqoop.
  • Performed data analytics in Hive and then exported the metrics to RDBMS using Sqoop.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Extensively used Pig for data cleaning and optimization.
  • Implemented complex map reduce programs to perform joins on the Map side using distributed cache.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Thoroughly tested Mapreduce programs using MRUnit and Junit testing frameworks.
  • Responsible for troubleshooting issues in the execution of Mapreduce jobs by inspecting and reviewing log files.
  • Extracted Tables from MS SQL Server through Sqoop and placed in HDFS and processed the records.
  • Used Flume to collect and aggregate weblog data from different sources and pushed to HDFS.
  • Deployed multi module applications with built tool like Maven and integrated with Continuous integration servers like Jenkins.

Environment: Hadoop1.x, HDFS, MapReduce, Pig 0.11, Hive 0.10, Sqoop, Unix, JDk 1.8 Java/J2EE with Web API’s, Microservices, JDBC, Junit, JSON, MAVEN 2.0

Confidential

Hadoop Developer

Responsibilities:

  • Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
  • Experience in installingHadoopcluster using different distributions of ApacheHadoop Hortonworks.
  • Experience in Spark (using RDDs, Data frames& SQLs) and Hadoop(using Map-reduce) eco-system with underlying programming language as Scala.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
  • Hands-on experience in developingSparkapplications using RDD transformations,Sparkcore,SparkStreaming andSparkSQL.
  • Experience in using D-Streams, Accumulator, Broadcast variables, RDD caching forSparkStreaming.
  • Developed pipeline for constant information ingestion utilizing Kafka,Sparkstreaming.
  • Experienced inSparkCore,SparkSQL,SparkStreaming.
  • Implemented Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce inSparkfor Data Aggregation, queries and writing data into HDFS through Sqoop.
  • Extensively involved in Design phase and delivered Design documents.
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive HBase database and SQOOP.
  • Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Importing and exporting data into HDFS and Hive using SQOOP.
  • Migration of huge amounts of data from different databases (i.e. Oracle, SQL Server) to Hadoop.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in mapreduce way.
  • Experienced in defining job flows.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Experienced in managing and reviewing the Hadoop log files.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Load and Transform large sets of structured and semi structured data.
  • Responsible to manage data coming from different sources.
  • Involved in creating Hive Tables, loading data and writing Hive queries.
  • Utilized Apache Hadoop environment by Hortonworks.
  • Created Data model for Hive tables.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Expert in designing and developing Jenkins deployments.
  • Experience on Continuous Integration Jenkins, performed end to end automation for build and deployments.
  • Written helper classes using the Java Collection Framework.
  • Written JUnit Test Cases for the classes developed.
  • Wrote UNIX shell scripts to fetch, parse, and load data from external sources.
  • Written wrapper scripts to automate the deployment.
  • Worked on Oozie workflow engine for job scheduling.
  • Did unit testing for newly developed components using JUnit
  • Involvement in Automation Environment setup using Eclipse,java,seleniumweb driverjava language bindings and TestNG jars.
  • Involved in Unit testing and delivered Unit test plans and results documents.

Environment: Hadoop2.x, HDFS, Spark Core, Spark SQL, Scala, MapReduce, Pig 0.12.1, Hive 0.13.1, Sqoop 1.4.4, Flume 1.6.0,Unix, JDk 1.8, Java/J2EE with Web API’s, Microservices, JDBC, Junit, JSON, MAVEN 2.0, GIT,GITBASH

Hire Now