We provide IT Staff Augmentation Services!

Data Engineer/ Spark Developer Resume

2.00/5 (Submit Your Rating)

Columbus, OH

PROFESSIONAL SUMMARY

  • Around 6years of total experience as Software Engineer with IT Technologies,4years of Experience asHadoop DeveloperwithBIG Data / HadoopEcosystemsand2years asJavaDeveloper.
  • Experience in installing, configuring and troubleshootingHadoopecosystem components like Map Reduce,Apache Spark,Scala,HDFS, Sqoop, Flume, Hive,Oozie, HBase, Python and Zoo Keeper.
  • Experience in upgrading the existingHadoop cluster to latest releases.
  • Experienced in using NFS (network file systems) for Name node metadata backup.
  • Experience in using Cloudera Manager 4.0 for installation and management ofHadoopcluster.
  • Worked with different flavors ofHadoopdistributions, which includes Cloudera (CDH4&5 Distributions) and Hortonworks.
  • Distributed Application Development using Actor Models for extreme scalability using Akka.
  • Worked on migrating the old java stack to Type safe stack using Scala for backend programming.
  • Used slick to query and storing in database in a Scala fashion using the powerful Scala collection framework.
  • Used Scala collection framework to store and process the complex consumer information. Based on the offers setup for each client,
  • the requests were post processed and given offers.
  • Experienced in Developing Sparkapplication usingSpark Core,Spark SQL andSpark Streaming API's.
  • Good experience in writing Spark applications using Scala and Javaand usedScalaset to developScalaprojects and executed usingSpark - Submit.
  • Excellent understanding/ knowledge ofHadooparchitecture and Spark architecture various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Spark components such as Spark Driver,Executors,Cluster manager and MapReduce.
  • Involved in importing Streaming data using Flume to HDFS and good experience in analyzing and cleansing raw data using HiveQL.
  • Experience in Partitioning, Bucketing, Join Optimizations and Query Optimizations in Hive and automating the Hive Queries with the Dynamic Partitioning.
  • Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud. Exporting and importing data into S3.
  • Good understanding of NoSQL databases and hands on experience in writing applications on NoSQL databases like Hbase and worked on HBase to load and retrieve data for real time processing using Rest API.
  • Worked on Multi Clustered environment and setting up Horton works Hadoop System Experienced in developing and implementing Map Reduce jobs using java to process and perform various analytics on large datasets.
  • Implemented a CI/CD pipeline with JENKINS, GITHUB, NEXUS, MAVEN and AWS AMI.
  • Good understanding of Data Structure and Algorithms.
  • Participated in converting existing Hadoop jobs to spark jobs using Spark Core, Spark SQL, Spark Streaming.
  • Load the data into Spark RDD’s, Spark Data Frame API’s in Scala and performed in-memory data computation to generate the output response.
  • Good experience on developing of ETL Scripts for Data cleansing and Transformation.
  • Experience in Data migration from existing data stores and mainframe NDM (Network Data mover) toHadoop.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Experience in supporting analysts by administering and configuring HIVE.
  • Hands-on programming experience in various technologies like JAVA,SQL, HTML, XML,, Eclipse, Visual Studio on Windows, UNIX.
  • Experience writing SQL queries and working with Oracle and My SQL.
  • Expertise in Object-oriented analysis and programming(OOAD) like UML and use of various design patterns
  • Experience in preparing deployment packages and deploying to Dev and QA environments and prepare deployment instructions to Production Deployment Team.
  • Team player with excellent analytical, communication and project documentation skills Agile Methodology and Iterative development.
  • Hands on experience on writing python scripts to automate the entire job flow of execution and integrating in one script.
  • Experience in using Cloud era Manager for installation and management of single-node and multi-node Hadoop cluster.

TECHNICAL SKILLS

Hadoop Ecosystem: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Kafka,Oozie, Zookeeper,Kafka,Apache Spark, Spark Streaming, Flume, Spark SQL, Spark Streaming, Apache Hadoop. Kafka, Apache Spark, Spark Streaming, Flume, Spark SQL, Spark Streaming, Apache Hadoop.

Languages: Core Java, SQL, Pig Latin, HiveQL, Scala, ITEXT, SPARK QL, SPARK CORE.

Databases: MySQL, Teradata, Oracle,Cassandra

Analytics Tools: RapidMiner, Weka, Apache Mahout.

Cloud Computing: AWS, EC2, S3,AZURE,GCP.

Version Control Tools: SVN, CVS

Operating Systems: Sun Solaris, RedHat Linux, Windows98/XP/Vista/7/8, UNIX, Linux.

Development Tools: Eclipse, Visual Studio, IntelliJ, Putty, WinSep,IntelliJ

Hadoop Distribution: Cloudera, Horton Works, Apache, AWS, MAP R,PIVOTAL.

CI CD Tools: Jenkins

Schdulers: Control - M,Oozie,Bedrock

PROFESSIONAL EXPERIENCE

Confidential, Columbus, OH

Data Engineer/ Spark Developer

Responsibilities:

  • Developing Spark using Scala’API to compare the performance of Spark with Teradata.
  • Used Spark API over HDP to perform analytics on data In Hive.
  • Designed and created Hive external tables using shared metastore, dynamicpartitioning and buckets.
  • Implemented Apache PIG scripts to load data from and to store data into Hive.
  • Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD's.
  • Used the JSON and XML SerDe's for serialization and deserialization to load JSON and XML data into HIVE tables.
  • Knowledge of Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
  • Imported the data from different sources like AWS S3, LFS into Spark RDD.
  • Worked with various HDFS file formats like Avro, Sequence File and various compression formats like Snappy.
  • Used SparkSQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structureddata using SparkSQL.
  • Develop Spark/MapReduce jobs to parse the JSON or XML data.
  • Involved in HBASE setup and storing data into HBASE, which will be used for analysis.
  • Used Scala libraries to process XML data that was stored in HDFS and processed data was stored in HDFS.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Wrote different pig scripts to clean up the ingested data and created partitions for the daily data.
  • Implemented SQOOP Scripts to import and Export data from RDBMS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and Python.
  • Analyzed the SQL scripts and designed the solution to implement using PySpark.
  • Involved in converting MapReduce programs into Spark transformations using Spark RDD in Scala.
  • Developed Spark scripts as per the requirement.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Used Oozie workflow to coordinate pig and hive scripts.

Environment: Hadoop, HDFS, Spark, MapReduce, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, AWS, Python,Java, JSON, SQL Scripting and Linux Shell Scripting, Avro, Parquet, Hortonworks.

Confidential, Detroit, MI

Big Data / Spark Developer

Responsibilities:

  • Primary role building data pipelines and working on advanced procedures like cluster tuning, code optimization and using the in-memory computing capabilities of Spark using Scala as per requirements.
  • Load the data into Spark RDD’s, Spark Data Frame API’s in Scala and performed in-memory data computation to generate the output response.
  • Working with Project managers, business owners, analyst teams and clients, building database prototypes to validate system requirements and document code, provide progress reports, and perform code review and peer feedback.
  • Working on converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Participate in the development of technical/functional requirements and design specification as appropriate and developing the software as required.
  • Worked on state full transformation of Spark Application and Create Spark Application to load the data in Hive Tables.
  • Build data pipeline for different events of ingestion, aggregation and load consumer response data into Hive external tables in HDFS location to serve as feed for several dashboards and Web APIs.
  • Develop SQOOP scripts to migrate data from Oracle to Big data Environment.
  • Design experimental Spark API for better optimization of existing algorithms such as Spark context, Spark SQL, Spark UDF’s, Spark Data Frames.
  • Supporting Hadoop developers and assisting in optimization of map reduce jobs, Hive Scripts. Worked on the Spark SQL for analyzing the data. Used Scala to write code for all Spark use cases.
  • Distributed Application Development using Actor Models for extreme scalability using Akka.
  • Worked on migrating the old java stack to Type safe stack using Scala for backend programming.
  • Used slick to query and storing in database in a Scala fashion using the powerful Scala collection framework.
  • Challenges faced and resolved in migration projects.
  • Used Scala collection framework to store and process the complex consumer information. Based on the offers setup for each client,
  • the requests were post processed and given offers.
  • Work with different file formats like CSV, Json, AVRO, text and parquet and compression techniques like snappy according to the request of the client.
  • Work on cluster tuning and in-memory computing capabilities of Spark using Scala based on the resources available on the cluster.
  • Develop Shell Scripts to automate the Jobs before moving to Production in a configured way by passing Parameters.
  • Worked extensively with Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses
  • Schedule automated jobs on daily basis and weekly basis according to the requirement using Oozie as Scheduler.
  • Work on operation controls like job failure notifications, email notifications for failure logs and exceptions.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, and Pair RDD'S, and YARN.
  • Involved in converting Hive Q into Spark transformations using Spark RDD'S and Scala.
  • Used Spark API over Cloud era Hadoop YARN to perform analytics on data in HDFS.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Support the project team for successful delivery of the client's business requirements through all the phases of the implementation.
  • Experienced in loading and transforming of large sets of structured, semi structured data using Spark

Environment: Hadoop, Spark Scala, Hive, Cloudera, HBase, Sqoop, HDP 2.6, HDFS.

Confidential - Detroit, MI

Data Engineer

Responsibilities:

  • Installed and configuredHadoopecosystem like HBase, Flume.
  • Involved inHadoopcluster task like Adding and Removing Nodes without any effect to running jobs and data.
  • Managed and reviewedHadoopLog files.
  • Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
  • Experience using Hortonworks platform and their eco systems. Hands on experience in installing, configuring and using ecosystem components likeHadoopMapReduce, HDFS, Hive and Flume.
  • Developed Pig scripts to load data from files to Hbase.
  • Developed Hive scripts to pull data from Data Lake to our tenant.
  • Proficient with UNIX shell scripting.
  • Involved in design discussions for the ingestion process
  • Developed Talend jobs to identify the gaps in the data and reload the data to clear the gaps.
  • Leading a team of 4 and tracking the defects to closure.
  • Develop Scala scripts, UDFs using both Data frames and RDDs in Spark for Data Aggregation, queries and writing data back into OLTP Systems.
  • Hands on experience in designing and developing POCs in Spark to compare the performance of Spark with Hive and SQL/Oracle using Scala.
  • Developed Shell and Python scripts to automate the jobs.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Hive.
  • Involved in installingHadoopEcosystem components.
  • Responsible to manage data coming from different sources.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Building massively scalable multi-threaded applications for bulk data processing primarily with Apache Spark and PIG onHadoop.
  • Developed Scripts and Batch Job to schedule variousHadoopProgram.
  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated daily scrum and other design related meetings
  • Involved in the data and designing the enterprise platforms like Hadoop Data Lake and Treasury Data Hub (TDH) Internal Data warehouses.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.

Environment: AWS Athena, SNS, SQS, CloudWatch, API Gateway, Redshift, S3, EC2, EMR, Apache Spark, Python, Docker, UNIX Shell scripts, GIT, Maven.

Confidential

Big data Developer

Responsibilities:

  • Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Participating in client level communications to go through application specific requirements
  • Creating Hive queries to bounce the data against end user systems.
  • The Hive tables created as per requirement were Internal or External tables defined with appropriate Static and Dynamic partitions, intended for efficiency.
  • Active participation in validating the statistics from source and target data sources.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MR, HIVE, SQOOP and Pig Latin.
  • Performed Fragment-Replicate Joins (Map side joins) in Pig, which implements Distributed Cache
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using sqoop.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Data analysis in running Hive queries.
  • Working with data delivery teams to setup newHadoopusers. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
  • Extending the functionality of Hive and Pig with custom UDF s and UDAF's on Java.
  • Involved in extracting the data from various sources intoHadoopHDFS for processing.
  • Worked on analyzingHadoopcluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Creating and truncating HBase tables in hue and taking backup of submitter ID(s).
  • Used Amazon EMR for map reduction jobs and test locally using Jenkins.
  • Creating and managing Azure Web-Apps and providing the access permission to Azure AD users.
  • Commissioned and Decommissioned nodes on CDH5Hadoopcluster on Red hat LINUX.
  • Involved in loading data from LINUX file system to HDFS.
  • Experience in configuring the Storm in loading the data from MYSQL to HBASE using JMS.
  • Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
  • Experience in managing and reviewingHadooplog files.

Environment: HDFS, Map Reduce, Hive, Hue, Pig, AZURE, Flume, Oozie, Sqoop, CDH5, Apache Hadoop, Spark, Python, R programming, Qlik, HortonWorks, Ambari, Cloudera Manager, Red Hat, Java, MySQL and Oracle.

Confidential

Hadoop Developer

Responsibilities:

  • Installed/Configured/Maintained ApacheHadoopclusters for application development andHadoop tools like Hive, Pig, HBase, Zookeeper, Cassandra and Sqoop.
  • Implemented High Availability Name Nodes using Quorum Journal Managers and Zookeeper Failover Controllers.
  • Managed 350+ Nodes HDP 2.3 cluster with 4 peta bytes of data using Ambari 2.0 and Linux Cent OS 7.
  • Familiar withHadoopSecurity involving LDAP, Kerberos, Ranger.
  • Strong experience using Ambary administering largeHadoopclusters > 100
  • After the transformation of data is done, this transformed data is then moved to Spark cluster where the data is set to go live on to the application using Spark streaming and Kafka.
  • Configure LDAP User Management Access
  • Wrote the shell scripts to monitor the health check ofHadoopdaemon services and respond accordingly to any warning or failure conditions.
  • Installed and configuredHadoop, MapReduce, HDFS (Hadoop Distributed File System)
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Set up Kerberos locally on 5 node POC cluster using Ambari and evaluated the performance of cluster, did impact analysis of Kerberos enablement.
  • Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
  • Worked on implementation of SSL /TLS implementation.
  • Configuration of SSL and trouble shooting in Hue.
  • Responsible for building scalable distributed data solutions usingHadoopCloudera works
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Enabled Kerberos for authorization and authentication.
  • Enabled HA for NameNode, Resource Manager, Yarn Configuration and Hive Megastore.
  • Configured Journal nodes and Zookeeper Services for the cluster using Cloudera.
  • MonitoredHadoopcluster job performance and capacity planning.
  • Monitored and reviewedHadooplog files.
  • Performed Cloudera Manager and CDH upgrades.
  • Taking backup of Critical data, Hive data and creating snapshots.
  • Responsible for building scalable distributed data solutions usingHadoop.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster
  • Monitoring and troubleshooting, and reviewHadooplog files.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Extraction data using Flume and Import/Export to HDFS/RDMS using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Good Knowledge of NoSQL database like HBase.
  • Continuous monitoring and managing theHadoopcluster through Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive.
  • Performance tuning of Impala jobs and resource management in cluster.

Environment: HDFS, Mapreduce (MR1), Pig, Hive, Oozie, Sqoop, Cassandra, AWS, Talend, Java, Unix-Shell Scripting.

Confidential

Core Java Developer

Responsibilities:

  • Developed Web application using spring, Spring IOC, Spring Annotations, Spring MVC, Spring Transactions, Hibernate, SQL, and IBM Web Sphere.
  • Development of the service layer using Java/J2EE.
  • Created internal routes using REST web service with spring which can accept and send objects in JSON format.
  • Very good implementation experience of Object Oriented concepts, Multithreading and Java/Scala.
  • Involved in multi-tiered J2EE design utilizing Spring IOC architecture and Hibernate.
  • Experienced in developing web services and worked with Web Sphere Application Server.
  • Involved in Analysis, Design and Implementation of Business User Requirements.
  • Designed table-less layouts using CSS and appropriate HTML tags as per W3C standards.
  • Created optimized graphic websites and application interfaces using HTML, CSS, and spring framework.
  • Created various Parser programs to extract data from Autosys, Tibco Business Objects, XML, Informatica, Java and database views using Scala.
  • Extensively worked on AJAX to implement front end /user interface features in the application.
  • Developed CSS style sheets to give gradient effects. Developed page layouts, navigation and icons.
  • Used Bootstrap in combination with Angular JS to develop this website as a responsive website.
  • Created Custom filters and directives to process the data or to render a reusable DOM.
  • Used JavaScript extensively for validation, DOM manipulation etc.
  • Used GitHub as the version control tool.
  • Worked with build tools like Jenkins to deploy application.

Environment: Spring, Hibernate, JMS, SOAP web service client (using JAX-WS), Restful Web Services Client (using JAX-RS), Angular JS, Bootstrap, HTML, CSS, AJAX, Scala, Oracle, SQL, Oracle, Eclipse, GIT, Jenkins, IBM Web Sphere.

We'd love your feedback!