We provide IT Staff Augmentation Services!

Sr. Big Data Developer/engineer Resume

3.00/5 (Submit Your Rating)

NC

SUMMARY:

  • Around 8+ Years of experience in Information Technology Industry which includes 5+Years of experience as Hadoop/Spark Developer using Bigdata Technologies like Hadoop Ecosystem, Spark Ecosystems and 3+Years of Java/J2EE Technologies and SQL.
  • Hands on experience in installing, configuring and using Hadoop ecosystem components like HDFS , MapReduce Programming, Hive, Pig, Yarn, Sqoop, Flume, Hbase, Impala, Oozie, ZooKeeper , Kafka , Spark .
  • In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.
  • In - depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLib and Spark Real time Streaming.
  • Hands on experience in Analysis, Design, Coding and Testing phases of Software Development Life Cycle (SDLC).
  • Hands on experience with AWS (Amazon Web Services), Elastic Map Reduce (EMR), Storage S3, EC2 instances and Data Warehousing.
  • Worked and learned a great deal from Amazon Web Services ( AWS ) Cloud services like EC2, S3, EBS
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Hands on experience in various Bigdata application phases like data ingestion, data analytics and data visualization.
  • Experience in usage of Hadoop distribution like Cloudera , Hortonworks distribution & Amazon AWS
  • Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP.
  • Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
  • Experience in working with flume to load the log data from multiple sources directly into HDFS.
  • Very well versed in workflow scheduling and monitoring tools such as Oozie, Hue and Zookeeper.
  • Good knowledge on Impala, Mahou t, SparkSQL , Storm , Avro , Kafka, Hue and AWS and knowledge on IDE tools such as Eclipse, NetBeans, and Maven.
  • Installed and configured MapReduce, HIVE and the HDFS, implemented CDH5 and HDP clusters on CentOS. Assisted with performance tuning, monitoring and troubleshooting.
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Strong knowledge of version control systems like SVN and GITHUB .
  • Experience in manipulating the streaming data to clusters through Kafka and Spark -Streaming.
  • Experience in analyzing data using HiveQL, Pig Latin , and custom MapReduce programs in Java .
  • Basic Knowledge on Kudu, Nifi, Kylin and Zeppelin with Apache Spark.
  • Experience in NoSQL Column-Oriented Databases like Hbase, Cassandra and its Integration with Hadoop cluster.
  • Involved in Cluster coordination services through Zookeeper.
  • Good level of experience in Core Java, J2EE technologies as JDBC, Servlets, and JSP.
  • Hands-on knowledge on core Java concepts like Exceptions, Collections, Data-structures, Multi-threading, Serialization and deserialization.
  • Experience in designing the User Interfaces using HTML, CSS, JavaScript and JSP.
  • Excellent interpersonal skills in areas such as teamwork, communication and presentation to business users or management teams.
  • Good knowledge in Oracle PL/SQL and shell scripting.
  • Strong problem-solving skills, quick learner and able to work independently as well as a team member of varying sized teams.
  • Ability to plan, manage, motivate and work efficiently as an independent or collaboratively in a team.
  • Self-motivated, enthusiastic and always keen to learn new methodologies and techniques.

TECHNICAL SKILLS:

Programming Languages: C, C++, Java 1.4/1.5/1.6/1.7/1.8, Sql, Pl/Sql, JavaScript

Big Data Technologies: HDFS, Hive, Crunch, Oozie, Apache Hadoop, Spark, HIVE, PIG, Hbase, SQOOP, Oozie, Zookeeper, Spark Mahout.

Web Technologies: HTML, HTML5, XML, XHTML, CSS3, JSON, AJAX, XSD, WSDL, ExtJS

RDBMS/Databases: Oracle, MySql, PostgreSQL, SQLServer, MongoDB (NoSQL), ORACLE 8i/9i/10g, SQL Server 6.5, MS Access

Server side Frameworks and Libraries: Spring 2.5/3.0/3.2, Hibernate 3x/4x, MyBatis, Spring MVC, Spring web flow, Spring Batch, Spring Integration, Spring-WS, Struts, Jersey Restful Web services, Xfire, Apache CXF, Mule ESB, Zookeeper, Curator, Apache POI, Junit, Mockito, PowerMock, Slf4j, Log4j, Gson, Jackson, UML, Selenium, Crystal Reports

UI Frameworks and Libraries: ExtJS, JQuery, JQueryUI, AngularJS, Thymeleaf, Prime Faces, Bootstrap

Application Servers: Bea WebLogic, IBM WebSphere, Apache Tomcat

Build Tools and IDE s: Maven, Ant, IntelliJ, Eclipse, Spring Tool Suite, NetBeans and Jenkins

Operating Systems: Windows, UNIX, SUN Solaris, Linux, Mac OS X

Tools: SVN, JIRA, Toad, SQL Developer, Serena Dimensions, Share point, Clear Case, Perforce

Agile, SCRUM, SDLC, Object: Oriented Analysis and Design, Test driven Development, Continuous Integration

PROFESSIONAL SUMMARY:

Confidential, NC

Sr. Big Data Developer/Engineer

Responsibilities:

  • Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.
  • Worked on all activities related to the development, implementation and support for Hadoop.
  • Designed custom re-usable templates in Nifi for code reusability and interoperability.
  • Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
  • Build frameworks using Python in Airflow to orchestrate the Data Science pipelines
  • Worked with teams in setting up AWS EC2 instances by using different AWS services like S3, EBS, and Elastic Load Balancer, Auto scaling groups, VPC subnets and Cloud Watch.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Worked with Kafka streaming tool to load the data into HDFS and exported it into MongoDB database.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Implemented multiple MapReduce Jobs in java for data cleansing and pre-processing.
  • Wrote complex Hive queries and UDFs in Java and Python.
  • Worked on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing.
  • Generated data analysis reports using Matplotlib, Tableau, successfully delivered and presented the results for C-level decision makers
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and MapReduce.
  • Used Scala and Spark-SQL to develop spark code for faster processing, testing and performed complex Hive queries on Hive tables.
  • Worked on Kerberization to secure the applications using SSL and SAML authentication
  • Wrote and execute SQL queries to work with structured data available in relational databases and to validate the transformation/ business logic.
  • Use Flume to move data from individual data sources to Hadoop system.
  • Use MRUnit framework to test the MapReduce code.
  • Responsible for building scalable distributed data solutions using Hadoop Eco system and Spark.
  • Worked on performance testing the api’s using Postman
  • Involved in the process of data acquisition, data pre-processing various types of source data using Stream sets.
  • Responsible for design & development of Spark SQL Scripts using Scala/Java based on Functional Specifications.
  • Analysed the data by performing Hive queries (HiveQL), ran Pig scripts, Spark SQL and Spark streaming.
  • Developed tools using Python, Shell scripting, XML to automate some of the menial tasks
  • Wrote scripts in Python for extracting data from HTML file.
  • Implemented MapReduce jobs in HIVE by querying the available data.
  • Configured Hive Meta store with MySQL, which stores the metadata for Hive tables.
  • Performed data analytics in Hive and then exported those metrics back to Oracle Database using Sqoop.
  • Performance tuning of Hive queries, MapReduce programs for different applications.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.

Environment: Nifi 1.1, Hadoop 2.6, JSON, XML, Avro, HDFS, Airflow Teradata r15, Sqoop, Kafka, MongoDB, Hive 2.3, Pig 0.17, HBase, Zookeeper, MapReduce, Postman, java, Python 3.6, Yarn, Flume, NoSQL, Cassandra 3.11.

Confidential Minneapolis, MN

Hadoop Developer

Responsibilities:

  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Application master, Node Manager, Resource Manager, Name Node, Datanode and MapReduce concepts.
  • Imported required tables from RDBMS to HDFS using Sqoop and also used Storm and Kafka to get real time streaming of data into HBase.
  • Good experience with NoSQL database Hbase and creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Wrote Hive and Pig scripts as ETL tool to do transformations, event joins, filter both traffic and some pre-aggregations before storing into the HDFS.
  • Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioural data and purchase histories into HDFS for analysis
  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Developed Java code to generate, compare & merge AVRO schema files.
  • Prepared the validation report queries, executed after every ETL runs, and shared the resultant values with business users in different phases of the project.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting & used the hive optimization techniques during joins and best practices in writing hive scripts using HiveQL.
  • Importing and exporting data into HDFS and Hive using Sqoop. Writing the HIVE queries to extract the data processed
  • Developing and running Map-Reduce Jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
  • Teamed up with Architects to design Spark model for the existing MapReduce model and Migrated MapReduce models to Spark Models using Scala.
  • Implemented Spark using Scala and utilizing SparkCore, Spark Streaming and SparkSQL API for faster processing of data instead of MapReduce in Java.
  • Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using Spark SQL
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce , loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce Hive, Pig, and Sqoop.
  • Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Pig, Sqoop, Spark and Zookeeper.
  • Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery and backup

Environment: Apache Hadoop, HDFS, MapReduce, HBase, Hive, Yarn, Pig, Sqoop, Flume, Zookeeper, Kafka, Impala, SparkSQL, Spark Core, Spark Streaming, NoSQL, MySQL, Cloudera, Java, JDBC, Spring, ETL, WebLogic, Web Analytics, Avro, Cassandra, Oracle, Shell Scripting, Ubuntu.

Confidential, PA

Hadoop Developer

Responsibilities:

  • Installed and configured various components of Hadoop Ecosystem like Job Tracker, Task Tracker, Name Node and Secondary Name Node.
  • Designed and developed multiple MapReduce Jobs in Java for complex analysis.
  • Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Imported required tables from RDBMS to HDFS using Sqoop and also used Storm and Kafka to get real time streaming of data into HBase
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts.
  • Moving the data from Oracle, MSSQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
  • Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe's like REGEX, JSON and Avro.
  • Developed data pipeline using Flume , Sqoop , Pig and MapReduce to ingest customer behavioral data and purchase histories into HDFS for analysis
  • Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using Spark SQL
  • Created MapReduce programs to handle semi/unstructured data like xml, Json, Avro data files and sequence files for log files.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
  • Implemented SparkRDD Transformations, actions to migrate MapReduce algorithms.
  • Used Zookeeper for providing coordinating services to the cluster.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and training the classifier using MapReduce jobs, Pig jobs and Hive jobs

Environment: Hadoop, Cloudera Manager, Linux, RedHat, CentOs, Ubuntu Operating System, Scala, HDFS, MapReduce, Hive, HBase, Oozie, Pig, Sqoop, Flume, Zookeeper, Kafka, Scala, Python, Java, JSON, Oracle, SQL, Avro

Confidential, Bloomington, IL

Hadoop Developer

Responsibilities:

  • Worked on loading disparate data sets coming from different sources to BDpaas (HADOOP) environment using SQOOP.
  • Loading datasets from two different sources like Oracle, MySQL to HDFS and Hive respectively.
  • Developed UNIX scripts in creating Batch load for bringing huge amount of data from Relational databases to BIGDATA platform.
  • Importing of data from various data sources, performed transformations, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data in to the Data Lake.
  • Involved in configuring batch job to perform ingestion of the source files in to the Data Lake.
  • Developed Pig queries to load data to HBase.
  • Leveraged Hive queries to create ORC tables.
  • Developed HIVE scripts for analyst requirements for analysis.
  • Worked extensively on Hive to create, alter and drop tables and involved in writing hive queries.
  • Created and altered HBase tables on top of data residing in Data Lake.
  • Created Views from Hive Tables on top of data residing in Data Lake.
  • Created Reports with different Selection Criteria from Hive Tables on the data residing in Data Lake.
  • Worked closely with scrum master and team to gather information and perform daily activities.
  • Worked with Systems Analyst and business users to understand requirements Environment: CDH, Hadoop, MapReduce, HDFS, Hive, Sqoop

Environment: Hadoop, Mainframe, Oracle, Linux, Hive, HDFS, DMX-h, Sqoop, Autosys, Spark, Scala

Confidential

Data Scientist/Data Analyst

Responsibilities:

  • Worked on data cleaning and reshaping, generated segmented subsets using Numpy and Pandas in Python
  • Wrote and optimized complex SQL queries involving multiple joins and advanced analytical functions to perform data extraction and merging from large volumes of historical data stored in Oracle 11g, validating the ETL processed data in target database
  • Good understanding of Teradata SQL Assistant, Teradata Administrator and data load Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, Pivot Tables and OLAP reporting.
  • Identified the variables that significantly affect the target.
  • Continuously collected business requirements during the whole project life cycle.
  • Conducted model optimization and comparison using stepwise function based on AIC value
  • Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency
  • Generated data analysis reports using Matplotlib, Tableau, successfully delivered and presented the results for C-level decision makers
  • Generated cost-benefit analysis to quantify the model implementation comparing with the former situation
  • Worked on model selection based on confusion matrices, minimized the Type II error

Environment: Tableau 7, Python 2.6.8, Numpy, Pandas, Matplotlib, Scikit-Learn, MongoDB, Oracle 10g, SQL

We'd love your feedback!