We provide IT Staff Augmentation Services!

Big Datadeveloper Resume

5.00/5 (Submit Your Rating)

Winnipeg, MB

SUMMARY

  • Highly acumen and experienced IT professional with 5+ years Hadoop Developer in Big Data/Hadoop technology development.
  • Experience working with Cloudera & Hortonworks Distribution of Hadoop.
  • Expertise in HDFS, MapReduce, Hive, Pig, Sqoop, HBase, Zookeeper and Hadoop ecosystem.
  • Worked with Apache Hadoop along enterprise version of Cloudera and Hortonworks. Good knowledge on MAPR distribution.
  • Hands on experience with Big Data Hadoop core and Eco - System components (HDFS, MR1, MR2, Yarn, Hive, Impala, Beeline, Sqoop, Flume, Oozie, HBase, Zookeeper and Pig).
  • Experience in manipulating the streaming data to clusters through Flume.
  • Proficient in working with NoSQL database like MongoDB and HBase.
  • Experience in partitioning the Big Data according the business requirements using Hive Indexing, partitioning and Bucketing.
  • Working with data extraction, transformation and load in Hive, Pig and HBase.
  • Working with data transformation from HDFS, HIVE, PIG, HBase, and MySQL.
  • Experience in creating UDF's, UDAF's for Hive and Pig.
  • Optimized streaming log files with no time latency using Flume and more importantly operating the data down stream flow to Hadoop ecosystems and it analysis segments.
  • Profound experience in working with Cloudera CDH 5.x on multi-node cluster.
  • Acumen in choosing an efficient ecosystem in Hadoop and providing the best solutions to Big Data problems.
  • Good Knowledge in Spark and Scala.
  • Developed POC in Spark and Scala.
  • Experience in reporting analyzed data in vivid formats using reporting tool Tableau.
  • Prolific in generating the splendid and informative dashboards for Business Intelligence teams.
  • Developed SPARK applications using Scala for easy Hadoop transitions. Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive. Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data
  • Experience in using design pattern, Java, Servlets, JSP, JavaScript, HTML, JQuery, Angular JS, Mobile JQuery, XML, Web Logic, JBOSS 4.2.3, SQL, PL/SQL, JUnit, and Apache-Tomcat, Linux.
  • Expertise in relational databases like Oracle, My SQL and SQL Server.
  • Experience in Agile methodologies.
  • Proficient communication skills with an ability to lead a team & keep them motivated.
  • Extensive experience with Java complaint IDE's like Eclipse.
  • Adept in handling the team in untoward situations and capable of sailing the team to deliver the quality output.
  • Highly motivated and versatile team player with the ability to work independently & adapt quickly to new emerging technologies.

TECHNICAL SKILLS

Tools: Big Data Hadoop, Storm, Trident, HBase, Hive, Flume, Kafka, Storm, Sqoop, Oozie, PIG, Spark, MapReduce, Zookeeper, Yarn.

Operating Systems: UNIX, Mac, Linux, Windows 2000 / NT / XP / Vista,Android

Programming Languages: Java (JDK 5/JDK 6), C/C++, Mat lab, R, HTML, SQL,PL/SQL

Frameworks: Hibernate 2.x/3.x, Spring 2.x/3.x,Struts 1.x/2.x and JPA

Web Services: WSDL, SOAP, Apache CXF/XFire, Apache Axis, REST,Jersey

Databases/technologies: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 & MySQL 4.x/5.x

Middleware Technologie: s Web sphere Message Queue, Web sphere MessageBroker, XML gateway, JMS

Web Technologies: J2EE, Soap & REST Web Services, JSP, Servlets, EJBJava: Script, Struts, Spring,Webworks, Direct Web remoting, HTML, XML, JMS, JSF, Ajax.

Testing Frameworks: Mockito, PowerMock, EasyMock.

Web/Application: ServersIBM Web sphere Application server, Jboss, ApacheTomcat.

Others Software: Borland Star team, Clear case, Junit, ANT,Maven, Android Platform, Microsoft Office, SQL Developer,DB2 control center, Microsoft Visio, Gradle, Hudson, Subversion, GIT, Nexus, Artifactory and Trac.

evelopment Strategies: Agile, Lean Agile, Pair Programming, Water-Fall and Test

PROFESSIONAL EXPERIENCE

Big DataDeveloper

Confidential - Winnipeg, MB

Responsibilities:

  • SDLC Requirements gathering, Analysis, Design, Development and Testing of application using AGILE and SCRUM methodology.
  • Detailed understanding on existing build system, Tools related for information of various products and releases and test results information
  • Designed and implemented map reduce jobs to support distributed processing using Java, Hive and Apache Pig.
  • Consumed Web Services for transferring data between different applications using RESTFUL APIs.
  • Built a mechanism for Talend, automatically moving the existing proprietary binary format data files to HDFS using a service called Ingestion service.
  • Implemented a prototype to integrate PDF documents into a web application using Git hub.
  • Active participation in process improvement, normalization/de-normalization, data extraction, data cleansing, SCRUM data manipulation
  • Performed data transformations in Scala, Hive and used partitions, buckets for performance improvements.
  • Written custom Input format and record reader classes for reading and processing the binary format in MapReduce.
  • Used Mockito frame work as the unit test runner.
  • Involved in Test Driven Development (TDD) and Acceptance Test Driven Development (ATDD).
  • Managed and deployed Amazon Web Services Elastic MapReduce (AWS EMR) clusters.
  • Build cloud-native applications using Amazon Web Services - specifically Elastic Map Reduce (EMR), Lambda, DynamoDB, and Elastic Beanstalk.
  • Managed data schema versions across various microservices.
  • Developed and tested the enterprise application with JUNIT.
  • Written Custom writable classes for Hadoop serialization and De-serialization of time series tuples.
  • Implemented custom file loader for Pig to query directly on large data files such as build logs
  • Used Python for pattern matching in build logs to format errors and warnings
  • Developed Pig Latin scripts & Shell scripts for validating the different query modes in Historian.
  • Created Hive external tables on the MapReduce output before partitioning; bucketing is applied on top of it.
  • Improved the Performance by Scala, tuning of HIVE and MapReduce using Talend, ActiveMQ and JBoss.
  • Developed daily test engine using Python for continuous tests.
  • Developed rich interactive visualizations integrating various reporting components from multiple data sources
  • Used Shell scripting for Jenkins job automation with Talend.
  • Building a custom calculation engine which can be programmed according to user needs.
  • Ingestion of data into Hadoop using Shell scripting for SCRUM, Elastic Sqoop and apply data transformations and using Pig and Hive.
  • Handled the performance improvement changes to Pre-Ingestion service which is responsible for generating the Big Data Format binary files from older version of Historian.
  • Worked with support teams and resolved operational & performance issues.

Environment: Apache Hadoop, Hive, Scala, PIG, HDFS, Cloudera, Java Map-Reduce, Python, Maven, GIT, Jenkins, UNIX, MySQL, Eclipse, Oozie, Sqoop, Flume, Oracle, JDK 1.8/1.7, Agile and Scrum Development Process, NoSQL, JBoss, Flink, Java Script, and Mockito

Hadoop Developer

Confidential - Calgary, AB

Responsibilities:

  • Evaluated Spark performance vs Impala on transactional data.
  • Used Spark transformations and aggregations using Python and Scala to perform min, max and average on transactional data.
  • Experienced in migrating data from HiveQL to Spark SQL using Scala.
  • Knowledge in using Spark Data-frames to load data in Spark Data-frames.
  • Knowledge on handling Hive queries using Spark SQL that integrate with Spark Environment.
  • Used java to develop Restful API for database Utility Project.
  • Responsible for performing extensive data validation using Hive.
  • Implemented a Data service as a rest API project to retrieve server utilization data from this Cassandra Table.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real- time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Used various spark Transformations and Actions for cleansing the input data.
  • Experience with creating script for data modeling and data import and export.
  • Extensive experience in deploying, managing and developing MongoDB clusters
  • Developed Spark Streaming script which consumes topics from distributed messagingsource Kafka and periodically pushes batch of data to Spark for real time processing
  • Implemented shell script to call python script to perform min, max and average on
  • utilization data of 1000s hosts and compared the performance on various levels ofsummarization.
  • Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability.
  • Generated reports from this hive table for visualization purpose.
  • Migrated HiveQL to Spark SQL to validate Spark, performance with Hive.
  • Implemented Proof of concept for Dynamo DB, Redshift and EMR

Environment: Hadoop, AWS, HDFS, Hive, Hue, Oozie, Java, Linux, Scala, Spark

HadoopDeveloper

Confidential - Kingston, ON

Responsibilities:

  • Worked on analyzing Hadoop stack and different big data analytic tools including Pig,Hive, Hbase database and Sqoop.
  • Experienced to implement Hortonworks distribution system (HDP 2.1, HDP 2.2 and HDP 2.3).
  • Developed Map Reduce programs for some refined queries on big data.
  • Experienced in working with Elastic MapReduce (EMR).
  • Creating Hive tables and working on them for data analysis to cope up with the requirements.
  • Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
  • Worked with business team in creating Hive queries for ad hoc access.
  • In depth understanding of Classic Map Reduce and YARN architectures.
  • Implemented Hive Generic UDF to implement business logic.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Installed and configured Pig for ETL jobs.
  • Developed Pig UDF& to pre-process the data for analysis.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing andprocessing of data.
  • Used Apache NiFi to copy the data from local file system to HDFS.
  • Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
  • Involved in continuous monitoring of operations using Storm.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Implemented indexing for logs from Oozie to Elastic Search.
  • Design, develop, unit test, and support ETL mappings and scripts for data marts using

Environment: Hortonworks, Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Oozie, SQL, Flume, Spark, HBase, Java, GitHub, Talend, Nifi.

HadoopDeveloper

Confidential - Toronto, ON

Responsibilities:

  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop the application.
  • Ingested the data from external data sources like MySQL using Sqoop and loaded data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop for further visualization and to generate reports for the BI team.
  • Worked on Linux shell scripts for business processes and with loading the data from different systems to the HDFS.
  • Configured Pig and designed Pig Latin scripts to process the data into a universal data model.
  • Used Pig Latin scripts to convert data from JSON, XML and other formats to Avro file format.
  • Wrote Custom UDFs in PIG to process and perform business intelligence on the data.
  • Created partitioned and bucketed tables in Hive based on the hierarchy of the dataset.
  • Involved in creating Hive internal and external tables, loading them with data and writing hive queries which require multiple join scenarios.
  • Analyzed, transformed, filtered, Co-Grouped and aggregated data with HiveQL.
  • Imported the transformed data into to generate visualizations for further analysis by Business analysts.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Developed Python scripts to analyze the data in the HDFS.
  • Distributed snapshot Jar's across the cluster in the distributed cache to increase the performance and efficiency of the algorithms by reducing the shuffling of data.
  • Used Cassandra node tool to manage Cassandra cluster.
  • Worked on to configure a Size Tiered Compaction Strategy (STCS) compaction strategy for Cassandra.
  • Processed the data with map reduce and staged end results to Cassandra.
  • Analyzed the performance of the cluster and related resources to identify bottlenecks.
  • Performed unit testing to meet the functional, technical and business requirements.
  • Performed Integration testing of software and firmware to ensure that the product health is optimal.

Environment: Scala, MapReduce, HDFS, Sqoop, Hive, YARN, PIG, Oozie, Shell Scripting, Git, Linux, Maven, AVRO.

HadoopDeveloper

Confidential - Halifax, NS

Responsibilities:

  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Develop different components of system like Hadoop process that involves Map Reduce, and Hive.
  • Developed interface for validating incoming data into HDFS before kicking off Hadoop process.
  • Written hive queries using optimized ways like user-defined functions, customizing Hadoop shuffle & sort parameters.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs.
  • Experience in creating tables, dropping and altered at run time without blocking updates and queries using Hive.
  • Experience on pre-processing the logs and semi structured content stored on HDFS using PIG.
  • Experience in structured data imports and exports into Hive warehouse which enables business analysts to write Hive queries.
  • Experience in managing and reviewing Hadoop log files.
  • Experience on Unix shell scripts for business process and loading data from different interfaces to HDFS.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts.
  • Hands on experience in eclipse, Putty, winSCP, VNCviewer, etc.

Environment: Linux, CDH, MapReduce, Hive, PIG, Shell Script, SQOOP, Eclipse, Java.

We'd love your feedback!