We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

3.00/5 (Submit Your Rating)

Atlanta, GeorgiA

SUMMARY:

  • Overall 7+ years of proactive IT experience in Analysis, Design, Development, Implementation,and Testing of software applications which includes an accomplished almost 4+ Years of experience in Big Data, Development and Design of Java based enterprise applications.
  • Leveraged strong Skills in developing applications involving Big Data technologies like Hadoop, MapReduce, Yarn, Flume, Hive, Pig, Sqoop, HBase, PIvotal, Cloudera, MapR, Avro, Spark and Scala.
  • Extensively worked on major components of Hadoop Ecosystem likeHDFS, HBase, Hive, Sqoop, PIG,and Mapreduce.
  • Developed various scripts, numerous batch jobs to schedule various Hadoop programs.
  • Experience in analyzing data using HiveQL, and custom MapReduce programs in Java.
  • Hands on experience in importing and exporting data from different databases like Oracle, Mysql, into HDFS and Hive using Sqoop.
  • Implemented Flume for collecting, aggregating and moving a largenumber of server logs and streaming data to HDFS.
  • Hands on experience in spark, scala,and Marklogic.
  • Extensively used MapReduce Design Patterns to solve complex MapReduce programs.
  • Developed Hive queries for data analysis to meet the business requirements.
  • Experience in extending Hive and Pig core functionality by writing custom UDFs like UDAFs and UDTFs.
  • Experienced implementing Security mechanism for Hive Data.
  • Extensively used ETL methodology for performing Data Profiling, Data Migration, Extraction, Transformation and loading using Talend/SSIS and designed data conversions from a wide variety of source systems.
  • Extensively created mappings in Talend using tMap, tJoin, tReplicate, tParallelize, tJava, tJavarow, tDie, tAggregateRow, tWarn, tLogCatcher, tMysqlScd, tFilter, tGlobalmap etc.
  • Experience with Hive Queries Performance Tuning.
  • Experienced with improving data cleansing process using Pig Latin operations, transformations and join operations.
  • Extensive knowledge of NoSQL database like HBase.
  • Experienced with performing CRUD operations using HBase Java Client API and Rest API.
  • Experience in designing both time driven and data driven automated workflows using Bedrock and Talend.
  • Good knowledge in creating PL/SQL Stored Procedures, Creating indexes, Packages, Functions, Triggers, Cursors with Oracle (9i, 10g, 11g), and MySQL server.
  • Expert in designing and writing on - demand UNIX shell scripts.
  • Extensively worked with Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java Design Patterns.
  • Excellent Java development skills using J2EE Frameworks like Struts, EJBs and Web Services.
  • Proficient in development methodologies such as Scrum, Agile,and Waterfall.
  • Passion to excel in any assignment and have good debugging and problem solving skills.Ability to work under high pressure and close deadlines.
  • Excellent adaptability, ability to learn, good analytical and programmatic skills.

TECHNICAL SKILLS:

Hadoop/Big Data Technologies: HDFS, Map Reduce, HBase, Hive, Pig, Impala, SQOOP, Flume, OOZIE, Spark, Spark QL, and Zookeeper, AWS, Cloud era, Horton works, Kafka, Avro, Big Query.

Languages: Core Java, XML, HTML and Hive QL.

J2EE Technologies: Servlets, JSP, JMS, JSTL, AJAX, DOJO, JSON and Blaze DS.

Frameworks: Spring 2, Struts 2 and Hibernate 3.

Application & Web Services: WebSphere 6.0, JBoss 4.X and Tomcat 5.

Scripting Languages: Java Script, Angular JS, Pig Latin, Python 2.7and Scala.

Database (SQL/No SQL): Oracle 9i, SQL Server 2005, MySQL, HBase and Mongo DB 2.2

IDE: Eclipse and Edit plus.

PM Tools: MS MPP, Risk Management, ESA.

Other Tools: SVN, Apache Ant, JUnit and Star UML, TOAD, Pl/SQL Developer, Perforce, JIRA, Bugzilla, Visual Source, QC, Agile Methodology.

EAI Tools: TIBCO 5.6.

Bug Tracking/ Ticketing: Mercury Quality Center and Service Now.

Operating System: Windows 98/2000, Linux /Unix and Mac.

PROFESSIONAL EXPERIENCE:

Confidential, Atlanta, Georgia

Hadoop/Big Data Developer

Responsibilities:

  • Translation of functional and technical requirements into detailed architecture and design.
  • Developed automated scripts for all jobs starting from pulling the data from Mainframes to HDFS system.
  • Developed and Designed ETL Applications and Automated using Oozie workflows and Shell scripts with error handling and mailing System.
  • Implemented nine nodes CDH4 Hadoop cluster on Ubuntu LINUX.
  • Implemented Map reduce programs by joining data sets from different sources using joins.
  • Optimized map reduces programs by configuring map reduce configurationally parameters and implemented optimized joins.
  • Implemented map reduce solutions like Top-K, summarizations, data partitions using map reduce design patterns.
  • Implemented map reduce programs to handle different file formats like Xml, Avro, sequence files and implemented compression techniques.
  • Developed hive queries according to business requirement.
  • Designed/created Hive Internal tables, partitions to store structured data.
  • Developed Hive custom UDF's to in corporate business logics into Hive Queries.
  • Used Hive Optimized file formats like ORC formats and Parquet formats.
  • Implemented Hive Serializes, desterilizes to handle Avro files, used Xpath expressions to handle Xml files.
  • Importing & exporting data from RDBMS through Sqoop.
  • Designed Cassandra data modeling to analyze near real time analysis.
  • Configured Cassandra clustered, v-nodes, replication strategies, and configured data model using data stax community.
  • Ensured NFS is configured for Name Node.
  • Designed/Implemented time series data analysis using Cassandra file system.
  • Implemented CRUD operation on top of Cassandra data using CQL and Rest API.
  • Implemented data import/export data from structured data using Sqoop import/export options.
  • Implemented Sqoop saved jobs, incremental imports to import data.
  • Used Compression Techniques (snappy) with file formats to leverage the storage in HDFS.
  • Used cloud era manager to perform cluster monitoring, debug map reduce jobs, handle job submission on the cluster.
  • Successfully migrated Legacy application to Big Data application using Hive/Pig in Production level.

Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, OOzie, Cloudera, PIG, Java (JDK 1.6), Eclipse, MySQL and Ubuntu, Zookeeper.

Confidential, New York

Hadoop/Big Data Developer

Responsibilities:

  • Involved in all phases of development activities from requirements collection to production support.
  • Detailed understanding of the current system and find out the different sources of data for EMR.
  • Involved in a Cluster setup.
  • Performed Batch processing of logs from various data sources using MapReduce
  • Automated job cloudera submission Via Jenkins scripts and Chef.
  • Predictive analytics (which can monitor inventory levels and ensure product availability)
  • Analysis of customers' purchasing behaviors in JMS.
  • Response to value-added services based on clients' profiles and purchasing habits
  • Worked on gathering and refining requirements, interviewing business users to understand and document data requirements including elements, entities,and relationships, in addition to visualization and report specifications.
  • Defined UDFs using PIG and Hive in order to capture customer behavior.
  • Design and implement map reduce jobs to support distributed processing using java, Hive, Spark SQL and Apache Pig, Oozie
  • Integrated Apache Kafka for data ingestion.
  • Create Scala, Hive external tables on the map reduce output before partitioning, bucketing is applied on top of it.
  • Providing Shell scripting pivotal graphs in order to show the trends
  • Maintenance of data importing scripts using HBase, Hive and Map reduce jobs
  • Developed and maintain several batch jobs to run automatically depending on business requirements
  • Import and export data between the environments like MySQL, HDFS and Unit testing and Deploying for internal usage monitoring performance of solution.

    Environment:Apache Hadoop, Cloudera, RHEL, Hive, HBase, PIG, HDFS, Java Map-Reduce, Core Java, Python, Maven, GIT, Jenkins, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Flume and Cloudera Distribution, Oracle, Teradata,and MySql.

Confidential, Provo, Utah

Hadoop Developer Dec

Responsibilities:

  • Detailed Understanding on the existing build system, Tools related for information of various products and releases and test results information
  • Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig.
  • Developed UDF's to provide custom hive and pig capabilities using SOAP/RESTful services.
  • Built a mechanism for talend, automatically moving the existing proprietary binary format data files to HDFS using a service called Ingestion service.
  • Implemented a prototype to integrate PDF documents into a web application using Github.
  • Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, Scrum data manipulation
  • Performed Scala, Data transformations in Scala, HIVE and used partitions, buckets for performance improvements.
  • Written custom Input format and record reader classes for reading and processing the binary format in MapReduce.
  • Written Custom writable classes for Hadoop serialization and De serialization of Time series tuples.
  • Implemented Custom File loader for Pig so that we can query directly on the large Data files such as build logs
  • Used Python for pattern matching in build logs to format errors and warnings
  • Developed Pig Latin scripts & Shell scrip for validating the different query modes in Historian.
  • Created Hive external tables on the map reduce output before partitioning; bucketing is applied on top of it.
  • Improved the Performance by Scala, tuning of HIVE and map reduce using talend, ActiveMQ,and JBoss.
  • Developed Daily Test engine using Python for continuous tests.
  • Used Shell scripting for Jenkins job automation with Talend.
  • Building a custom calculation engine which can be programmed according to user needs.
  • Ingestion of data into Hadoop using Shell scripting for Scrum, Elastic Sqoop and apply data transformations and using Pig and HIVE.
  • Handled the performance improvement changes to Pre Ingestion service which is responsible for generating the Big Data Format binary files from an older version of Historian
  • Worked with support teams and resolved operational & performance issues
  • Research, Scrum, evaluate and utilize new technologies/tools/frameworks around Hadoop eco system
  • Prepared graphs from test results posted to MIA

Environment: Apache Hadoop, Hive, Scala, PIG, HDFS, Cloud era, Java Map-Reduce, Core Java, Python, Maven, GIT, Jenkins, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Flume, Oracle, My SQL, and CDH4.X.

Confidential

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different Big Data analytic tools including Hive, Pig and Map Reduce.
  • Installed and configured the Hadoop cluster using the Cloud era's CDH distribution and monitored the cluster performance using the Cloud era Manager.
  • Monitored workload, job performance and capacity planning using Cloud era Manager.
  • Implemented schedulers on the Job tracker to share resources of the cluster for the Map Reduce jobs given by cluster.
  • Involved in Design, develop Hive Data model, loading with data and writing Java UDF for Hive.
  • Handled importing and exporting data into HDFS by developing solutions, analyzed the data using Map Reduce, Hive and produce summary results from Hadoop to downstream systems.
  • Used Sqoop to import and export the data from Hadoop Distributed File System (HDFS) to RDBMS.
  • Created Hive tables and loaded data from HDFS to hive tables as per the requirement.
  • Established custom Map Reduces programs to analyze data and used HQL queries for data cleansing.
  • Created components like Hive UDFs for missing functionality in Hive to analyze and process large volumes of data extracted from No-SQL database Cassandra.
  • Collecting and aggregating substantial amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Worked on optimization of Map Reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for an HDFS cluster.
  • Comprehensive Knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, Scrum data manipulation.

Environment: Cloud era Distribution (CDH), HDFS, Pig, Hive, Map Reduce, Sqoop, Hbase, Impala, Java, SQL, Cassandra.

Confidential

B ig Data/Hadoop Developer

Responsibilities:

  • Involved in the analysis, design, and development, testing phases of Software Development Life Cycle (SDLC)
  • Analysis, design and development of Application based on J2EE using Struts and Tiles, Spring 2.0 and Hibernate 3.0.
  • Developed the services to run Map Reduce jobs as per the daily requirement.
  • Involved in creating Hive tables, loading them with data and writing hive queries.
  • Involved in optimizing Hive Queries, joins to get better results for Hive ad-hoc queries.
  • Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data into HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Hands on experience with NoSQL databases like HBase for POC (proof of concept) in storing URL's, images, products and supplements information at real time.
  • Developed integrated dash board to perform CRUD operations on HBase data using Thrift API.
  • Implemented error notification module to support team using HBase co-processors (Observers).
  • Configured, integrated Flume sources, channels, destinations to analyze log data in HDFS.
  • Implemented flume custom interceptors to perform cleansing operations before moving data onto HDFS.
  • Involved in troubleshooting errors in Shell, Hive and Map Reduce.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Developed Oozie workflows which are scheduled monthly.

Environment: Map Reduce, HDFS, HBase, HDP Horton, Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, PL/SQL, TOAD, Java.

Confidential

Java Developer

Responsibilities:

  • Involved in coding, designing, documenting, debugging and maintenance of several applications.
  • Involved in creation of SQL tables, indexes and was involved in writing queries to read/manipulate data.
  • Used JDBC to establish connection between the database and the application.
  • Created the user interface using HTML, CSS and JavaScript.
  • Maintenance and support of the existing applications.
  • Responsible for the development of database SQL queries
  • Created/modified shell scripts for scheduling and automating tasks.
  • Wrote unit test cases using Junit framework.

Environment: Java, J2EE, Servlets, JSP, SQL, PL/SQL, HTML, JavaScript, CSS, Eclipse, Oracle, MYSQL, IBM WebSphere, JIRA.

We'd love your feedback!