Hadoop/big Data Developer Resume
Atlanta, GeorgiA
SUMMARY:
- Overall 7+ years of proactive IT experience in Analysis, Design, Development, Implementation,and Testing of software applications which includes an accomplished almost 4+ Years of experience in Big Data, Development and Design of Java based enterprise applications.
- Leveraged strong Skills in developing applications involving Big Data technologies like Hadoop, MapReduce, Yarn, Flume, Hive, Pig, Sqoop, HBase, PIvotal, Cloudera, MapR, Avro, Spark and Scala.
- Extensively worked on major components of Hadoop Ecosystem likeHDFS, HBase, Hive, Sqoop, PIG,and Mapreduce.
- Developed various scripts, numerous batch jobs to schedule various Hadoop programs.
- Experience in analyzing data using HiveQL, and custom MapReduce programs in Java.
- Hands on experience in importing and exporting data from different databases like Oracle, Mysql, into HDFS and Hive using Sqoop.
- Implemented Flume for collecting, aggregating and moving a largenumber of server logs and streaming data to HDFS.
- Hands on experience in spark, scala,and Marklogic.
- Extensively used MapReduce Design Patterns to solve complex MapReduce programs.
- Developed Hive queries for data analysis to meet the business requirements.
- Experience in extending Hive and Pig core functionality by writing custom UDFs like UDAFs and UDTFs.
- Experienced implementing Security mechanism for Hive Data.
- Extensively used ETL methodology for performing Data Profiling, Data Migration, Extraction, Transformation and loading using Talend/SSIS and designed data conversions from a wide variety of source systems.
- Extensively created mappings in Talend using tMap, tJoin, tReplicate, tParallelize, tJava, tJavarow, tDie, tAggregateRow, tWarn, tLogCatcher, tMysqlScd, tFilter, tGlobalmap etc.
- Experience with Hive Queries Performance Tuning.
- Experienced with improving data cleansing process using Pig Latin operations, transformations and join operations.
- Extensive knowledge of NoSQL database like HBase.
- Experienced with performing CRUD operations using HBase Java Client API and Rest API.
- Experience in designing both time driven and data driven automated workflows using Bedrock and Talend.
- Good knowledge in creating PL/SQL Stored Procedures, Creating indexes, Packages, Functions, Triggers, Cursors with Oracle (9i, 10g, 11g), and MySQL server.
- Expert in designing and writing on - demand UNIX shell scripts.
- Extensively worked with Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java Design Patterns.
- Excellent Java development skills using J2EE Frameworks like Struts, EJBs and Web Services.
- Proficient in development methodologies such as Scrum, Agile,and Waterfall.
- Passion to excel in any assignment and have good debugging and problem solving skills.Ability to work under high pressure and close deadlines.
- Excellent adaptability, ability to learn, good analytical and programmatic skills.
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: HDFS, Map Reduce, HBase, Hive, Pig, Impala, SQOOP, Flume, OOZIE, Spark, Spark QL, and Zookeeper, AWS, Cloud era, Horton works, Kafka, Avro, Big Query.
Languages: Core Java, XML, HTML and Hive QL.
J2EE Technologies: Servlets, JSP, JMS, JSTL, AJAX, DOJO, JSON and Blaze DS.
Frameworks: Spring 2, Struts 2 and Hibernate 3.
Application & Web Services: WebSphere 6.0, JBoss 4.X and Tomcat 5.
Scripting Languages: Java Script, Angular JS, Pig Latin, Python 2.7and Scala.
Database (SQL/No SQL): Oracle 9i, SQL Server 2005, MySQL, HBase and Mongo DB 2.2
IDE: Eclipse and Edit plus.
PM Tools: MS MPP, Risk Management, ESA.
Other Tools: SVN, Apache Ant, JUnit and Star UML, TOAD, Pl/SQL Developer, Perforce, JIRA, Bugzilla, Visual Source, QC, Agile Methodology.
EAI Tools: TIBCO 5.6.
Bug Tracking/ Ticketing: Mercury Quality Center and Service Now.
Operating System: Windows 98/2000, Linux /Unix and Mac.
PROFESSIONAL EXPERIENCE:
Confidential, Atlanta, Georgia
Hadoop/Big Data Developer
Responsibilities:
- Translation of functional and technical requirements into detailed architecture and design.
- Developed automated scripts for all jobs starting from pulling the data from Mainframes to HDFS system.
- Developed and Designed ETL Applications and Automated using Oozie workflows and Shell scripts with error handling and mailing System.
- Implemented nine nodes CDH4 Hadoop cluster on Ubuntu LINUX.
- Implemented Map reduce programs by joining data sets from different sources using joins.
- Optimized map reduces programs by configuring map reduce configurationally parameters and implemented optimized joins.
- Implemented map reduce solutions like Top-K, summarizations, data partitions using map reduce design patterns.
- Implemented map reduce programs to handle different file formats like Xml, Avro, sequence files and implemented compression techniques.
- Developed hive queries according to business requirement.
- Designed/created Hive Internal tables, partitions to store structured data.
- Developed Hive custom UDF's to in corporate business logics into Hive Queries.
- Used Hive Optimized file formats like ORC formats and Parquet formats.
- Implemented Hive Serializes, desterilizes to handle Avro files, used Xpath expressions to handle Xml files.
- Importing & exporting data from RDBMS through Sqoop.
- Designed Cassandra data modeling to analyze near real time analysis.
- Configured Cassandra clustered, v-nodes, replication strategies, and configured data model using data stax community.
- Ensured NFS is configured for Name Node.
- Designed/Implemented time series data analysis using Cassandra file system.
- Implemented CRUD operation on top of Cassandra data using CQL and Rest API.
- Implemented data import/export data from structured data using Sqoop import/export options.
- Implemented Sqoop saved jobs, incremental imports to import data.
- Used Compression Techniques (snappy) with file formats to leverage the storage in HDFS.
- Used cloud era manager to perform cluster monitoring, debug map reduce jobs, handle job submission on the cluster.
- Successfully migrated Legacy application to Big Data application using Hive/Pig in Production level.
Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, OOzie, Cloudera, PIG, Java (JDK 1.6), Eclipse, MySQL and Ubuntu, Zookeeper.
Confidential, New York
Hadoop/Big Data Developer
Responsibilities:
- Involved in all phases of development activities from requirements collection to production support.
- Detailed understanding of the current system and find out the different sources of data for EMR.
- Involved in a Cluster setup.
- Performed Batch processing of logs from various data sources using MapReduce
- Automated job cloudera submission Via Jenkins scripts and Chef.
- Predictive analytics (which can monitor inventory levels and ensure product availability)
- Analysis of customers' purchasing behaviors in JMS.
- Response to value-added services based on clients' profiles and purchasing habits
- Worked on gathering and refining requirements, interviewing business users to understand and document data requirements including elements, entities,and relationships, in addition to visualization and report specifications.
- Defined UDFs using PIG and Hive in order to capture customer behavior.
- Design and implement map reduce jobs to support distributed processing using java, Hive, Spark SQL and Apache Pig, Oozie
- Integrated Apache Kafka for data ingestion.
- Create Scala, Hive external tables on the map reduce output before partitioning, bucketing is applied on top of it.
- Providing Shell scripting pivotal graphs in order to show the trends
- Maintenance of data importing scripts using HBase, Hive and Map reduce jobs
- Developed and maintain several batch jobs to run automatically depending on business requirements
- Import and export data between the environments like MySQL, HDFS and Unit testing and Deploying for internal usage monitoring performance of solution.
Environment:Apache Hadoop, Cloudera, RHEL, Hive, HBase, PIG, HDFS, Java Map-Reduce, Core Java, Python, Maven, GIT, Jenkins, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Flume and Cloudera Distribution, Oracle, Teradata,and MySql.
Confidential, Provo, Utah
Hadoop Developer Dec
Responsibilities:
- Detailed Understanding on the existing build system, Tools related for information of various products and releases and test results information
- Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig.
- Developed UDF's to provide custom hive and pig capabilities using SOAP/RESTful services.
- Built a mechanism for talend, automatically moving the existing proprietary binary format data files to HDFS using a service called Ingestion service.
- Implemented a prototype to integrate PDF documents into a web application using Github.
- Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, Scrum data manipulation
- Performed Scala, Data transformations in Scala, HIVE and used partitions, buckets for performance improvements.
- Written custom Input format and record reader classes for reading and processing the binary format in MapReduce.
- Written Custom writable classes for Hadoop serialization and De serialization of Time series tuples.
- Implemented Custom File loader for Pig so that we can query directly on the large Data files such as build logs
- Used Python for pattern matching in build logs to format errors and warnings
- Developed Pig Latin scripts & Shell scrip for validating the different query modes in Historian.
- Created Hive external tables on the map reduce output before partitioning; bucketing is applied on top of it.
- Improved the Performance by Scala, tuning of HIVE and map reduce using talend, ActiveMQ,and JBoss.
- Developed Daily Test engine using Python for continuous tests.
- Used Shell scripting for Jenkins job automation with Talend.
- Building a custom calculation engine which can be programmed according to user needs.
- Ingestion of data into Hadoop using Shell scripting for Scrum, Elastic Sqoop and apply data transformations and using Pig and HIVE.
- Handled the performance improvement changes to Pre Ingestion service which is responsible for generating the Big Data Format binary files from an older version of Historian
- Worked with support teams and resolved operational & performance issues
- Research, Scrum, evaluate and utilize new technologies/tools/frameworks around Hadoop eco system
- Prepared graphs from test results posted to MIA
Environment: Apache Hadoop, Hive, Scala, PIG, HDFS, Cloud era, Java Map-Reduce, Core Java, Python, Maven, GIT, Jenkins, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Flume, Oracle, My SQL, and CDH4.X.
Confidential
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different Big Data analytic tools including Hive, Pig and Map Reduce.
- Installed and configured the Hadoop cluster using the Cloud era's CDH distribution and monitored the cluster performance using the Cloud era Manager.
- Monitored workload, job performance and capacity planning using Cloud era Manager.
- Implemented schedulers on the Job tracker to share resources of the cluster for the Map Reduce jobs given by cluster.
- Involved in Design, develop Hive Data model, loading with data and writing Java UDF for Hive.
- Handled importing and exporting data into HDFS by developing solutions, analyzed the data using Map Reduce, Hive and produce summary results from Hadoop to downstream systems.
- Used Sqoop to import and export the data from Hadoop Distributed File System (HDFS) to RDBMS.
- Created Hive tables and loaded data from HDFS to hive tables as per the requirement.
- Established custom Map Reduces programs to analyze data and used HQL queries for data cleansing.
- Created components like Hive UDFs for missing functionality in Hive to analyze and process large volumes of data extracted from No-SQL database Cassandra.
- Collecting and aggregating substantial amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Worked on optimization of Map Reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for an HDFS cluster.
- Comprehensive Knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, Scrum data manipulation.
Environment: Cloud era Distribution (CDH), HDFS, Pig, Hive, Map Reduce, Sqoop, Hbase, Impala, Java, SQL, Cassandra.
Confidential
B ig Data/Hadoop Developer
Responsibilities:
- Involved in the analysis, design, and development, testing phases of Software Development Life Cycle (SDLC)
- Analysis, design and development of Application based on J2EE using Struts and Tiles, Spring 2.0 and Hibernate 3.0.
- Developed the services to run Map Reduce jobs as per the daily requirement.
- Involved in creating Hive tables, loading them with data and writing hive queries.
- Involved in optimizing Hive Queries, joins to get better results for Hive ad-hoc queries.
- Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data into HDFS.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Hands on experience with NoSQL databases like HBase for POC (proof of concept) in storing URL's, images, products and supplements information at real time.
- Developed integrated dash board to perform CRUD operations on HBase data using Thrift API.
- Implemented error notification module to support team using HBase co-processors (Observers).
- Configured, integrated Flume sources, channels, destinations to analyze log data in HDFS.
- Implemented flume custom interceptors to perform cleansing operations before moving data onto HDFS.
- Involved in troubleshooting errors in Shell, Hive and Map Reduce.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Developed Oozie workflows which are scheduled monthly.
Environment: Map Reduce, HDFS, HBase, HDP Horton, Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, PL/SQL, TOAD, Java.
Confidential
Java Developer
Responsibilities:
- Involved in coding, designing, documenting, debugging and maintenance of several applications.
- Involved in creation of SQL tables, indexes and was involved in writing queries to read/manipulate data.
- Used JDBC to establish connection between the database and the application.
- Created the user interface using HTML, CSS and JavaScript.
- Maintenance and support of the existing applications.
- Responsible for the development of database SQL queries
- Created/modified shell scripts for scheduling and automating tasks.
- Wrote unit test cases using Junit framework.
Environment: Java, J2EE, Servlets, JSP, SQL, PL/SQL, HTML, JavaScript, CSS, Eclipse, Oracle, MYSQL, IBM WebSphere, JIRA.