Hadoop Developer/ Big Data Analyst Resume
Basking Ridge, NJ
PROFESSIONAL SUMMARY:
- 4 years of hands on experience in Hadoop, HDFS, Map Reduce and Hadoop Ecosystem
- Good experience in system monitoring, development and support related activities for Hadoop and Java/J2EE Technologies.
- Highly experienced as Big Data Engineer with deep understanding of the Hadoop Distributed File System and Eco System (HDFS, Map Reduce, Hive, Sqoop, Oozie, Zookeeper, HBase, Flume, PIG, Apache Kafka) in a range of industries such as Retail and Communication sectors.
- Experience in importing and exporting data from RDBMS to HDFS, Hive tables and HBase by using Sqoop.
- Experience in importing streaming data into HDFS using flume sources, flume interceptors and flume sinks.
- Experienced in implementing complex analytical algorithms using Map reduce design patterns.
- Good Expertise working with different varieties of data including semi/un - structured data using Map reduce programs.
- Experienced in optimization techniques in sorting and phase of Map reduce programs, and implemented optimized joins that will join data from different data sources.
- Hands on Experience in performing analytics on structured data in hive with Hive queries, Views, Partitioning, Bucketing and UDF’s using HiveQL.
- Experience in performance tuning of HIVE queries and Java Map Reduce programs for scalability and faster execution.
- Hands on experience in writing Map Reduce jobs on Hadoop Ecosystem using Pig Latin and creating Pig scripts to carry out essential data operations and tasks.
- Worked with different file formats like JSON, XML, Avro data files and text files.
- Used PIG Latin Scripts, join operations, custom user defined functions (UDF) to perform ETL operations.
- Excellent understanding and knowledge of NOSQL databases like HBase, Cassandra.
- Hands on experience in creating Apache Spark RDD transformations on Data sets in the Hadoop data lake.
- Used Apache Oozie to combine multiple jobs for Map Reduce, Hive, Pig, Sqoop into one logical unit of work.
- Experience in working with Hadoop in Stand-alone, pseudo and distributed modes.
- Good Knowledge on Cloud Computing with Amazon Web Services like EC2, S3 which provides fast and efficient processing of Big Data
- Experienced with using different kind of compression techniques to save data and optimize data transfer over network using Lzo, Snappy, etc.
- Thorough understanding of Software Development Life Cycle (SDLC), Software Test Life Cycle (STLC) and processes across multiple environments and platforms.
- Hands on experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries, using Oracle, DB2, SQL Server and MySQL.
- Experienced in Database upgrade , migration and Database Patching .
- Well experienced in Pro-active monitoring of database health .
- Creating required table spaces, users, roles and Granting/Revoking privileges to users.
- Experience in Performance Tuning, tuned various large critical OLAP and OLTP databases of Terabytes size using various tools like STATSPACK , AWR , SQL-TRACE,TKPROF, ADDM and Explain Plan Methods.
- Extensive knowledge in Database Backup and Recovery concepts using RMAN utilities scheduling full/incremental/cumulative backups, hot/cold backups writing/editing UNIX Shell Scripts for RMAN).
- Ability to transform complex business requirements into technical specifications.
- Used Maven extensively for building jar files of Map Reduce programs and J2EE applications.
- Used Agile methodology to work with IT and business to progress efficient system development
- Resourceful and creative, with a high adaptability to change. Enjoy new challenges and able to learn new skills quickly.
- Experienced in all facets of Software Development Life Cycle (Analysis, Design, Development, Testing and maintenance) using Waterfall and Agile methodologies
TECHNICAL EXPERTISE:
Big Data/ Hadoop: HDFS, HadoopMapReduce, Zookeeper, Hive, Pig, Sqoop, Flume, Oozie,Spark, Apache Kafka,Apache Solr
Cloud Computing: Amazon Web Services
Methodologies: Agile, Waterfall model
Java/J2EE Technologies: J2SE, J2EE, JDBC,JSP, Servlets,DesignPatterns,MVC, SOA
Frameworks: Hibernate, Spring MVC, Struts, JQuery, CSS3, bootstrap, ANT, Maven, JSON, JUnit, log4j, MRUnit
Database: Oracle (SQL & PL/SQL), My SQL,HBase, Cassandra, Mongo DB
Servers: Tomcat,JBOSS
Version Control: SVN, AccuRev, GIT
IDE: Eclipse, Net Beans
XML Related and Others: XML, DTD, XSD, XSLT, JAXB, JAXP, CSS, AJAX, JavaScript.
PROFESSIONAL EXPERIENCE:
Confidential, Basking Ridge, NJ
Hadoop Developer/ Big Data Analyst
Responsibilities:
- Involved in complete SDLC life cycle of big data project that includes requirement analysis, design, coding, testing and production.
- Extensively Used Sqoop to import/export data between RDBMS and hive tables, incremental imports and created Sqoop jobs for last saved value.
- Involved in implementing the solution for data preparation which is responsible for data transformation as wells as handling user stories.
- Developing and testing data Ingestion/Preparation/Dispatch jobs.
- Worked on migrating existing mainframe data and reporting feeds to hadoop.
- Involved in setup of IBM CDC to capture changes on mainframe.
- Developed Pig script to read CDC files and ingest into Hbase.
- Worked on Hbase table setup and shell script to automate ingestion process.
- Created hive external table on top of Hbase which were used for feed generation.
- Scheduled automated run in Talend .
- Worked on migration of an existing feed from hive to Spark. In order to reduce latency of feeds the existing hql was transformed to run using Spark SQL and Hive Context.
- Worked on logs monitoring using Splunk. Performed setup of Splunk forwarders and built dashboards on Splunk.
- Prepared Pig Scripts that were used to build denormalized JSON documents which were then loaded in ElasticSearch
- Experience in working with Spark SQL for processing data in the Hive tables.
- Writing Pig Latin Scripts to perform transformations (ETL) as per the use case requirement.
- Created dispatcher jobs using sqoop export to dispatch the data into Teradata target tables.
- Involved in indexing the files using solr for removing the duplicates in type 1 insert jobs.
- Implemented new PIG approach for SCD type1 jobs using PIG Latin scripts.
- Created Hive target tables to hold the data after all the PIG ETL operations using HQL.
- Created HQL scripts to perform the data validation once transformations are done as per the use case.
- Implemented compression technique to free up some space in the cluster using Snappy compression on HBase tables to reclaim the space.
- Hands on experience with Accessing and perform CURD operations against HBase data.
- Integrating SQL layer on top of HBase to get the best performance while reading and writing using salting feature.
- Written shell scripts to automate the process by scheduling and calling the scripts from scheduler.
- Create Hive scripts to load the historical data and also partition the data
- Integrated Hadoop with Tableau and SAS analytics to provide end users analytical reports
- Wrote rules in hive to predict members with various ailments and their primary care providers and reports are pushed to Elastic Search .
- Worked with provisioning and configuring Elastic Search nodes and create indexes. Prepared Kibana Dashboard for business users.
- Closely collaborated with both the onsite and offshore team
- Closely worked with App support team to deploy the developed jobs into production
Environment: Hadoop, HDFS, Map Reduce, Hive, Flume, Sqoop, PIG, Java (JDK 1.6), Eclipse, MySQL and Ubuntu, Zookeeper, Oracle, Shell Scripting, Elastic Search, Kibana, Mapr.
Confidential, Chevy Cahse, MD
Hadoop Developer
Responsibilities:
- Involved in complete SDLC life cycle of big data project that includes requirement analysis, design, coding, testing and production
- Extensively Used Sqoop to import/export data between RDBMS and hive tables, incremental imports and created Sqoop jobs for last saved value.
- Established custom Map Reduces programs in order to analyze data and used Pig Latin to clean unwanted data.
- Installed and configured Hive and wrote HiveUDF to successfully implement business requirements.
- Involved in creating hive tables, loading data into tables and writing hive queries those are running in MapReduce way.
- Experienced with using different kind of compression techniques to save data and optimize data transfer over network using Lzo, Snappy, etc in Hive tables.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Experience in working with Spark SQL for processing data in the Hive tables.
- Developing Scripts and Tidal Jobs to schedule a bundle (group of coordinators), which consists of various Hadoop Programs using Oozie .
- Involved in writing test cases, implementing unit test cases.
- Installed Oozi e workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Hands on experience with Accessing and perform CURD operations against HBase data using Java API.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
- Developed spark applications using Scala for easy Hadoop transitions.
- Extensively used Hive queries to query data according to the business requirement.
- Used Pig for analysis of large data sets and brought data back to Hbase by Pig
Environment: Hadoop, HDFS, Map Reduce, Hive, Flume, Sqoop, PIG, MySQL and Ubuntu, Zookeeper, CDH3/4 Distribution , Java Eclipse, Oracle, Shell Scripting.
Confidential, Houston, TX
Hadoop Developer
Responsibilities:
- Developed various Big Data workflows using custom MapReduce, Pig, Hive, Sqoop, and Flume.
- Used flume script to import logs into HDFS from logs and social media like twitter.
- Implemented complex map reduce programs to perform joins on the Map side using distributed cache.
- Developed map reduce jobs to preprocess data using PIG.
- Created hive tables defined with appropriate static and dynamic partitions, intended for efficiency and worked on them using HIVE QL
- Used Sqoop to import data from RDBMS into hive tables.
- Created Hive UDF’s to implement business requirements.
- Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF’s.
- Monitored workload, job performance and capacity planning using Cloudera Manager
- Involved in HDFS maintenance and loading of structured and unstructured datafrom different sources.
- Wrote pig scripts for advanced analytics on the data for recommendations.
- Installed Oozie workflow engine to run multiple pig and hive jobs.
- Implemented helper classes that access HBase directly from java using Java API to perform CRUD operations.
- Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs.
Environment: Hadoop, Cloudera, MapReduce, HDFS, Hive, Pig, Linux, XML, Eclipse, SQL Server, Oracle 11i, MySQL, Flume, Oozie, Hbase
Confidential
Java/ J2EE SQL Developer
Responsibilities:
- Involved in writing Servlets on the server side, which gets the requests from the client and processes the same by interacting with Oracle database.
- Created SQL tables and indexes and also wrote queries to update/manipulate data stored in the Database.
- Used JDBC to establish connection between the database and the application
- Used XML for mapping the pages and classes and to transfer data universally among different data sources.
- Written Java Servlets to control and maintain the session state and handle user requests
- Creation of JSP pages including the use of JSP custom tags and other methods of Java Beam presentation and all HTML and graphically oriented aspects of the site's user interface.
- GUI development using HTML Forms and Frames and validating the data with JavaScript.
- Used JDBC to connect to the backend database and developed stored procedures.
- Developed code to handle web requests involving Request Handlers, Business Objects, and Data Access Objects.
- Created the user interface using HTML, CSS and JavaScript.
- Created/modified shell scripts for scheduling and automating tasks
- Involved in unit testing and documentation and used Junit framework for testing.
- Handled requests and worked in an Agile process
- Started learning and attending sessions on open source Big Data technologies like Hadoop and NoSQL databases as they are all related to Data.
Environment: Eclipse, Servlets, JSPs, HTML, CSS, JavaScript, JQuery, SQL, JDBC
Confidential
SQL Developer and Jr Business Analyst
Responsibilities:
- Assisted Sr BA for business requirements and preparing gap analysis and impact analysis
- Performed Data Analysis using MS Excel
- Imported SQL tables to MS Excel for data mining and working on data off site.
- Performed Data Mappings from the data source to destination.
- Worked on profiling the data for understanding the data and mapping document
- Define Business requirement and address root cause to current pain points (Business Challenges).
- Developed detailed Business Requirements Specification comprising of Use Cases.
- Got experience in conducting GAP analysis by identifying existing technologies, documenting the enhancements to meet the end-state requirements, User Acceptance Testing (UAT)
- Engaged in analyzing requirements, identifying various individual logical components, expressing the system design through UML diagrams
- Responsible for generating accurate data on company performance, project expense, user behavior, and employment rates
- Involved in UAT process before rolling out to production
- Controlled user access to resource and information
- Security Management by allocating proper privileges to users upon their requests.
- Monitored COLD, HOT Backup of about 30 databases.
- Involved in Upgrading Oracle database from oracle 9i to Oracle 10g.
- Altered the Table space sizes and resized Data files.
- Used Oracle Export & Import utilities for refreshing schemas and tables.
Environment: Oracle 9i, Oracle 10g, SQL server 2005.