Hadoop Developer Resume
San Jose, CA
SUMMARY:
- 7+ years of IT experience with clients across different industries and involved in all phases of SDLC in different projects, including 3 years in big data space
- 3+ years of experience in Big Data Technologies with full project development, implementation and deployment in agile, scrum and waterfall models
- Deep understanding of Hadoop Architecture of versions 1x,2xand various components such as HDFS, YARN and MapReduce concepts along with Hive, Pig, Sqoop, Oozie, Zookeeper, Map Reduce framework and NoSQL databases like HBase
- Expertise in writing Hive and Pig scripts and UDFs to perform data analysis on large data sets
- Worked in data formats such as Text File, Sequence File, Row Columnar and Optimized Row Columnar, Parquet in HDFS
- Partitioned and Bucketed data sets in Apache Hive to improve performance
- Managed and Scheduled jobs on Hadoop cluster using Apache Oozie
- Worked with NoSQL database HBase to retrieve data in sparse datasets
- Experienced in installation, configuration, support and monitoring of Hadoop clusters using Cloudera distributions, Hortonworks and AWS
- Experience in managing Hadoop clusters using Cloudera Manager Tooland Hue
- Experience in creating SparkContexts, SparkSQLContexts, Spark StreamingContext to process huge sets of data
- Experience in performing SQL and hive operations using Spark SQL
- Performed real time analytics on streaming data using Spark Streaming
- Created Kafka Topics and distributed to different consumer applications
- Strong experience in design and development of relational database concepts with multiple RDBMS databases including Oracle 10g, MySQL, MS SQL Server & PL/SQL
- Developed applications using Java, RDBMS and UNIX Shell scripting, Python
- Hands - on experience with visualization tools such as MS Excel, MS Visio, Tableau
- Experience in Scala’s FP, Case Classes, Traits and leveraged Scala to code Spark applications.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Oozie, Apache Spark, Flume, Kafka,Scala
IDE Tools: Eclipse, NetBeans
Programming languages: Java/J2EE, Python, Linux shell scripts, C++
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, Teradata
Web Technologies: HTML, Java Script, XML, ODBC, JDBC, JSP, Servlets, Struts, Junit, REST API, Spring, Hibernate
Visualization: MS Excel, RAW,Tableau
PROFESSIONAL EXPERIENCE:
Confidential, San Jose CA
Hadoop Developer
Responsibilities:
- Imported data to HDFS from MySQL and exported data from HDFS to MySQL data, using Apache Sqoop
- Modified and Optimized databases to speed up importing to HDFS
- Performed data analysis of Twitter by importing data to HDFS using Apache Flume
- Used Talend to extract, modify and load data from files, MySQL, Oracle and other input sources to load data into HDFS
- Used Talend to model data and loaded into Hive and HDFS for performing analytics
- Cleaned data and preprocessed data using MapReduce for efficient data analysis
- Used Scala and Java to develop MapReduce programs for data cleansing and analysis
- Developed custom UDFs using Apache Hive to manipulate data sets
- Created Hive Compact/ Bitmap Indexes to speed up the processing of data
- Created/Inserted/Updated Tables in Hive using DDL, DML commands
- Improved performance of datasets for querying through partitions and buckets
- Worked with Hive file formats such as ORC, sequence file, text file to load data in tables and perform queries
- Used Pig Custom Loaders to load different from data file types such as XML, JSON and CSV
- Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS
- Scheduled workflow of jobs using Oozie to create data pipelines
- Worked on NoSQL database HBase to perform operations on sparse data set
- Developed shell scripts, python scripts to check the health of Hadoop Daemons and schedule jobs
- Integrated Hive with HBase to upload data and perform row level operations
- Experienced in creating SparkContext and performing RDD transformations and actions using Python API
- Used SparkContext to create RDDs to use incoming data to perform Spark Transformations and Actions
- Created SparkSQLContextto load data from Parquet, JSON files and perform SQL queries
- Created data frames out of text files to execute SparkSQL queries
- Used Spark’s enableHiveSupport to execute Hive queries in Spark
- Created DStreams on incoming data using createstream
- Developed Spark streaming applications to work with data generated by sensors in real time
- Linked Kafka and Flume to Sparkby adding dependencies for data ingestion
- Performed data extraction, aggregation, log analysis on real time data using Spark Streaming
- Created Broadcast and Accumulator variables to share data across nodes
- Used case classes, higher order functions, collections of Scala to apply map transformations on RDDs
- Used Scala sbt to develop Scala coded spark projects and executed using spark-submit
- Experience with GitHub in maintaining source code and version control
- Used Maven to configure plugins and dependencies for developing projects
- Used AWS EMR, EC2, S3 to access hadoop nodes for running hadoop jobs and storing HDFS data
Environment: HDFS, MapReduce, Hive, HBase, Pig, Java, Oozie Scala, Kafka, Spark, Git, Maven, Talend, Putty, CentOS 6.4, SBT, AWS
Confidential, Dallas TX
Hadoop Developer
Responsibilities:
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data
- Responsible for building scalable distributed data solutions using Hadoop
- Transformed incoming data with Hive & Pig to make data available to internal users
- Performed extensive Data Mining applications using HIVE
- Experienced in migrating HiveQL into Impala to minimize query response time
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement
- Worked with Hive AVRO data format to compress data and speed up processing
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
- Developed pig scripts for analyzing large data sets in the HDFS
- Developed multiple MapReduce programs in Java for Data Analysis
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
- Designed and presented plan for POC on impala
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using autosys and Ooziecoordinator jobs
- Responsible for performing extensive data validation using Hive
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
- Involved in submitting and tracking Map Reduce jobs using JobTracker
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability
- Created Spark streaming jobs using Spark submit and SBT tools to perform streaming on incoming data
- Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment
- Configured Kafka brokers, Zookeepers to increase node utilization
- Integrated Kafka with Flume to send data to Spark Streaming context, HDFS
Environment: HDFS, MapReduce, Hive, HBase, Pig, Java, Scala, Kafka, Spark, Git, Maven, Talend, Putty,CentOS 6.4
Confidential, New York NY
Hadoop Developer
Responsibilities:
- Collected log data and staging data using Apache Flume and stored in HDFS for analysis.
- Implemented helper classes that access HBase directly from java using Java API to perform CRUD operations.
- Handled different time series data using HBase to perform store data and perform analytics based on time to improve queries retrieval time.
- Developed MapReduce programs to parse the raw data and store the refined data in tables.
- Performed debugging and fine tuning in Hive & Pig for improving performance.
- Used Oozie operational services for batch processing and scheduling workflows dynamically.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day.
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Performed Map side joins on data in Hive to explore business insights.
- Involved in forecast based on the present results and insights derived from data analysis.
- Integrated Map Reduce with HBaseto import bulk amount of data into HBase using Map Reduce Programs.
- Developed several REST web services supporting both XML and JSON. REST web services leveraged by both web and mobile applications.
- Participated in team discussions to develop useful insights from big data processing results.
- Suggested trends to the higher management based on social media data.
Environment: HDFS, MapReduce, Hive, HBase, Pig, Java, Git, Maven, Talend, Putty, REST,CentOS 6.3
Confidential
Java Developer
Responsibilities:
- Generated object relational mapping(ORMs) using XML for Java classes and databases.
- Used Eclipse platform to design and code in J2EE stack.
- Developed user interfaces using JSP, HTML, Javascript and XML.
- Designed and developed an enterprise common logging around Log4j with a centralized log support (used logger info, error and debug).
- Implemented Java and J2EE Design patterns like Business Delegate and Data Transfer Object.
- (DTO), Data Access Object and Service Locator.
- Application of JQuery/JS for responsive GUI.
- Setting up distributed environment and deploying application on distributed system.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
- Used Spring Framework AOP Module to implement Logging in the application to know the application status XML Parsing/Domain.
- Developed Junit test cases, validated users input using regular expressions in JavaScript as well as in the server side.
- Used JDBC to connect the web applications to Databases.
- Used parsers like SAX and DOM for parsing xml documents and used XML transformations using XSLT.
- Designed REST APIs that allow sophisticated, effective and low cost application integration.
- Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy.
- Gained Knowledge in building sophisticated distributed systems using REST/hypermedia web APIs (SOA) and developed POCs.
Environment: Java, J2EE, Spring, Struts, PL/SQL, HTML, REST, Eclipse, Oracle DB
Confidential
Jr. Java Developer
Responsibilities:
- Analyzed business requirements and created design documents.
- Created Test plans to ensure requirements are met without impacting other areas in business.
- Implemented server side programs by using Servlets and JSP.
- Designed and developed front end using JSP, Struts (tiles), XML, JavaScript, and HTML.
- Used SVN for versioning control system.
- Used Hibernate for object-relational mapping persistence.
- Designed databases to store data and optimized databases to minimize load in transfers and processing.
- Developed PL/SQL stored procedures and triggers.
- Wrote Complex SQL queries to perform various database operations using TOAD.
- Used Spring Framework for Dependency Injection and integrated with Hibernate.
- Implemented XMLBeanFactory to read configured metadata.
- Tested complex applications using Junit.
Environment: Java, Hibernate, Spring, Struts, PL/SQL, TOAD, SVN, Eclipse