Hadoop Developer Resume San Jose CA - Hire IT People

SUMMARY:

7+ years of IT experience with clients across different industries and involved in all phases of SDLC in different projects, including 3 years in big data space
3+ years of experience in Big Data Technologies with full project development, implementation and deployment in agile, scrum and waterfall models
Deep understanding of Hadoop Architecture of versions 1x,2xand various components such as HDFS, YARN and MapReduce concepts along with Hive, Pig, Sqoop, Oozie, Zookeeper, Map Reduce framework and NoSQL databases like HBase
Expertise in writing Hive and Pig scripts and UDFs to perform data analysis on large data sets
Worked in data formats such as Text File, Sequence File, Row Columnar and Optimized Row Columnar, Parquet in HDFS
Partitioned and Bucketed data sets in Apache Hive to improve performance
Managed and Scheduled jobs on Hadoop cluster using Apache Oozie
Worked with NoSQL database HBase to retrieve data in sparse datasets
Experienced in installation, configuration, support and monitoring of Hadoop clusters using Cloudera distributions, Hortonworks and AWS
Experience in managing Hadoop clusters using Cloudera Manager Tooland Hue
Experience in creating SparkContexts, SparkSQLContexts, Spark StreamingContext to process huge sets of data
Experience in performing SQL and hive operations using Spark SQL
Performed real time analytics on streaming data using Spark Streaming
Created Kafka Topics and distributed to different consumer applications
Strong experience in design and development of relational database concepts with multiple RDBMS databases including Oracle 10g, MySQL, MS SQL Server & PL/SQL
Developed applications using Java, RDBMS and UNIX Shell scripting, Python
Hands - on experience with visualization tools such as MS Excel, MS Visio, Tableau
Experience in Scala’s FP, Case Classes, Traits and leveraged Scala to code Spark applications.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Oozie, Apache Spark, Flume, Kafka,Scala

IDE Tools: Eclipse, NetBeans

Programming languages: Java/J2EE, Python, Linux shell scripts, C++

Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, Teradata

Web Technologies: HTML, Java Script, XML, ODBC, JDBC, JSP, Servlets, Struts, Junit, REST API, Spring, Hibernate

Visualization: MS Excel, RAW,Tableau

PROFESSIONAL EXPERIENCE:

Confidential, San Jose CA

Hadoop Developer

Responsibilities:

Imported data to HDFS from MySQL and exported data from HDFS to MySQL data, using Apache Sqoop
Modified and Optimized databases to speed up importing to HDFS
Performed data analysis of Twitter by importing data to HDFS using Apache Flume
Used Talend to extract, modify and load data from files, MySQL, Oracle and other input sources to load data into HDFS
Used Talend to model data and loaded into Hive and HDFS for performing analytics
Cleaned data and preprocessed data using MapReduce for efficient data analysis
Used Scala and Java to develop MapReduce programs for data cleansing and analysis
Developed custom UDFs using Apache Hive to manipulate data sets
Created Hive Compact/ Bitmap Indexes to speed up the processing of data
Created/Inserted/Updated Tables in Hive using DDL, DML commands
Improved performance of datasets for querying through partitions and buckets
Worked with Hive file formats such as ORC, sequence file, text file to load data in tables and perform queries
Used Pig Custom Loaders to load different from data file types such as XML, JSON and CSV
Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS
Scheduled workflow of jobs using Oozie to create data pipelines
Worked on NoSQL database HBase to perform operations on sparse data set
Developed shell scripts, python scripts to check the health of Hadoop Daemons and schedule jobs
Integrated Hive with HBase to upload data and perform row level operations
Experienced in creating SparkContext and performing RDD transformations and actions using Python API
Used SparkContext to create RDDs to use incoming data to perform Spark Transformations and Actions
Created SparkSQLContextto load data from Parquet, JSON files and perform SQL queries
Created data frames out of text files to execute SparkSQL queries
Used Spark’s enableHiveSupport to execute Hive queries in Spark
Created DStreams on incoming data using createstream
Developed Spark streaming applications to work with data generated by sensors in real time
Linked Kafka and Flume to Sparkby adding dependencies for data ingestion
Performed data extraction, aggregation, log analysis on real time data using Spark Streaming
Created Broadcast and Accumulator variables to share data across nodes
Used case classes, higher order functions, collections of Scala to apply map transformations on RDDs
Used Scala sbt to develop Scala coded spark projects and executed using spark-submit
Experience with GitHub in maintaining source code and version control
Used Maven to configure plugins and dependencies for developing projects
Used AWS EMR, EC2, S3 to access hadoop nodes for running hadoop jobs and storing HDFS data

Environment: HDFS, MapReduce, Hive, HBase, Pig, Java, Oozie Scala, Kafka, Spark, Git, Maven, Talend, Putty, CentOS 6.4, SBT, AWS

Confidential, Dallas TX

Hadoop Developer

Responsibilities:

Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data
Responsible for building scalable distributed data solutions using Hadoop
Transformed incoming data with Hive & Pig to make data available to internal users
Performed extensive Data Mining applications using HIVE
Experienced in migrating HiveQL into Impala to minimize query response time
Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement
Worked with Hive AVRO data format to compress data and speed up processing
Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
Developed pig scripts for analyzing large data sets in the HDFS
Developed multiple MapReduce programs in Java for Data Analysis
Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files
Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
Designed and presented plan for POC on impala
Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using autosys and Ooziecoordinator jobs
Responsible for performing extensive data validation using Hive
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
Involved in submitting and tracking Map Reduce jobs using JobTracker
Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability
Created Spark streaming jobs using Spark submit and SBT tools to perform streaming on incoming data
Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment
Configured Kafka brokers, Zookeepers to increase node utilization
Integrated Kafka with Flume to send data to Spark Streaming context, HDFS

Environment: HDFS, MapReduce, Hive, HBase, Pig, Java, Scala, Kafka, Spark, Git, Maven, Talend, Putty,CentOS 6.4

Confidential, New York NY

Hadoop Developer

Responsibilities:

Collected log data and staging data using Apache Flume and stored in HDFS for analysis.
Implemented helper classes that access HBase directly from java using Java API to perform CRUD operations.
Handled different time series data using HBase to perform store data and perform analytics based on time to improve queries retrieval time.
Developed MapReduce programs to parse the raw data and store the refined data in tables.
Performed debugging and fine tuning in Hive & Pig for improving performance.
Used Oozie operational services for batch processing and scheduling workflows dynamically.
Analyzed the web log data using the HiveQL to extract number of unique visitors per day.
Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Performed Map side joins on data in Hive to explore business insights.
Involved in forecast based on the present results and insights derived from data analysis.
Integrated Map Reduce with HBaseto import bulk amount of data into HBase using Map Reduce Programs.
Developed several REST web services supporting both XML and JSON. REST web services leveraged by both web and mobile applications.
Participated in team discussions to develop useful insights from big data processing results.
Suggested trends to the higher management based on social media data.

Environment: HDFS, MapReduce, Hive, HBase, Pig, Java, Git, Maven, Talend, Putty, REST,CentOS 6.3

Confidential

Java Developer

Responsibilities:

Generated object relational mapping(ORMs) using XML for Java classes and databases.
Used Eclipse platform to design and code in J2EE stack.
Developed user interfaces using JSP, HTML, Javascript and XML.
Designed and developed an enterprise common logging around Log4j with a centralized log support (used logger info, error and debug).
Implemented Java and J2EE Design patterns like Business Delegate and Data Transfer Object.
(DTO), Data Access Object and Service Locator.
Application of JQuery/JS for responsive GUI.
Setting up distributed environment and deploying application on distributed system.
Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
Used Spring Framework AOP Module to implement Logging in the application to know the application status XML Parsing/Domain.
Developed Junit test cases, validated users input using regular expressions in JavaScript as well as in the server side.
Used JDBC to connect the web applications to Databases.
Used parsers like SAX and DOM for parsing xml documents and used XML transformations using XSLT.
Designed REST APIs that allow sophisticated, effective and low cost application integration.
Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy.
Gained Knowledge in building sophisticated distributed systems using REST/hypermedia web APIs (SOA) and developed POCs.

Environment: Java, J2EE, Spring, Struts, PL/SQL, HTML, REST, Eclipse, Oracle DB

Confidential

Jr. Java Developer

Responsibilities:

Analyzed business requirements and created design documents.
Created Test plans to ensure requirements are met without impacting other areas in business.
Implemented server side programs by using Servlets and JSP.
Designed and developed front end using JSP, Struts (tiles), XML, JavaScript, and HTML.
Used SVN for versioning control system.
Used Hibernate for object-relational mapping persistence.
Designed databases to store data and optimized databases to minimize load in transfers and processing.
Developed PL/SQL stored procedures and triggers.
Wrote Complex SQL queries to perform various database operations using TOAD.
Used Spring Framework for Dependency Injection and integrated with Hibernate.
Implemented XMLBeanFactory to read configured metadata.
Tested complex applications using Junit.

Environment: Java, Hibernate, Spring, Struts, PL/SQL, TOAD, SVN, Eclipse

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

San Jose, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship