Hadoop Developer Resume Santa Clara-CA - Hire IT People

SUMMARY

Around 5+ years’ experience with Big Data Hadoop Ecosystem tools like HDFS, Map Reduce, Spark, YARN, Hive, Sqoop, Flume, Storm, Pig, HBASE and Cassandra with all distributed system.
Experience with Cloudera and Hortonworks Hadoop distributions and AWS Amazon EMR.
Experience in all phases of SDLC including application design, development, support, maintenance, testing, change request management and Enhancement support to the client.
Developed spark applications for data transformations and loading into HDFS using RDD, Data frames and datasets.
Experience with developing applications using Spark core, Spark SQL and good knowledge on Spark streaming.
Performance tuning in Hive& Impala using multiple methods but not limited to dynamic partitioning, bucketing, indexing, file compressions, and cost - based optimization, etc.
Experience using data ingestion tools Kafka, flume and Sqoop.
Experience handling different file formats like JSON, AVRO, ORC and Parquet.
Experience on analyzing data in NOSQL databases like HBASE, MongoDB.
Experience with connecting Tableau to different data sources and creating dashboards and worksheets.
Experience in converting Hive/SQL queries into Spark transformations.
Programming experience in JAVA and Scala.
Hands on with UNIX commands, shell scripting and setting up CRON jobs.
Experience in software configuration management using Git and Bitbucket.
Good experience in using Relational databases Oracle, SQLServer and PostgreSQL.

TECHNICAL SKILLS

Hadoop Components: HDFS, Hue, MapReduce, PIG, Hive, HBASE, Sqoop, Impala, Oozie, Zookeeper, Flume, Kafka, Yarn and Cloudera Manager.

Spark Components: Apache Spark,Data Frames, SparkSQL, Spark, YARN, Pair RDDs.

Databases: Microsoft SQL Server, MySQL, Oracle, HBASE, MongoDB and Cassandra

Programming Languages: C, C++, Java, Scala, Shell Scripting.

Web Servers: Apache HTTP andTomcat.

IDE: Eclipse, Pycharm, IntelliJ

OS/Platforms: Windows, Linux (All major distributions), Centos.

PROFESSIONAL EXPERIENCE

Confidential, Santa Clara-CA

Hadoop Developer

Responsibilities:

Created data pipeline for different events of mobile applications, to filter and load consumer response data from urban-airship in AWS S3 bucket into Hive external tables in HDFS location.
Worked with different file formats like JSON, AVRO and parquet and compression techniques like snappy.
Developed UDF’s in spark to capture values of a key-value pair in encoded JSON string.
Developed SQL scripts for end user / analyst requirements for ad-hoc analysis
Used various Hive optimization techniques like partitioning, bucketing and Map join.
Developed shell scripts for dynamic partitions adding to hive stage table, verifying JSON schema change of source files, and verifying duplicate files in source location.
Developed spark application for filtering JSON source data in AWS S3 location and store it into HDFs with partitions and used spark to extract schema of JSON files.

Environment: Hive, Spark 2.0, AWS S3, EMR, Jenkins, Shell scripting, HBASE, Airflow, Intellij IDEA, Sqoop, Hive, JAVA.

Confidential, Charlotte-NC

Hadoop Developer

Responsibilities:

Loaded home mortgage data from the existing DWH tables (Teradata) to HDFS using Sqoop.
Created Hive Tables, loaded retail transactional data from Teradata using Sqoop.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
Wrote Impala/Hive Queries to have a consolidated view of the mortgage and retail data.
Created multiple Hive tables, implemented Dynamic Partitioning and Buckets in Hive for efficient data access.
Involved in creating Hive External tables, also used custom SerDe’sbased on the structure of input file so that Hive knows how to load the files to Hive tables.
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MapReduce jobs.
Developed multiple modules in Scala for data cleaning and pre-processing jobs in Spark environment.
Ingested credit bureau data stream into the data lake via Flume

Environment: Hadoop, HDFS, Spark 1.6, Sqoop, Hive, PIG, Flume, Oozie, Zookeeper, Cloudera distribution (CDH-5.6.1), Impala, Eclipse.

Confidential, CA

Hadoop developer

Responsibilities:

Designed docs and specs for the near real-time data analytics using Hadoop and HBase.
Installed Cloudera Manager on the clusters.
Used a 20-node cluster with Cloudera Hadoop distribution on Amazon EC2.
Developed ad-clicks based data analytics, for keyword analysis and insights.
Crawled public posts from Facebook and tweets.
Wrote MapReduce jobs to extract product sentiment and publish to Data Science team.
Converted output to structured data and imported to Spotfire with analytics team.
Defined problems to look for right data and analyze results to make room for new project.
TIBCO Spotfire with in-house custom application was used to perform and generate analytic

Environment: Hadoop, HBase, HDFS, MapReduce, Java, Spotfire, Cloudera Manager, Amazon EC2

Confidential

Java Developer

Responsibilities:

Individually worked on all the stages of a Software Development Life Cycle (SDLC).
Used JavaScript code, HTML and CSS style declarations to enrich websites.
Implemented the application using Spring MVC Framework which is based on MVC design pattern.
Designed User Interface and the business logic for customer registration and maintenance.
Integrating Web services and working with data in different servers.
Involved in designing and Development of SOA services using Web Services.
Experience in Creating Tables, Views, Triggers, Indexes, Constraints and functions in SQL Server2005.

Environment: Java, J2EE, JSP, Spring, Struts, Hibernate, Eclipse, SOA, WebLogic, Oracle, HTML, CSS, Web Services, JUnit, SVN, Windows, UNIX.