Hadoop Developer Resume
Santa, Clara-cA
SUMMARY
- Around 5+ years’ experience with Big Data Hadoop Ecosystem tools like HDFS, Map Reduce, Spark, YARN, Hive, Sqoop, Flume, Storm, Pig, HBASE and Cassandra with all distributed system.
- Experience with Cloudera and Hortonworks Hadoop distributions and AWS Amazon EMR.
- Experience in all phases of SDLC including application design, development, support, maintenance, testing, change request management and Enhancement support to the client.
- Developed spark applications for data transformations and loading into HDFS using RDD, Data frames and datasets.
- Experience with developing applications using Spark core, Spark SQL and good knowledge on Spark streaming.
- Performance tuning in Hive& Impala using multiple methods but not limited to dynamic partitioning, bucketing, indexing, file compressions, and cost - based optimization, etc.
- Experience using data ingestion tools Kafka, flume and Sqoop.
- Experience handling different file formats like JSON, AVRO, ORC and Parquet.
- Experience on analyzing data in NOSQL databases like HBASE, MongoDB.
- Experience with connecting Tableau to different data sources and creating dashboards and worksheets.
- Experience in converting Hive/SQL queries into Spark transformations.
- Programming experience in JAVA and Scala.
- Hands on with UNIX commands, shell scripting and setting up CRON jobs.
- Experience in software configuration management using Git and Bitbucket.
- Good experience in using Relational databases Oracle, SQLServer and PostgreSQL.
TECHNICAL SKILLS
Hadoop Components: HDFS, Hue, MapReduce, PIG, Hive, HBASE, Sqoop, Impala, Oozie, Zookeeper, Flume, Kafka, Yarn and Cloudera Manager.
Spark Components: Apache Spark,Data Frames, SparkSQL, Spark, YARN, Pair RDDs.
Databases: Microsoft SQL Server, MySQL, Oracle, HBASE, MongoDB and Cassandra
Programming Languages: C, C++, Java, Scala, Shell Scripting.
Web Servers: Apache HTTP andTomcat.
IDE: Eclipse, Pycharm, IntelliJ
OS/Platforms: Windows, Linux (All major distributions), Centos.
PROFESSIONAL EXPERIENCE
Confidential, Santa Clara-CA
Hadoop Developer
Responsibilities:
- Created data pipeline for different events of mobile applications, to filter and load consumer response data from urban-airship in AWS S3 bucket into Hive external tables in HDFS location.
- Worked with different file formats like JSON, AVRO and parquet and compression techniques like snappy.
- Developed UDF’s in spark to capture values of a key-value pair in encoded JSON string.
- Developed SQL scripts for end user / analyst requirements for ad-hoc analysis
- Used various Hive optimization techniques like partitioning, bucketing and Map join.
- Developed shell scripts for dynamic partitions adding to hive stage table, verifying JSON schema change of source files, and verifying duplicate files in source location.
- Developed spark application for filtering JSON source data in AWS S3 location and store it into HDFs with partitions and used spark to extract schema of JSON files.
Environment: Hive, Spark 2.0, AWS S3, EMR, Jenkins, Shell scripting, HBASE, Airflow, Intellij IDEA, Sqoop, Hive, JAVA.
Confidential, Charlotte-NC
Hadoop Developer
Responsibilities:
- Loaded home mortgage data from the existing DWH tables (Teradata) to HDFS using Sqoop.
- Created Hive Tables, loaded retail transactional data from Teradata using Sqoop.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
- Wrote Impala/Hive Queries to have a consolidated view of the mortgage and retail data.
- Created multiple Hive tables, implemented Dynamic Partitioning and Buckets in Hive for efficient data access.
- Involved in creating Hive External tables, also used custom SerDe’sbased on the structure of input file so that Hive knows how to load the files to Hive tables.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MapReduce jobs.
- Developed multiple modules in Scala for data cleaning and pre-processing jobs in Spark environment.
- Ingested credit bureau data stream into the data lake via Flume
Environment: Hadoop, HDFS, Spark 1.6, Sqoop, Hive, PIG, Flume, Oozie, Zookeeper, Cloudera distribution (CDH-5.6.1), Impala, Eclipse.
Confidential, CA
Hadoop developer
Responsibilities:
- Designed docs and specs for the near real-time data analytics using Hadoop and HBase.
- Installed Cloudera Manager on the clusters.
- Used a 20-node cluster with Cloudera Hadoop distribution on Amazon EC2.
- Developed ad-clicks based data analytics, for keyword analysis and insights.
- Crawled public posts from Facebook and tweets.
- Wrote MapReduce jobs to extract product sentiment and publish to Data Science team.
- Converted output to structured data and imported to Spotfire with analytics team.
- Defined problems to look for right data and analyze results to make room for new project.
- TIBCO Spotfire with in-house custom application was used to perform and generate analytic
Environment: Hadoop, HBase, HDFS, MapReduce, Java, Spotfire, Cloudera Manager, Amazon EC2
Confidential
Java Developer
Responsibilities:
- Individually worked on all the stages of a Software Development Life Cycle (SDLC).
- Used JavaScript code, HTML and CSS style declarations to enrich websites.
- Implemented the application using Spring MVC Framework which is based on MVC design pattern.
- Designed User Interface and the business logic for customer registration and maintenance.
- Integrating Web services and working with data in different servers.
- Involved in designing and Development of SOA services using Web Services.
- Experience in Creating Tables, Views, Triggers, Indexes, Constraints and functions in SQL Server2005.
Environment: Java, J2EE, JSP, Spring, Struts, Hibernate, Eclipse, SOA, WebLogic, Oracle, HTML, CSS, Web Services, JUnit, SVN, Windows, UNIX.