Hadoop Developer And Administrator Resume
OBJECTIVE:
- Seeking position as hadoop Developer and Administrator which enables use of my exceptional knowledge in hadoop components, SQL, Python, and Spark.
PROFESSIONAL SUMMARY:
- 1.5+ years of experience as Software Engineer in developing applications in Big Data using Java/J2EE, Python and HQL.
- 1.5+ years of experience with Big Data Hadoop Ecosystem tools like Map Reduce, YARN, HDFS, Hbase, Impala, Hive, Pig, Oozie, Apache Spark for ingestion, storage, querying, processing and analysis of data.
- Hands on experience with Cloudera, Hortonworks and Apache Hadoop distributions.
- Experience installing, configuring, testing Hadoop ecosystem components.
- Performance tuning in Hive & Impala using multiple methods but not limited to dynamic partitioning, bucketing, indexing, file compressions, vectorization, and cost based optimization, etc.
- Good knowledge on distributed publish - subscribe messaging system apache Kafka and data ingestion tool flume and sqoop.
- Hands on experience with workflow management tool Oozie.
- Hands on experience handling different file formats like Json, AVRO, ORC and Parquet.
- Hands on experience in using apache Drill for low latency sub second queries.
- Integrated Drill, Impala with Tableau using JDBC for data visualization.
- Experience on analyzing data in NOSQL databases like Hbase.
- Experience using apache Jmeter for load and performance testing.
- Hands-on experience with creating dashboards and worksheets in Tableau.
- Experience in connecting tableau with different data sources for visualization.
- Developed applications using pyspark for data filtering creating Spark RDD’s, data Frames, caching.
- Developed pyspark UDF’s in python for Json data filtering.
- Hands on experience with python programming and different libraries.
- Hands-on experience with AWS (Amazon Web Services), using Elastic MapReduce (EMR), creating buckets in S3 and storing data in them.
- Hands-on experience with creating EC2 instances, using IAM (Identity and Access Management) for creating groups, users and assigning permissions.
- Hands on experience with creating ELB (Elastic Load Balancer) for Hadoop web ui’s.
- Hands-on programming experience in JAVA.
- Good knowledge on using Jira for ticketing issues.
- Good knowledge on using Jenkins for continuous integration.
- Hands on with UNIX commands, shell scripting and setting up CRON jobs.
- Experience in software configuration management using GIT.
- Proficient in SDLC methodologies such as agile, scrum and waterfall models.
TECHNICAL SKILLS:
Hadoop Components: HDFS, Hue, MapReduce, PIG, Hive, HCatalog, Hbase, Sqoop, Impala, Oozie, Drill, Kylin, Zookeeper, Flume, Kafka, Yarn and Cloudera Manager.
Spark Components: Apache Spark, Data Frames, SparkSQL, Spark, YARN, Pair RDDs.
Server SideScripting: UNIX Shell Scripting.
Databases: Microsoft SQL Server, MySQL, Oracle.
Programming Languages: Java, Python.
Web Servers: Windows server 2005/2008 and Apache Tomcat.
IDE: Eclipse, Pycharm.
OS/Platforms: Windows 2005/2008, Linux (All major distributions), Unix.
NoSQL Databases: Hbase.
Currently Exploring: Apache Kylin, Flink, Drill, Alluxio.
PROFESSIONAL EXPERIENCE:
Confidential
Hadoop Developer and AdministratorResponsibilities:
- Responsible for designing and implementing the data pipeline from end-to-end for this project.
- Used apache Drill and Impala for low latency analytical queries.
- Setup an S3 bucket to store archive data from HDFS periodically through a cron job.
- Setup AWS cli access from hadoop cluster to the S3 archive bucket.
- Created users and groups for restricting access to the data using AWS IAM (Identity and access Management).
- Created partitioned tables in hive with parquet plus snappy format.
- Connected apache drill to tableau as a data source using ODBC.
- Created worksheets and dashboards in tableau for data visualization and analysis.
- Developed pyspark application for reading and filtering Json source data and store it into HDFS with partitions and also used spark to extract schema of Json files.
- Developed hive queries to filter the data and load into final tables for each manufacturer.
- Solved hive small files problem in final table by using merge files, and merge mapred files parameter in hive.
- Using apache Jmeter performed load and performance tests on the hadoop cluster to see how many concurrent users our cluster can handle without failure.
- Used Hive optimization techniques like map side joins, merging, and parallel execution.
- Setting cron jobs for workflow to download data from Confidential and load it into partitioned hive tables.
Confidential, Texas
Hadoop Developer and AdministratorResponsibilities:
- Developed multiple MapReduce Jobs in java for data cleaning and pre-processing.
- Loaded home mortgage data from the existing DWH tables (SQL Server) to HDFS using Sqoop.
- Created Hive Tables, loaded retail transactional data from Teradata using Sqoop.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
- Wrote Hive Queries to have a consolidated view of the mortgage and retail data.
- Created multiple Hive tables, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access.
- Involved in creating Hive External tables, loading with data and writing hive queries which will run internally in map reduce, also used custom SerDe’s based on the structure of input file so that Hive knows how to load the files to Hive tables.
- Exported analyzed data to relational databases using SQOOP for visualization to generate reports for the BI Team.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive JDBC connector.
- Experience in using Tableau Data Integration tool for data integration, OLAP analysis and ETL process.
Environment: Hadoop, HDFS, MapReduce, Sqoop, Hive, PIG, Flume, Oozie, Zoo keeper, Cloudera distribution, MySQL, Eclipse.
