Hadoop Developer Resume
Fredericksburg, VirginiA
SUMMARY
- Excellent experience in developing applications dat perform large scale Distributed Data Processing using Big Data ecosystem tools Hadoop, Hive, Pig, Sqoop, Hbase, Spark, Spark Streaming, Spark SQL, Oozie, ZooKeeper, Flume, Kafka and Yarn.
- Hands on experience in using various Hadoop distributions (Cloudera, Hortonworks, MapR).
- Deep knowledge on spark architecture and how RDD's work internally. Have exposure to Spark Streaming, Spark SQL, No SQL databases like Cassandra and Hbase.
- Experience in converting Hive/SQL queries into RDD transformations using Apache Spark, Scala and Python.
- Hands on Experience in designing and developing applications in Spark using Scala to compare teh performance of Spark with Hive and SQL/Oracle.
- Experience in implementing Real - Time event processing and analytics using messaging systems like Spark Streaming.
- Experienced in Java, Spring Boot, Apache Tomcat, Maven, Gradle, Hibernate and open source frameworks/ software's.
- Worked on Spark Scripts to find teh most trending products (day-wise and week-wise) using Scala.
- Exposure in analyzing data using HiveQL, HBase and custom Map Reduce programs in Java
- Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Involved inPL/SQLquery optimization to reduce teh overall run time of stored procedures.
- Experience in HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
- Created User Defined Functions (UDFs) in Hive.
- Good experience working with Hortonworks Distribution and Cloudera Distribution.
- Written MapReduce programs in Java for data extraction, transformation and aggregation from various file formats which includes XML, JSON, CSV, Avro, Parquet, Sequence, Texts and other formats.
- Good knowledge of Oozie concepts like design, development, and execution of workflows in Oozie.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Experienced in all facets of Software Development Life Cycle (Analysis, Design, Development, Testing and maintenance) using Waterfall and Agile, Scrum methodologies.
- Experienced with performing real time analytics on NoSQL data bases like HBase and Cassandra.
- Highly adept at promptly and thoroughly mastering new technologies with a keen awareness of new industry developments and teh evolution of next generation programming solutions.
TECHNICAL SKILLS
Bigdata Technologies: HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Storm, Scala, Spark, Apache Kafka, Flume, Solr, Ambari, Hue, Impala
Database: Oracle 10g/11g, MySQL, JDBC
Languages: Java, Scala
Development Methodologies: Agile, Waterfall
Testing: Junit, Selenium Web Driver
NOSQL Databases: HBase, MongoDB
ETL Tools: Tableau
IDE Tools: Eclipse, NetBeans, Intellij
Modelling Tools: Rational Rose, StarUML, Visual paradigm for UML
Architecture: Relational DBMS, Client-Server Architecture
Operating System: Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS X
PROFESSIONAL EXPERIENCE
Confidential, Fredericksburg, Virginia
Hadoop Developer
Responsibilities:
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data and Used Spark transformations for Data Wrangling and ingesting teh real-time data of various file formats
- Worked on loading CSV/TXT/AVRO/PARQUET files using Scala/Java language in Spark Framework and process teh data by creating Spark Data frame and RDD and save teh file in parquet format in HDFS to load into fact table using ORC Reader
- Experienced with Spark Context, Spark-SQL, Data Frame, Datasets, Spark YARN.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations.
- Involved in loading data from UNIX/LINUX file system to HDFS.
- Analyzed teh data by performing Hive queries.
- Developed Simple to complex Map/reduce Jobs using Hive, Java.
- Extending Hive functionality by writing custom UDFs in Java.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed data pipeline using Hive from Teradata, DB2 data sources. These pipelines had customized UDF'S to extend teh ETL functionality.
- Used EXPLAIN, COLLECT STATISTICS for TERADATA performance tuning.
- Create, develop, modify and maintain Database objects, PL/SQL packages, functions, stored procedures, triggers, views, and materialized views to extract data from different sources.
- Developed hive queries and UDF’s to analyze/transform teh data in HDFS. Developed hive scripts for implementing control tables logic in HDFS.
- Created procedures, macros in Teradata
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
- Implemented functionalities by performing Sentiment Analysis of teh products and performed Trend Analysis of teh products and display it to teh user.
- Used Flume to stream through teh log data from various sources. Perform ETL on different formats of data like JSON, CSV files and converted them to parquet while loading to final tables. Ran ad-hoc querying using Hive and Impala.
- Configured Flume to extract teh data from teh web server output files to load into HDFS.
- Install and maintain Hadoop and NoSql applications.
- Worked on No-SQL databases like Hbase, MongoDB for POC purpose in storing images and URIs.
- Managed and reviewed Hadoop and HBase log files. Worked on HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Performed data analysis with HBase using Hive External tables. Exported teh analyzed data to HBase using Sqoop and to generate reports for teh BI team.
- Import teh data from relational database to Hadoop cluster by using Sqoop.
- Developed Hive queries to process teh data and generate teh data cubes for visualizing.
- Monitored Hadoop cluster job performance and capacity planning. Providing teh architectural design to Business users.
- Create/Modify Shell scripts for scheduling data cleansing scripts and ETL loading process.
- Installed Oozie workflow engine to automate Map/Reduce jobs.
- Building teh Hadoop cluster and sizing teh cluster based on teh data which extracted from all teh sources.
Environment: MapReduce, HDFS, Hive, HBase, Sqoop, Java, Oozie, Spark, MySQL, Eclipse, JDBC, Linux, Shell Scripting, Putty, XML, HTML, JSON.
Confidential, Chicago, IL
Jr. Hadoop Developer
Responsibilities:
- Created Hive external tables and managed tables, designed data models in hive
- Configured Hive Meta store with MySQL, which stores teh metadata of Hive tables
- Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from various sources using Flume and managing
- Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability
- Successfully migrated Legacy application to Big Data application using Hive/HBase in Production level
- Load and transform large sets of structured, semi structured and unstructured data dat includes Avro, sequence files and xml files
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
- Teh Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency
- Utilized Apache Hadoop environment by Cloudera
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing teh data onto HDFS.
- Involved in End-to-End implementation of ETL logic.
- Developed complex Map Reduce jobs in Java to perform data extraction, aggregation, transformation and performed rule checks on multiple file formats like XML, JSON, and CSV.
- Developed complex Map Reduce Jobs using Hive and Java.
- Transferred data from various OLTP data sources, such as Oracle, MS Access, MS Excel, Flat files, CSV files into SQL Server
- Create Workflows and Sub-Workflows using Spring Batch.
- Helped teh Business intelligence team in designing dashboards and workbooks
- Migrated teh ETL code between teh development/test/production environments.
- Understood complex data structures of different type (structured, semi structured) and de-normalizing for storage in Hadoop.
Environment: MapReduce, HDFS, Hive, HBase, Sqoop, Java, Oozie, Spark, MySQL, Eclipse, JDBC, Linux, Shell Scripting, Putty, XML, HTML, JSON.
Confidential
Data Engineer
Responsibilities:
- Collaborated on insights with other Data Scientists, Business Analysts, and partners.
- Uploaded data to Hadoop hive and combined new tables with existing databases
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented POC by developing Scala scripts, UDFs using both Data frames/SQL/Data sets in Spark 2.1 for Data Aggregation, queries and writing data back into OLTP system through Sqoop
- Implemented apache impala for data processing on top of hive
- Involved in importing and exporting structured data like tables from Oracle to hive and hive to oracle. Involved in optimizing Joins in Hive queries
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive, Designing, creating and maintaining GIT repositories according to teh client specifications.
- Data pipeline consists Spark, Hive and Sqoop and custom-built Input Adapters to ingest, transform and analyze operational data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Having good knowledge of JDBC connectivity.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Used Spark to perform analytics on data in hive.
- Automating teh jobs using Oozie.
- Performed advanced procedures like text analytics and processing, using teh in-memory computing capabilities of Spark using Scala.
- Extracted data from MYSQL database to hdfs using sqoop
Environment: MapReduce, HDFS, Hive, HBase, Sqoop, Java, Scala, Oozie, MySQL, Eclipse, JDBC, Linux, Shell Scripting