Hadoop Developer Resume
Irving, TX
SUMMARY
- Around 7 years of professional IT experience in analyzing requirements, designing, building, highly distributed mission critical products and applications.
- Strong end - to-end experience on Hadoop Development and implementing Big Data technologies.
- Expertise in core Hadoop and its ecosystem tools like Spark, CASSANDRA, HDFS, MapReduce, Oozie, Hive, Sqoop, Pig, Flume, Kafka, HBase and ZooKeeper.
- Experience on NoSQL databases like Cassandra, HBase & Mongo DB and their Integration with Hadoop cluster.
- Experienced in Spark Core, Spark SQL and Spark Streaming.
- Implemented Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data into HDFS through Sqoop.
- Developed fan-out workflow using Flume and Kafka for ingesting data from various data sources like Webservers, REST API by using network sources and ingested data into Hadoop with HDFS sink.
- Exposure to Cassandra CQL with Java APIs to retrieve data from Cassandra tables.
- Experience in Oozie workflow scheduler to manage Hadoop jobs by DAG of actions with control flows.
- Experience in usingZookeeperandOozieoperational services to coordinate clusters and scheduling workflows.
- Developed pipeline for constant information ingestion utilizing Kafka and Spark streaming.
- Experienced in using Spark SQL and Spark Dataframe to cleanse and integrate data.
- Hands on experience in developing Scala scripts using Data frames/SQL/Data Sets and RDD/Map Reduce in Spark for Data aggregation and queries.
- Analyze Cassandra database and compare it with other open-source NoSQL databases.
- Worked with Spark on parallel computing to enhance knowledge aboutRDDusing CASSANDRA.
- Good understanding and working experience on Hadoop Distributions likeClouderaandHortonworks.
- Experience in managing large shared MongoDB cluster and managing life cycle of MongoDB including sizing, automation, monitoring and tuning.
- Developed Python scripts to monitor health of Mongo databases and perform ad-hoc backups using Mongo dump and Mongo restore.
- Experience withApacheTEZonHiveandPIGto achieve better responsive time while running MR Jobs.
- Experience with Hadoop deployment and automation tools such asAmbari, Cloudbreak, and EMR.
- Hands-on experience on analyzing data using Hive QL, Pig Latin, and custom MapReduce programs.
- Experienced in working with structured data using Hive QL, JOIN Operations, Hive UDFs, Partitions, Bucketing and internal/external tables.
- ImplementedmodulesusingAmazon Cloud ComponentsS3, EC2, Elastic beanstalk and SimpleDB.
- Hands-on creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
- Experienced with full text search and implemented data querying with faceted reader search using Solr.
- Hands-on experience with monitoring tools to check status of cluster using Cloudera manager.
- Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing and ETL Tools like IBM DataStage, Informatica and Talend.
- Knowledge on integrating Kerberos into Hadoop to make cluster stronger and secure for unauthorized users.
- Monitor the Hadoop cluster connectivity and security using Ambari monitoring system.
- Experienced in backend development using SQL, stored procedures on Oracle 10g and 11g.
- Good working knowledge on multithreading Core Java, J2EE, JDBC, jQuery, JavaScript and Web Services (SOAP, REST).
TECHNICAL SKILLS
Big Data/Hadoop Ecosystem: Horton works, Cloudera, Apache, EMR, Hive, Pig, Sqoop, Spark, Kafka, Oozie, Flume, Zookeeper, NIFI, Impala, TEZ
NoSQL Databases: Cassandra, HBase, MongoDB
Cloud Services: Amazon AWS
Languages: SQL, PL/SQL, Pig Latin, HiveQL, UNIX Shell Scripting, C, Java, Python, Scala
ETL Tools: Informatica, IBM, DataStage, Talend
Java/J2EE Technologies: Core Java, Servlets, Hibernate, spring, Struts, JSP, JDBC, EJB
Application Servers: Web Logic, Web Sphere, Tomcat
BI Tools: Tableau, Splunk
Databases: Oracle, MySQL, DB2, Teradata, MS SQL Server
Operating Systems: UNIX, Windows, iOS, LINUX
IDE’s: IntelliJ IDEA, Eclipse, NetBeans
PROFESSIONAL EXPERIENCE
Confidential - Irving, TX
Hadoop Developer
Responsibilities:
- Involved in loading data from Unix file system to HDFS
- Responsible for building scalable distributed data solutions usingHadoop.
- Developed Oozie workflows for daily incremental loads, which gets data from Teradata and then import to Hive tables.
- Worked extensively on Apache NIFI as an ETL tool for batch processing and real time processing.
- Involved in ETL, data integration and migration and used Sqoop to load data from oracle to HDFS on regular basis.
- Managed and ReviewiedHadoopLog Files, took part in deploying and MaintainingHadoopCluster.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL.
- Worked on different file formats like Text, ORC, Avro and Parquet, and compression techniques like snappy, gzip and zlib.
- Experience in creating tables, dropping, and altered at run time without blocking updates and queries using HBase.
- Wrote shell scripts to run multiple hive jobs to automate different hive tables incrementally
- Imported data from different data sources into HDFS using Sqoop and performed transformations using Hive, MapReduce and then loaded data into HDFS.
- Executed hive queries using hive command line and web GUI HUE to read write and query data into HBASE.
- Developed Oozie workflows for daily incremental loads, which gets data from Teradata and then import to Hive tables.
- Used FLUME to load log data into HDFS.
- Used Pig to convert the fixed width file to delimited file.
- Developed workflows in Apache NIFI to ingest, prepare and publish data.
- Developed Kafka producer and consumer components for real time data processing.
Environment: Apache Nifi, Hive, Pig, HDFS, Hortonworks, Flume, Zookeeper, Sqoop, Oozie, RDBMS, Teradata, Apache Zeppelin, Shell Scripts, NoSQL, Java, FileZilla, putty, Postgre SQL.
Confidential - Stamford, CT
Spark/Scala Developer
Responsibilities:
- Import the data from NoSQL CASSANDRA databases and stored it into AWS.
- Performed transformations on the data using different Spark modules.
- Responsible forSparkCore configuration based on type of Input Source.
- Executed Spark code using Scala forSpark Streaming/Spark SQL for faster processing of data.
- Performed SQL Joins among Hive tables to get input forSparkbatch process.
- Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs.
- Developed Kafka producer and brokers for message handling.
- Analyze Cassandra database and compare it with other open-source NoSQL databases.
- Involved in importing the data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Used Amazon CLI for data transfers to and from Amazon S3 buckets.
- Involved in setup of Kafka producers and consumers along with Kafka brokers and topics.
- Developed data pipeline using Kafka and Strom to store data into HDFS.
- Executed Hadoop/Sparkjobs on AWS EMR using programs and data is stored in S3 Buckets.
- Conducted vectorization queries using Hive for processing batch of rows.
- Used various Hive optimization techniques like partitioning and bucketing.
- Performed ad-hoc queries on structured data using Hive QL and used joins like Skew join, Map side join, and SMB join with hive for faster data access.
- Experience in pulling the data from Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames inSpark to perform further analysis.
- ImplementedSparkRDD transformations, actions to implement business analysis.
- Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, JSON, CSV formats.
- DevelopedSparkscripts by using Scala as per the requirement.
- Involved in converting Cassandra/Hive/SQL queries intoSparktransformations usingSparkRDD's in Scala and Python.
- Worked with Cassandra for non-relation data storage and retrieval.
- Used Amazon CloudWatch to monitor and track resources on AWS.
- UsingSpark-Streaming APIs to perform transformations on the data from Kafka in real time and persists into Cassandra.
- Worked on analyzing and examining customer behavioral data using Cassandra.
- Experienced in automating jobs with Event driven and time-based jobs using Oozie workflow.
Environment: Cassandra, Kafka, Spark 2.3, Pig, Hive, Oozie, AWS, Solr, NoSQL, SQL, Scala 2.11, Python, Java, FileZilla, putty, IntelliJ, GitHub.
Confidential - Omaha, NE
Hadoop Developer
Responsibilities:
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Migrated Hive QL into Impala to minimize query response time.
- Responsible for analyzing the performance of Hive queries using Impala.
- Development of Apache Flume client to send data as events to flume sever and stored in file and HDFS.
- Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to Hive tables using different SerDe’s.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in developing Hive DDLs to create, alter and drop Hive tables.
- Developed Spark applications for data transformations and loading into HDFS using RDD and Dataframes .
- Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
- Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in Hive, doing map side joins.
- Analyze Cassandra database and compare it with other open-source NoSQL databases.
- Developed the Sqoop scripts to make the interaction between HDFS and RDBMS (Oracle, MySQL).
- Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes.
- Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS.
- Implemented Name Node backup using NFS. This was done for High availability.
- Developed ETL processes to transfer data from different sources, using Sqoop, Impala, and bash.
- Used Flume to stream through the log data from various sources.
- Configured Flume to extract the data from the web server output files to load into HDFS.
- Designed and implemented the MongoDB schema.
- Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
- Wrote services to store and retrieve user data from the MongoDB for the application on devices.
- Used Mongoose API to access the MongoDB from NodeJS.
- Implemented Spark operations on RDD and converted Hive/SQL queries into Spark transformations using RDD.
- Written the shell scripts to monitor the data of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
Environment: Apache Flume, Hive, Pig, HDFS, Zookeeper, Sqoop, RDBMS, AWS, Cloudera, MongoDB, Shell Scripts, Eclipse.
Confidential, Irving, TX
Java Developer
Responsibilities:
- Involved in End to End Design and Development of UI Layer, Service Layer, and Persistence Layer.
- Implemented Spring MVC for designing and implementing the UI Layer for the application.
- Implemented UI screens using JSF for defining and executing UI flow in the application.
- Have used AJAX to retrieve data from server synchronously in the background without interfering with the display and existing page in an interactive way.
- Have Used DWR (Direct Web Remoting) generated script to make AJAX calls to JAVA.
- Worked with JavaScript for dynamic manipulation of the elements on the screen and to validate the input.
- Analyze Cassandra database and compare it with other open-source NoSQL databases.
- Involved in writing Spring Validator Classes for validating the input data.
- Have used JAXB to marshal & unmarshal java objects to Communicate with the backend mainframe system.
- Involved in writing complex PL/SQL and SQL blocks for the application.
- Involved in Unit Testing, User Acceptance Testing and Bug Fixing.
Environment: MVC Architecture, Spring-IOC, Hibernate-JPA, JAVA Web Services-REST, Spring Batch, XML, Java Script, Oracle, Database (10g) and JSF UI, JUnits.
Confidential
Java Developer
Responsibilities:
- Used Servlets, Struts, JSP andJavaBeans for developing the Performance module using Legacy Code.
- Involved in coding for JSP pages, Form Beans and Action Classes in Struts.
- Created Custom Tag Libraries to support the Struts framework.
- Involved in Writing Validations.
- Involved in Database Connectivity through JDBC.
- Implemented JDBC for mapping an object-oriented domain model to a traditional relational database.
- Created SQL queries, Sequences, Views for the backend database in Oracle database.
- Developed JUnit Test cases and performed application testing for QC team.
- Used JavaScript for client-side validations.
- Developed dynamic JSP pages with Struts.
- Participated in weekly project meetings, updates, and Provided Estimates for the assigned task.
Environment: JAVA, J2EE, STRUTS, JSP, JDBC, ANT, XML, IBM Web Sphere, WSAD, JUNIT, DB2, Rational Rose, CVS, SOAP, and RUP.