Sr. Hadoop Developer Resume
Plano, TX
SUMMARY
- Hadoop Developer with over 8 years of IT experience in Big Data, HadoopEco System, ETL and RDBMS related technologies with domain experience in Financial, Banking, Health Care, Retail and Non - profit Organizations. Worked on Design, Development, Implementation, Testing and Deployment of Software Applications using wide variety of technologies in all phases of the development life cycle.
- 4+ years working exclusive experience on Big Data technologies and HadoopstackStrong experience working with Spark Core, Spark SQL, Spark Streaming using Scala, Python, HDFS, MapReduce, Elastic Search, Spark, Hive, Pig, Sqoop, Avro, HCatalog, Flume, Kafka, Oozie, Mahout, Zookeeper and HBase.
- Great working experience on Real-time streaming data using Spark and Kafka connect.
- Very Good Knowledge in Object-oriented concepts with complete software development life cycle experience - Requirements gathering, Conceptual Design, Analysis, Detail design, Development, Mentoring, System.
- Good understanding of distributed systems, HDFS architecture, Internal working details of Mapreduce and Spark processing frameworks.
- Experienced in Big data solutions and Hadoopecosystem related technologies. Well versed with Big Data solution planning, designing, development and POC's.
- Experience in using GIT as data repository.
- Experience in integrating Cassandra with Elastic Searchand Hadoop.
- Used Apache Avro to de-serialize data from compact binary format to Json format
- Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
- Extensive Knowledge in Development, analysis and design of ETL methodologies in all the phases of Data Warehousing life cycle.
- Experience in deploying and managing the Hadoop cluster using Cloudera Manager.
- More than one year of hands on experience using Spark framework with Scala. Good exposure to performance tuning hive queries and map-reduce jobs in spark framework.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Very Good knowledge and Hands-on experience in Cassandra, Flume and Spark (YARN).
- Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files.
- Has good understanding of various compression techniques used in Hadoop processing like Gzip, SNAPPY, LZO etc.,
- Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using Apache SQOOP.
- Tuned PIG and HIVE scripts by understanding the joins, group and aggregation between them.
- Extensively worked on HiveQL, join operations, writing custom UDF's and having good experience in optimizing Hive Queries.
- Worked on various HadoopDistributions (Cloudera, Hortonworks, Amazon AWS) to implement and make use of those.
- Mastered in using the using the different columnar file formats like RCFile, ORC and Parquet formats.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Hands on experience in installing, configuring and deploying Hadoopdistributions in cloud environments (Amazon Web Services).
- Hands on experience in NOSQL databases like HBase, MongoDB and Cassandra.
- Extensively use of use case diagrams, use case model, sequence diagrams using rational rose.
- Proactive in time management and problem solving skills, self-motivated and good analytical skills.
- Have analytical and organizational skills with the ability to multitask and meet the deadlines.
TECHNICAL SKILLS
Languages: Java (J2SE, J2EE), SQL, PL/SQL, C, C++, Python, Scala, C#.
Java Technologies: JSP, JSF, JDBC, Servlets, Web ServicesFrameworks Struts, Hibernate, Spring.
Big Data Ecosystem: HDFS, MapReduce, YARN, Hive, HBase, Impala, Zookeeper, SqoopOozie, Apache Cassandra, Flume, Spark, Hcatalog, Hue, Kafka, AVROAmbari, Kerberos, AWS
Scripting language: PIG, Python, UNIX, LINUX
HadoopCluster: Cloudera CDH 5, HortonWorks HDP 2.3/2.4, MapR
Web Technologies: HTML, HTML5, XML, CSS, JavaScript, JQuery, JSON, Bootstrap, SOAP RESTful
App/Web Servers: Apache Tomcat, IBM WebSphere, Web Logic
Databases: Oracle … MS SQL Server … My SQL, MS Access, DB2.
Methodology: Agile, Scrum.
IDE: Eclipse, Net Beans, IntelliJ.
Operating Systems: Linux (Redhat, CentOS, Ubuntu), UNIX, Mac OS, Sun Solaris and Windows
PROFESSIONAL EXPERIENCE
Confidential, Plano, TX
Sr. Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoopcluster using different big data analytic tools including Spark, Kafka, Pig, Flume, Hive and Map Reduce.
- Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Importing the data from the MySql and Oracle into the HDFS using Sqoop.
- Importing the unstructured data into the HDFS using Flume.
- Written Map Reduce java programs to analyze the log data for large-scale data sets.
- Worked hands on with ETL process and Involved in the development of the Hive/Impala scripts for extraction, transformation and loading of data into other data warehouses.
- Used HIVE join queries to join multiple tables of a source system for loading and analyzing data and load them into Elastic Search Tables.
- Importing and exporting data into HDFS using Sqoop and Kafka.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Deployed HadoopCluster in Pseudo-distributed and Fully Distributed modes.
- Involved in running Ad-Hoc query through PIG Latin language, Hive or Java MapReduce.
- Responsible for upgrading Hortonworks HadoopHDP2.2.0 and Mapreduce 2.0 with YARN in Multi Clustered Node environment.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Written Apache Spark streaming API on Big Data distribution in the active cluster environment.
- Used the Spark - Cassandra Connector to load data to and from Cassandra.
- Created data-models for data using the Cassandra Query Language
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Experience in AWS cloud computing platform, and its many dimensions of scalability - including but not limited to: VPC (Virtual Private Cloud), EC2, load-balancing with ELB, Cloudfront, S3, Glacier, messaging with SQS (and scalable non-AWS alternatives), auto scaling architectures, using EBS under high I/O requirements, custom monitoring metrics/analysis/alarms via CloudWatch.
- Implemented Elastic Search to decrease query times and increase search capabilities.
- Used Spark API over Hortonworks HadoopYARN to perform analytics on data in Hive
- Loading the processed data from multiple sources to AWS S3 cloud storage.
- Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to HadoopDistributed File System.
Environment: Hadoop2.4.0, Oracle 11g/10g, Python, Hortonworks, MapReduce, Hive, HBase, Flume, Impala, Sqoop, Pig, Zookeeper, Tableau, Cassandra, Java, ETL, SQL Server, CentOS, UNIX, Linux, Windows 7/ Vista/ XP.
Confidential, Fort Mill, SC
Sr.Hadoop Developer
Responsibilities:
- Worked with technology and business groups for Hadoopmigration strategy.
- Researched and recommended suitable technology stack for Hadoopmigration considering current enterprise architecture.
- Designed docs and specs for the near real time data analytics using Hadoop and HBase.
- Installed Cloudera Manager 5.5 on the clusters.
- Used a 60 node cluster with Cloudera Hadoop distribution on Amazon EC2.
- Developed ad-clicks based data analytics, for keyword analysis and insights.
- Crawled public posts from Facebook and tweets.
- Wrote MapReduce jobs with the Data Science team to analyze this data.
- Validated and Recommended on Hadoop Infrastructure and data center planning considering data growth.
- Transferred data to and from cluster, using Sqoop and various storage media such as Informix table's and flat files.
- Developed MapReduce programs and Hive queries to analyse sales pattern and customer satisfaction index over the data present in various relational database tables.
- Worked extensively in performance optimization by adopting/deriving at appropriate design patterns of the MapReduce jobs by analysing the I/O latency, map time, combiner time, reduce time etc.
- Developed Pig scripts in the areas where extensive coding needs to be reduced.
- Developed UDF's for Pig as needed.
- Followed agile methodology for the entire project
- Defined problems to look for right data and analyze results to make room for new project.
Environment: Hadoop2.4.0, HBase, HDFS, MapReduce, Pig, Java, Cloudera Manager 2, Amazon EC2 classic.
Confidential, Minneapolis, MN
JAVA/HADOOP Developer
Responsibilities:
- Extracted files from MySQL, Oracle, and Teradata through Sqoop and placed in HDFS and processed.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop
- Experience in using Apache Flume for collecting, aggregating and moving large amounts of data.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked with the Data Science team to gather requirements for various data mining projects
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Migrated ETL processes from Oracle, MSQL to Hive to test the easy data manipulation
- Developed Simple to complex MapReduce Jobs using Hive and Pig
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way
- Involved in running Hadoopjobs for processing millions of records of text data
- Worked with application teams to install operating system, Hadoopupdates, patches, version upgrades as required
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing
- Involved in loading data from LINUX file system to HDFS. Responsible for managing data from multiple sources
- Created and maintained Technical documentation for launching HadoopClusters and for executing Hive queries and Pig Scripts.
- Worked on Evaluating, comparing different tools for test data management with Hadoop.
- Develop Shell scripts for automate routine tasks.
- Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows
- Scheduled automated tasks with Oozie for loading data into HDFS through Sqoop and pre-processing the data with Pig and Hive.
- Worked on Spring Security for user Authentication and Authorization using LDAP authentication provider.
- Written java code for file writing and reading, extensive usage of data structure ArrayList and HashMap.
- Implemented the MVC architecture using Spring MVC framework
- Composing the application classes as Spring Beans using Spring IOC/Dependency Injection.
- Designed and Developed server side components using Java, REST, WSDL.
- Managing and Supporting Infrastructure
- Monitoring and Debugging Hadoop jobs/Applications running in production.
- Worked on Providing User support and application support on HadoopInfrastructure.
- Reviewing ETL application use cases before on boarding to Hadoop.
- Worked on Evaluating, comparing different tools for test data management with Hadoop.
- Helped and directed testing team to get up to speed on HadoopApplication testing.Worked on Installing 20 node UAT Hadoopcluster.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Flume, ETL tools LINUX, and Big Data, Shell Scripting, HBase, Spring, Java Collections, REST, WSDL, Zookeeper and MySQL.
Confidential
JAVA Developer
Responsibilities:
- Followed SCRUM process of Agile Methodology.
- Used prototypes to demonstrate and verify the behavior of the system.
- Developed Restful Web services for other systems to interact with our system and secured the service with Spring-Security Oauth-2.0.
- Used Spring Core Container module to separate the application configuration and dependency specification from the actual code for injecting the dependencies into the objects
- Developed and deployed Spring AOP module to implement the crosscutting concerns like logging, security, Declarative Transaction Management
- Used Spring MVC framework to push messages on to client's browser page.
- Configured Hibernate mapping files and Hibernate configuration files to connect and query the database.
- Implemented inline validations using JQuery plugin to make the app more users friendly.
- Written automation scripts using Java and web driver/Selenium 2 and implementing automation scripts using Sauce Labs.
- Used JUnit framework to develop and execute the unit test cases.
Environment: J2EE, JDK, Sprint MVC, Hibernate, JSP, Jenkins, Web services, XSD, XML, JQuery, AJAX, Maven, Log4j, JUnits.
Confidential
Software Developer Intern
Responsibilities:
- Monitored sessions using the workflow monitor, which were scheduled, running, completed or failed. Debugged mappings for failed sessions.
- Perform SQL Server service pack and Windows Service pack upgrades.
- Implementation of SQL Logins, Roles and Authentication Modes as a part of Security Policies for various categories of users.
- Used various transformations like Filter, Expression, Sequence Generator, Update
- Designed and implemented comprehensive Backup plan and disaster recovery strategies
- Implemented and Scheduled Replication process for updating our parallel servers.
- Scheduled reports for daily, weekly, monthly reports for executives, Business analyst and customer representatives for various categories and regions based on business needs using SQLServer Reporting services (SSRS).
- Actively participated in interaction with users, team lead, DBAs and technical manager to fully understand the requirements of the system.
- Developed Stored Procedures and used them in Stored Procedure transformation for data processing and have used data migration tools.
- Designed and developed ETL mappings to extract master and transactional data from heterogeneous data feeds and load
- Used Extract Transform Loading (ETL) tool of SQLServer to populate data from various data sources, creating packages for different data loading operations for application.
- Developed scripts to migrate data from multiple sources.
Environment: Shell scripting, Linux, SQL, SSRS, ETL