Sr. Hadoop Developer Resume
Plano, TX
PROFESSIONAL SUMMARY:
- Hadoop Developer with over 8 years of IT experience in Big Data, HadoopEco System, ETL and RDBMS related technologies with domain experience in Financial, Banking, Health Care, Retail and Non - profit Organizations. Worked on Design, Development, Implementation, Testing and Deployment of Software Applications using wide variety of technologies in all phases of the development life cycle.
- 4+ years working exclusive experience on Big Data technologies and HadoopstackStrong experience working with Spark Core, Spark SQL, Spark Streaming using Scala, Python, HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Avro, Flume, Kafka, Oozie, Cassandra and HBase.
- 4+ years of UNIX shell scripting. Hands on exposure on UNIX environment and knowledge.
- Great working experience on Real-time streaming data using Spark and Kafka connect.
- Very Good Knowledge in Object-oriented concepts with complete software development life cycle experience - Requirements gathering, Conceptual Design, Analysis, Detail design, Development, Mentoring, System.
- Good understanding of distributed systems, HDFS architecture, Internal working details of Mapreduce and Spark processing frameworks.
- Experienced in Big data solutions and Hadoopecosystem related technologies. Well versed with Big Data solution planning, designing, development and POC's.
- Used Apache Avro to de-serialize data from compact binary format to Json format
- Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
- Extensive Knowledge in Development, analysis and design of ETL methodologies in all the phases of Data Warehousing life cycle.
- Experience in deploying and managing the Hadoop cluster using Cloudera Manager.
- More than one year of hands on experience using Spark framework with Scala, Python. Good exposure to performance tuning hive queries and map-reduce jobs in spark framework.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Very Good knowledge and Hands-on experience in Cassandra, Flume and Spark (YARN).
- Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files.
- Has good understanding of various compression techniques used in Hadoop processing like Gzip, SNAPPY, LZO etc.,
- Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using Apache SQOOP.
- Tuned PIG and HIVE scripts by understanding the joins, group and aggregation between them.
- Extensively worked on HiveQL, join operations, writing custom UDF's and having good experience in optimizing Hive Queries.
- Worked on various HadoopDistributions (Cloudera, Hortonworks, Amazon AWS) to implement and make use of those.
- Mastered in using the using the different columnar file formats like RCFile, ORC and Parquet formats.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Hands on experience in installing, configuring and deploying Hadoopdistributions in cloud environments (Amazon Web Services).
- Hands on experience in NOSQL databases like HBase, MongoDB and Cassandra.
- Extensively use of use case diagrams, use case model, sequence diagrams using rational rose.
- Proactive in time management and problem solving skills, self-motivated and good analytical skills.
- Have analytical and organizational skills with the ability to multitask and meet the deadlines.
TECHNICAL SKILLS:
Languages: Java (J2SE, J2EE), SQL, PL/SQL, C, C++, Python, Scala, C#.
Java Technologies: JSP, JSF, JDBC, Servlets, Web ServicesFrameworks: Struts, Hibernate, Spring.
Big Data Ecosystem: HDFS, MapReduce, YARN, Hive, HBase, Impala, Zookeeper, SqoopOozie, Apache Cassandra, Flume, Spark, Hcatalog, Hue, Kafka, AVROAmbari, Kerberos, AWS
Scripting language: PIG, Python, UNIX, LINUX
HadoopCluster: Cloudera CDH 5, HortonWorks HDP 2.3/2.4, MapR
Web Technologies: HTML, HTML5, XML, CSS, JavaScript, JQuery, JSON, Bootstrap, SOAP RESTful
Databases: Oracle MS SQL Server My SQL, MS Access, DB2.
Methodology: Agile, Scrum.
IDE: Eclipse, Net Beans, IntelliJ.
Operating Systems: Linux (Redhat, CentOS, Ubuntu), UNIX, Mac OS, Sun Solaris and Windows
PROFESSIONAL EXPERIENCE:
Confidential, Plano, TX
Sr. Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoopcluster using different big data analytic tools including Spark, Kafka, Pig, Flume, Hive and Map Reduce.
- Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Importing the data from the MySql and Oracle into the HDFS using Sqoop.
- Importing the unstructured data into the HDFS using Flume.
- Written Map Reduce java programs to analyze the log data for large-scale data sets.
- Worked hands on with ETL process and Involved in the development of the Hive/Impala scripts for extraction, transformation and loading of data into other data warehouses.
- Importing and exporting data into HDFS using Sqoop and Kafka.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Scheduled automated tasks with Oozie for loading data into HDFS through Sqoop.
- Deployed HadoopCluster in Pseudo-distributed and Fully Distributed modes.
- Involved in running Ad-Hoc query through PIG Latin language, Hive or Java MapReduce.
- Responsible for upgrading Cloudera CDH 5.6.0 and Mapreduce 2.0 with YARN in Multi Clustered Node environment.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Written Apache Spark streaming API on Big Data distribution in the active cluster environment.
- Used the Spark - Cassandra Connector to load data to and from Cassandra.
- Created data-models for data using the Cassandra Query Language
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Developed custom MapReduce programs and User Defined Functions (UDFs) in Hive to transform the large volumes of data with respect to business requirement
- Cross examining data loaded in Hive table with the source data in oracle.
- Developing structured, efficient and error free codes for Big Data requirements using my knowledge in Hadoop and its Eco-system.
- Storing, processing and analyzing huge data-set for getting valuable insights from them.
Environment: Hadoop2.4.0, Oracle 11g/10g, Python, Cloudera, MapReduce, Hive, HBase, Flume, Impala, Sqoop, Pig, Zookeeper, Tableau, Cassandra, Java, ETL, SQL Server, CentOS, UNIX, Linux, Windows 7/ Vista/ XP.
Confidential, Fort Mill, SC
Sr. Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different Big Data Components including Pig, Hive, Spark, HBase, Kafka and SQOOP. Installed hadoop, Map Reduce, HDFS, and developed multiple Map-Reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Real time streaming the data using Spark with Kafka.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using scala.
- Designed docs and specs for the near real time data analytics using Hadoop and HBase.
- Used a 60 node cluster with Cloudera Hadoop distribution on Amazon EC2.
- Wrote MapReduce jobs with the Data Science team to analyze this data.
- Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMWare Vm's as required in the environment.
- Developed MapReduce programs and Hive queries to analyse sales pattern and customer satisfaction index over the data present in various relational database tables.
- Implemented project on the sales streaming data from multiple online resources and physical locations.
- Stored the data in S3 buckets for staging purpose and for further demand analytics we used EMR.
- Processed data is stored in Amazon RedShiftand finally submitting the data into Quick sight for business processes, fast analytics and visualization which is helpful for scaling to handle to hundreds of thousands of users and terabytes of data per organization.
- Followed agile methodology for the entire project.
- Experience in AWS cloud computing platform, and its many dimensions of scalability - including but not limited to: Amazon Kinesis, EMR(Elastic MapReduce), VPC (Virtual Private Cloud), EC2, load-balancing with ELB, Cloudfront, S3, RDS messaging with SQS (and scalable non-AWS alternatives), auto scaling architectures, using EBS under high I/O requirements, custom monitoring metrics/analysis/alarms via CloudWatch.
- Used Spark API over HadoopYARN to perform analytics on data in Hive
- Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to HadoopDistributed File System.
- Defined problems to look for right data and analyze results to make room for new project.
Environment: AWS, Hadoop2.4.0, HBase, HDFS, MapReduce, Pig, Java, Cloudera Manager.
Confidential, Minneapolis, MN
JAVA/HADOOP Developer
Responsibilities:
- Extracted files from Teradata through Sqoop and placed in HDFS and processed.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Also exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Pair RDDs, Spark YARN
- Experience in using Apache Flume for collecting, aggregating and moving large amounts of data.
- Responsible for loading the customer's data and event logs from Oracle database, Teradata into HDFS using Sqoop
- Implemented Spark using scala for faster testing and processing of data.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked with the Data Science team to gather requirements for various data mining projects
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Migrated ETL processes from Oracle, MSQL to Hive to test the easy data manipulation
- Developed Simple to complex MapReduce Jobs using Hive and Pig
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing
- Involved in loading data from LINUX file system to HDFS. Responsible for managing data from multiple sources
- Develop Shell scripts for automate routine tasks.
- Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
- Scheduled automated tasks with Oozie for loading data into HDFS through Sqoop and pre-processing the data with Pig and Hive.
- Written java code for file writing and reading, extensive usage of data structure ArrayList and HashMap.
- Implemented the MVC architecture using Spring MVC framework
- Composing the application classes as Spring Beans using Spring IOC/Dependency Injection.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Flume, ETL tools LINUX, and Big Data, Shell Scripting, HBase, Spring, Java Collections, REST, WSDL, Zookeeper and MySQL.
Confidential
JAVA Developer
Responsibilities:
- Followed SCRUM process of Agile Methodology.
- Used prototypes to demonstrate and verify the behavior of the system.
- Developed Restful Web services for other systems to interact with our system and secured the service with Spring-Security Oauth-2.0.
- Used Spring Core Container module to separate the application configuration and dependency specification from the actual code for injecting the dependencies into the objects
- Developed and deployed Spring AOP module to implement the crosscutting concerns like logging, security, Declarative Transaction Management
- Used JUnit framework to develop and execute the unit test cases.
Environment: J2EE, JDK, Sprint MVC, Hibernate, JSP, Jenkins, Web services, XSD, XML, JQuery, AJAX, Maven, Log4j, JUnits.
Confidential
Software Developer Intern
Responsibilities:
- Monitored sessions using the workflow monitor, which were scheduled, running, completed or failed. Debugged mappings for failed sessions.
- Perform SQL Server service pack and Windows Service pack upgrades.
- Used various transformations like Filter, Expression, Sequence Generator, Update
- Used Extract Transform Loading (ETL) tool of SQLServer to populate data from various data sources, creating packages for different data loading operations for application.
- Developed scripts to migrate data from multiple sources.
Environment: Shell scripting, Linux, SQL, SSRS, ETL
