Bigdata Engineer Resume
Phoenix, AZ
SUMMARY:
- 6 years of Professional experience in IT Industry with 2 years of experience in analysis, architectural design, prototyping, development, Integration and testing of applications using Java/J2EE Technologies and 3+ years of experience in Big Data Analytics as Hadoop Developer.
- Experienced in building highly scalable Big Data solutions using Hadoop and multiple distributions i.e., Cloudera, Horton works and NoSQL platforms (Flume, HBase, Cassandra).
- Strong Knowledge and development experience in Hadoop and Big Data Ecosystem including MapReduce, HDFS, Hive, Pig, Spark, HBase, Zookeeper, Kafka, Scala, Sqoop, Flume, Oozie and Impala.
- Technical skills encompass Java, J2EE (JDBC, Servlets, EJB, JMS, jQuery, Web Services (SOAP, RESTFUL), Spring & Hibernate Frameworks, HTML 5.0, DHTMLX, JSON, jQuery, Apache Log4J, Maven, Shell script and Java script.
- Extensive working knowledge in Analysis, Design, Development, Documentation and Deployment in handling projects of different Domains like Health Care Insurance, Telecom Automobile, Banking & Financial.
- Real time experience in developing, debugging and tuning Spark jobs in Hadoop environment.
- Experienced in developing enterprise applications using open source Technologies such as Maven, Log4j and Junit.
- Experience with scripting languages (SQL, Scala, Java, Pig) to manipulate data.
- Worked with relational database systems (RDBMS) such as MySQL, Tera Data, Oracle and NoSQL database systems like HBase and Cassandra.
- Ability to design and support development of a data platform for data processing (data ingestion and data transformation) and data repository using Big Data Technologies like Hadoop stack including HDFS cluster, Spark, Scala, Hive and Impala.
- Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDFs) for data specific processing.
- Proficient in Big data ingestion tools like Flume, Kafka, Spark Streaming and Sqoop for streaming and batch data ingestion.
- Experienced in building or administrating Hadoop cluster with HDFS, Spark, Kafka, Zookeeper, Impala, Hive, YARN, Hue, Oozie.
- Experience in developing applications using Spring Framework 3.2.2, worked on different spring modules like core container module, application context module, Aspect oriented module (AOP Module), JDBC Module, ORM Module and web module.
- Strong SQL and Hive knowledge in query processing, optimization and execution, query performance, Explain, database tooling.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map Reduce and Pig jobs.
- Proficiency in UNIX/Linux fundamentals in relation to UNIX scripting and administration
- Working knowledge in Hadoop HDFS Admin Shell commands.
- Experience in designing and developing Enterprise applications using Java/J2EE technologies on Hadoop MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, J2EE tools & technologies like JDBC, Spring, MVC, Hibernate, XML and IDEs tools Eclipse and IntelliJ.
- Good experience in development and support of Java/J2EE Web applications with emphasis on OOP - based web forms, business logic, database access components.
- Good understanding of distributed systems, HDFS architecture, Internal working details of Mapreduce and Spark processing frameworks.
- Ability to design, develop, deploy and support solutions using Agile Scrum methodology that leverage the Client big data platform.
TECHNICAL SKILLS:
Bigdata Technologies: HDFS, MapReduce, Pig, Hive, HBase, Sqoop, Flume, Oozie, Hadoop Streaming, Zookeeper, AWS, Kafka, Impala, Apache Spark, Apache Storm and Yarn
Hadoop Distributions: Cloudera (CDH4/CDH5), Horton Works
Java, C, Scala, CPP, SQL, PIG: Latin, HQL
IDE Tools: Eclipse, IntelliJ
Web Technologies: HTML5, JavaScript, jQuery, JSP, JSON, XML.
Web Services: SOAP, REST, WSDL.
Operating Systems: Windows (XP,7,8), UNIX, LINUX, CentOS
Tools: Adobe, Sql Developer, Flume, Sqoop
J2EE Technologies: JSP, Java Bean, JDBC
Oracle, MySQL, DB2, Tera Data, No: SQL Database (HBase, Cassandra)
PROFESSIONAL EXPERIENCE:
Confidential, Phoenix, AZ
Bigdata Engineer
Responsibilities:
- Data Ingestion into the Indie-Data Lake using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Open source Apache tools like FLUME and SQOOP into HIVE environment.
- Develop Spark code using Scala and Spark-SQL for faster testing and data processing.
- Experience with batch processing of data sources using Apache Spark.
- Develop predictive analytic using Apache Spark Scala APIs.
- Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
- Develop Kafka producer and consumers, HBase clients, Spark jobs using scala API’s along with components on HDFS, Hive.
- Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
- Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
- Responsible for managing existing data extraction jobs, but also play a vital role in building new data pipelines from various structured and unstructured sources into Hadoop. work on a product team using Agile Scrum methodology to design, develop, deploy and support solutions that leverage the Client big data platform.
- Integrated Apache Spark with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, HBase and Hive by integrating with Spark.
- Design and code from specifications, analyzes, evaluates, tests, debugs, documents, and implements complex software apps.
- Developed Sqoop Scripts to extract data from DB2 EDW source databases onto HDFS.
- Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Implemented Cloudera Manager on existing cluster.
- Extensively worked with Cloudera Distribution Hadoop, CDH 5.x, CDH4.x.
- Troubleshooting experience in debugging and fixed the wrong data or data missing problem for both Oracle Database and Tera Data.
Environment: HDFS, JavaAPI, Pig, Hive, Sqoop, Oozie, HBase, Kafka, Spark Streaming, Scala, Yarn, IntelliJ, Spring, Shell Scripting, Cloudera.
Confidential
Hadoop Developer
Responsibilities:
- Work with hive complex datatypes and involved in Bucketing.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
- Experienced in Core Java with strong understanding of Multithreading, Collections, Concurrency, Exception handling concepts, Object-oriented analysis, design, and development.
- Develop Spark code using Scala and Spark-SQL for faster testing and data processing.
- Import millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Use Spark SQL to process the huge amount of structured data.
- Responsible in analysis, design, testing phases and responsible for documenting technical specifications.
- Having experience on RDD architecture and implementing Spark operations on RDD and also optimizing transformations and actions in Spark.
- Developed UDF's using both Data Frames/Sql and RDD in Spark For data Aggregation queries and reverting back into OLTP through sqoop.
- Involved in converting Hive/Sql queries into Spark transformations using Spark RDD's.
- Loading data from Linux file system to HDFS and vice-versa.
- Implement Spark RDD transformations, actions to migrate Map reduce algorithms.
- Create a complete processing engine, based on Cloudera's distribution.
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Developing and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Experience on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
- Worked on NoSQL databases including HBase and Cassandra
Environment: Hadoop, Hive, Pig, Horton Works(HDP2.5/2.6), Cassandra, HDFS, DB2, Sqoop, Oozie, Spark, Scala, Linux,, MySQL, Oracle.
Confidential
Hadoop/Java Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
- Developed several custom User defined functions in Hive & Pig using Java.
- Installed and configured Hadoop Map reduce, HDFS, Cassandra and Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Work with hive complex datatypes and involved in Bucketing.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
- Develop Spark code using Scala and Spark-SQL for faster testing and data processing.
- Import millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Having experience on RDD architecture and implementing Spark operations on RDD and also optimizing transformations and actions in Spark.
- Developed UDF's using both Data Frames/Sql and RDD in Spark For data Aggregation queries and reverting back into OLTP through Sqoop.
- Developed Rest architecture-based web services to facilitate communication between client and servers.
- Used the Eclipse as IDE, configured and deployed the application onto WebLogic application server using Maven build scripts to automate the build and deployment process.
- Implement Spark RDD transformations, actions to migrate Map reduce algorithms.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Experience on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Involved in multi-tiered J2EE design utilizing MVC architecture Struts Framework, Hibernate and EJB deployed on WebSphere Application Server connecting to an Oracle database.
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Involved in submitting and tracking Map Reduce jobs using Job Tracker.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
- Implemented Hive Generic UDF's to implement business logic.
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Teradata, Zookeeper, Kafka, Impala, Apache Spark, Horton Works, HBase, YARN, Java, Spring, MySQL, Oracle, Sql Server, Agile Methodology, Eclipse, WebLogic Application Server, SOAP, Restful Webservices, JDBC.