Big-data Engineer/ Hadoop Developer Resume
PhoeniX
SUMMARY
- 7 years of IT experience in various industries with 4 years of hands on experience in developing Big - data and Hadoop applications.
- Have strong technical foundation with in-depth knowledge in Big Data Hadoop, Data Reporting, Data Design, Data Analysis, Data governance, Data integration and Data quality.
- Experience in setting, configuring and monitoring of Hadoop cluster of Cloudera, Hortonworks distribution.
- Deep and extensive knowledge with HDFS, Spark, Map Reduce, Hive, HBase, Sqoop, Yarn, Oozie.
- Thorough knowledge on Hadoop architecture and various components such as HDFS, Name Node, Data Node, Application Master, Resource Manager, Node Manager, Job Tracker, Task Tracker and Map Reduce programming paradigm.
- Good understanding on Hadoop MR1 and MR2 (YARN) Architecture.
- Experience in developing able solutions using NoSQL databases including HBASE and COSMOS DB.
- Experience in writing distributed Scala code for efficient big data processing
- Experience building distributed high-performance systems using Spark and Scala
- Efficient in working with Hive data warehouse tool creating tables, data distributing by implementing Partitioning and Bucketing strategy, writing and optimizing the HiveQL queries.
- Experienced in performing analytics on structured data using Hive queries, operations, Joins, tuning queries and UDF.
- Good experience working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO and Parquet.
- Experience in using modern Big-Data tools like SparkSQL to convert schema-less data into more structured files for further analysis. Experience in Spark Streaming to receive real time data and store the stream data into HDFS.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- In-depth understanding of Data Structure and Algorithms.
- Experience in managing and reviewing Hadoop log files.
- Excellent understanding and knowledge of NOSQL databases like HBase.
- Implemented in setting up standards and processes for Hadoop based application design and implementation.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience in working with Apache Sqoop to import and export data to and from HDFS and Hive.
- Good working experience in designing Oozie workflows for cleaning data and storing into Hive tables for quick analysis.
- Primarily responsible for designing, implementing, Testing, and maintaining database solution for Azure.
- Primarily involved in Data Migration process using Azure by integrating with GitHub repository and Jenkins.
- Hands on experience with Real time streaming using Kafka, Spark streaming into HDFS.
- Developed analytical components using SparkSql and Spark Stream.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL using Scala.
- Good knowledge streaming data using Flume and Kafka from multiple sources into HDFS.
- Knowledge of processing and analyzing real-time data streams/flows using Kafka and HBase.
- Experience with Informatica Power Center Big Data Edition (BDE) for high-speed Data Ingestion and Extraction.
- Hands on experience with Amazon EMR, Cloudera (CDH4 & CDH5), and Horton Works Hadoop Distributions.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, Spark, Kafka, Map Reduce, Hive, Impala, HBase, Sqoop, Oozie, YARN, Azure, AWS
Programming languages: Python, Scala, SQL, PL/SQL.
Web Services & Technologies: WSDL, SOAP and RESTful.
ETL tools: Talend, Informatica (MDM, IDQ, TPT), Teradata.
Databases: Oracle, SQL Server, MySQL, DB2, NoSQL.
Application Servers: Apache Tomcat, WebLogic, WebSphere, JBoss.
Operating Systems: Windows, UNIX, Linux, Mac OS.
PROFESSIONAL EXPERIENCE
Confidential, Phoenix
Big-Data Engineer/ Hadoop developer
Responsibilities:
- Developed and deployed successfully many modules on Spark, Hive, Sqoop, Shell, Pig, Scala and Python. Successfully launched data transfer between Databases and HDFS with Sqoop and used Flume in parallel to stream the log data from servers.
- Modified Hive and SQL queries to Spark using Spark RDDs and Scala, python.
- Loaded the log data and data from UI apps into Hadoop lake using Spark.
- Developed analytical components using SparkSql and Spark Stream.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL using Scala.
- Experience developing Scala applications for loading/streaming data into NoSQL databases (MongoDB) and HDFS.
- Used Spark and Scala for developing machine learning algorithms which analyses click stream data.
- Designed and Developed Scala workflows for data pull from cloud based systems and applying transformations on it.
- Designed and deployed multiple POCs using Scala and Yarn cluster, and checked the Performance of Spark, with Cassandra and SQL.
- Primarily responsible for designing, implementing, Testing, and maintaining database solution for Azure.
- Creating pipelines from different sources (Hadoop, Teradata, Oracle, Linux) to ADLS2 mediating through Azure data factory by creating datasets and data pipelines.
- Performed Data transformations in the data bricks cluster.
- Used Azure data factory to schedule the flows by connecting different pipelines and data bricks notebooks.
- Creating Service principle to Authenticate the notebooks.
- Involved in data loading from UNIX file system to HDFS.
- Generated Sqoop scripts for data ingestion into Hadoop environment.
- Implemented Spark API over YARN to achieve data analytics in Hive DB.
- Developed Spark scripts for performing the Data validation between source and destination.
- Created and scheduled multiple tasks for incremental load into staging tables.
- Transformed data and performed data quality checks before loading onto HDFS with Pig.
- Created Hive External tables in partitioned format to load the processed data obtained from Map Reduce.
- Operated analytical algorithms on HDFS data using Map Reduce programs
- Merged data from different sources using Hive joins and performed Adhoc queries.
- Designed Hive Generic UDFs to perform record level business logic operations.
- Implemented Data classification algorithms using Map Reduce design patterns.
- Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.
Confidential, Phoenix
Big Data Engineer / Hadoop Developer
Responsibilities:
- Secured Hadoop Cluster by implementing Kerberos with Active Directory.
- Involved in data ingestion from relational databases into HDFS using Sqoop
- Data cleansing and data enrichment is done using Pig Latin and HiveQL.
- Responsible for managing data from various sources
- Worked on Machine Learning Algorithms Development for analyzing click stream data using Spark and Scala.
- Strong experience in core Java, Scala, SQL, PL/SQL and Restful web services.
- Created Hive External table for Semantic data and loaded the data into tables and query data using HQL.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
- Worked with different data sources like Avro data files, XML files, Json files, SQL server and Oracle to load data into Hive tables.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Business Metrics are built as part of target platform using HiveQL.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Involved in loading data into HBase from Hive tables to see the performance.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading data as text, parquet and orc to write into hive queries.
- Written a custom Map Reduce program for merging the data in incremental Sqoop
- Responsible for loading data from UNIX file system to HDFS.
Confidential, Washington, DC
Hadoop Developer
Responsibilities:
- Workflow to export Cassandra column family data to CSV, loaded data to pig. Avro Data Serialization system to work with JSON data formats.
- Created and maintained Technical documentation for launching Hadoop.
- Involved in managing deployments using xml scripts.
- Developed Spark SQL scripts and involved in converting hive UDF's to Spark SQL UDF's.
- Performed operations on data stored in HDFS and other NoSQL databases in both batch-oriented and ad-hoc contexts.
- Developed multiple Map Reduce jobs in java for data cleaning and preprocessing. Involved in loading data from LINUX file system to HDFS system.
- Running batch processes using Pig Scripts and developed Pig UDFs for data manipulation per Business Requirements.
- Accessing Hive tables to perform analytics from java applications using JDBC.
- Used Partitioning pattern in Map Reduce to move records into categories.
- Commissioning and Decommissioning nodes to Hadoop Cluster.
Confidential
Java Developer
Responsibilities:
- Actively involved in Analysis, Detail Design, Development, System Testing and User Acceptance Testing.
- Developing Intranet Web Application using J2EE architecture, using JSP to design the user interfaces, and JSP tag libraries to define custom tags and JDBC for database connectivity.
- Implemented struts framework (MVC): developed Action Servlet, Action Form bean, configured the struts-config descriptor, implemented validator framework.
- Extensively involved in database designing work with Oracle Database and building the application in J2EE Architecture.
- Integrated messaging with MQSERIES classes for JMS, which provides XML message Based interface. In this application publish-and-subscribe model of JMS is used.
- Developed the EJB-Session Bean that acts as Facade, will be able to access the business entities through their local home interfaces.
- Evaluated and worked with EJB's Container Managed Persistent strategy.
- Used Web services - WSDL and SOAP for getting Loan information from third party and used SAX and DOM XML parsers for data retrieval.