Hadoop Big Data/spark Developer Resume
Charlotte, NC
SUMMARY:
- Over 6+ years of experience in IT and 3+ years of experience Hadoop/Big D Confidential eco systems and Java technologies like Confidential, MapReduce, oozie, Impala, Apache Pig, Hive, Hbase, Spark, Kafka and Sqoop.
- In depth knowledge of Hadoop Architecture and Hadoop daemons such as Name Node, Secondary Name Node, D Confidential Node, Job Tracker and Task Tracker.
- Hands on experience in writing Ad - hoc Queries for moving d Confidential from Confidential to HIVE and analyzing the d Confidential using HIVE QL.
- Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Terad Confidential .
- Experience in writing Map Reduce programs using Apache Hadoop for analyzing Big D Confidential .
- Experience in writing Hadoop Jobs for analyzing d Confidential using Pig Latin Commands.
- Experience in importing and exporting d Confidential using SQOOP from Relational D Confidential base Systems to Confidential .
- Working Knowledge in NoSQL D Confidential bases like HBase and Cassandra.
- Good Knowledge in Amazon AWS concepts like EMR, EC2, EBS, S3 and RDS web services which provides fast and efficient processing of Big D Confidential .
- Experience in administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig in Distributed Mode.
- Good Knowledge of analyzing d Confidential in HBase using Hive and Pig.
- Experience in Integrating BI tools like Tableau and pulling required d Confidential to in-memory of BI tool.
- Experience in Launching EC2 instances in Amazon EMR using Console.
- Expertise in using Apache Nifi for multiple d Confidential transformations before loading to Confidential .
- Extending Hive and PIG core functionality by writing custom UDFs like UDAFs and UDTFs.
- Strong D Confidential Warehousing ETL experience of using DM Express ETL tool.
- Experience in working with Windows, UNIX/LINUX platform with different technologies such as Big D Confidential, SQL, XML, HTML, Core Java, Shell Scripting etc.
- Experience in using Apache Flume for collecting, aggregating and moving large amounts of d Confidential from application servers.
- Passionate towards working in Big D Confidential and Analytics environment.
- Knowledge on Reporting tools like Tableau which is used to do analytics on d Confidential in cloud.
- Extensive experience with SQL, PL/SQL, Shell Scripting and d Confidential base concepts.
- Experience with front end technologies like HTML, CSS and JavaScript.
TECHNICAL SKILLS:
D Confidential base: DB2, MySQL, Oracle, MS SQL Server, Terad Confidential
Languages: Core Java, PIG Latin, Scala Scripting, SQL, Hive QL, Shell Scripting and XML
API s/Tools: NetBeans, Eclipse, MYSQL workbench, Visual Studio, DM Express
Web Technologies: HTML, XML, JavaScript, CSS
BigD Confidential Ecosystem: Confidential, SPARK, PIG, MAPREDUCE, HIVE, Impala, KAFKA, SQOOP, FLUME, HBase
Operating System: Unix, Linux, Windows XP
Visualization Tools: Tableau, Zeppelin
Virtualization Software: VMware, Oracle Virtual Box.
Cloud Computing Services: AWS (Amazon Web Services).
PROFESSIONAL EXPERIENCE:
Confidential, Charlotte, NC
Hadoop Big D Confidential /Spark Developer
Responsibilities:
- Analyzing the requirement to setup a cluster.
- Developed MapReduce programs in Java for parsing the raw d Confidential and populating staging Tables.
- Created Hive queries to compare the raw d Confidential with EDW tables and performing aggregates
- Importing and exporting d Confidential into Confidential and Hive using SQOOP.
- Writing PIG scripts to process the d Confidential .
- Developed and designed Hadoop, Spark and Java components.
- Developed Spark programs to parse the raw d Confidential, populate staging tables and store the refined d Confidential in partitioned tables in the EDW. Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
- Developed PIG Latin scripts to extract the d Confidential from the web server output files to load into Confidential .
- Involved in HBASE setup and storing d Confidential into HBASE, which will be used for further analysis.
- Explored the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, D Confidential Frame, Pair RDD's, spark YARN and converted Hive queries into Spark transformations using Spark RDDs.
- Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.
- Used in production by multiple companies.
- Developed Unix/Linux Shell Scripts and PL/SQL procedures.
- Worked towards creating real time d Confidential streaming solutions using Apache Spark/Spark Streaming, Kafka.
- Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
- Installed and configured Hive and written Hive UDFs.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's using Scala.
- Involved in creating Hive tables, loading with d Confidential and writing hive queries using the HIVEQL which will run internally in MAPREDUCE way.
- Loaded some of the d Confidential into Cassandra for fast retrieval of d Confidential .
- Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of d Confidential and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
- Exported the analyzed d Confidential to the relational d Confidential bases using SQOOP for visualization and to generate reports by our BI team.
- Extracted files from Cassandra through Sqoop and placed in Confidential and processed.
- Implementation of Big D Confidential solutions on the Horton works distribution and AWS Cloud platform.
- Complete end-to-end design of Apache NiFi to get connected to AWS and store the final output in Confidential .
- Developed Pig Latin scripts for handling d Confidential formation.
- Extracted the d Confidential from MySQL into Confidential using SQOOP.
- Experience in managing and monitoring Hadoop cluster using Cloudera Manager.
Environment: Hadoop, Cloudera distribution, Hortonworks distribution, AWS, EMR, Azure cloud platform, Confidential, MapReduce, DocumentDB Unix Shell Scripting, Kafka, Pig, Hive, Sqoop, Flume, Oozie, Zoo keeper, Core Java, impala, HiveQL, Spark, UNIX/Linux Shell Scripting.
Confidential, Iriving TX
Big D Confidential Developer
Reponsibilities:
- Responsible for building scalable distributed d Confidential solutions using Hadoop.
- Involved in loading d Confidential from LINUX file system to Confidential .
- Working experience in Confidential Admin Shell commands.
- Experience in ETL methods for d Confidential extraction, transformation and loading in corporate-wide ETL Solutions and D Confidential warehouse tools for reporting and d Confidential analysis.
- Understanding/knowledge of Hadoop Architecture and various components such as Confidential, Job Tracker, Task Tracker, Name Node and D Confidential Node concepts.
- Developed Kafka producer and consumers, HBase clients, Apache Spark and
- Hadoop MapReduce jobs along with components on Confidential, Hive.
- Used Spark Streaming API with Kafka to build live dashboards; Worked on Transformations & actions in RDD, Spark Streaming, Pair RDD Operations, Check-pointing, and SBT.
- Used Kafka to transfer d Confidential from different d Confidential systems to Confidential .
- Migrated complex map reduce programs into Spark RDD transformations, actions.
- Involved in the development of Spark Streaming application for one of the d Confidential source using Scala, Spark by applying the transformations.
- Developed a script in Scala to read all the Parquet Tables in a D Confidential base and parse them as Json files, another script to parse them as structured tables in Hive.
- Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
- Used spark to parse XML files and extract values from tags and load it into multiple hive tables.
- Experience on different Hadoop distribution Systems such as: Cloudera & Hortonworks
- Hands on experience on Cassandra DB.
- Analyzed large d Confidential sets by running Hive queries and Pig scripts.
- Used Hive and created Hive tables and involved in d Confidential loading and writing Hive UDFs.
- Hands on using SQOOP to import and export d Confidential into Confidential from RDBMS and vice-versa.
- Exported the analyzed d Confidential to the relational d Confidential bases using Sqoop for visualization and to generate reports for the BI team.
- Used SQOOP, AVRO, HIVE, PIG, Java, MAPREDUCE daily to develop ETL, Batch Processing and d Confidential storage functionality.
- Supported implementation and execution of MAPREDUCE programs in a cluster environment.
Environment: Hadoop, MapReduce, Hive,Pig, Hbase, Sqoop, Kafka, Cassandra, Flume, Java, SQL, Cloudera Manager, Eclipse, Unix Script, YARN.
Confidential, NJ
Hadoop Developer
Responsibilities:
- Written MapReduce code to parse the d Confidential from various sources and storing parsed d Confidential into Hbase and Hive.
- Integrated Map Reduce with HBase to import bulk amount of d Confidential into HBase using Map Reduce Programs.
- Imported d Confidential from different relational d Confidential sources like Oracle, Terad Confidential to Confidential using Sqoop.
- Worked on a stand-alone as well as a distributed Hadoop application.
- Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Used Oozie and Zookeeper to automate the flow of jobs and coordination in the cluster respectively.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on d Confidential in Hive.
- Extensive knowledge on PIG scripts using bags and tuples and Pig UDF'S to pre-process the d Confidential for analysis.
- Implemented usage of Amazon EMR for processing Big D Confidential across a Hadoop Cluster of virtual servers.
- Used Terad Confidential to build Hadoop project and also as ETL project.
- Developed several shell scripts, which acts as wrapper to start these Hadoop jobs and set the configuration parameters.
- Involved in writing query using Impala for better and faster processing of d Confidential .
- Involved in developing Impala scripts for extraction, transformation, loading of d Confidential into d Confidential warehouse.
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Involved in moving all log files generated from various sources to Confidential for further processing through Flume.
- Involved in collecting and aggregating large amounts of log d Confidential using Apache and staging d Confidential in Confidential for further analysis.
- Develop testing scripts in Python and prepare test procedures, analyze test results d Confidential and suggest improvements of the system and software.
Environment: Confidential, MapReduce, Python, CDH5, Hbase, NOSQL, Hive, Pig, Hadoop, Sqoop, Impala, Yarn, Shell Scripting, Ubuntu, Linux Red Hat.
Confidential
Java / Hadoop Developer
Reponsibilities:
- Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
- Collected the logs d Confidential from web servers and integrated in to Confidential using Flume.
- Involved in creating Hive Tables, loading with d Confidential and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Worked on analyzing Hadoop cluster and different big d Confidential analytic tools including Map Reduce, Hive and Spark.
- Developed the Map Reduce programs to parse the raw d Confidential and store the pre-Aggregated d Confidential in the portioned tables.
- Involved in start to end process of Hadoop cluster installation, configuration and monitoring
- Responsible for building scalable distributed d Confidential solutions using Hadoop and Involved in submitting and tracking Map Reduce jobs using Job Tracker.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Worked with HBase in creating tables to load large sets of semi structured d Confidential coming from various sources.
- Created design documents and reviewed with team in addition to assisting the business analyst / project manager in explanations to line of business.
- Responsible for understanding the scope of the project and requirement gathering.
- Involved in analysis, design, construction and testing of the application
- Developed the web tier using JSP to show account details and summary.
- Designed and developed the UI using JSP, HTML, CSS and JavaScript.
- Used Tomcat web server for development purpose.
- Involved in creation of Test Cases for JUnit Testing.
- Used Oracle as D Confidential base and used Toad for queries execution and also involved in writing SQL scripts, PL/SQL code for procedures and functions.
- Developed application using Eclipse and used build and deploy tool as Maven.
Environment: Hadoop, HBase, Confidential, Pig Latin, Sqoop, Hive,Java, J2EE Servlet, JSP, JUnit, AJAX, XML, JavaScript, Maven, Eclipse, Apache Tomcat, and Oracle.