Hadoop/spark/scala Developer Resume
OH
SUMMARY:
- Pradeep with 12 years of overall IT experience in a variety of industries, which includes hands on experience of around Three (3) years in Big Data technologies.
- Involved in the Software Development Life Cycle(SDLC)phases which include Analysis, Design, Implementation, Testing and Maintenance.
- Deep Understanding of Hadoop and Spark architecture and experience with Hadoop components such as Name Node, Data Node and Map Reduce concepts and HDFS Framework.
- Experience with Big Data, distributed systems using Apache Spark and developing solutions with Scala programming.
- Highly capable of processing large sets of Structured, Semi - structured and Unstructured datasets supporting Big Data applications.
- Experience working with relational database Management Systems (RDMS) like oracle, SQL Server and NoSQL databases like HBase.
- Experience with Big Data streaming messaging services like Kafka Connect, Stream processing using Apache Kafka.
- Experience working with Big Data Streaming frameworks like Spark-Streaming.
- Experience on Hadoop eco-system components like MapReduce, Yarn, Zookeeper, Hive, Pig, Sqoop, Flume, Oozie.
- Good command on coding with scripting languages like Unix shell scripting.
- Strong experience in database development, tuning and debug complex applications using SQL, PL/SQL and Java.
- Expertise in schema design, developing data models and proven ability to work with complex data.
- Knowledge about different file formats csv, Avro, parquet, xml and compression methodologies.
- Experience in developing Tableau reports for data visualizing.
- Highly motivated team player with the ability to work independently and adapt quickly to new and emerging technologies.
- Creatively communicate and present models to business customers and executives, utilizing a variety of formats and visualization methodologies.
TECHNICAL SKILLS:
- Apache Spark
- Scala Programming
- Apache Flume
- Apache Sqoop
- Apache Kafka
- Hadoop HDFS
- Apache Hive
- Apache HBase
- Zookeeper
- Apache Pig
- Tableau
- SQL and PL/SQL
- Unix Shell scripting
- Core Java
PROFESSIONAL EXPERIENCE:
Confidential, OH
HADOOP/SPARK/SCALA DEVELOPER
Responsibilities:
- To build, maintain and test Data ETL pipelines using Big data technologies.
- Worked on a live 60 nodes Hadoop cluster running HDP.
- Performed Data ingestion using Sqoop Import loading the customer profiles data, customer payments data, credit from legacy warehouses into HDFS and Hive tables and performed Sqoop export to get the HDFS data into MySQL server for data visualization using the tableau.
- Collecting and aggregating large amounts of log data using Flume and pushed staging data into HDFS for further analysis.
- Performed Streaming analytics using Flume, Kafka with Spark Streaming to process the data and saving raw data and processed data simultaneously into the HDFS using multiplex flume implementation model.
- Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization on customer data.
- Developed programs using Scala for Apache Spark to handle live data using D Stream to analyse the customer payments data.
- Loaded data to Apache Kafka Queue further loaded to HDFS and relational database for UI team to display it using the Web application.
- Developed Hive scripts to transform and aggregate the data for end user / analyst requirements to perform analysis.
- Solved performance issues in Hive and Pig scripts with modifying Joins, Group and aggregation.
- Converted Hive/SQL queries into Spark transformations and actions using Spark RDDs, Data Frames.
- To compare data across data storage locations: from files, to Hive, to traditional relational databases.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Automated the process for extraction of data from warehouses and weblogs into HIVE tables by developing workflows and coordinator jobs in Oozie.
- Monitored the Hadoop clusters with help of Apache Ambari, as part of Hortonworks Data platform.
Environment: Hortonworks (HDP 2.5), HDFS, Apache Spark(V1.6.3), Scala(V2.10.6), Apache Kafka(V0.10.1.0), Apache Flume (V1.6.0), Apache Sqoop(V1.4.6), Zookeeper(V3.4.9), YARN, Hive(V1.2.1), Oozie, Java, Linux, Apache Ambari (V2.4.2), Oracle database10g, SQL server (V13.0), Apache HBase (V1.1.2), Apache Phoenix, JIRA, Git, UNIX, Tableau, IntelliJ ( V2016.3.8 ).
Confidential, CO
Technical Lead
Responsibilities:
- Designed, developed and maintained data extraction and transformation processes and ensured that data is properly loaded and extracted in and out of our systems.
- Analysed information needed for coding and implementing procedures and functions.
- Translated business into technical requirements for development cycle.
- Identify opportunities to automate manual data process.
- Involved in SQL Query tuning and provided tuning recommendations to jobs, time/CPU Consuming queries.
- Built a complex Enterprise Java ecosystem in collaboration with development team.
- Co-ordinate with the Business users on day-to-day maintenance and root cause analysis to resolve application issues.
- Resolved application and connectivity problems in Java programs.
- Developed Language based UNIX shell Scripts for automating the failed orders in production.
- Understand data models to develop new reports.
- Documented and maintained code using change control software.
- Conducted unit testing of reports to facilitate change management and quality assurance.
- Assisted with integration testing and data quality issues.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
- Used Hive to analyse the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive (v0.13.1) scripts to transform and aggregate the data for end user / analyst requirements to perform analysis.
- Developed Tableau visualizations and dashboards using Tableau Desktop and provided production support for Tableau users.
- Provide weekly status and monthly metrics report to stakeholders.
Environment: Oracle SQL, PL/ SQL, XML/HTML, Unix shell scripting, Java, Oracle reporting tools, SQL Server, putty, WinSCP, TOAD, Hive(V0.13.1), Cloudera 3, Tableau.
Confidential
Senior Software engineer
Responsibilities:
- Interacted with Strategic Development Team and Business analyst to analyse business needs and developed technical specifications.
- Tested, Cleaned, and Standardized Data to meet the business standards using Execute SQL task, Conditional Split, Data Conversion, and Derived column in different environments.
- Communicate analysis effectively to internal teams and clients.
- Support data analysis requests from business owners and support management team.
- Developed Custom Interface programs and registered. These programs are scheduled to run every night as a concurrent request.
- Developed the required PL/SQL packages and registered as Concurrent Programs.
- Involved in Creating Database Links for the Remote Database Connectivity.
- Participated in the Unit and Integration Testing along with the end-users.
- Monitoring the deploy activities to QA and Production environments.
- Supporting the QA, Load testing, UAT
Environment: Oracle SQL, PL/ SQL, Unix shell scripting, putty, WinSCP, TOAD
Confidential
Software Engineer
Responsibilities:
- Developed, tested, debugged and documented Oracle PL/SQL packages and types in accordance with company policies, company standards and industry best practices.
- Involved in creating database objects like tables, views, procedures, triggers, functions using Oracle SQL to provide definition, structure and to maintain data efficiently.
- Fine-tuned SQL Queries for maximum efficiency and performance.
- Tested coding modifications and assisted with application and system testing to minimize errors and downtime.
- Translated business requirements into technical requirements and delivered application code that is fully tested and meets the business requirements.
- Maintained versioned code in Subversion in accordance with company policies, company standards and industry best practices.
- Designed, coded, tested, and debugged enterprise wide applications.
Environment: Oracle SQL, PL/ SQL, Unix shell scripting, putty, WinSCP, SQL Navigator.
