Big Data Engineer Resume Austin TX - Hire IT People

SUMMARY

7+ years of Total IT experience in Big Data Developer and Data Analysis.
Experience in design and development of applications using Hadoop ecosystem components like HDFS, Hive, Spark, Sqoop, Scala, Kafka, Apache Nifi, HBase and YARN
Experience on Hadoop Distributions HDP 2.6.x and CDH 5.x
Experience in developing Spark streaming applications to consume real - time transactions via Kafka Topics
Experience on building the applications using Spark Core, Spark SQL, Data Frames, Spark Streaming
Experience on importing the data from RDBMS databases Oracle and SQL Server into Hadoop data lake using Sqoop
Experience on job scheduling tool - Oozie
Experienced in AWS - S3, EC2, RDS and EMR
Experience in developing Spark applications using DataFrame and Datasets. Transformed data using PySpark, Spark SQL.
Knowledge on NoSQL databases HBase, MongoDB and Cassandra
Experience in real - time messaging systems such as Kafka to ingest real time streaming data into Hadoop
Worked with different Bug Tracking Tools like Remedy, and Jira
Experience on developing Spark batch applications to ingest data into common data lake.
Experience in importing and exporting data using Sqoop from RDBMS to HDFS and vice-versa
Experience working with Agile and Waterfall methodologies
Highly motivated, detail oriented, ability to work independently and as a part of the team with excellent networking and communication with all levels of stakeholders as appropriate, including executives, application developers, business users, and customers

TECHNICAL SKILLS

Hadoop/Big data: HDFS, Hive, Map Reduce, Spark, Sqoop, HBase, Kafka, Oozie, Nifi, Impala, Hue, Strom.

No SQL Databases: Spanner, HBase, MapR-DB

Languages: Python, Scala, Core Java, Unix Shell scripts, SQL

Web/Application Server: Apache Tomcat

Databases/ETL: Oracle, DB2, SQL Server, MySQL, DataStage, Teradata

IDEs: Eclipse, IntelliJ

Other Tools & packages: CAWA, Bit Bucket, JUnit, Maven, ANT, GitHub, Stream sets Data Collector, Grafana, Tableau.

SDLC Methodology: Agile, Waterfall model

Operating Systems: Linux, UNIX, Windows

Office Tools: MS Office, Word, Power Point

PROFESSIONAL EXPERIENCE

Confidential - Austin TX

Big Data Engineer

Responsibilities:

Importing and exporting data usingSqoopto load data to and fromOracle 11gto HDFS on a regular basis.
Involved in Requirement analysis, Design, development, and testing of the application.
Worked on enhancing the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Spark RDD's
Followed a parametrized approach for the schema details, file locations and delimiter details to make the coding efficient and reusable.
Developed PySpark application to consume data from Apache Kafka topics and publish to HDFS and HBase.
Worked on DStreams (Discretized Stream), RDD’s (Resilient Distributed Dataset), Dataframes, Spark SQL to build the spark streaming application.
Involved in convertingHive queries intoSpark transformationsusingSpark RDDs.
UsedSpark-SQL, Spark RDD, Spark Dataframeto loadJSONdata into Hive tables.
Involved in the data ingestion process through DataStage to load data into HDFS from Mainframes, Teradata, DB2.
Used Apache Nifi for data Ingestion, load data into Kafka topics
Developed ELT workflows using Nifi to load data into Hive and Teradata.
Used Python subprocess module to call the PySpark job.
DevelopingSparkcodeinSparkSQLenvironment for faster testing and processing of data and Loading the data intoSparkRDD.
Load the data intoSparkRDDand do in-memory data computations to generate the Output response.
Used Hue and Cloudera Manager to monitor Spark jobs.
Worked on AWS POC to modernize the streaming pipeline using AWS kinesis, lambda and s3, redshift.
Developed and maintains system documentation and runbooks.
Led the effort of end user training, to increase and drive technology adoption program among business users.
Worked on UNIX shell scripts and automation of the Scoop jobs using UNIX shell scripting.
Worked on Tableau for reporting on top of Hive.
Worked in Agile and used JIRA for maintain the stories about project.

Environment: Cloudera Hadoop, HDFS, Yarn, Hive, Spark, PySpark, Spark SQL, HBase, Sqoop, MS SQL Server, Oracle, SQL/ NoSQL, Linux, Python

Confidential - Austin, TX

Big Data Developer

Responsibilities:

Teamed up with Architects to Design Spark model for the Generic ETL framework.
Implemented Spark with YARN to perform analytics on data in Hive.
Developed the Extract process using Spark 2.0.
Created libraries to connect multiple databases like DB2, SQL Server, Oracle, MongoDB, Postgres SQL, HBase and to invoke the spark session along with some UDF’s
Imported the data from multiple data bases DB2, SQL server, Oracle, MongoDB, files etc.
Created data frames as a result set for the extracted data.
Applied filters and developed the Spark MapReduce jobs to process the data.
Involved in converting Hive/SQL queries into spark transformations using Spark RDD’s, Scala.
Implemented the code using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
Created multiple case class to generate the data based on the Object.
Converted the data into required JSON structure using Jackson4j to load the data into MongoDB or HBase or Postgres SQL.
As part of the transformation, read the data from MongoDB as a Json, applied explode on the data frame to flatten the data.
Developed code build using Scala API’s to compare the performance of Spark with Hive and Shell Script for the Sqoop job.
Used Struct type, struct of array to get/read the different schema’s on data frames.
Used Spark for interactive queries, processing of streaming data and integration with popular NOSQL data bases for huge volume of data.
Worked on Grafana for real-time visualizations.
Have good experience with ETL tools like IBM DataStage, Talend.
Expert knowledge on MongoDB, NoSQL data modeling, tuning, Indexing.

Environment: Cloudera Hadoop, HDFS, Yarn, Hive, Spark, PySpark, Spark SQL, HBase, Sqoop, Kafka, DB2, SQL Server, Oracle, MongoDB, DataStage, Postgres SQL, HBase, Linux, Python

Confidential - Minneapolis, MN

Big Data Developer

Responsibilities:

Involved in Requirement analysis, Design, development and testing of the application.
Configured Kafka Connect JDBC with SAP HANA and MapR Streams for both real-time streaming and batch process.
Created MapR-Event Streams and Kafka topics.
Worked on Antiunity Replicate to load data from SAP ECC to Apache Kafka topics.
Developed Spark Streaming application using Python to stream data from MapR Event Streams and Apache Kafka topics to Hive and MapR-DB and to stream data from one topic to the other topic with in the MapR Event Streams.
Worked on DStreams (Discretized Stream), RDD’s (Resilient Distributed Dataset), Dataframes, Spark SQL to build the spark streaming application.
Involved in creating SQL queries to extract data, to perform joins on the tables in SAP HANA and MySQL.
Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
Implemented Partitioning, Dynamic Partition, and Bucketing in Hive for efficient data access.
Used Hue and MapR Control System (MCS) to monitor and troubleshoot Spark jobs.
Developed SQOOP scripts to move data from MapR-FS to SAP HANA.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Installed and configured Kafka Connect JDBC in AWS EC2 instance.
Created stored procedures in MySQL to improve data handling and ETL Transactions.
Worked on data validation using HIVE and written Hive UDFs.
Managed Linux and Windows virtual servers on AWS EC2.
Built statistical on AWS EMR by uploading data in S3 instance on EC2models and creating
Configured SAP HANA source connector with SAP HANA as source and Apache Kafka topic as target for real time streaming and batch processing.
Provisioned, installed, and configured SAP HANA enterprise edition on AWS cloud EC2 instance.
Developed Streaming application to stream data from MapR ES to HBase.
Streamed data from Apache Kafka topics to time series database OPEN TSDB.
Built dashboards and visualizations on top of MapR-DB and Hive using Oracle data visualizer desktop. Built real-time visualizations on top of Open TSDB using Grafana.
Worked on UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.

Environment: MapR 6.0, Apache Kafka 1.0.0, Hive 2.1, HBase 1.1.8, Hue, MapR-DB, MapR-FS,, Spark 2.1.0, Python, AWS, SAP HANA, SQOOP, Oozie, Pig, IntelliJ, Kafka Connect Framework, DB Visualizer, Oracle Data Visualizer Desktop, Stream-sets Data collector, MapR-ES, MySQL, GIT.

Confidential, Median-OH

Big Data Developer

Responsibilities:

Worked on enhancing the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Spark RDD's.
Worked on MySQL for identifying required tables and views to export into HDFS.
Loaded data from MySQL to HDFS to development cluster for validation and cleansing.
Created Apache Kafka topics.
Configured Stream sets data collector with Apache Kafka to stream real time data from different sources (database & files) into Kafka topics.
Developed streaming application to stream data from Kafka topics to Hive using Spark, Python.
Processed large amounts of structured and semi-structured data using MapReduce programs.
Worked on real time processing and batch processing of data sources using Apache Spark, Elastic search, Spark Streaming, Apache Kafka.
Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, python, Spark SQL.
Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance.
Conducted POC's for real time streaming of data from MySQL to Hive and HBase.
Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement.
Handled importing & exporting of large data sets from various data sources into HDFS and vice-versa using Sqoop, performed transformations using Hive and loaded data into HDFS.
Built dashboards and visualizations on top of Hive using Tableau and published those reports on tableau online accounts and on the browser using iframe.
Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages.
Loaded data from UNIX file system to HDFS.

Environment: Cloudera, Apache Kafka, HDFS, Python, Hive, Spark, Spark SQL, PIG, Map Reduce, SQOOP, IntelliJ, Tableau, Stream-sets Data collector, UNIX, MySQL, GIT.

Confidential

Java Developer

Responsibilities:

Involved in the implementation of design using vital phases of the Software development life cycle.
Involved in design, development and testing of the application.
Implemented the object-oriented programming concepts for validating the columns of the import file.
Used DOM Parser to parse the xml files.
Implemented complex back-end component to get the count in no time against large size MySQL database (about 4 crore rows) using Java multi-threading.
Experience working in agile development following SCRUM process, Sprint, and daily stand-up meetings.
Developed front-end screens using JSP, HTML, JQuery, JavaScript and CSS.
Participate in OOAD, domain modelling, and system architecture.
Used WinSCP to transfer file from local system to other system.
Coming up with the test cases for unit testing before the QA release.
Working closely with QA team and coordinating on fixes.

Environment: Java, Core Java, Apache Tomcat, Maven, JavaScript, RESTful Web Services, Web logic, JBoss, Eclipse IDE, Apache CXF, FTP, HTML, CSS.

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Austin, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship